World Wide Web Consortium releases draft on how text-to-speech reads annotated text

The World Wide Web Consortium has released a draft guide on how text-to-speech technology should read documents that include ruby annotations. The draft focuses on explaining user needs and reading strategies so that speech output sounds natural across different writing systems.

World Wide Web Consortium releases draft on how text-to-speech reads annotated text

The World Wide Web Consortium Internationalization Working Group has published the first draft of a Group Note titled Text-to-Speech Rendering of Electronic Documents Containing Ruby: User Requirements. The document aims to describe, in plain terms, what people need when text-to-speech systems read electronic documents that include ruby annotations.

Ruby annotations are small bits of text placed above, beside, or below the main text to show pronunciation or provide brief notes. They are common in writing systems such as Japanese, where small pronunciation guides (known as furigana) are shown next to complex characters. When a text reader (software that converts written content into spoken words) encounters documents with ruby, it needs to know how to read both the base text and the extra annotations in a way that makes sense to listeners.

The draft document explains the different roles that ruby plays in various languages and text formats. For example, in some cases ruby text shows how a word is pronounced, while in others it shows alternate or explanatory text. Understanding these roles helps determine what should be spoken aloud and how it should sound.

Instead of telling developers exactly which algorithms to use or how to code these behaviours, the draft focuses on describing user expectations. It discusses questions such as: Should the reader speak the ruby text before or after the main text? Should it be spoken at all? How should pauses or emphasis be handled? By outlining these kinds of reading strategies, the document helps designers think about how a text-to-speech system should behave so that the output is clear and useful to listeners.

The draft is framed around user requirements, meaning it centres on what people need from text-to-speech in everyday use. It examines what makes sense for listeners in different contexts, rather than prescribing technical formulas or specific software implementations. This approach allows for flexibility in how developers build or improve reading tools, while still encouraging consistency so that users have reliable experiences across devices and platforms.

As a Group Note draft, the document is not yet a formal standard. Instead, it serves as a starting point for discussion and feedback. The working group expects comments from developers, accessibility experts, and others interested in text-to-speech technology to help refine future versions. The goal is to ensure that documents with ruby annotations can be read aloud in ways that are natural and helpful for all users, including those with visual impairments or those who rely on speech output for other reasons.

Go to Top