-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage requirement of mo audio element #1986
Comments
I'm not sure what you're asking by this. It's an instruction to the reading system to render it via TTS when the audio element isn't present and the element doesn't refer to embedded media. If you check what is referenced by the fragment, what would make this complex to test? The test to verify would be to play back the smil file and see if the reading systems renders the text content of the referenced element using its own TTS playback. If not, it fails. We don't define TTS enhancement technologies like SSML, PLS, and CSS3 Speech, but there's still a requirement for Reading Systems to render the content via TTS in media overlays. That's the section the link on TTS goes to. I agree it would be better to say it's optional except when required by the other element, though. It's always weird to start off saying it's required and then not make it required. But it would be better to stick with the XPATH axes terminology ("sibling" not "adjacent"). Maybe we can make this a list to make it easier to read:
|
Your proposed text is much better, for sure. What bothers me, I guess, is that all the other statements and requirements in that section are on the structure of the MO file (element must be present, attributes are optional or not, etc.). Ie, for a given MO file the validity of the MO file can be checked easily, eg, via an xml schema of some sort (and maybe checking the media type of a referenced file). This statement is an outlier, because no schema or media type check can really check whether the mo is valid. Maybe we cannot help it, but it is still worth rewriting the sentence as you propose to make things clearer. |
It's not a simple schema check, but the outlier cases where it might have problems probably aren't too realistic for EPUB, either. Or put another way, what else widely occurs in the body of an EPUB but audio, video, images and text? That immediately excludes a clear set of HTML elements (header stuff, script, style, etc.). For the rest, you can likely omit the audio element and some playback behaviour will still occur. It may not be what was intended, but you can't solve intent by schema alone. And if you want to get fancier and expect other media types might occur (e.g., so you don't blanket accept whatever is in an I personally think the implementation is more complicated than the verification, but I don't believe this feature is expected to be widely used, at least not the TTS part. I expect it's there primarily for DAISY producers, and I don't see this requirement tripping them up. |
I am in favor of making |
I think that makes sense, but should we be more explicit in the text element definition about what it can reference so that it's not legal to reference non-content elements? If text must reference palpable content, for example, then omitting audio really shouldn't matter. |
I think it would be cleaner for the audio element to say something like:
@mattgarrish I guess your proposal to add a reference to a palpable content is to be added to the text element definition. I would agree with that although, again, it is difficult to check for a checker... |
B.t.w., making the audio element is in line with the definition of the par element which refers to the element as optional. |
What bothers me more now is that I was forgetting that SVG support is optional, so we still have to account for it. I don't believe it has any concept of palpable content that we could similarly cite. (Minor nitpick, but the intro for the text element says: "The text element references an element in the EPUB Content Document." Since an MO doc can reference multiple epub content documents, this should really be "in an EPUB Content Document".) |
For SVG: aren't the only 'palpable' content elements
That should work... |
For text content, yes, but the That means, at least as I understand it, you can associate audio with just about any component of an svg image outside of script and style. I don't know if those are the only two elements that are non-content or if there are others -- I don't create SVGs so can't evaluate the purpose of all the elements just by looking at the list of names. |
Hm. To be honest, I am not even sure what that would mean: I do not see how a feedback would work on a drawing the same way as the RS would go, say, sentence-by-sentence signalling what the audio reading is just referring to. My suggestion is that we are getting into a new specification territory, which we should not do. |
Ya, I think that's the gist of this note from the authoring spec:
We might want to look at better explaining this, like being more explicit that there isn't even guaranteed support for playback of SVGs at all.
I'm assuming that you could highlight graphical objects. For example, the slices of a pie chart could be described by audio. But I don't recall ever seeing SVG with overlays, and don't know at what level it stops making sense to reference specific graphical elements. |
It may even make sense to reconsider why we're supporting SVG when we expect there to be interoperability problems due to a lack of detailed playback behaviour. |
I do have an example somewhere from many years ago that shows MO sync with SVG. As you can apply CSS to SVG, it works in the same way as MO + HTML, assuming the author defines styles that make sense for SVG. But as there wasn't really any interest in digging into this direction, I think we left it in the spec as "possible but mostly uncharted territory" |
I think supporting SVG by explicitly referring to textual elements (as in #1986 (comment)) is a low-hanging fruit spec-wise (no idea if anybody ever implemented it, though). It may also make sense if SVG is used for FXL. Going beyond that is not obvious. It is much more complicated in practice than text (it would be possible to use one SVG element that describes several slices in a pie chart with a
In both cases we put a note into the spec making that fact clear and that the SVG related work is left for future work. |
Actually... playing with some SVG, and if we want to make things explicit, then the |
FWIW, I created an MO test with SVG (see w3c/epub-tests#118). I hope the test is o.k. from a spec point of view; Thorium does the reading but does not do the highlighting on the SVG file, so that is not entirely o.k. and the web version of the Colibrio reader (which otherwise seems to do a good job for MO) seems not to implement spine level SVG in the first place... |
Actually... maybe we are overthinking the SVG issue. The highlight mechanism is CSS based. Whatever can be styled can be used for highlight. In theory, there is no need for even mentioning the concept of palpable elements or SVG's text elements; things are fine as they are. Well... at least for the Content document. I would think that the A11y spec, or one of the guidelines, should really say that palpable elements should be used as a highlight focus in MO (or something like that). But the content spec seems to be fine to me. |
Is that definable? It can't be whatever accepts a class attribute, since everything accepts a class attribute whether it can be styled or not. |
In HTML we cannot avoid authors shooting themselves into their feet, i.e., to use a class and its styling on something that cannot be styled. That is, actually, an HTML feature... In our case, the class is added to whatever element is pointed to in the SMIL file: the same thing applies. If that element cannot be styled then... well, too bad for the authors' feet. I do not think we should specify what can be referred to from the SMIL file. If the SVG author (to come back to what triggered this discussion) decides to refer to some complex drawing when the SMIL file makes the RS read some specific text: this should be o.k. And it actually is, per spec. Obviously, there are A11y issues that might have to be added to some implementation guide; you are much more of an expert than I am on that. But for the content document, I believe our best action is... to do nothing :-) |
Well MO rendering just applies a class attribute, the "style" part is up to the author. Like my earlier comment, I think it's up to the author whether it makes sense or not. But the spec mechanism works in either case of HTML or SVG. |
I wonder how we should close this issue. My feeling is: (1) we can modify the audio element's definition with something like:
This is what I proposed in #1986 (comment) (not sure about the SHOULD vs. should, though). (2) The definition of the text element should say "in an EPUB Content Document" instead of "in the Content Document" (see #1986 (comment)) (3) I have the impression that getting into some further specification about palpable elements and its equivalents for SVG would become way more complicated than what it is worth; spec-wise, I think the current text is o.k. Would we want, however, to add a note after the text element definition along the lines of:
I am happy to write up a PR. |
Can we just say it's an OPTIONAL child of Also, is it clear enough that this audio should be an audio representation of the corresponding text element? Should that be in the element definition? I would also move the reading system TTS behavior recommendation to the RS spec, under rendering text. Everything else you suggested sounds great to me. |
…and that is also true for the TTS. Specification-wise you are right but, editorially and for readability, I think it is better if these two information are made explicit (with a link to the embedded a/v section, b.t.w.). It may be difficult to find those points otherwise.
Well... that may have been the intention of MO, but I am not sure this restriction is actually part of the spec. Isn't it perfectly o.k. to have
which provides a musical background to a video element? My reading is that this is perfectly conformant to the spec (and it should be).
I am not sure what you mean. The content document has a section on TTS (§ 7.3.2.5) and so does the RS document (§ 8.3.3). Both sections are very short, obviously, but they are also different. What is the change you propose? |
Why would we use the definition for the audio element to talk about the text element? It makes the audio element definition less clear.
Ok, I propose two things:
" |
You mean when there is no audio sibling, right? Or do you want to remove the whole of the embedded media section and disallow referencing audio/video from a text element? I am not sure if I like the idea...
Good point about implementation, of course. I already have a not-yet-fully-finished test on embedded video usage (a PR should come to your review soon) and the implementations are, so far, shall we say poor. But not to have them at all may be premature at this point. |
@marisademeglio I may misunderstand what you said; I went ahead and created the PR #2001. We can finalize the text more easily over there, I think. |
I am inclined to disallow it completely. It feels like the purpose of allowing this is to cover all our edge cases, not to provide useful content creation mechanisms or user experiences (I do not know of a single example, even just a test, that uses this mechanism to provide good accessibility). It is also hell on implementations. But I digress -- I don't think we can yet take the liberty of making such a big change. Though as soon as we can, I'm ready! Thanks for making the spec edits! I will review shortly. |
Created issue #2001. This issue will be closed if and when that PR is merged. |
I a bit bothered by the usage requirement of the audio element in MO. The text says:
As read, this sounds like an untestable requirement to me, unless it is tested run-time. After all, how does a checker know that an HTML file, linked through a
text
element, is used with text-to-speech? We do not define normatively anything for tts after all.Maybe EPUBCheck found a trick that I miss (@rdeltour ?). Otherwise, it may be cleaner to replace that requirement by something like:
I think that content-wise we are saying the same thing, but the checker is not under the obligation of checking this.
The text was updated successfully, but these errors were encountered: