-
Notifications
You must be signed in to change notification settings - Fork 36
Timing of SpeechSynthesis state changes not defined #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I needed this to write a test for https://bugs.chromium.org/p/chromium/issues/detail?id=679043. |
What issues do you have? A transitory pending? Whether speaking goes false at the end of each utterance and goes true at each new utterance? Whether the engine can remove a second utterance from the queue and thus start "speaking" it before it has finished speaking the first? The spec has the engine initialized to pending=false, speaking=false, and paused=false. Those attributes do not seem to have events directly associated with them. The important events happen to utterances. Testing pending, speaking, and paused during an utterance event is begging for a race error. When the user does .speak(utterance), logically that puts the utterance in the queue, so pending=true. The engine removes an utterance from the queue. If the engine removes the last utterance from the queue, then pending=false. The engine starts processing the utterance and building audio to play. When it has a buffer, it starts sending audio to the speakers. There's a window from when an utterance is pulled off the queue to when the audio starts coming out of the speakers. Supposedly, anytime during during that window the engine can issue the start utterance event and make speak=true. Ideally, speaking=noise coming out of the speakers, but I do not see that as a strict requirement of the spec. The spec isn't clear about the order the start utterance event and the speaking attribute, but I'm not sure it needs to be. At some point, the engine finishes processing the utterance and posts the last audio block, but the engine cannot return the utterance just yet. The engine must wait for the utterance's last audio block to finish playing. Then the engine can issue the utterance end event and release the utterance. A reasonable engine will pull the next utterance off the queue before the audio from the previous utterance has finished playing. Pending may go false even though the first utterance has not issued an end event. The current spec implies that processing may start on the next utterance, but the start utterance event will not happen until after the previous utterance has issued its end event. (The current description does not allow overlapped utterances / box model, so sequential, ordered, events are implied.) If the user commands pause, then the engine should set pause=true should pause the audio system. It must then figure out which sample the audio system paused on so it can determine the current utterance. It may have to issue an end event for the previous utterance, a start event for the current utterance, and a pause event for the current utterance. There may also be mark and boundary events that need to be issued in their proper order. If the .pause() hits after the only utterance has finished speaking, then there is no utterance for a pause event, so no pause event is issued. I don't think the spec covered this (the pause event is "Fired when and if this utterance is paused mid-utterance."), but imagine the speech system has been (1) paused when it just finished utterance 1 but before it has pulled the next utterance off the queue or (2) paused with no utterances in the queue or speaking, and then an utterance is added with .speak(). That means there's no utterance pause event. The engine should pull the next utterance off the queue (when and if it arrives), issue an utterance start event, and immediately issue an utterance pause event. ("Mid-utterance" should include at the start of the utterance (sample 0).) If the user commands resume, then pause=false and the engine resumes the audio and issues the utterance resume event. There's a subtle question about ordering the transitions of the pending, speaking, and paused attributes with respect to the utterance events, but I don't think a program should ever depend on those timings because they can change asynchronously. The program might be processing an utterance resume event when a subsequent pause has been executed; the utterance processing must proceed no matter the current state of the speech engine. |
It's just that the spec just doesn't say exactly when state is manipulated and events are fired. Compare to https://html.spec.whatwg.org/multipage/media.html#dom-media-pause which has an algorithm that synchronously set the Web Speech might say:
|
In other words, unlike media elements, it looks like Web Speech changes the script-readable state right before events are fired. This is actually better I think. Nonetheless, the spec doesn't say in enough detail to write tests asserting as much. |
I'm still having trouble with your desires. The state transitions of the speech engine (pending, speaking, paused) do not have to be ordered with respect to the state of the utterances (start, marks, paused, resumed, end). Furthermore, the code handling an utterance event should not be looking at the speech engine state. The Web Speech spec is not firing events at the speech engine (except for You can .speak(uttLasting5seconds), field an onstart event for that utterance, wait 1 second, and command .pause() (from outside the event handler). You don't know what the state of the speech engine is after the call, but you should see an onpause event for the utterance. You can then issue a .resume() and expect to see an onresume event followed by an onend event. You cannot rely on this behavior: utter.onpause = t.step_func(() => {
utter.onpause = null;
assert_true(speechSynthesis.paused, 'paused state at pause event');
speechSynthesis.resume();
// paused state changes async, right before the resume event
assert_true(speechSynthesis.paused, 'paused state after resume()');
utter.onresume = t.step_func_done(() => {
assert_false(speechSynthesis.paused, 'paused state at resume event');
});
}); It confuses many issues. Why can't .resume() be instantaneous? |
I don't have a strong opinion about what the best behavior is, I'm just pointing out that the spec in fact doesn't say what the behavior should be. "paused state changes async, right before the resume event" was just matching what I observed browsers to do. |
…d resume(), a=testonly Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992) For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40 -- wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c wpt-pr: 12992
…d resume(), a=testonly Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992) For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40 -- wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c wpt-pr: 12992
…d resume(), a=testonly Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992) For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40 -- wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c wpt-pr: 12992 UltraBlame original commit: 4b9326fa29a486e29995cf8fa19ac7172b811ad5
…d resume(), a=testonly Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992) For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40 -- wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c wpt-pr: 12992 UltraBlame original commit: 4b9326fa29a486e29995cf8fa19ac7172b811ad5
…d resume(), a=testonly Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992) For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40 -- wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c wpt-pr: 12992 UltraBlame original commit: 4b9326fa29a486e29995cf8fa19ac7172b811ad5
https://w3c.github.io/speech-api/#speechsynthesis
The spec doesn't define when precisely the
pending
,speaking
andpaused
states change. This makes it impossible to write a detailed test for SpeechSynthesisUtterance pause() and resume() based on the spec.The text was updated successfully, but these errors were encountered: