Skip to content

Timing of SpeechSynthesis state changes not defined #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
foolip opened this issue Sep 13, 2018 · 6 comments
Open

Timing of SpeechSynthesis state changes not defined #39

foolip opened this issue Sep 13, 2018 · 6 comments

Comments

@foolip
Copy link
Collaborator

foolip commented Sep 13, 2018

https://w3c.github.io/speech-api/#speechsynthesis

The spec doesn't define when precisely the pending, speaking and paused states change. This makes it impossible to write a detailed test for SpeechSynthesisUtterance pause() and resume() based on the spec.

@foolip
Copy link
Collaborator Author

foolip commented Sep 13, 2018

I needed this to write a test for https://bugs.chromium.org/p/chromium/issues/detail?id=679043.

@GLRoylance
Copy link

What issues do you have? A transitory pending? Whether speaking goes false at the end of each utterance and goes true at each new utterance? Whether the engine can remove a second utterance from the queue and thus start "speaking" it before it has finished speaking the first?

The spec has the engine initialized to pending=false, speaking=false, and paused=false. Those attributes do not seem to have events directly associated with them. The important events happen to utterances. Testing pending, speaking, and paused during an utterance event is begging for a race error.

When the user does .speak(utterance), logically that puts the utterance in the queue, so pending=true.

The engine removes an utterance from the queue. If the engine removes the last utterance from the queue, then pending=false.

The engine starts processing the utterance and building audio to play. When it has a buffer, it starts sending audio to the speakers.

There's a window from when an utterance is pulled off the queue to when the audio starts coming out of the speakers. Supposedly, anytime during during that window the engine can issue the start utterance event and make speak=true. Ideally, speaking=noise coming out of the speakers, but I do not see that as a strict requirement of the spec. The spec isn't clear about the order the start utterance event and the speaking attribute, but I'm not sure it needs to be.

At some point, the engine finishes processing the utterance and posts the last audio block, but the engine cannot return the utterance just yet. The engine must wait for the utterance's last audio block to finish playing. Then the engine can issue the utterance end event and release the utterance.

A reasonable engine will pull the next utterance off the queue before the audio from the previous utterance has finished playing. Pending may go false even though the first utterance has not issued an end event. The current spec implies that processing may start on the next utterance, but the start utterance event will not happen until after the previous utterance has issued its end event. (The current description does not allow overlapped utterances / box model, so sequential, ordered, events are implied.)

If the user commands pause, then the engine should set pause=true should pause the audio system. It must then figure out which sample the audio system paused on so it can determine the current utterance. It may have to issue an end event for the previous utterance, a start event for the current utterance, and a pause event for the current utterance. There may also be mark and boundary events that need to be issued in their proper order.

If the .pause() hits after the only utterance has finished speaking, then there is no utterance for a pause event, so no pause event is issued.

I don't think the spec covered this (the pause event is "Fired when and if this utterance is paused mid-utterance."), but imagine the speech system has been (1) paused when it just finished utterance 1 but before it has pulled the next utterance off the queue or (2) paused with no utterances in the queue or speaking, and then an utterance is added with .speak(). That means there's no utterance pause event. The engine should pull the next utterance off the queue (when and if it arrives), issue an utterance start event, and immediately issue an utterance pause event. ("Mid-utterance" should include at the start of the utterance (sample 0).)

If the user commands resume, then pause=false and the engine resumes the audio and issues the utterance resume event.

There's a subtle question about ordering the transitions of the pending, speaking, and paused attributes with respect to the utterance events, but I don't think a program should ever depend on those timings because they can change asynchronously. The program might be processing an utterance resume event when a subsequent pause has been executed; the utterance processing must proceed no matter the current state of the speech engine.

@foolip
Copy link
Collaborator Author

foolip commented Sep 14, 2018

It's just that the spec just doesn't say exactly when state is manipulated and events are fired. Compare to https://html.spec.whatwg.org/multipage/media.html#dom-media-pause which has an algorithm that synchronously set the paused attribute and says, effective "queue a task for fire a simple event named paused".

Web Speech might say:

  1. Return and run the following steps in parallel:
    1. Wait until [some condition is true]
    2. Queue a task to run the following steps
      1. Set [some state correspdoning which is what the paused attribute uses]
      2. Fire a simple event named "pause" at [some target]

@foolip
Copy link
Collaborator Author

foolip commented Sep 14, 2018

In other words, unlike media elements, it looks like Web Speech changes the script-readable state right before events are fired. This is actually better I think. Nonetheless, the spec doesn't say in enough detail to write tests asserting as much.

@GLRoylance
Copy link

GLRoylance commented Sep 14, 2018

I'm still having trouble with your desires. The state transitions of the speech engine (pending, speaking, paused) do not have to be ordered with respect to the state of the utterances (start, marks, paused, resumed, end). Furthermore, the code handling an utterance event should not be looking at the speech engine state.

The Web Speech spec is not firing events at the speech engine (except for onvoiceschanged which is async to everything else). The events are fired at utterances.

You can .speak(uttLasting5seconds), field an onstart event for that utterance, wait 1 second, and command .pause() (from outside the event handler). You don't know what the state of the speech engine is after the call, but you should see an onpause event for the utterance. You can then issue a .resume() and expect to see an onresume event followed by an onend event.

You cannot rely on this behavior:

    utter.onpause = t.step_func(() => {
        utter.onpause = null;
        assert_true(speechSynthesis.paused, 'paused state at pause event');
         speechSynthesis.resume();
         // paused state changes async, right before the resume event
        assert_true(speechSynthesis.paused, 'paused state after resume()');
         utter.onresume = t.step_func_done(() => {
          assert_false(speechSynthesis.paused, 'paused state at resume event');
        });
      });

It confuses many issues. Why can't .resume() be instantaneous?

@foolip
Copy link
Collaborator Author

foolip commented Sep 15, 2018

I don't have a strong opinion about what the best behavior is, I'm just pointing out that the spec in fact doesn't say what the behavior should be. "paused state changes async, right before the resume event" was just matching what I observed browsers to do.

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Sep 18, 2018
…d resume(), a=testonly

Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992)

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043

Spec bugs:
WebAudio/web-speech-api#39
WebAudio/web-speech-api#40
--

wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c
wpt-pr: 12992
jankeromnes pushed a commit to jankeromnes/gecko that referenced this issue Sep 19, 2018
…d resume(), a=testonly

Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992)

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043

Spec bugs:
WebAudio/web-speech-api#39
WebAudio/web-speech-api#40
--

wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c
wpt-pr: 12992
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 3, 2019
…d resume(), a=testonly

Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992)

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043

Spec bugs:
WebAudio/web-speech-api#39
WebAudio/web-speech-api#40
--

wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c
wpt-pr: 12992

UltraBlame original commit: 4b9326fa29a486e29995cf8fa19ac7172b811ad5
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 3, 2019
…d resume(), a=testonly

Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992)

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043

Spec bugs:
WebAudio/web-speech-api#39
WebAudio/web-speech-api#40
--

wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c
wpt-pr: 12992

UltraBlame original commit: 4b9326fa29a486e29995cf8fa19ac7172b811ad5
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 3, 2019
…d resume(), a=testonly

Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992)

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043

Spec bugs:
WebAudio/web-speech-api#39
WebAudio/web-speech-api#40
--

wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c
wpt-pr: 12992

UltraBlame original commit: 4b9326fa29a486e29995cf8fa19ac7172b811ad5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants