Timing of SpeechSynthesis state changes not defined #39

foolip · 2018-09-13T16:41:00Z

https://w3c.github.io/speech-api/#speechsynthesis

The spec doesn't define when precisely the pending, speaking and paused states change. This makes it impossible to write a detailed test for SpeechSynthesisUtterance pause() and resume() based on the spec.

The text was updated successfully, but these errors were encountered:

foolip · 2018-09-13T16:42:54Z

I needed this to write a test for https://bugs.chromium.org/p/chromium/issues/detail?id=679043.

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40

GLRoylance · 2018-09-13T19:13:30Z

What issues do you have? A transitory pending? Whether speaking goes false at the end of each utterance and goes true at each new utterance? Whether the engine can remove a second utterance from the queue and thus start "speaking" it before it has finished speaking the first?

The spec has the engine initialized to pending=false, speaking=false, and paused=false. Those attributes do not seem to have events directly associated with them. The important events happen to utterances. Testing pending, speaking, and paused during an utterance event is begging for a race error.

When the user does .speak(utterance), logically that puts the utterance in the queue, so pending=true.

The engine removes an utterance from the queue. If the engine removes the last utterance from the queue, then pending=false.

The engine starts processing the utterance and building audio to play. When it has a buffer, it starts sending audio to the speakers.

There's a window from when an utterance is pulled off the queue to when the audio starts coming out of the speakers. Supposedly, anytime during during that window the engine can issue the start utterance event and make speak=true. Ideally, speaking=noise coming out of the speakers, but I do not see that as a strict requirement of the spec. The spec isn't clear about the order the start utterance event and the speaking attribute, but I'm not sure it needs to be.

At some point, the engine finishes processing the utterance and posts the last audio block, but the engine cannot return the utterance just yet. The engine must wait for the utterance's last audio block to finish playing. Then the engine can issue the utterance end event and release the utterance.

A reasonable engine will pull the next utterance off the queue before the audio from the previous utterance has finished playing. Pending may go false even though the first utterance has not issued an end event. The current spec implies that processing may start on the next utterance, but the start utterance event will not happen until after the previous utterance has issued its end event. (The current description does not allow overlapped utterances / box model, so sequential, ordered, events are implied.)

If the user commands pause, then the engine should set pause=true should pause the audio system. It must then figure out which sample the audio system paused on so it can determine the current utterance. It may have to issue an end event for the previous utterance, a start event for the current utterance, and a pause event for the current utterance. There may also be mark and boundary events that need to be issued in their proper order.

If the .pause() hits after the only utterance has finished speaking, then there is no utterance for a pause event, so no pause event is issued.

I don't think the spec covered this (the pause event is "Fired when and if this utterance is paused mid-utterance."), but imagine the speech system has been (1) paused when it just finished utterance 1 but before it has pulled the next utterance off the queue or (2) paused with no utterances in the queue or speaking, and then an utterance is added with .speak(). That means there's no utterance pause event. The engine should pull the next utterance off the queue (when and if it arrives), issue an utterance start event, and immediately issue an utterance pause event. ("Mid-utterance" should include at the start of the utterance (sample 0).)

If the user commands resume, then pause=false and the engine resumes the audio and issues the utterance resume event.

There's a subtle question about ordering the transitions of the pending, speaking, and paused attributes with respect to the utterance events, but I don't think a program should ever depend on those timings because they can change asynchronously. The program might be processing an utterance resume event when a subsequent pause has been executed; the utterance processing must proceed no matter the current state of the speech engine.

foolip · 2018-09-14T08:31:57Z

It's just that the spec just doesn't say exactly when state is manipulated and events are fired. Compare to https://html.spec.whatwg.org/multipage/media.html#dom-media-pause which has an algorithm that synchronously set the paused attribute and says, effective "queue a task for fire a simple event named paused".

Web Speech might say:

Return and run the following steps in parallel:
1. Wait until [some condition is true]
2. Queue a task to run the following steps
  1. Set [some state correspdoning which is what the paused attribute uses]
  2. Fire a simple event named "pause" at [some target]

foolip · 2018-09-14T08:32:49Z

In other words, unlike media elements, it looks like Web Speech changes the script-readable state right before events are fired. This is actually better I think. Nonetheless, the spec doesn't say in enough detail to write tests asserting as much.

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40

GLRoylance · 2018-09-14T21:32:37Z

I'm still having trouble with your desires. The state transitions of the speech engine (pending, speaking, paused) do not have to be ordered with respect to the state of the utterances (start, marks, paused, resumed, end). Furthermore, the code handling an utterance event should not be looking at the speech engine state.

The Web Speech spec is not firing events at the speech engine (except for onvoiceschanged which is async to everything else). The events are fired at utterances.

You can .speak(uttLasting5seconds), field an onstart event for that utterance, wait 1 second, and command .pause() (from outside the event handler). You don't know what the state of the speech engine is after the call, but you should see an onpause event for the utterance. You can then issue a .resume() and expect to see an onresume event followed by an onend event.

You cannot rely on this behavior:

    utter.onpause = t.step_func(() => {
        utter.onpause = null;
        assert_true(speechSynthesis.paused, 'paused state at pause event');
         speechSynthesis.resume();
         // paused state changes async, right before the resume event
        assert_true(speechSynthesis.paused, 'paused state after resume()');
         utter.onresume = t.step_func_done(() => {
          assert_false(speechSynthesis.paused, 'paused state at resume event');
        });
      });

It confuses many issues. Why can't .resume() be instantaneous?

foolip · 2018-09-15T00:33:34Z

I don't have a strong opinion about what the best behavior is, I'm just pointing out that the spec in fact doesn't say what the behavior should be. "paused state changes async, right before the resume event" was just matching what I observed browsers to do.

…d resume(), a=testonly Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992) For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40 -- wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c wpt-pr: 12992

…d resume(), a=testonly Automatic update from web-platform-testsAdd tests for SpeechSynthesis pause() and resume() (#12992) For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40 -- wpt-commits: d85043d2c674aef5a8939c18454c683f82eaab2c wpt-pr: 12992 UltraBlame original commit: 4b9326fa29a486e29995cf8fa19ac7172b811ad5

foolip added a commit to web-platform-tests/wpt that referenced this issue Sep 13, 2018

Add tests for SpeechSynthesis pause() and resume()

87280bc

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40

foolip mentioned this issue Sep 13, 2018

Add tests for SpeechSynthesis pause() and resume() web-platform-tests/wpt#12992

Merged

GLRoylance mentioned this issue Sep 13, 2018

Timing of SpeechSynthesisUtterance events firing not defined #40

Open

foolip added a commit to web-platform-tests/wpt that referenced this issue Sep 14, 2018

Add tests for SpeechSynthesis pause() and resume() (#12992)

d85043d

For https://bugs.chromium.org/p/chromium/issues/detail?id=679043 Spec bugs: WebAudio/web-speech-api#39 WebAudio/web-speech-api#40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timing of SpeechSynthesis state changes not defined #39

Timing of SpeechSynthesis state changes not defined #39

foolip commented Sep 13, 2018

foolip commented Sep 13, 2018

GLRoylance commented Sep 13, 2018

foolip commented Sep 14, 2018

foolip commented Sep 14, 2018

GLRoylance commented Sep 14, 2018 •

edited

Loading

foolip commented Sep 15, 2018

Timing of SpeechSynthesis state changes not defined #39

Timing of SpeechSynthesis state changes not defined #39

Comments

foolip commented Sep 13, 2018

foolip commented Sep 13, 2018

GLRoylance commented Sep 13, 2018

foolip commented Sep 14, 2018

foolip commented Sep 14, 2018

GLRoylance commented Sep 14, 2018 • edited Loading

foolip commented Sep 15, 2018

GLRoylance commented Sep 14, 2018 •

edited

Loading