Add speech recognition context to the Web Speech API #145

yrw-google · 2025-02-24T20:36:11Z

Introduce a new speech recognition context feature for contextual biasing

yrw-google · 2025-02-27T19:32:43Z

Hi @padenot, can you review this one too and also take a look at the explainer again when you get a chance? Feel free to assign another reviewer too if needed

index.bs

padenot · 2025-02-28T13:06:16Z

index.bs

+[Exposed=Window]
+interface SpeechRecognitionPhraseList {
+    readonly attribute unsigned long length;
+    getter SpeechRecognitionPhrase item(unsigned long index);


This is invalid WebIDL.

It would be either:

SpeechRecognition item(unsigned long index);

or

getter SpeechRecognition(unsigned long index)

What is the intent here?

I've changed this to SpeechRecognitionPhrase item(unsigned long index), but using getter is actually how SpeechRecognitionResultList is doing it, as well as some list objects I've seen in other specs, e.g. https://html.spec.whatwg.org/multipage/common-dom-interfaces.html#the-domstringlist-interface. I thought it's a standard thing to always define a getter for a list, but I don't see that getter is required in our use case, so I can either keep it or remove it.

index.bs

yrw-google · 2025-03-05T00:14:05Z

Hi @padenot, I've updated the specs as well as the explainer according to your comments. Please take a look again when you get a chance. Thanks!

padenot · 2025-03-05T15:00:25Z

Can we please focus on either the explainer or the spec patch? If we have a spec patch, an explainer shouldn't be necessary. If we aren't comfortable writing the spec patch right now because we want to iterate, it doesn't seem useful to update the spec patch.

Let me know which one I should look at first please?

yrw-google · 2025-03-05T21:12:25Z

Hi @padenot, you can focus on the spec patch right now. I'm keeping the spec patch and the explainer in sync and the spec patch has many more details than the explainer, so if we can reach consensus on the spec, we can also reach consensus on the explainer easily.

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

padenot · 2025-03-10T15:42:54Z

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

It's not necessary. Details can and should be in informative notes or MDN.

yrw-google · 2025-03-10T22:16:22Z

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

It's not necessary. Details can and should be in informative notes or MDN.

I've closed the PR for explainer. Can you now review the spec changes?

padenot

Getting there! lmk if I have been unclear in certain parts of the review.

index.bs

yrw-google · 2025-03-13T23:49:10Z

Hi @padenot, I decided to remove SpeechRecognitionContext and updateContext() since they seem redundant at this point and can cause confusion, so instead we will add SpeechRecognitionPhraseList phrases to SpeechRecognition directly and always update this attribute. I've also addressed your other comments so please take a look again.

padenot

Looking a lot better, some questions still, but it feels we're almost there.

index.bs

yrw-google · 2025-03-27T21:55:27Z

Hi @padenot, a gentle ping for reviewing this

padenot

Almost there!

padenot · 2025-03-31T12:34:33Z

index.bs

+    The setter steps are:
+    1. If the {{SpeechRecognitionPhraseList/length}} of {{SpeechRecognition/phrases}} is greater than 0
+        and the system using the given value for {{SpeechRecognition/[[mode]]}} does not support contextual biasing,
+        throw a {{SpeechRecognitionErrorEvent}} with the {{SpeechRecognitionErrorCode/not-allowed}} error code and abort these steps.


not-supported for consistency? Or we want to clearly distinguish the two? What do we lose by using "normal" errors here?

I was thinking that when user is trying to set the mode here but get a phrases-not-supported error, it seems weird that the error is not related to mode directly, but we can also add error message to explain this better, so I'll switch to phrases-not-supported for consistency.

index.bs

padenot · 2025-03-31T12:40:35Z

index.bs

+  <dt><dfn method for=SpeechRecognitionPhraseList>addItem(|item|)</dfn> method</dt>
+  <dd>
+    This method adds the {{SpeechRecognitionPhrase}} object |item| to the list.
+    When invoked, add |item| to the end of {{SpeechRecognitionPhraseList/[[phrases]]}}.


What happens if we add an element twice with different scores? Do we want to use a set instead from the infra link above?

If we want to avoid the same phrase with different boosts, we will need to use a map like map<string, float> for the phrase-boost pair, rather than a set. I feel like this adds more limitations to the design which is not as desired. For example, if in the future we want to add a pronunciation attribute for each SpeechRecognitionPhrase, using a list like now is easier than a map.

For now if user adds the phrase twice with different boosts, I think the system can use the boost that's updated at the second phrase, and this case should be rare.

Right, my point is that we can do whatever, but it needs to be specified.

I've added descriptions for this case.

index.bs

Introduce a new speech recognition context feature for contextual biasing

Remove SpeechRecognitionContext and add SpeechRecognitionPhraseList to SpeechRecognition directly Remove updateContext and always update phrases instead Rename context-not-supported error code to phrases-not-supported Add removeItem to SpeechRecognitionPhraseList

yrw-google force-pushed the main branch from 62b1598 to 6ce27d9 Compare February 24, 2025 20:42

padenot requested changes Feb 28, 2025

View reviewed changes

yrw-google force-pushed the main branch 5 times, most recently from 5be471c to f1cd16f Compare March 4, 2025 23:21

yrw-google force-pushed the main branch from f1cd16f to a36a57a Compare March 10, 2025 22:14

yrw-google requested a review from padenot March 10, 2025 22:17

padenot requested changes Mar 11, 2025

View reviewed changes

yrw-google requested a review from padenot March 13, 2025 23:49

padenot requested changes Mar 17, 2025

View reviewed changes

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

yrw-google force-pushed the main branch from 27b5db4 to 7e1c292 Compare March 18, 2025 00:42

yrw-google requested a review from padenot March 18, 2025 00:50

yrw-google force-pushed the main branch from 7e1c292 to 2c63fab Compare March 21, 2025 21:49

padenot requested changes Mar 31, 2025

View reviewed changes

yrw-google and others added 3 commits March 31, 2025 15:24

Add speech recognition context to the Web Speech API

fc1135d

Introduce a new speech recognition context feature for contextual biasing

Minor updates for comments

69cda24

yrw-google force-pushed the main branch from 2c63fab to 69cda24 Compare March 31, 2025 22:44

yrw-google requested a review from padenot March 31, 2025 22:49

Add descriptions for corner cases

2e4b76b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add speech recognition context to the Web Speech API #145

Add speech recognition context to the Web Speech API #145

yrw-google commented Feb 24, 2025 •

edited by pr-preview bot

Loading

yrw-google commented Feb 27, 2025 •

edited

Loading

padenot Feb 28, 2025

yrw-google Mar 4, 2025

yrw-google commented Mar 5, 2025

padenot commented Mar 5, 2025

yrw-google commented Mar 5, 2025

padenot commented Mar 10, 2025

yrw-google commented Mar 10, 2025

padenot left a comment

yrw-google commented Mar 13, 2025

padenot left a comment

yrw-google commented Mar 27, 2025

padenot left a comment

padenot Mar 31, 2025

yrw-google Mar 31, 2025

padenot Mar 31, 2025

yrw-google Mar 31, 2025

padenot Apr 1, 2025

yrw-google Apr 2, 2025

Add speech recognition context to the Web Speech API #145

Are you sure you want to change the base?

Add speech recognition context to the Web Speech API #145

Conversation

yrw-google commented Feb 24, 2025 • edited by pr-preview bot Loading

yrw-google commented Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yrw-google commented Mar 5, 2025

padenot commented Mar 5, 2025

yrw-google commented Mar 5, 2025

padenot commented Mar 10, 2025

yrw-google commented Mar 10, 2025

padenot left a comment

Choose a reason for hiding this comment

yrw-google commented Mar 13, 2025

padenot left a comment

Choose a reason for hiding this comment

yrw-google commented Mar 27, 2025

padenot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yrw-google commented Feb 24, 2025 •

edited by pr-preview bot

Loading

yrw-google commented Feb 27, 2025 •

edited

Loading