Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add speech recognition context to the Web Speech API #145

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

yrw-google
Copy link

@yrw-google yrw-google commented Feb 24, 2025

Introduce a new speech recognition context feature for contextual biasing


Preview | Diff

@yrw-google
Copy link
Author

yrw-google commented Feb 27, 2025

Hi @padenot, can you review this one too and also take a look at the explainer again when you get a chance? Feel free to assign another reviewer too if needed

index.bs Outdated
[Exposed=Window]
interface SpeechRecognitionPhraseList {
readonly attribute unsigned long length;
getter SpeechRecognitionPhrase item(unsigned long index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is invalid WebIDL.

It would be either:

SpeechRecognition item(unsigned long index);

or

getter SpeechRecognition(unsigned long index)

What is the intent here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to SpeechRecognitionPhrase item(unsigned long index), but using getter is actually how SpeechRecognitionResultList is doing it, as well as some list objects I've seen in other specs, e.g. https://html.spec.whatwg.org/multipage/common-dom-interfaces.html#the-domstringlist-interface. I thought it's a standard thing to always define a getter for a list, but I don't see that getter is required in our use case, so I can either keep it or remove it.

@yrw-google yrw-google force-pushed the main branch 5 times, most recently from 5be471c to f1cd16f Compare March 4, 2025 23:21
@yrw-google
Copy link
Author

Hi @padenot, I've updated the specs as well as the explainer according to your comments. Please take a look again when you get a chance. Thanks!

@padenot
Copy link
Member

padenot commented Mar 5, 2025

Can we please focus on either the explainer or the spec patch? If we have a spec patch, an explainer shouldn't be necessary. If we aren't comfortable writing the spec patch right now because we want to iterate, it doesn't seem useful to update the spec patch.

Let me know which one I should look at first please?

@yrw-google
Copy link
Author

Hi @padenot, you can focus on the spec patch right now. I'm keeping the spec patch and the explainer in sync and the spec patch has many more details than the explainer, so if we can reach consensus on the spec, we can also reach consensus on the explainer easily.

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

@padenot
Copy link
Member

padenot commented Mar 10, 2025

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

It's not necessary. Details can and should be in informative notes or MDN.

@yrw-google
Copy link
Author

I think the explainer is still necessary when we want to launch this feature since we will be asked for a link to an explainer in many places. The explainer is also a good place to show why we want to add contextual biasing, and provides brief introduction on the changes for people who don't want to learn about every detail in specs.

It's not necessary. Details can and should be in informative notes or MDN.

I've closed the PR for explainer. Can you now review the spec changes?

@yrw-google yrw-google requested a review from padenot March 10, 2025 22:17
Copy link
Member

@padenot padenot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there! lmk if I have been unclear in certain parts of the review.

@yrw-google
Copy link
Author

Hi @padenot, I decided to remove SpeechRecognitionContext and updateContext() since they seem redundant at this point and can cause confusion, so instead we will add SpeechRecognitionPhraseList phrases to SpeechRecognition directly and always update this attribute. I've also addressed your other comments so please take a look again.

@yrw-google yrw-google requested a review from padenot March 13, 2025 23:49
Copy link
Member

@padenot padenot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot better, some questions still, but it feels we're almost there.

@yrw-google
Copy link
Author

Hi @padenot, a gentle ping for reviewing this

Copy link
Member

@padenot padenot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

index.bs Outdated
The setter steps are:
1. If the {{SpeechRecognitionPhraseList/length}} of {{SpeechRecognition/phrases}} is greater than 0
and the system using the given value for {{SpeechRecognition/[[mode]]}} does not support contextual biasing,
throw a {{SpeechRecognitionErrorEvent}} with the {{SpeechRecognitionErrorCode/not-allowed}} error code and abort these steps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not-supported for consistency? Or we want to clearly distinguish the two? What do we lose by using "normal" errors here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that when user is trying to set the mode here but get a phrases-not-supported error, it seems weird that the error is not related to mode directly, but we can also add error message to explain this better, so I'll switch to phrases-not-supported for consistency.

<dt><dfn method for=SpeechRecognitionPhraseList>addItem(|item|)</dfn> method</dt>
<dd>
This method adds the {{SpeechRecognitionPhrase}} object |item| to the list.
When invoked, add |item| to the end of {{SpeechRecognitionPhraseList/[[phrases]]}}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we add an element twice with different scores? Do we want to use a set instead from the infra link above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to avoid the same phrase with different boosts, we will need to use a map like map<string, float> for the phrase-boost pair, rather than a set. I feel like this adds more limitations to the design which is not as desired. For example, if in the future we want to add a pronunciation attribute for each SpeechRecognitionPhrase, using a list like now is easier than a map.

For now if user adds the phrase twice with different boosts, I think the system can use the boost that's updated at the second phrase, and this case should be rare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my point is that we can do whatever, but it needs to be specified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added descriptions for this case.

yrw-google and others added 3 commits March 31, 2025 15:24
Introduce a new speech recognition context feature for contextual biasing
Remove SpeechRecognitionContext and add SpeechRecognitionPhraseList to SpeechRecognition directly

Remove updateContext and always update phrases instead

Rename context-not-supported error code to phrases-not-supported

Add removeItem to SpeechRecognitionPhraseList
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants