Skip to content

Commit da6e057

Browse files
authored
Specify language tag fallback support
This relies on the infrastructure from ECMA-402 to give sensible answers about language support even in the presence of multiple subtags.
1 parent f62dfb9 commit da6e057

File tree

1 file changed

+86
-19
lines changed

1 file changed

+86
-19
lines changed

index.bs

+86-19
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,16 @@ Die On: warning
2020
<pre class="link-defaults">
2121
spec:infra; type:dfn; text:user agent
2222
</pre>
23+
<pre class="anchors">
24+
urlPrefix: https://tc39.es/ecma402/; spec: ECMA-402
25+
type: dfn
26+
text: [[AvailableLocales]]; url: sec-internal-slots
27+
text: Unicode canonicalized locale identifier; url: sec-language-tags
28+
type: abstract-op
29+
text: LookupMatchingLocaleByBestFit; url: sec-lookupmatchinglocalebybestfit
30+
text: IsStructurallyValidLanguageTag; url: sec-isstructurallyvalidlanguagetag
31+
text: CanonicalizeUnicodeLocaleId; url: sec-canonicalizeunicodelocaleid
32+
</pre>
2333

2434
<style>
2535
dl.props { display: grid; grid-template-columns: max-content auto; row-gap: 0.25em; column-gap: 1em; }
@@ -344,9 +354,9 @@ The <dfn attribute for="AI">summarizer</dfn> getter steps are to return [=this=]
344354

345355
1. Set |availableCreateOptions|[(|type|, |format|, |length|)] to the [=current summarizer create options availability=] given |type|, |format|, and |length|.
346356

347-
1. Let |availableLanguages| be the [=current summarizer language availability map=].
357+
1. Let « |readilyAvailableLanguages|, |afterDownloadAvailableLanguages| » be the [=current summarizer language availabilities=].
348358

349-
1. If |availableLanguages| is null, or |availableCreateOptions|'s [=map/values=] [=list/contains=] null, then [=queue a global task=] on the [=AI task source=] given [=this=] to perform the following steps:
359+
1. If |readilyAvailableLanguages| is null, |afterDownloadAvailableLanguages| is null, or |availableCreateOptions|'s [=map/values=] [=list/contains=] null, then [=queue a global task=] on the [=AI task source=] given [=this=] to perform the following steps:
350360

351361
1. [=Reject=] |promise| with an "{{UnknownError}}" {{DOMException}}.
352362

@@ -357,8 +367,10 @@ The <dfn attribute for="AI">summarizer</dfn> getter steps are to return [=this=]
357367
<dl class="props">
358368
: [=AISummarizerCapabilities/available create options=]
359369
:: |availableCreateOptions|
360-
: [=AISummarizerCapabilities/available languages=]
361-
:: |availableLanguages|
370+
: [=AISummarizerCapabilities/readily available languages=]
371+
:: |readilyAvailableLanguages|
372+
: [=AISummarizerCapabilities/after-download available languages=]
373+
:: |afterDownloadAvailableLanguages|
362374
</dl>
363375

364376
1. [=Resolve=] |promise| with |capabilitiesObject|.
@@ -368,16 +380,18 @@ The <dfn attribute for="AI">summarizer</dfn> getter steps are to return [=this=]
368380

369381
Every {{AISummarizerCapabilities}} has an <dfn for="AISummarizerCapabilities">available create options</dfn>, a [=map=] from [=tuples=] of ({{AISummarizerType}}, {{AISummarizerFormat}}, {{AISummarizerLength}}) values to {{AICapabilityAvailability}} values, set during creation.
370382

371-
Every {{AISummarizerCapabilities}} has an <dfn for="AISummarizerCapabilities">available languages</dfn>, a [=map=] of strings representing BCP 47 language tags to {{AICapabilityAvailability}} values, set during creation. The [=map/values=] will never be "{{AICapabilityAvailability/no}}".
383+
Every {{AISummarizerCapabilities}} has an <dfn for="AISummarizerCapabilities">readily available languages</dfn>, [=set=] of strings representing BCP 47 language tags, set during creation.
384+
385+
Every {{AISummarizerCapabilities}} has an <dfn for="AISummarizerCapabilities">after-download available languages</dfn>, [=set=] of strings representing BCP 47 language tags, set during creation.
372386

373387
<div algorithm>
374388
The <dfn attribute for="AISummarizerCapabilities">available</dfn> getter steps are:
375389

376-
1. If [=this=]'s [=AISummarizerCapabilities/available languages=] [=map/is empty|are empty=], then return "{{AICapabilityAvailability/no}}".
390+
1. If [=this=]'s [=AISummarizerCapabilities/readily available languages=] and [=AISummarizerCapabilities/after-download available languages=] [=map/is empty|are empty=], then return "{{AICapabilityAvailability/no}}".
377391

378392
1. If [=this=]'s all of [=this=]'s [=AISummarizerCapabilities/available create options=] [=map/values=] are "{{AICapabilityAvailability/no}}", then return "{{AICapabilityAvailability/no}}".
379393

380-
1. If all of [=this=]'s [=AISummarizerCapabilities/available create options=]'s [=map/values=] or all of [=this=]'s [=AISummarizerCapabilities/available languages=]'s [=map/values=] are "{{AICapabilityAvailability/after-download}}", then return "{{AICapabilityAvailability/after-download}}".
394+
1. If [=this=]'s [=AISummarizerCapabilities/readily available languages=] [=map/is empty|are empty=], then return "{{AICapabilityAvailability/after-download}}".
381395

382396
1. Return "{{AICapabilityAvailability/readily}}".
383397
</div>
@@ -391,9 +405,23 @@ Every {{AISummarizerCapabilities}} has an <dfn for="AISummarizerCapabilities">av
391405
<div algorithm>
392406
The <dfn method for="AISummarizerCapabilities">languageAvailable(|languageTag|)</dfn> method steps are:
393407

394-
1. Return [=this=]'s [=AISummarizerCapabilities/available languages=][|languageTag|], or "{{AICapabilityAvailability/no}}" if no such [=map/entry=] [=map/exists=].
408+
1. If [$IsStructurallyValidLanguageTag$](|languageTag|) is false, then throw a {{TypeError}}.
409+
410+
1. Set |languageTag| to [$CanonicalizeUnicodeLocaleId$](|languageTag|).
411+
412+
1. Let |bestReadilyAvailableMatch| be [$LookupMatchingLocaleByBestFit$]([=this=]'s [=AISummarizerCapabilities/readily available languages=], |languageTag|).
413+
414+
1. If |bestReadilyAvailableMatch| is not undefined, then return "{{AICapabilityAvailability/readily}}".
415+
416+
<p class="note">|bestReadilyAvailableMatch|.\[[locale]] contains the actual language tag from [=this=]'s [=AISummarizerCapabilities/readily available languages=], which might be different from |languageTag|.
417+
418+
1. Let |bestAfterDownloadAvailableMatch| be [$LookupMatchingLocaleByBestFit$]([=this=]'s [=AISummarizerCapabilities/after-download available languages=], |languageTag|).
395419

396-
<p class="issue">Per <a href="https://github.com/WICG/translation-api/issues/11">WICG/translation-api#11</a> it seems we're supposed to do something more complex than just straight string comparison for language tags, but it's not clear what.</p>
420+
1. If |bestAfterDownloadAvailableMatch| is not undefined, then return "{{AICapabilityAvailability/after-download}}".
421+
422+
<p class="note">|bestAfterDownloadAvailableMatch|.\[[locale]] contains the actual language tag from [=this=]'s [=AISummarizerCapabilities/after-download available languages=], which might be different from |languageTag|.
423+
424+
1. Return "{{AICapabilityAvailability/no}}".
397425
</div>
398426

399427
<hr>
@@ -413,27 +441,66 @@ Every {{AISummarizerCapabilities}} has an <dfn for="AISummarizerCapabilities">av
413441
</div>
414442

415443
<div algorithm>
416-
The <dfn>current summarizer language availability map</dfn> is given by the following steps. They return a [=map=] from strings representing BCP 47 language tags to {{AICapabilityAvailability}} values, or null. [[!RFC5646]]
444+
The <dfn>current summarizer language availabilities</dfn> are given by the following steps. They return a [=list=] containing two [=list/items=]; the items each are [=sets=] of strings representing [=Unicode canonicalized locale identifier=], or null. [[!ECMA-402]]
417445

418446
1. [=Assert=]: this algorithm is running [=in parallel=].
419447

420-
1. If there is some error attempting to determine whether the user agent supports summarizing text, which the user agent believes to be transient (such that re-querying the [=current summarizer create options availability=] could stop producing such an error), then return null.
448+
1. If there is some error attempting to determine whether the user agent supports summarizing text, which the user agent believes to be transient (such that re-querying the [=current summarizer language availabilities=] could stop producing such an error), then return « null, null ».
449+
450+
1. Let |readilyAvailableLanguages| and |afterDownloadAvailableLanguages| be empty [=sets=].
451+
452+
1. [=list/For each=] human language |languageTag|, represented as a [=Unicode canonicalized locale identifier=], for which the user agent supports summarizing text written in that language, without performing any downloading operations:
453+
454+
1. [=set/Append=] |languageTag| to |readilyAvailableLanguages|.
455+
456+
1. [=list/For each=] human language |languageTag|, represented as a [=Unicode canonicalized locale identifier=], for which the user agent believes it can summarize text written in that language, but only after performing a download (e.g., of an AI model or fine-tuning):
457+
458+
1. [=Assert=]: |readilyAvailableLanguages| does not [=set/contain=] |languageTag|.
459+
460+
1. [=set/Append=] |languageTag| to |afterDownloadAvailableLanguages|.
461+
462+
1. If the [=set/union=] of |readilyAvailableLanguages| and |afterDownloadAvailableLanguages| does not meet the [=language tag set completeness rules=], then:
463+
464+
1. Let |missingLanguageTags| be the [=set=] of missing language tags necessary to meet the [=language tag set completeness rules=].
465+
466+
1. [=set/For each=] |languageTag| of |missingLanguageTags|:
467+
468+
1. <span id="readily-or-after-download-implementation-defined"></span> [=set/Append=] |languageTag| to either |readilyAvailableLanguages| or |afterDownloadAvailableLanguages|. Which of the two sets to append to is [=implementation-defined=], and should be guided by considerations similar to that of [$LookupMatchingLocaleByBestFit$] in terms of keeping "best fallback languages" together.
469+
470+
1. Return « |readilyAvailableLanguages|, |afterDownloadAvailableLanguages| ».
471+
</div>
472+
473+
<div algorithm>
474+
The <dfn>language tag set completeness rules</dfn> state that for every [=set/item=] |languageTag|, if |languageTag| has more than one subtag, then the set must also contain a less narrow language tag with the same language subtag and a strict subset of the same following subtags (i.e., omitting one or more).
475+
476+
<p class="note">This definition is intended to align with that of [=[[AvailableLocales]]=] in <cite>ECMAScript Internationalization API Specification</cite>. [[ECMA-402]]
421477

422-
1. Let |availableLanguages| be an empty [=map=].
478+
<div class="example" id="example-subtags-intro">
479+
This means that if an implementation supports summarization of "`de-DE`" text, it will also count as supporting "`de`" text.
423480

424-
1. [=list/For each=] human language for which the user agent supports summarizing text written in that language, without performing any downloading operations:
481+
The converse direction is supported not by the [=language tag set completeness rules=], but instead by the use of [$LookupMatchingLocaleByBestFit$], which ensures that if an implementation supports summarizing "`de`" text, it also counts as supporting summarization of "`de-CH`", "`de-Latn-CH`", etc.
482+
</div>
425483

426-
1. Let |languageTag| be that language, represented as a BCP 47 language tag string. <span class="issue">Describe how to handle subtags.</span>
484+
<div class="example" id="example-subtags-chinese">
485+
A common setup seen in today's software is to support two types of written Chinese: "traditional Chinese" and "simplified Chinese". Let's suppose that the user agent supports summarizing text written in traditional Chinese readily, and simplified Chinese after a download.
427486

428-
1. Set |availableLanguages|[|languageTag|] to "{{AICapabilityAvailability/readily}}".
487+
One way this could be implemented would be for [=current summarizer language availabilities=] to return that « "`zh-Hant`" » is readily available, and « "`zh`", "`zh-Hans`" » is available after download. This return value conforms to the requirements of the [=language tag set completeness rules=], in ensuring that "`zh`" is present. Per <a class="allow-2119" href="#readily-or-after-download-implementation-defined">the "should"-level guidance</a>, the implementation has determined that "`zh`" belongs in the list of after-download available languages, with "`zh-Hans`", instead of in the list of readily available languages, with "`zh-Hant`".
429488

430-
1. [=list/For each=] human language for which the user agent believes it can summarize text written in that language, but only after performing a download (e.g., of an AI model or fine-tuning):
489+
Combined with the use of [$LookupMatchingLocaleByBestFit$], this means the the {{AISummarizerCapabilities/languageAvailable()}} will give the the following answers:
431490

432-
1. Let |languageTag| be that language, represented as a BCP 47 language tag string. <span class="issue">Describe how to handle subtags.</span>
491+
<xmp class="language-js">
492+
c.languageAvailable("zh") === "after-download";
493+
c.languageAvailable("zh-Hant") === "readily";
494+
c.languageAvailable("zh-Hans") === "after-download";
433495

434-
1. Set |availableLanguages|[|languageTag|] to "{{AICapabilityAvailability/after-download}}".
496+
c.languageAvailable("zh-TW") === "readily"; // zh-TW will best-fit to zh-Hant
497+
c.languageAvailable("zh-HK") === "readily"; // zh-HK will best-fit to zh-Hant
498+
c.languageAvailable("zh-CN") === "after-download"; // zh-CN will best-fit to zh-Hans
435499

436-
1. Return |availableLanguages|.
500+
c.languageAvailable("zh-BR") === "after-download"; // zh-BR will best-fit to zh
501+
c.languageAvailable("zh-Kana") === "after-download"; // zh-Kana will best-fit to zh
502+
</xmp>
503+
</div>
437504
</div>
438505

439506
<h3 id="summarizer-object">Summarization</h3>

0 commit comments

Comments
 (0)