Version | 42 (draft) | ||
Editors | Mark Davis (markdavis@google.com) and other CLDR committee members | ||
Date | 2022-09-27 | ||
Date | 2022-10-05 | ||
This Version | https://www.unicode.org/reports/tr35/tr35-67/tr35.html | ||
Previous Version | https://www.unicode.org/reports/tr35/tr35-66/tr35.html | ||
Latest Version | https://www.unicode.org/reports/tr35/ |
key (old key name) | key description | example type (old type name) | type description |
---|---|---|---|
A Unicode Calendar Identifier defines a type of calendar. The valid values are those name attribute values in the type elements of key name="ca" in bcp47/calendar.xml. | |||
A Unicode Calendar Identifier defines a type of calendar. The valid values are those name attribute values in the type elements of key name="ca" in bcp47/calendar.xml. | |||
"ca" (calendar) |
Calendar algorithm (For information on the calendar algorithms associated with the data used with these, see [Calendars].) |
"buddhist" | @@ -729,13 +729,13 @@ The BCP 47 form for keys and types is the canonical form, and recommended. Other|
… | |||
Note: Some calendar types are represented by two subtags. In such cases, the first subtag specifies a generic calendar type and the second subtag specifies a calendar algorithm variant. The CLDR uses generic calendar types (single subtag types) for tagging data when calendar algorithm variations within a generic calendar type are irrelevant. For example, type "islamic" is used for specifying Islamic calendar formatting data for all Islamic calendar types, including "islamic-civil" and "islamic-umalqura". | |||
A Unicode Currency Format Identifier defines a style for currency formatting. The valid values are those name attribute values in the type elements of key name="cf" in bcp47/currency.xml. | |||
A Unicode Currency Format Identifier defines a style for currency formatting. The valid values are those name attribute values in the type elements of key name="cf" in bcp47/currency.xml. | |||
"cf" | Currency Format style | "standard" | Negative numbers use the minusSign symbol (the default). |
"account" | Negative numbers use parentheses or equivalent. | ||
A Unicode Collation Identifier defines a type of collation (sort order). The valid values are those name attribute values in the type elements of bcp47/collation.xml. | |||
A Unicode Collation Identifier defines a type of collation (sort order). The valid values are those name attribute values in the type elements of bcp47/collation.xml. | |||
For information on each collation setting parameter, from ka to vt, see Setting Options | |||
"co" (collation) |
Collation type | @@ -754,20 +754,20 @@ The BCP 47 form for keys and types is the canonical form, and recommended. OtherSpecial collation type for a modified string search in which a pattern consisting of a sequence of Hangul initial consonants (jamo lead consonants) will match a sequence of Hangul syllable characters whose initial consonants match the pattern. The jamo lead consonants can be represented using conjoining or compatibility jamo. This search collator is best used at SECONDARY strength with an "asymmetric" search as described in the [UCA] section Asymmetric Search and obtained, for example, using ICU4C's usearch facility with attribute USEARCH_ELEMENT_COMPARISON set to value USEARCH_PATTERN_BASE_WEIGHT_IS_WILDCARD; this ensures that a full Hangul syllable in the search pattern will only match the same syllable in the searched text (instead of matching any syllable with the same initial consonant), while a Hangul initial consonant in the search pattern will match any Hangul syllable in the searched text with the same initial consonant. | |
… | |||
A Unicode Currency Identifier defines a type of currency. The valid values are those name attribute values in the type elements of key name="cu" in bcp47/currency.xml. | |||
A Unicode Currency Identifier defines a type of currency. The valid values are those name attribute values in the type elements of key name="cu" in bcp47/currency.xml. | |||
"cu" (currency) |
Currency type | ISO 4217 code, plus others in common use |
Codes consisting of 3 ASCII letters that are or have been valid in ISO 4217, plus certain additional codes that are or have been in common use. The list of countries and time periods associated with each currency value is available in Supplemental Currency Data, plus the default number of decimals. The XXX code is given a broader interpretation as Unknown or Invalid Currency. |
A Unicode Dictionary Break Exclusion Identifier specifies scripts to be excluded from dictionary-based text break (for words and lines). The valid values are of one or more items of type SCRIPT_CODE as specified in the name attribute value in the type element of key name="dx" in bcp47/segmentation.xml. | |||
A Unicode Dictionary Break Exclusion Identifier specifies scripts to be excluded from dictionary-based text break (for words and lines). The valid values are of one or more items of type SCRIPT_CODE as specified in the name attribute value in the type element of key name="dx" in bcp47/segmentation.xml. | |||
"dx" | Dictionary break script exclusions | unicode_script_subtag values |
One or more items of type SCRIPT_CODE, which are valid The code Zyyy (Common) can be specified to exclude all scripts, in which case it should be the only SCRIPT_CODE value specified. |
A Unicode Emoji Presentation Style Identifier specifies a request for the preferred emoji presentation style. This can be used as part of the value for an HTML lang attribute, for example <html lang="sr-Latn-u-em-emoji"> . The valid values are those name attribute values in the type elements of key name="em" in bcp47/variant.xml. | |||
A Unicode Emoji Presentation Style Identifier specifies a request for the preferred emoji presentation style. This can be used as part of the value for an HTML lang attribute, for example <html lang="sr-Latn-u-em-emoji"> . The valid values are those name attribute values in the type elements of key name="em" in bcp47/variant.xml. | |||
"em" | Emoji presentation style | "emoji" | @@ -776,7 +776,7 @@ The BCP 47 form for keys and types is the canonical form, and recommended. OtherUse a text presentation for emoji characters if possible. |
"default" | Use the default presentation for emoji characters as specified in UTR #51 Section 4, Presentation Style. | ||
A Unicode First Day Identifier defines the preferred first day of the week for calendar display. Specifying "fw" in a locale identifier overrides the default value specified by supplemental week data (see Part 4 Dates, section 4.3 Week Data). The valid values are those name attribute values in the type elements of key name="fw" in bcp47/calendar.xml. | |||
A Unicode First Day Identifier defines the preferred first day of the week for calendar display. Specifying "fw" in a locale identifier overrides the default value specified by supplemental week data (see Part 4 Dates, section 4.3 Week Data). The valid values are those name attribute values in the type elements of key name="fw" in bcp47/calendar.xml. | |||
"fw" | First day of week | "sun" | @@ -787,7 +787,7 @@ The BCP 47 form for keys and types is the canonical form, and recommended. Other|
"sat" | Saturday | ||
A Unicode Hour Cycle Identifier defines the preferred time cycle. Specifying "hc" in a locale identifier overrides the default value specified by supplemental time data (see Part 4 Dates, section 4.4 Time Data). The valid values are those name attribute values in the type elements of key name="hc" in bcp47/calendar.xml. | |||
A Unicode Hour Cycle Identifier defines the preferred time cycle. Specifying "hc" in a locale identifier overrides the default value specified by supplemental time data (see Part 4 Dates, section 4.4 Time Data). The valid values are those name attribute values in the type elements of key name="hc" in bcp47/calendar.xml. | |||
"hc" | Hour cycle | "h12" | @@ -799,7 +799,7 @@ The BCP 47 form for keys and types is the canonical form, and recommended. Other|
"h24" | Hour system using 1–24; corresponds to 'k' in pattern | ||
A Unicode Line Break Style Identifier defines a preferred line break style corresponding to the CSS level 3 line-break option. Specifying "lb" in a locale identifier overrides the locale’s default style (which may correspond to "normal" or "strict"). The valid values are those name attribute values in the type elements of key name="lb" in bcp47/segmentation.xml. | |||
A Unicode Line Break Style Identifier defines a preferred line break style corresponding to the CSS level 3 line-break option. Specifying "lb" in a locale identifier overrides the locale’s default style (which may correspond to "normal" or "strict"). The valid values are those name attribute values in the type elements of key name="lb" in bcp47/segmentation.xml. | |||
"lb" | Line break style | "strict" | @@ -809,7 +809,7 @@ The BCP 47 form for keys and types is the canonical form, and recommended. Other|
"loose" | CSS lev 3 line-break=loose | ||
A Unicode Line Break Word Identifier defines preferred line break word handling behavior corresponding to the CSS level 3 word-break option. The valid values are those name attribute values in the type elements of key name="lw" in bcp47/segmentation.xml. | |||
A Unicode Line Break Word Identifier defines preferred line break word handling behavior corresponding to the CSS level 3 word-break option. The valid values are those name attribute values in the type elements of key name="lw" in bcp47/segmentation.xml. | |||
"lw" | Line break word handling | "normal" | @@ -821,7 +821,7 @@ The BCP 47 form for keys and types is the canonical form, and recommended. Other|
"phrase" | Prioritize keeping natural phrases (of multiple words) together when breaking, used in short text like title and headline | ||
A Unicode Measurement System Identifier defines a preferred measurement system. Specifying "ms" in a locale identifier overrides the default value specified by supplemental measurement system data (see Part 2 General, section 5 Measurement System Data). The valid values are those name attribute values in the type elements of key name="ms" in bcp47/measure.xml. + | |||
A Unicode Measurement System Identifier defines a preferred measurement system. Specifying "ms" in a locale identifier overrides the default value specified by supplemental measurement system data (see Part 2 General, section 5 Measurement System Data). The valid values are those name attribute values in the type elements of key name="ms" in bcp47/measure.xml. The determination of preferred units depends on the locale identifer: the keys ms, mu, rg, the base locale (language, script, region) and the user preferences. For information about preferred units and unit conversion, see Unit Conversion and Unit Preferences. | |||
"uksystem" | UK System of measurement: feet, pints, etc.; pints are 20oz | ||
A Measurement Unit Preference Override defines an override for measurement unit preference. The valid values are those name attribute values in the type elements of key name="mu" in bcp47/measure.xml. + | |||
A Measurement Unit Preference Override defines an override for measurement unit preference. The valid values are those name attribute values in the type elements of key name="mu" in bcp47/measure.xml. For information about preferred units and unit conversion, see Unit Conversion and Unit Preferences. | |||
"mu" | Measurement unit override | @@ -845,7 +845,7 @@ The determination of preferred units depends on the locale identifer: the keys m||
"fahrenhe" | Fahrenheit as temperature unit | ||
A Unicode Number System Identifier defines a type of number system. The valid values are those name attribute values in the type elements of bcp47/number.xml. | |||
A Unicode Number System Identifier defines a type of number system. The valid values are those name attribute values in the type elements of bcp47/number.xml. | |||
"nu" (numbers) |
Numbering system | Unicode script subtag | @@ -878,7 +878,7 @@ The determination of preferred units depends on the locale identifer: the keys mA unicode_subdivision_id, which is a unicode_region_subtag concatenated with a unicode_subdivision_suffix. For example, gbsct is “gb”+“sct” (where sct represents the subdivision code for Scotland). Thus “en-GB-u-sd-gbsct” represents the language variant “English as used in Scotland”. And both “en-u-sd-usca” and “en-US-u-sd-usca” represent “English as used in California”. See 3.6.5 Subdivision Codes. |
… | |||
A Unicode Sentence Break Suppressions Identifier defines a set of data to be used for suppressing certain sentence breaks that would otherwise be found by UAX #14 rules. The valid values are those name attribute values in the type elements of key name="ss" in bcp47/segmentation.xml. | |||
A Unicode Sentence Break Suppressions Identifier defines a set of data to be used for suppressing certain sentence breaks that would otherwise be found by UAX #14 rules. The valid values are those name attribute values in the type elements of key name="ss" in bcp47/segmentation.xml. | |||
"ss" | Sentence break suppressions | "none" | @@ -886,7 +886,7 @@ The determination of preferred units depends on the locale identifer: the keys m|
"standard" | Use sentence break suppressions data of type "standard" | ||
A Unicode Timezone Identifier defines a timezone. The valid values are those name attribute values in the type elements of bcp47/timezone.xml. | |||
A Unicode Timezone Identifier defines a timezone. The valid values are those name attribute values in the type elements of bcp47/timezone.xml. | |||
"tz" (timezone) |
Time zone | Unicode short time zone IDs | @@ -894,7 +894,7 @@ The determination of preferred units depends on the locale identifer: the keys m|
A Unicode Variant Identifier defines a special variant used for locales. The valid values are those name attribute values in the type elements of bcp47/variant.xml. | |||
A Unicode Variant Identifier defines a special variant used for locales. The valid values are those name attribute values in the type elements of bcp47/variant.xml. | |||
"va" | Common variant type | "posix" | @@ -908,9 +908,9 @@ Additional keys or types might be added in future versions. Implementations of L #### 3.6.2 Numbering System Data -LDML supports multiple numbering systems. The identifiers for those numbering systems are defined in the file **bcp47/number.xml**. For example, for the latest version of the data see [bcp47/number.xml](https://github.com/unicode-org/cldr/tree/main/common/bcp47/number.xml). +LDML supports multiple numbering systems. The identifiers for those numbering systems are defined in the file **bcp47/number.xml**. For example, for the latest version of the data see [bcp47/number.xml](https://github.com/unicode-org/cldr/blob/main/common/bcp47/number.xml). -Details about those numbering systems are defined in **supplemental/numberingSystems.xml**. For example, for the latest version of the data see [supplemental/numberingSystems.xml](https://github.com/unicode-org/cldr/tree/main/common/supplemental/numberingSystems.xml). +Details about those numbering systems are defined in **supplemental/numberingSystems.xml**. For example, for the latest version of the data see [supplemental/numberingSystems.xml](https://github.com/unicode-org/cldr/blob/main/common/supplemental/numberingSystems.xml). LDML makes certain stability guarantees on this data: @@ -1301,7 +1301,7 @@ Even though localization should be done as close to the end-user as possible, th #### 3.9.1 Message Formatting and Exceptions -Windows ([FormatMessage](https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessage), [String.Format](https://docs.microsoft.com/en-us/dotnet/api/system.string.format)), Java ([MessageFormat](https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html)) and ICU ([MessageFormat](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classMessageFormat.html), [umsg](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/umsg_8h.html)) all provide methods of formatting variables (dates, times, etc) and inserting them at arbitrary positions in a string. This avoids the manual string concatenation that causes severe problems for localization. The question is, where to do this? It is especially important since the original code site that originates a particular message may be far down in the bowels of a component, and passed up to the top of the component with an exception. So we will take that case as representative of this class of issues. +Windows ([FormatMessage](https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-formatmessage), [String.Format](https://learn.microsoft.com/en-us/dotnet/api/system.string.format?view=net-6.0)), Java ([MessageFormat](https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html)) and ICU ([MessageFormat](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classMessageFormat.html), [umsg](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/umsg_8h.html)) all provide methods of formatting variables (dates, times, etc) and inserting them at arbitrary positions in a string. This avoids the manual string concatenation that causes severe problems for localization. The question is, where to do this? It is especially important since the original code site that originates a particular message may be far down in the bowels of a component, and passed up to the top of the component with an exception. So we will take that case as representative of this class of issues. There are circumstances where the message can be communicated with a language-neutral code, such as a numeric error code or mnemonic string key, that is understood outside of the component. If there are arguments that need to accompany that message, such as a number of files or a datetime, those need to accompany the numeric code so that when the localization is finally at some point, the full information can be presented to the end-user. This is the best case for localization. @@ -1340,7 +1340,7 @@ Note that the language of locale data may differ from the language of localized #### 3.10.2 Hybrid Locale Identifiers -Hybrid locales have intermixed content from 2 (or more) languages, often with one language's grammatical structure applied to words in another. These are commonly referred to with portmanteau words such as _Franglais, [​Spanglish](https://en.oxforddictionaries.com/definition/spanglish)_ or _Denglish_. Hybrid locales do not _not_ reference text simply containing two languages: a book of parallel text containing English and French, such as the following, is not Franglais: +Hybrid locales have intermixed content from 2 (or more) languages, often with one language's grammatical structure applied to words in another. These are commonly referred to with portmanteau words such as _Franglais, [​Spanglish](https://en.wikipedia.org/wiki/Spanglish)_ or _Denglish_. Hybrid locales do not _not_ reference text simply containing two languages: a book of parallel text containing English and French, such as the following, is not Franglais: