Skip to content

Commit

Permalink
CLDR-15618 spec: improve docs about value+children (#2395)
Browse files Browse the repository at this point in the history
- document the invariant against nondistinguishing elements and children
- update the test to link to the docs
- test: add an unreached enum value to the switch statement
- Also update comment about orderedItems
- Also update comment about NFC
- Also add note about the TECHPREVIEW annotation
- Also regenerated ToC for tr35-info.md
  • Loading branch information
srl295 authored Sep 29, 2022
1 parent 2c40a42 commit a37a71d
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 10 deletions.
3 changes: 3 additions & 0 deletions docs/ldml/tr35-info.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ The LDML specification is divided into the following parts:
* 11 [Version Information](#Version_Information)
* 12 [Parent Locales](#Parent_Locales)
* 13 [Unit Conversion](#Unit_Conversion)
* [Unit Parsing Data](#unit-parsing-data)
* [Constants](#constants)
* [Conversion Data](#conversion-data)
* [Exceptional Cases](#exceptional-cases)
Expand All @@ -85,6 +86,8 @@ The LDML specification is divided into the following parts:
* [Mixed Units](#mixed-units)
* [Testing](#testing)
* 14 [Unit Preferences](#Unit_Preferences)
* 14.2 [Unit Preferences Overrides](#Unit_Preferences_Data)
* 14.2 [Unit Preferences Data](#Unit_Preferences_Data)
* [Constraints](#constraints)
* [Caveats](#caveats)

Expand Down
47 changes: 38 additions & 9 deletions docs/ldml/tr35.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ The LDML specification is divided into the following parts:
* Table: [Part 6 Links](#Part_6_Links): [Supplemental](tr35-info.md) (supplemental data)
* Table: [Part 7 Links](#Part_7_Links): [Keyboards](tr35-keyboards.md) (keyboard mappings)
* [Annex C. LocaleId Canonicalization](#LocaleId_Canonicalization)
* [Definitions](#definitions)
* [LocaleId Definitions](#LocaleId_Definitions)
* [1. Multimap interpretation](#1.-multimap-interpretation)
* [2. Alias elements](#2.-alias-elements)
* [3. Matches](#3.-matches)
Expand Down Expand Up @@ -1737,7 +1737,9 @@ Attributes that serve to distinguish multiple elements at the same level are cal
<language type="ab">Abkhazian</language>
```

Distinguishing attributes affect inheritance; two elements with different distinguishing attributes are treated as different for purposes of inheritance. For more information, see [Section 5.5 Valid Attribute Values](#Valid_Attribute_Values). Other attributes are called nondistinguishing (or informational) attributes. These carry separate information, and do not affect inheritance.
Distinguishing attributes affect inheritance; two elements with different distinguishing attributes are treated as different for purposes of inheritance. For more information, see [Section 5.5 Valid Attribute Values](#Valid_Attribute_Values). Other attributes are called value attributes. Value attributes do not affect inheritance, and elements with value attributes may not have child elements (see [XML Format](#XML_Format)).

Non-distinguishing attributes are identified by [DTD Annotations](#DTD_Annotations) such as `@VALUE`.

For any element in an XML file, _an element chain_ is a resolved [[XPath](#XPath)] leading from the root to an element, with attributes on each element in alphabetical order. So in, say, [https://github.com/unicode-org/cldr/blob/main/common/main/el.xml](https://github.com/unicode-org/cldr/blob/main/common/main/el.xml) we may have:

Expand Down Expand Up @@ -2324,16 +2326,42 @@ The XML structure is stable over releases. Elements and attributes may be deprec

In general, all translatable text in this format is in element contents, while attributes are reserved for types and non-translated information (such as numbers or dates). The reason that attributes are not used for translatable text is that spaces are not preserved, and we cannot predict where spaces may be significant in translated material.

There are two kinds of elements in LDML: _rule_ elements and _structure_ elements. For structure elements, there are restrictions to allow for effective inheritance and processing:
There are two kinds of elements in LDML: _rule_ elements and _structure_ elements.

For structure elements, there are restrictions to allow for effective inheritance and processing:

1. There is no "mixed" content: if an element has textual content, then it cannot contain any elements.
1. There is no ["mixed" content](https://www.w3.org/TR/xml/#sec-mixed-content): if an element has textual content, then it cannot contain any elements.
2. The [[XPath](#XPath)] leading to the content is unique; no two different pieces of textual content have the same [[XPath](#XPath)].
3. An element that has [value attributes](#Definitions) MUST NOT also have have child elements.

To illustrate these restrictions, consider the below chunk of XML:

```xml
<!-- Not correct LDML -->
<unit type="duration-day"
displayName="days"> <!-- #3: @VALUE attribute AND children -->
{0} per day <!-- #1: Mixed content -->
<unitPattern>{0} day</unitPattern> <!-- #2 same XPath /unit[@type="duration-day"]/unitPattern -->
<unitPattern>{0} days</unitPattern> <!-- #2 same XPath /unit[@type="duration-day"]/unitPattern -->
</unit>
```

LDML is actually structured as below (from `en.xml`):

```xml
<unit type="duration-day"> <!-- OK: "type" is distinguishing -->
<displayName>days</displayName>
<unitPattern count="one">{0} day</unitPattern> <!-- "count" is distinguishing -->
<unitPattern count="other">{0} days</unitPattern>
<perUnitPattern>{0} per day</perUnitPattern> <!-- mixed content in an element -->
</unit>
```

Rule elements do not have this restriction, but also do not inherit, except as an entire block. The rule elements are listed in serialElements in the supplemental metadata. See also _[Section 4.2 Inheritance and Validity](#Inheritance_and_Validity)_. For more technical details, see [Updating-DTDs](https://cldr.unicode.org/development/updating-dtds).
Rule elements do not have these restrictions, but also do not inherit, except as an entire block. Items which are ordered have the DTD Annotation `@ORDERED`. See [_DTD Annotations_](#DTD_Annotations) and _[Section 4.2 Inheritance and Validity](#Inheritance_and_Validity)_. For more technical details, see [Updating-DTDs](https://cldr.unicode.org/development/updating-dtds).

Note that the data in examples given below is purely illustrative, and does not match any particular language. For a more detailed example of this format, see [[Example](#LDML)]. There is also a DTD for this format, but _remember that the DTD alone is not sufficient to understand the semantics, the constraints, nor the interrelationships between the different elements and attributes_. You may wish to have copies of each of these to hand as you proceed through the rest of this document.

In particular, all elements allow for draft versions to coexist in the file at the same time. Thus most elements are marked in the DTD as allowing multiple instances. However, unless an element is listed as a serialElement, or has a distinguishing attribute, it can only occur once as a subelement of a given element. Thus, for example, the following is illegal even though allowed by the DTD:
In particular, all elements allow for draft versions to coexist in the file at the same time. Thus most elements are marked in the DTD as allowing multiple instances. However, unless an element is annotated as `@ORDERED`, or has a distinguishing attribute, it can only occur once as a subelement of a given element. Thus, for example, the following is illegal even though allowed by the DTD:

```xml
<languages>
Expand All @@ -2343,9 +2371,9 @@ In particular, all elements allow for draft versions to coexist in the file at t

There must be only one instance of these per parent, unless there are other distinguishing attributes (such as an `alt` element).

In general, LDML data should be in NFC format. However, certain elements may need to contain characters that are not in NFC, including exemplars, transforms, segmentations, and p/s/t/i/pc/sc/tc/ic rules in collation. These elements must not be normalized (either to NFC or NFD), or their meaning may be changed. Thus LDML documents must not be normalized as a whole. To prevent problems with normalization, no element value can start with a combining slash (U+0338 COMBINING LONG SOLIDUS OVERLAY).
In general, LDML data should be in NFC format. Normalization forms are defined by [[UAX15](https://www.unicode.org/reports/tr41/#UAX15)]. However, certain elements may need to contain characters that are not in NFC, including exemplars, transforms, segmentations, and p/s/t/i/pc/sc/tc/ic rules in collation. These elements must not be normalized (either to NFC or NFD), or their meaning may be changed. Thus LDML documents must not be normalized as a whole. To prevent problems with normalization, no element value can start with a combining slash (U+0338 COMBINING LONG SOLIDUS OVERLAY).

Lists, such as singleCountries are space-delimited. That means that they are separated by one or more XML whitespace characters,
Lists, such as singleCountries are space-delimited. That means that they are separated by one or more XML whitespace characters:

* singleCountries
* preferenceOrdering
Expand Down Expand Up @@ -3033,6 +3061,7 @@ and are included below the !ELEMENT or !ATTLIST line that they apply to. The cur
| `<!--@DEPRECATED-->` | The element or attribute is deprecated, and should not be used. |
| `<!--@DEPRECATED: attribute-value1, attribute-value2-->` | The attribute values are deprecated, and should not be used. Spaces between tokens are not significant. |
| `<!--@MATCH:{attribute value constraint}-->` | Requires the attribute value to match the constraint. |
| `<!--@TECHPREVIEW-->` | The element is a technical preview of a feature and may be changed or removed at any time. |

There is additional information in the attributeValueValidity.xml file that is used internally for testing. For example, the following line indicates that the 'currency' element in the ldml dtd must have values from the bcp47 'cu' type.

Expand Down Expand Up @@ -3404,7 +3433,7 @@ The `languageAlias`, `scriptAlias`, `territoryAlias`, and `variantAlias` element
> canonicalization.
>See §3.8.2 [Legacy Variants](#Legacy_Variants).
### Definitions
### <a name="LocaleId_Definitions">LocaleId Definitions</a>

#### 1. Multimap interpretation

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ private void checkEmpty(Multimap<String, String> m, DtdType type, Element elemen

HashSet<Attribute> valueAttributes = new LinkedHashSet<>();
HashSet<Attribute> distAttributes = new LinkedHashSet<>();
HashSet<Attribute> metadataAttributes = new LinkedHashSet<>(); // TODO: not used currently, ignored
for (Attribute attribute : element.getAttributes().keySet()) {
if (attribute.isDeprecated()) continue;
switch (attribute.getStatus()) {
Expand All @@ -222,6 +223,9 @@ private void checkEmpty(Multimap<String, String> m, DtdType type, Element elemen
case distinguished:
distAttributes.add(attribute);
break;
case metadata:
metadataAttributes.add(attribute);
break;
}
}
ElementType elementType = element.getType();
Expand Down Expand Up @@ -269,7 +273,7 @@ private void checkEmpty(Multimap<String, String> m, DtdType type, Element elemen
logKnownIssue("cldrbug:9982", "Lower priority fixes to bad xml");
break;
default:
m.put("error", "\t||" + showPath(parents) + "||path has both children AND value attributes"
m.put("error", "\t||" + showPath(parents) + "||DTD has both children AND value attributes: tr35.md#XML_Format"
+ "||" + valueAttributes
+ "||" + children + "||");
break;
Expand Down

0 comments on commit a37a71d

Please sign in to comment.