You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Accordingly the value for the glyph element will be defined as follows:
Pre-composed representation = base + combining character(s) (decomposed representation)
See http://www.fileformat.info/info/unicode/char/0101/index.htm
"U+0101" = (U+0061) + (U+0304)
"combining characters" ("base characters" in combination with non-spacing marks or characters which are combined to one) are represented as one "glyph", e.g. áàâ.
This is accompanied by the restriction length=1 for the CONTENT attribute:
<xsd:attributename="CONTENT"use="required">
<xsd:annotation>
<xsd:documentation>
CONTENT contains the precomposed representation (combining character) of the character from the parent String element.
The sequence position of the Glyph element matches the position of the character in the String.
</xsd:documentation>
</xsd:annotation>
<xsd:simpleType>
<xsd:restrictionbase="xsd:string">
<xsd:lengthfixed="true"value="1"/>
<xsd:whiteSpacevalue="preserve"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
Unfortunately, in some alphabets, a precomposed representation does not exist.
For example, in the Hebrew alphabet, it is possible for many letters to have three diacritics:
Even if we ignore cantillation marks, which are limited to biblical text, only a very small portion of the combined possibilities exist as precombined characters.
Thus, there is no precombined character for "בָּ" or even the more common "בָ".
Therefore, to be able to represent Hebrew glyphs properly, we should change the specification to something like:
<xsd:attributename="CONTENT"use="required">
<xsd:annotation>
<xsd:documentation>
CONTENT contains the representation of the character from the parent String element.
Precombined characters are recommended, but it is acceptable to have one base character and zero-to-many combining diacritics.
The sequence position of the Glyph element matches the position of the character in the String.
</xsd:documentation>
</xsd:annotation>
<xsd:simpleType>
<xsd:restrictionbase="xsd:string">
<xsd:maxLengthvalue="4" />
<xsd:whiteSpacevalue="preserve"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
We should also remove the text above from the GlyphType documentation.
I'm not sure whether other alphabets would require more than 4 characters - maybe the max length attribute could be removed entirely.
The text was updated successfully, but these errors were encountered:
Thank you for this topic, this change could be a good candidate for 5.0 as well, maybe we will find other use cases (other languages) to provide it as well as sample of usage
The
GlyphType
documentation states:This is accompanied by the restriction
length=1
for the CONTENT attribute:Unfortunately, in some alphabets, a precomposed representation does not exist.
For example, in the Hebrew alphabet, it is possible for many letters to have three diacritics:
Even if we ignore cantillation marks, which are limited to biblical text, only a very small portion of the combined possibilities exist as precombined characters.
Thus, there is no precombined character for "בָּ" or even the more common "בָ".
Therefore, to be able to represent Hebrew glyphs properly, we should change the specification to something like:
We should also remove the text above from the
GlyphType
documentation.I'm not sure whether other alphabets would require more than 4 characters - maybe the max length attribute could be removed entirely.
The text was updated successfully, but these errors were encountered: