Why doesn't max/min Graphemes have an associated Locale? #3079
-
Grapheme length is dependent on the locale context that the string is parse in. For example: This leads to inconsistent implementation of validation unless there is someway to require a locale on every message or have a default/standard locale. I noticed the reference validation for this is just counting 16bit UTF block , which does garantee that the amount of graphemes are less then the constraint but one service could consider one record valid - because the amount of graphemes is within the min/max, and a different one using a different locale would consider it invalid. See my bluesky post where I tested some limits with different large grapheme cluster emojis: https://bsky.app/profile/the.nathanklisch.com/post/3lbkd5cw43c2v Am I missing something? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
The source line you link is just an early-exit optimisation, full grapheme counting happens later, using the Graphemer package https://www.npmjs.com/package/graphemer Graphemer claims "it is an implementation on the Default Grapheme Cluster Boundary of UAX #29." You're right that locale-specific rule variations can be used, but the default seems well-defined. (and the atproto docs should probably say something about that) |
Beta Was this translation helpful? Give feedback.
The source line you link is just an early-exit optimisation, full grapheme counting happens later, using the Graphemer package https://www.npmjs.com/package/graphemer
Graphemer claims "it is an implementation on the Default Grapheme Cluster Boundary of UAX #29." You're right that locale-specific rule variations can be used, but the default seems well-defined. (and the atproto docs should probably say something about that)