Skip to content

Conversation

@MDLC01
Copy link
Collaborator

@MDLC01 MDLC01 commented Oct 26, 2025

This PR updates the testing infrastructure to also test that standardized variation sequences are valid. I also reorganized build.rs to separate the processing of Codex module files from the part that downloads Unicode data files for the tests.

Standardized variation sequences are defined by Unicode in standardized-variation-sequences.txt and are sequences consisting of an initial character and a variation selector from VS1 to vS14. Notably, standardized variation sequences are disjoint from emoji variation sequences (i.e., presentation sequences), which we already fully support since #114.

Unicode defines a third kind of variation sequences: ideographic variation sequences, which we do not care about in Codex.

More on variation sequences: https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-23/#G26678.

@MDLC01 MDLC01 added the meta Discussion about the structure of this repo label Oct 26, 2025
@MDLC01 MDLC01 added the waiting on reviews Breaking and non-breaking changes need respectively 3 and 2 reviews label Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

meta Discussion about the structure of this repo waiting on reviews Breaking and non-breaking changes need respectively 3 and 2 reviews

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant