Skip to content

CLDR-16836 kbd: add EBNF to spec for transforms#4261

Merged
srl295 merged 7 commits intounicode-org:mainfrom
srl295:srl295/kbd/cldr-18197/bnf
Jan 9, 2025
Merged

CLDR-16836 kbd: add EBNF to spec for transforms#4261
srl295 merged 7 commits intounicode-org:mainfrom
srl295:srl295/kbd/cldr-18197/bnf

Conversation

@srl295
Copy link
Copy Markdown
Member

@srl295 srl295 commented Jan 1, 2025

CLDR-16836

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@srl295 srl295 self-assigned this Jan 1, 2025
@srl295 srl295 force-pushed the srl295/kbd/cldr-18197/bnf branch from 121aab0 to afa4af5 Compare January 2, 2025 20:04
@jira-pull-request-webhook
Copy link
Copy Markdown

Notice: the branch changed across the force-push!

  • docs/ldml/tr35-keyboards.md is different
  • keyboards/abnf/transform-from-required.abnf is different
  • tools/scripts/keyboard-abnf-tests/check-keyboard-abnf.sh is different
  • tools/scripts/keyboard-abnf-tests/transform-from-required.d/from-match.fail.txt is now changed in the branch
  • tools/scripts/keyboard-abnf-tests/transform-from-required.d/from-match.pass.txt is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 changed the title CLDR-18197 kbd: update spec to mention abnf CLDR-18197 kbd: update spec with ABNF Jan 2, 2025
@srl295
Copy link
Copy Markdown
Member Author

srl295 commented Jan 2, 2025

fyi @stasm and @aphillips

@srl295
Copy link
Copy Markdown
Member Author

srl295 commented Jan 2, 2025

  • NMTOKEN is a little incomplete - had a tool problem with high values. (worked around it)
  • Need to do the to= side (done)

@srl295 srl295 force-pushed the srl295/kbd/cldr-18197/bnf branch from a461f03 to 308448c Compare January 3, 2025 18:14
@jira-pull-request-webhook
Copy link
Copy Markdown

Notice: the branch changed across the force-push!

  • docs/ldml/tr35-keyboards.md is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 changed the title CLDR-18197 kbd: update spec with ABNF CLDR-18197 kbd: add EBNF to spec for transforms Jan 3, 2025
- add keyboard abnf and sample files and automated tests

Temporarily skip non-BMP chars,
see hildjj/node-abnf#25 which is being fixed.
@srl295 srl295 force-pushed the srl295/kbd/cldr-18197/bnf branch from 308448c to d38f212 Compare January 3, 2025 21:31
@jira-pull-request-webhook
Copy link
Copy Markdown

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 changed the title CLDR-18197 kbd: add EBNF to spec for transforms CLDR-16836 kbd: add EBNF to spec for transforms Jan 3, 2025
Copy link
Copy Markdown
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

  1. The EBNF links should point to https://unicode.org/reports/tr35/#ebnf. (Since the LDML EBNF is a superset of the vanilla EBNF that should work.
  2. The validity / well-formedness constraints (eg no more than 9 capture groups), should use the w3c syntax.

@srl295
Copy link
Copy Markdown
Member Author

srl295 commented Jan 3, 2025

Looks good!

Thanks!

  1. The EBNF links should point to https://unicode.org/reports/tr35/#ebnf. (Since the LDML EBNF is a superset of the vanilla EBNF that should work.

I will add this:

The following is the [LDML EBNF](./tr35.md#ebnf) format for the grammar:
  1. The validity / well-formedness constraints (eg no more than 9 capture groups), should use the w3c syntax.

The 9 capture groups is 9 inner capture groups. So for example, valid:

/(first)(second)(third)(fourth)(fifth)(sixth)(seventh)(eighth)(?:And possibly, (ninth))?/

but invalid:

/(first)(second)(third)(fourth)(fifth)(sixth)(seventh)(eighth)(?:And possibly, (ninth))?(?:But not, (tenth)!)?/

because it's a nested group it may be challenging to express in the grammar, it's more like a resource (slot) limit.

@srl295 srl295 force-pushed the srl295/kbd/cldr-18197/bnf branch from 199f93e to c0e32b4 Compare January 3, 2025 21:51
@jira-pull-request-webhook
Copy link
Copy Markdown

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295 srl295 requested a review from macchiati January 3, 2025 21:53
@srl295 srl295 marked this pull request as ready for review January 3, 2025 21:54
@srl295
Copy link
Copy Markdown
Member Author

srl295 commented Jan 3, 2025

I'd like to get some Java based tooling, but that will be in a separate PR.
Edit: ran into some toolchain snags, so java tooling not ready yet. I'd like to land this current PR first anyways.

Comment thread docs/ldml/tr35-keyboards.md Outdated
Comment thread docs/ldml/tr35-keyboards.md Outdated
Comment thread docs/ldml/tr35-keyboards.md
Comment thread docs/ldml/tr35-keyboards.md Outdated
| DIGIT
| '_'
ASCII-CTRLS
::= [#x1-#x8#xB-#xC#xE-#x1F]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not allowed in XML 1.0

Comment thread docs/ldml/tr35-keyboards.md Outdated
@srl295
Copy link
Copy Markdown
Member Author

srl295 commented Jan 4, 2025 via email

Copy link
Copy Markdown
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I requested changes for the constraints.

@srl295 srl295 requested review from macchiati and miloush January 8, 2025 22:46
Comment thread docs/ldml/tr35-keyboards.md Outdated
```ebnf
[ wfc: No more than 9 capture groups may be present. ]
[ vc: all variables referenced must be defined in the <variables> element ]
[ vc: The CLDR repository may define additional constraints on the repertoire, such as requiring all characters to be in a published Unicode version and disallowing private-use characters. ]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[ vc: The CLDR repository may define additional constraints on the repertoire, such as requiring all characters to be in a published Unicode version and disallowing private-use characters. ]
[ vc: If a keyboard definitions is submitted to the the CLDR repository, it must satisfy additional constraints on the character repertoire. For more information, see [CLDR keyboard repertoire constraints](#repertoire-constraints). ]

We have to point to where those constraints are documented. So add a little section header for that and point to it. Also make a change to a similar line below.

I suggest that the contents of that section be:

  • No characters can have any of the following General_Category values in the latest version of the Unicode Standard:
    • Private Use (Co)
    • Surrogate (Cs)
    • Unassigned (Cn)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we can link from ebnf to spec.

Also the repository requirements are a separate activity - I don't see a reason to specify them here in detail (though I'll take the list back to kbd-wg). Perhaps it's better to just say- see a future version of the spec.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this

[ vc: If a keyboard definitions is submitted to the the CLDR repository, it must satisfy additional constraints on the character repertoire. ]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is there is no way for the reader to find out what those requirements are — we can't have undefined vc requirements.

Copy link
Copy Markdown
Member Author

@srl295 srl295 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'd propose to drop it from vc and just note. Or not even note. We don't have the requirements yet.

Yes there's no way to find out the requirements because they are future.

Comment thread docs/ldml/tr35-keyboards.md Outdated
Comment thread docs/ldml/tr35-keyboards.md Outdated
@srl295
Copy link
Copy Markdown
Member Author

srl295 commented Jan 9, 2025

@miloush how does it look?

@srl295 srl295 merged commit 2b78422 into unicode-org:main Jan 9, 2025
@srl295 srl295 deleted the srl295/kbd/cldr-18197/bnf branch January 9, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants