-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-11874 Add conversion of subdivisions to json #4255
CLDR-11874 Add conversion of subdivisions to json #4255
Conversation
3b9404d
to
2be4ca9
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
2be4ca9
to
5b10a04
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Also added containment for subdivisions. |
5b10a04
to
9afc856
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks great need to test it
"autas": "Tasmania",
"auvic": "Victoria",
"auwa": "Western Australia",
"AW": "Aruba",
"AX": "Åland Islands",
"azabs": "Absheron",
"azaga": "Agstafa",
"azagc": "Aghjabadi",
"azagm": "Agdam", looks like it includes the territories also, not just subdivisions. |
Files generated
|
You're right for my local builds I set: |
This is actually already in the source .xml. Not all territories are included (e.g. "NL", "FR", "GB), it seems only territories, which are not countries. Anyway the json is a direct reproduction of the xml. |
@macchiati indeed it does seem like this is in the source - is this a bug? https://github.com/unicode-org/cldr/blob/main/common/subdivisions/en.xml#L211-L212 |
I think what is happening is that some subdivisions are aliases of country codes. We're treating the country codes as predominant (we'll have better data for them). For the JSON version, I think we can skip the country codes; implementers are on notice that they should use the country codes. |
The XML has <subdivision type="AW">Aruba</subdivision>
<subdivision type="AX">Åland Islands</subdivision>
<!-- AW : Aruba : NO SUBDIVISIONS
AX : Åland Islands : NO SUBDIVISIONS
AZ : Azerbaijan
--> The spec doesn't address it. It looks like it's just included incorrectly. Unless the goal was to define a subdivision that was just "all of aruba" The spec by the way at https://unicode.org/reports/tr35/dev/tr35-general.html#Contents gives examples with hyphens in the subdivision codes, and doesn't mention the special subdivisions in common/main. <!-- from the spec -->
<subdivision type="AL-04">Fier County</subdivision>
<subdivision type="AL-FR">Fier</subdivision> <!-- in AL-04 : Fier County -->
<subdivision type="AL-LU">Lushnjë</subdivision> <!-- in AL-04 : Fier County -->
<subdivision type="AL-MK">Mallakastër</subdivision> <!-- in AL-04 : Fier County --> |
That's a bug. We'll want to exclude the country codes from the XML data (and via that, the JSON), and also modify the spec to make it clear that implementations need to look at the subdivision aliases, such as
I was confused at first, because the TOC doesn't contain any subdivision codes. But I think you mean https://unicode.org/reports/tr35/dev/tr35-general.html#locale-display-name-fields, notably
Is that what you meant? Anyway, can you file a ticket for these two items? (They are related since we'll want to change the spec to document the relation between the subdivisions and codes. |
@arjanmels Can you look into:
|
For the data issue: https://unicode-org.atlassian.net/browse/CLDR-18219 - looks like this was fixed before but is a regression |
This was already being properly converted together with the language, territory etc. aliases (before this PR):
|
@arjanmels OK, sorry i missed it. I'm going to approve this and look at the coverage issue separately |
@srl295 great, thank you! |
CLDR-11874
ALLOW_MANY_COMMITS=true