-
-
Notifications
You must be signed in to change notification settings - Fork 830
ICU-23177 Add width-aware day period support and fix data loading #3611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
API proposal has been sent. This is the C++ version. I will try to add the Java version tomorrow. |
UnicodeString* fWideAmPms; | ||
int32_t fWideAmPmsCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does increase the stack size of DateFormatSymbols, but it is already so big that the delta is only about 2-3%.
@sffc But this is not yet in the ICU 78 API Proposals document |
icu4c/source/i18n/dtfmtsym.cpp
Outdated
@@ -2396,23 +2450,23 @@ DateFormatSymbols::initializeData(const Locale& locale, const char *type, UError | |||
assignArray(fStandaloneNarrowMonths, fStandaloneNarrowMonthsCount, fShortMonths, fShortMonthsCount); | |||
} | |||
|
|||
// Load AM/PM markers; if wide or narrow not available, use short | |||
// Load AM/PM markers. | |||
// Narrow falls back to Abbr which falls back to Wide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not the way the dayPeriod fallback works in CLDR. There, within a given style (format or stand-alone), wide and narrow both fall back to abbreviated; and stand-alone abbreviated falls back to format abbreviated. The root locale only has the format abbreviated forms.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds width-aware day period support for AM/PM strings in SimpleDateFormat and DateFormatSymbols, allowing for abbreviated (default), wide, and narrow formatting styles. It also fixes data loading by properly handling separate AmPmMarkersAbbr and AmPmMarkers resources.
- Implements width-aware AM/PM string formatting in SimpleDateFormat using pattern length (a, aaaa, aaaaa)
- Adds new APIs to DateFormatSymbols for getting/setting AM/PM strings with width specification
- Refactors resource loading to distinguish between abbreviated and wide AM/PM markers
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
SimpleDateFormat.java | Updates AM/PM formatting logic to use width-specific strings based on pattern length |
DateFormatSymbols.java | Adds ampmsWide field and width-aware getter/setter methods for AM/PM strings |
dtfmtsym.h | Adds C++ API declarations for width-aware AM/PM string methods |
dtfmtsym.cpp | Implements C++ width-aware AM/PM string handling and resource loading fixes |
smpdtfmt.cpp | Updates C++ SimpleDateFormat to use width-specific AM/PM formatting and parsing |
Test files | Adds comprehensive tests for new AM/PM width functionality and updates existing test expectations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
I am traveling but should have a chance to take a closer look at this sometime in the next 3 days or so. |
@sffc This all looks good to me, with my one comment being that we should change the loading order & fallback for both C and J so that we load abbreviated first, and then have both wide and narrow fall back to abbreviated if they are missing. That way C and J would be consistent with each other and with the CLDR fallback order (in which wide/narrow fallback to abbreviated, and root only provides abbreviated). I will go ahead and approve in case you need to get this in and make those changes later, but I would like those changes at some point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me, with my one comment being that we should change the loading order & fallback for both C and J so that we load abbreviated first, and then have both wide and narrow fall back to abbreviated if they are missing. That way C and J would be consistent with each other and with the CLDR fallback order (in which wide/narrow fallback to abbreviated, and root only provides abbreviated). I will go ahead and approve in case you need to get this in and make those changes later, but I would like those changes at some point.
Hmmm. I wonder if there is a way with conformance tests to verify the order and targets of aliases. |
I don't believe that the manual fallback code as written in C++ actually does anything. The data loading sink should see the alias in root and do the fallback itself. I consider the C++ code to be merely a robust coding style, consistent with the rest of the symbols loading. In Java, most of the symbols also do not have explicit fallbacks written in code. |
Having a test locale in CLDR to exercise these sorts of things would be valuable. We could leverage en-XA perhaps. |
icu4c/source/i18n/unicode/dtfmtsym.h
Outdated
@@ -436,20 +436,27 @@ class U_I18N_API_CLASS DateFormatSymbols final : public UObject { | |||
/** | |||
* Gets AM/PM strings with the specified width. For example: "A" and "P". | |||
* @param count Filled in with length of the array. | |||
* @param context The usage context: FORMAT, STANDALONE. Currently ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should note that the values actually returned are the format styles.
icu4c/source/i18n/unicode/dtfmtsym.h
Outdated
|
||
/** | ||
* Sets AM/PM strings with the specified width. For example: "A" and "P". | ||
* @param ampms The new AM/PM strings. (not adopted; caller retains ownership) | ||
* @param count The number of strings in the array. | ||
* @param context The usage context: FORMAT, STANDALONE. Currently ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the values returned are the format styles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
c86d020
to
509effe
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Checklist