-
Notifications
You must be signed in to change notification settings - Fork 223
Use simple approximation for LunarChinese #7006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. |
4d87b46
to
00dabee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial feedback. I want to do a more thorough review before this lands
if !(1900..2100).contains(&case.iso_year) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't pass outside the hardcoded range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then update the tests to either be in-range or use the new dates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wait until I have an approval before I delete test cases that I might have to restore later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
// This is required for continuity with the hardcoded data | ||
day_fraction_to_ms!((-9) / 24), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought/Issue: the code is designed for Chinese leap years. Please check Reingold for how Dangi calculates its leap years. Doing a 9-hour new moon adjustment probably isn't the right way to get Dangi alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the code is designed for Chinese leap years?
we might not end up needing the correction, I haven't imported the KASI data yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really concerned by this comment. Dangi and Chinese calculate leap months the exact same way, all our code for it is and has always been the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm confused here too. This adjustment is to match the data, and Korea is also on UTC+9
@robertbastian we now have the KASI data, would appreciate that check being done now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want me to fetch the gist and test against that? seems a bit brittle
e8fcb02
to
77cf05e
Compare
Dangi data: https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e. Please ignore the data for the year 1899 and 2050, it is incomplete. |
I'm going to figure out a home for the scraper and scraped data, but for now we can just cite KASI. |
The code and data used for fetching this will be pushed up to a separate (private) Unicode repo once we have one. You can find the cleaned up source data in https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e. I'm imagining that post-1950 data will change or be removed with #7006 The initial motivation here was to fix the apparent ground truth mismatch found in https://github.com/unicode-org/icu4x/pull/7007/files#r2393049682. Turns out it was a different problem, and it has been fixed in #7013. We may potentially need the same discussion as #6970 about whether we care about these pre-1912 dates, since that's the only time this diverges.
077ed47
to
acf1c4a
Compare
LunarChineseYearData::simple( | ||
// Future reference time is probably UTC+9 | ||
day_fraction_to_ms!(9 / 24), | ||
// This is required for continuity with the hardcoded data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these years start on the same day with both methods:
2092, 2093, 2094, 2095, 2096, 2097, 2098, 2103, 2104, 2106, 2107, 2108, 2109, 2110
crucially, not 2101, which is why we need a correction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like adjusting the new moon because it skews not only this year but all future years. I want us to at least try to approximate GB/T, with an average error near zero, even if local errors are several hours one way or the other. Can we just project Reingold two more years before cutting over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, that's why I posted this list
we're not going to get an average error of zero, because the mean lunar cycle varies considerable over centuries. this method does not really align with gbt, even without lunar correction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should pick a new moon that is close to the mean of the synodic cycle. If the moon was at the outer extreme of the ellipse on the date you picked in January 2000, then we'll be carrying that error forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: We should run Reingold's new moon code for several hundred years and pick the new moon instant that minimizes the error. This should be an easy simulation to write, and it makes the approximation less arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't consider minimising an error to be a goal here, at least not in this PR
// This is required for continuity with the hardcoded data | ||
day_fraction_to_ms!((-9) / 24), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm confused here too. This adjustment is to match the data, and Korea is also on UTC+9
@robertbastian we now have the KASI data, would appreciate that check being done now.
if !(1900..2100).contains(&case.iso_year) { | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
NotFound, | ||
} | ||
|
||
/// The mean year length according to the Gregorian solar cycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (nb): perhaps this can live in a separate file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's closely tied to the duration representation chosen by this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do you mean the whole code?
#[path = "chinese/qing_data.rs"] | ||
mod qing_data; | ||
|
||
macro_rules! day_fraction_to_ms { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: docs
new_moon_correction: Milliseconds, | ||
related_iso: i32, | ||
) -> LunarChineseYearData { | ||
fn periodic_duration_on_or_before( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: docs
base_moment: LocalMoment, | ||
duration: Milliseconds, | ||
) -> LocalMoment { | ||
let num_periods = ((rata_die - base_moment.rata_die + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what's going on with this +1 / -1 ?
For functions like this either the math should be explained so that it's clear what it does, or the function should be documented so I can verify the math, doing neither leaves me guessing. Ideally both are documented.
My understanding is: this function is attempting to find the closest whole period from base_moment
of period duration
looking back from rata_die
. The +/- 1 handles the edge cases, but I'd like a comment explanation explaining how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially wrote this formula before @robertbastian refactored it. The +1 / -1 is to get us to the final millisecond of the day. We're given a RD, but the function returns "on or before", so we add 1 to the RD and then subtract 1 from the millisecond. We do the -1 because if the new moon occurs exactly at midnight, it belongs to the following day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk, I can remove this, we don't have a requirement of millisecond precision here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the +1 / -1. @Manishearth was just asking why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well it wouldn't be the end of the world if we didn't move the new moon at the millisecond of exactly midnight to next day. lunar times are +-minutes from reality anyway
pub(crate) fn md_from_rd(self, rd: RataDie) -> (u8, u8) { | ||
debug_assert!( | ||
rd < self.next_new_year() || !WELL_BEHAVED_ASTRONOMICAL_RANGE.contains(&rd), | ||
rd < self.next_new_year(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: I'm worried about hitting these again in V8: can we write a fuzzer for these?
Both RD-to-chinese and chinese-from-fields.
It's pretty easy, use cargo-fuzz
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we have a test at the extreme values, right? I don't see what you'd be worried about, there's not floating point math that can degenerate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this PR I'd slightly prefer the well-behaved guard be kept around, and removing them is a separate change.
But if you feel confident about the assertions I guess it's fine.
while solar_term < 0 { | ||
let next_new_moon = new_moon + MEAN_SYNODIC_MONTH_LENGTH; | ||
|
||
if major_solar_term.rata_die < next_new_moon.rata_die { | ||
solar_term += 1; | ||
major_solar_term = major_solar_term + MEAN_GREGORIAN_SOLAR_TERM_LENGTH; | ||
} | ||
|
||
new_moon = next_new_moon; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: I'm not convinced this is exactly right in how it handles M12L.
You start with new_moon
and solar_term
pointing to M11. The first iteration of the loop is guaranteed to not be a leap month, so we go into the second iteration with new_moon
pointing to (M11 + 1) and solar_term
of -1. Suppose the second iteration is also a common month (M12). Then we update new_moon
to (M12 + 1) and solar_term
to 0. But then we don't run the loop again, even if the currently-pointed-to month is M12L.
solar_term += 1; | ||
major_solar_term = major_solar_term + MEAN_GREGORIAN_SOLAR_TERM_LENGTH; | ||
} else { | ||
leap_month = Some(month as u8 + 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Although I think this technically works, there are two issues a casual observer who knows the Chinese calendar algorithm will think are wrong:
- You shouldn't pick a leap month if there was already a M11L or M12L in the same sui
- You shouldn't override a leap month later in the year if there was one earlier in the year
My version of the code makes both of these conditions explicit. You've chosen, it seems, to basically assert that, since this is linear and periodic, these two conditions will not occur. That might be the case (haven't thought it through fully), but if that is what you intended, I would expect a paragraph of explanation, since it is a deviation from the requirements of GB/T. (As stated before, i want this to align with GB/T, just with using approximate new moons and solar terms instead of the exact new moons and solar terms.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm okay with the approximate algorithm diverging from GB/T, but yes, having comments to that effect would be nice)
#5778
Replaces #6995