Skip to content

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Sep 30, 2025

#5778

Replaces #6995

@robertbastian robertbastian requested review from Manishearth, sffc and a team as code owners September 30, 2025 19:42
Copy link

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

@robertbastian robertbastian force-pushed the pinqi2 branch 3 times, most recently from 4d87b46 to 00dabee Compare September 30, 2025 20:00
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial feedback. I want to do a more thorough review before this lands

Comment on lines +1808 to +1835
if !(1900..2100).contains(&case.iso_year) {
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't pass outside the hardcoded range

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then update the tests to either be in-range or use the new dates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait until I have an approval before I delete test cases that I might have to restore later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

Comment on lines 342 to 376
// This is required for continuity with the hardcoded data
day_fraction_to_ms!((-9) / 24),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought/Issue: the code is designed for Chinese leap years. Please check Reingold for how Dangi calculates its leap years. Doing a 9-hour new moon adjustment probably isn't the right way to get Dangi alignment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the code is designed for Chinese leap years?

we might not end up needing the correction, I haven't imported the KASI data yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really concerned by this comment. Dangi and Chinese calculate leap months the exact same way, all our code for it is and has always been the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm confused here too. This adjustment is to match the data, and Korea is also on UTC+9

@robertbastian we now have the KASI data, would appreciate that check being done now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want me to fetch the gist and test against that? seems a bit brittle

@Manishearth
Copy link
Member

Dangi data: https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e. Please ignore the data for the year 1899 and 2050, it is incomplete.

@Manishearth
Copy link
Member

I'm going to figure out a home for the scraper and scraped data, but for now we can just cite KASI.

robertbastian pushed a commit that referenced this pull request Oct 1, 2025
The code and data used for fetching this will be pushed up to a separate
(private) Unicode repo once we have one. You can find the cleaned up
source data in
https://gist.github.com/Manishearth/d8c94a7df22a9eacefc4472a5805322e.

I'm imagining that post-1950 data will change or be removed with
#7006



The initial motivation here was to fix the apparent ground truth
mismatch found in
https://github.com/unicode-org/icu4x/pull/7007/files#r2393049682. Turns
out it was a different problem, and it has been fixed in
#7013.

We may potentially need the same discussion as #6970 about whether we
care about these pre-1912 dates, since that's the only time this
diverges.
@robertbastian robertbastian marked this pull request as draft October 1, 2025 10:32
@robertbastian robertbastian force-pushed the pinqi2 branch 7 times, most recently from 077ed47 to acf1c4a Compare October 1, 2025 16:52
@robertbastian robertbastian marked this pull request as ready for review October 1, 2025 17:03
LunarChineseYearData::simple(
// Future reference time is probably UTC+9
day_fraction_to_ms!(9 / 24),
// This is required for continuity with the hardcoded data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these years start on the same day with both methods:

2092, 2093, 2094, 2095, 2096, 2097, 2098, 2103, 2104, 2106, 2107, 2108, 2109, 2110

crucially, not 2101, which is why we need a correction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like adjusting the new moon because it skews not only this year but all future years. I want us to at least try to approximate GB/T, with an average error near zero, even if local errors are several hours one way or the other. Can we just project Reingold two more years before cutting over?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could, that's why I posted this list

we're not going to get an average error of zero, because the mean lunar cycle varies considerable over centuries. this method does not really align with gbt, even without lunar correction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pick a new moon that is close to the mean of the synodic cycle. If the moon was at the outer extreme of the ellipse on the date you picked in January 2000, then we'll be carrying that error forward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: We should run Reingold's new moon code for several hundred years and pick the new moon instant that minimizes the error. This should be an easy simulation to write, and it makes the approximation less arbitrary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't consider minimising an error to be a goal here, at least not in this PR

Comment on lines 342 to 376
// This is required for continuity with the hardcoded data
day_fraction_to_ms!((-9) / 24),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm confused here too. This adjustment is to match the data, and Korea is also on UTC+9

@robertbastian we now have the KASI data, would appreciate that check being done now.

Comment on lines +1808 to +1835
if !(1900..2100).contains(&case.iso_year) {
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

NotFound,
}

/// The mean year length according to the Gregorian solar cycle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (nb): perhaps this can live in a separate file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's closely tied to the duration representation chosen by this code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or do you mean the whole code?

#[path = "chinese/qing_data.rs"]
mod qing_data;

macro_rules! day_fraction_to_ms {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: docs

new_moon_correction: Milliseconds,
related_iso: i32,
) -> LunarChineseYearData {
fn periodic_duration_on_or_before(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: docs

base_moment: LocalMoment,
duration: Milliseconds,
) -> LocalMoment {
let num_periods = ((rata_die - base_moment.rata_die + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what's going on with this +1 / -1 ?

For functions like this either the math should be explained so that it's clear what it does, or the function should be documented so I can verify the math, doing neither leaves me guessing. Ideally both are documented.

My understanding is: this function is attempting to find the closest whole period from base_moment of period duration looking back from rata_die. The +/- 1 handles the edge cases, but I'd like a comment explanation explaining how.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially wrote this formula before @robertbastian refactored it. The +1 / -1 is to get us to the final millisecond of the day. We're given a RD, but the function returns "on or before", so we add 1 to the RD and then subtract 1 from the millisecond. We do the -1 because if the new moon occurs exactly at midnight, it belongs to the following day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk, I can remove this, we don't have a requirement of millisecond precision here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need the +1 / -1. @Manishearth was just asking why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it wouldn't be the end of the world if we didn't move the new moon at the millisecond of exactly midnight to next day. lunar times are +-minutes from reality anyway

pub(crate) fn md_from_rd(self, rd: RataDie) -> (u8, u8) {
debug_assert!(
rd < self.next_new_year() || !WELL_BEHAVED_ASTRONOMICAL_RANGE.contains(&rd),
rd < self.next_new_year(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I'm worried about hitting these again in V8: can we write a fuzzer for these?

Both RD-to-chinese and chinese-from-fields.

It's pretty easy, use cargo-fuzz.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we have a test at the extreme values, right? I don't see what you'd be worried about, there's not floating point math that can degenerate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR I'd slightly prefer the well-behaved guard be kept around, and removing them is a separate change.

But if you feel confident about the assertions I guess it's fine.

Comment on lines +1170 to +1179
while solar_term < 0 {
let next_new_moon = new_moon + MEAN_SYNODIC_MONTH_LENGTH;

if major_solar_term.rata_die < next_new_moon.rata_die {
solar_term += 1;
major_solar_term = major_solar_term + MEAN_GREGORIAN_SOLAR_TERM_LENGTH;
}

new_moon = next_new_moon;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: I'm not convinced this is exactly right in how it handles M12L.

You start with new_moon and solar_term pointing to M11. The first iteration of the loop is guaranteed to not be a leap month, so we go into the second iteration with new_moon pointing to (M11 + 1) and solar_term of -1. Suppose the second iteration is also a common month (M12). Then we update new_moon to (M12 + 1) and solar_term to 0. But then we don't run the loop again, even if the currently-pointed-to month is M12L.

solar_term += 1;
major_solar_term = major_solar_term + MEAN_GREGORIAN_SOLAR_TERM_LENGTH;
} else {
leap_month = Some(month as u8 + 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Although I think this technically works, there are two issues a casual observer who knows the Chinese calendar algorithm will think are wrong:

  1. You shouldn't pick a leap month if there was already a M11L or M12L in the same sui
  2. You shouldn't override a leap month later in the year if there was one earlier in the year

My version of the code makes both of these conditions explicit. You've chosen, it seems, to basically assert that, since this is linear and periodic, these two conditions will not occur. That might be the case (haven't thought it through fully), but if that is what you intended, I would expect a paragraph of explanation, since it is a deviation from the requirements of GB/T. (As stated before, i want this to align with GB/T, just with using approximate new moons and solar terms instead of the exact new moons and solar terms.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm okay with the approximate algorithm diverging from GB/T, but yes, having comments to that effect would be nice)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants