Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Сalculation speed up for the Gregorian calendar #5849

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

Nikita-str
Copy link

@Nikita-str Nikita-str commented Nov 21, 2024

Solving #5562 by implementing algo from the article for the Gregorian calendar.
The article: "Euclidean Affine Functions and Applications to Calendar Algorithms" by Cassio Neri & Lorenz Schneider (Feb. 2021).

Draft PR:

  • As I understand there is many places where parts of the logic is used and I only change the main file.
  • Also need to run/writes benchmarks in the repo (by now, I run benches only in local pseudo repo where I test the algo).
  • Also with the same logic can be speeded up the Julian calendar.
  • Tests: not only test the same value as prev algos

@CLAassistant
Copy link

CLAassistant commented Nov 21, 2024

CLA assistant check
All committers have signed the CLA.

Performance of new version is much better(~60%) at the end of a year, ~25% better in avg, and the same at the start of a year.
Performance of new version is much better(~400%) (at least on my CPU).
Seems like indexing (in prev algo) is pretty slow.
@Nikita-str
Copy link
Author

TODO: Is this comment true?

Before in `utils/calendarical_calculations` was only test of equality with prev impl

[⚠️] In old Julian algo there was a comment:
```
Julian epoch is equivalent to fixed_from_iso of December 30th of 0 year
1st Jan of 1st year Julian is equivalent to December 30th of 0th year of ISO year
```
is it true?
@Nikita-str
Copy link
Author

iso_old_file.rs is just old version of the iso.rs file.
The same with julian_old_file.rs.

iso_old_algos.rs is some of old algo implementations that was changed outside of the iso.rs file.

All of them were copied without changes.

@Nikita-str Nikita-str marked this pull request as ready for review November 26, 2024 23:13
@Nikita-str Nikita-str requested review from Manishearth, sffc and a team as code owners November 26, 2024 23:13
@@ -23,7 +23,7 @@ rust-version.workspace = true

# This is a special exception: The algorithms in this crate are based on "Calendrical Calculations" by Reingold and Dershowitz
# which has its lisp code published at https://github.com/EdReingold/calendar-code2/
license = "Apache-2.0"
license = "Apache-2.0" # TODO: Need to add `MIT`/`GNU`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the license was picked by a lawyer. Why would GNU be needed?

Copy link
Author

@Nikita-str Nikita-str Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a realization of the algorithm(from the author of the article) in C/C++ and in the repo no apache 2.0 license

Here is a comment with mentioning it in the PR

So I don't sure is it necessary to add any of them or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. We'd have to talk to the lawyer again for this.

Typically algorithms themselves aren't copyrightable, however we would indeed need to check with our lawyer to pull in this code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the policy is, we can redistribute MIT (but not GPL) code under the Apache-2.0 license, and all third-party code should retain its copyright comments inline in the code, similar to the Reingold code.

Copy link
Member

@sffc sffc Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first link goes to a file with the following comment at the top

// SPDX-License-Identifier: GPL-3.0-or-later

According to the terms of the GPL license, which are fairly strict, an Apache-licensed crate such as calendrical_calculations would not be able to redistribute that code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A workaround to this type of issue would be for any GPL-licensed code to live in its own crate, and then either icu_calendar or calendrical_calculations has an optional Cargo feature to consume it. Clients who would like the speedup and are okay consuming GPL-licensed code would need to manually enable the Cargo feature.

I do not know whether such a GPL crate could live in this repository or whether it would need its own repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid introducing GPL licensed code in our dep tree at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sffc @Manishearth
As I can understand you can implement algos from an article without such license restriction.
And if you will check the code by link and the code from PR it's will be pretty clear* that code was inspired by the article and only then it was matched with author's code for reference to authority. So maybe I can just remove link to the author implementation and leave only links to the article?

[*]: because of naming(unnamed const and very short names that say almost nothing (except for y/m/d ones, yeah they can say something, but nothing about how and why)) in the repo of the article's author. And in the PR even some consts was changed because of we have larger valid dates' interval -- in the author's code they are just magic numbers again; and how will you change such consts without understanding for what and why? And of course in the PR code there is plenty comments about why and how.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that generally algorithms are not copyrightable, but either way, we will have to get approval from our lawyer for doing this, and they may choose to be more cautious about this. We were already quite cautious about the Reingold&Dershowitz algoritms.

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in general not super in favor of this much added code for an optimization which I'm not sure matters. Thoughts @sffc?

It is well commented, so we do have that.

@sffc
Copy link
Member

sffc commented Dec 3, 2024

Thank you for the PR!

I am in general not super in favor of this much added code for an optimization which I'm not sure matters. Thoughts @sffc?

It is well commented, so we do have that.

As far as I can tell, the amount of library code is reduced by this PR. Most of it is in new benches and tests. This looks like a reasonably high quality PR.

@Nikita-str, can you post the results of the benchmarks before and after the change? Are we talking a 5% or 50% speedup, for example?

@Manishearth
Copy link
Member

Manishearth commented Dec 3, 2024

the amount of library code is reduced by this PR

no, iso.rs has a pretty large diff. It's also replacing relatively straightforward code with complex code.

The tests and benches are definitely welcome either way!

@Nikita-str
Copy link
Author

can you post the results of the benchmarks before and after the change? Are we talking a 5% or 50% speedup, for example?
@sffc

I will run benches from the PR, and they do in such way to minimize LTO optimization that turn on for [profile.bench] in the repo -- in way that compiler don't sure which data will goes to him, more detailed comment here.

I actually don't sure that benches from the component/calendar do something with this optimization.

And I guess that do so against LTO optimization(don't allow him to predict the dates) is pretty fair because I looked on generated asm code (for my machine and I can give full asm to you if needed) for new and old implementation of fixed_from_iso and the ratio between asm code lengths (with accounting instr weights) corresponds to time of run.
For example

  • old code have 5 JMP instr, and new have 0 -- branch free;
  • old have 17 MOV instr, new -- 9;
  • old have 10 MUL instr, and new only 3;
  • and so on.

But, because of I never seen the LTO turned on for benches, I don't sure that I do all correct, and maybe it makes sense to discuss.


Sorry for such big prelude, now the benches.
All with postfix OLD is old impl (old code is saved for tests, and so I use it for benches), and with NEW is new impls of the same fns:

fixed_from_iso/OLD      time:   [23.898 ms 24.002 ms 24.120 ms]
                        change: [+0.3544% +0.8842% +1.4301%] (p = 0.00 < 0.05)
fixed_from_iso/NEW      time:   [8.4999 ms 8.5454 ms 8.5959 ms]
                        change: [+0.0506% +0.9231% +1.7161%] (p = 0.03 < 0.05)

fixed_from_julian/OLD   time:   [10.362 ms 10.429 ms 10.503 ms]
                        change: [+1.4614% +2.2219% +2.9768%] (p = 0.00 < 0.05)
fixed_from_julian/NEW   time:   [5.9676 ms 5.9940 ms 6.0227 ms]
                        change: [-1.0328% -0.3838% +0.2869%] (p = 0.28 > 0.05)

day_of_week/iso/OLD     time:   [24.078 ms 24.185 ms 24.308 ms]
                        change: [-0.3406% +0.2049% +0.7859%] (p = 0.49 > 0.05)
day_of_week/iso/NEW     time:   [11.559 ms 11.602 ms 11.646 ms]
                        change: [-1.6141% -1.0452% -0.4639%] (p = 0.00 < 0.05)

day_of_year/iso/OLD     time:   [10.370 ms 10.419 ms 10.474 ms]
                        change: [+1.8417% +2.4847% +3.1837%] (p = 0.00 < 0.05)
day_of_year/iso/NEW     time:   [4.9905 ms 5.0421 ms 5.0995 ms]
                        change: [+2.1166% +3.3795% +4.5959%] (p = 0.00 < 0.05)

day_of_year/julian/OLD  time:   [5.4533 ms 5.4917 ms 5.5335 ms]
                        change: [+0.8272% +1.7159% +2.5537%] (p = 0.00 < 0.05)
day_of_year/julian/NEW  time:   [3.8352 ms 3.8482 ms 3.8619 ms]
                        change: [+0.7694% +1.2778% +1.7898%] (p = 0.00 < 0.05)

from_fixed/iso/OLD      time:   [727.22 µs 729.27 µs 731.30 µs]
                        change: [-0.5123% +0.0909% +0.6777%] (p = 0.78 > 0.05)
from_fixed/iso/NEW      time:   [188.33 µs 188.89 µs 189.50 µs]
                        change: [-3.1673% -2.4265% -1.6602%] (p = 0.00 < 0.05)

from_fixed/julian/OLD   time:   [272.92 µs 274.12 µs 275.52 µs]
                        change: [-0.6668% +0.1737% +1.0292%] (p = 0.68 > 0.05)
from_fixed/julian/NEW   time:   [181.39 µs 182.00 µs 182.60 µs]
                        change: [-1.8728% -1.0777% -0.3063%] (p = 0.01 < 0.05)

from_year_day/iso/OLD/AVG
                        time:   [7.6168 ms 7.6470 ms 7.6807 ms]
                        change: [+0.6165% +1.2356% +1.8289%] (p = 0.00 < 0.05)
from_year_day/iso/NEW/AVG
                        time:   [6.8247 ms 6.8469 ms 6.8697 ms]
                        change: [+0.0875% +0.7088% +1.3100%] (p = 0.02 < 0.05)
from_year_day/iso/OLD/START
                        time:   [616.58 µs 618.70 µs 621.31 µs]
                        change: [-0.0837% +0.6122% +1.2558%] (p = 0.07 > 0.05)
from_year_day/iso/NEW/START
                        time:   [878.33 µs 882.02 µs 886.45 µs]
                        change: [-0.7903% +0.2027% +1.0847%] (p = 0.68 > 0.05)
from_year_day/iso/OLD/END
                        time:   [1.5702 ms 1.5779 ms 1.5877 ms]
                        change: [-1.7883% -0.5992% +0.5419%] (p = 0.32 > 0.05)
from_year_day/iso/NEW/END
                        time:   [1.1674 ms 1.1708 ms 1.1741 ms]
                        change: [-1.6080% -0.7991% +0.0059%] (p = 0.06 > 0.05)

Few words about iso::from_year_day:

  • AVG -- for all dates within a year
  • START -- for dates from first 2 months
  • END -- for dates from last 2 months
    As you can see, old code was faster for first 2 months, but it's perf degrade with each next month: there was month cycle.
    New code is faster in avg and for second half of months.

And let's calc time ratio (OLD / NEW):

  • fixed_from_iso: ~2.8
  • fixed_from_julian: ~1.7
  • iso::day_of_week: ~2.05
  • iso::day_of_year: ~2.05
  • julian::day_of_year: ~1.4
  • iso::from_fixed: ~3.7
  • julian::from_fixed: ~1.45
  • iso::from_year_day(AVG): ~1.1
  • iso::from_year_day(START): ~0.7 :(
  • iso::from_year_day(END): ~1.3

@Nikita-str
Copy link
Author

@Manishearth
Just in case, there are tests for all changed fns that checks in big enough range that new algo return the same result as prev alog.
(Except for mistake in prev impl)

@sffc
Copy link
Member

sffc commented Dec 6, 2024

fixed_from_iso and iso_from_fixed are performance critical fns, so it is compelling if we are able to improve their performance by as much as these benches suggest.

Regarding your comment on LTO: we only benchmark optimized code; we assume that users in performance-sensitive environments will be building with all optimizations enabled.

I fully expect that the compiler will unwrap the loop, but you have enough black_boxes that I would be surprised if the compiler changes the underlying algorithm in a way that it makes assumptions about the input. It's very possible that the algorithm looks unrecognizable.

@Nikita-str
Copy link
Author

About fixed_from_iso:

  1. If you will remove all black_box but still will pass the vec to fixed_from then ratio will be almost the same. So in case of iteration by vec of dates previous result of benches ~correct(2.7).

    fn fixed_from(f: fn(i32, u8, u8) -> RataDie, ymd_vec: &Vec<(i32, u8, u8)>) -> i64 {
        let mut sum = 0;
        for (year, month, day) in ymd_vec.iter().copied() {
            sum += f(year, month, day).to_i64_date();
            sum += f(year + 7, month, (day + 2) >> 1).to_i64_date();
            sum += f(year + 37, (month + 3) >> 1, (day + 3) >> 1).to_i64_date();
            sum += f(year + 137, (month + 7) >> 1, (day + 19) >> 1).to_i64_date();
        }
        sum
    }
    
    fn bench_fixed_from(c: &mut Criterion) {
        const Y: i32 = 1_000;
        const DELTA: i32 = 2_000;
        let ymd_vec = prep_gen_ymd_vec(Y, DELTA);
        
        let mut group = c.benchmark_group("fixed_from_iso");
    
        #[cfg(feature = "bench")]
        group.bench_function("OLD", |b| {
            b.iter(|| fixed_from(old::fixed_from_iso, &ymd_vec))
        });
        group.bench_function("NEW", |b| {
            b.iter(|| fixed_from(new::fixed_from_iso, &ymd_vec))
        });
    }

    And you even can remove black_box from return in prep_gen_ymd_vec (I don't expect this, I was afraid that in such case LTO will move year/month/day cycle to for (year, month, day) in ymd_vec.iter().copied() place).

    Then cargo bench "fixed_from_iso" --features "bench":

    fixed_from_iso/OLD      time:   [21.493 ms 21.603 ms 21.724 ms]
                            change: [-0.5445% +0.1499% +0.8029%] (p = 0.68 > 0.05)
    fixed_from_iso/NEW      time:   [7.8851 ms 7.9164 ms 7.9483 ms]
                            change: [-1.0912% -0.4945% +0.1104%] (p = 0.10 > 0.05)
    

    The ratio is slightly less: 2.7.

  2. Also you will get ~the same ratio(2.65) with one date(but there you cannot remove black_box otherwise it will be calculated in compile time)

    fn fixed_from_v2(f: fn(i32, u8, u8) -> RataDie) -> i64 {
        let mut sum = 0;
        let (year, month, day) = bb((2024, 12, 7));
        sum += f(year, month, day).to_i64_date();
        sum += f(year + 7, month, (day + 2) >> 1).to_i64_date();
        sum += f(year + 37, (month + 3) >> 1, (day + 3) >> 1).to_i64_date();
        sum += f(year + 137, (month + 7) >> 1, (day + 19) >> 1).to_i64_date();
        sum
    }
    fn bench_fixed_from(c: &mut Criterion) {
            #[cfg(feature = "bench")]
            group.bench_function("v2/OLD", |b| {
                b.iter(|| fixed_from_v2(old::fixed_from_iso))
            });
            group.bench_function("v2/NEW", |b| {
                b.iter(|| fixed_from_v2(new::fixed_from_iso))
            });
        }

    Then cargo bench "fixed_from_iso" --features "bench":

    fixed_from_iso/v2/OLD   time:   [14.520 ns 14.566 ns 14.609 ns]
                            change: [-1.8428% -1.0879% -0.4102%] (p = 0.00 < 0.05)
    fixed_from_iso/v2/NEW   time:   [5.4007 ns 5.4210 ns 5.4418 ns]
                            change: [-0.9680% +0.5250% +2.9374%] (p = 0.70 > 0.05)
    

    The ratio is: 2.65.

  3. But if you move the y/m/d cycle to the body of fn fixed_from then without black_box the ratio will be ~1 (in such case the cycle is const value, is it real case that we want to bench?). If you make ranges in y/m/d cycle unpredictable then compiler will still know about how dates changes, and the ratio will be ~1.1. And then, if you will make dates unpredictable the ratio will be ~the same as initial(2.6).

    fn fixed_from(f: fn(i32, u8, u8) -> RataDie) -> i64 {
        let mut sum = 0;
        for year in -1000..=3000 {
            for month in 1..=12u8 {
                for day in 1..=MONTH_DAYS[(month as usize) - 1] {
                    sum += f(year, month, day).to_i64_date();
                    sum += f(year + 7, month, (day + 2) >> 1).to_i64_date();
                    sum += f(year + 37, (month + 3) >> 1, (day + 3) >> 1).to_i64_date();
                    sum += f(year + 137, (month + 7) >> 1, (day + 19) >> 1).to_i64_date();
                }
            }
        }
        sum
    }
    fixed_from_iso/OLD      time:   [2.4114 ms 2.4211 ms 2.4313 ms]
                            change: [-1.1807% -0.5908% +0.0132%] (p = 0.05 > 0.05)
    fixed_from_iso/NEW      time:   [2.3871 ms 2.3956 ms 2.4045 ms]
                            change: [-0.3403% +0.1353% +0.6289%] (p = 0.60 > 0.05)
    

    The ratio is: 1

    If we wrap all y/m/d ranges in black_box'es then compiler still know how dates change:

        for year in bb(-1000..=3000) {
            for month in bb(1..=12u8) {
                for day in bb(1..=MONTH_DAYS[(month as usize) - 1]) {
    fixed_from_iso/OLD      time:   [2.7803 ms 2.7901 ms 2.8005 ms]
                            change: [-1.0568% -0.5314% +0.0267%] (p = 0.06 > 0.05)
    fixed_from_iso/NEW      time:   [2.4587 ms 2.4680 ms 2.4776 ms]
                            change: [+0.7571% +1.2649% +1.7770%] (p = 0.00 < 0.05)
    

    The ratio is: 1.125

    If we will not wrap the ranges but will wrap each tuple into black_box then compiler can't predict how dates changes:

        for year in -1000..=3000 {
            for month in 1..=12u8 {
                for day in 1..=MONTH_DAYS[(month as usize) - 1] {
                    let (year, month, day) = bb((year, month, day));
    fixed_from_iso/OLD      time:   [21.261 ms 21.345 ms 21.438 ms]
                            change: [-1.1324% -0.6343% -0.1357%] (p = 0.01 < 0.05)
    fixed_from_iso/NEW      time:   [8.1986 ms 8.2348 ms 8.2716 ms]
                            change: [-1.7247% -1.1298% -0.5004%] (p = 0.00 < 0.05)
    

    The ratio is: 2.6

  4. If you will use random dates (and then you remove calculation time of random date) you will get ~initial ratio.

    fn rand_date() -> (i32, u8, u8) {
       let y = rand::thread_rng().gen_range(-3000..=3000);
       let m = rand::thread_rng().gen_range(1..=12);
       let d = rand::thread_rng().gen_range(1..=28);
       (y, m, d)
    }
    fn fixed_from(f: ...) -> i64 {
     let mut sum = 0;
     for _ in 0..100_000 {
       let (year, month, day) = rand_date();
       sum += f(year, month, day).to_i64_date();
       sum += f(year + 7, month, (day + 2) >> 1).to_i64_date();
       sum += f(year + 37, (month + 3) >> 1, (day + 3) >> 1).to_i64_date();
       sum += f(year + 137, (month + 7) >> 1, (day + 19) >> 1).to_i64_date();
     }
     sum
    }
    fixed_from_iso/OLD      time:   [3.2165 ms 3.2271 ms 3.2379 ms]
                           change: [+903.42% +908.88% +914.01%] (p = 0.00 < 0.05)
    fixed_from_iso/NEW      time:   [1.8040 ms 1.8111 ms 1.8186 ms]
                           change: [+898.62% +906.08% +912.84%] (p = 0.00 < 0.05)
    

    But for next code:

    fn fixed_from(f: ...) -> i64 {
      let mut sum = 0;
      for _ in 0..100_000 {
        let (year, month, day) = rand_date();
        sum += year + (month) as i32 + (day) as i32;
        sum += year + 7 + (month) as i32 + ((day + 2) >> 1) as i32;
        sum += year + 37 + ((month + 3) >> 1) as i32 + ((day + 3) >> 1) as i32;
        sum += year + 137 + ((month + 7) >> 1) as i32 + ((day + 19) >> 1) as i32;
        ...

    The time is:
    time: [1.1717 ms 1.1762 ms 1.1810 ms]
    And ratio of algos will be: (3.2 - 1.15)/(1.82 - 1.15) = ~3.05

So the question is only: do we want to test perf of a concrete cycles/cycles with some info(I guess -- not) or a general case when we don't know which dates we will have (for me it seems like a more real use case)?

Oh, one note about the cycles: I use it to calc average performance of algo, because there may be cases like iso::from_year_day. As you can see, I don't touch the algo functions themselves, only their args.


Now, about iso_from_fixed, in short:

  1. If you remove black_box from from_fixed (and even from prep_gen_rata_die_vec, what is again unexpected for me), you will get exactly the same ratio.
  2. If you will do it with a single RataDie, without RataDie cycle but with black_box (in my case it was let rd = bb(731_137);) the ratio will be ~the same(3.6).
  3. fn from_fixed_v3(f: IsoFromFixedFn) -> i64 {
        let map = |r: Result<(i32, u8, u8), I32CastError>| {
            let x = r.unwrap();
            x.0 as i64 + (x.1 + x.2) as i64
        };
        let mut sum = 0;
        for rd in 730_037..737_373 {
            sum += map(f(RataDie::new(rd)));
            sum += map(f(RataDie::new(rd + 1313)));
            sum += map(f(RataDie::new(rd + 7429)));
            sum += map(f(RataDie::new(rd - 5621)));
        }
        sum
    }
    In this case you will get the same ratio(3.9) again. I don't expect this, but it's quite understandable: seems like LTO could not optimize it into incrementing, but maybe in a new version LTO will could (It's a const... so maybe on some day it will be able to optimize it to a const).

As I can understand, I should remove as many black_box'es as I can, and commit it, right? I will not remove black_box'es from gen fns, ok?

@Manishearth
Copy link
Member

Problems:

  • A lot of code for perf win in unclear-if-perf-critical code
  • Licensing, the code has been written looking at GPL-licensed reference code

Discussion:

@nekevss: Cassio Neri gave a CPP conference presentation and has other sample code and benchmarks. There's an open issue on jiff as well.
@sffc: Having the algorithm be modular is not a bad idea, to let users plug in their own. Then this code can be landed somewhere else.
@sffc: I think this is pretty compelling and we should figure out how to make that work.
@Manishearth: say more about modularity
@sffc: few approaches. one is api based with a function call plugin. second is dependency based.
@Manishearth: api based hard because we use this API everywhere and we'd need a context object. dependency based seems fine, someone can publish a GPL crate that can be opted into.

Conclusion:

  • We'd love to get the benchmarks landed as a separate PR
  • We may in the long run want to do dependency-based pluggability.
  • ICU4X team may follow up on the licensing issue.

@sffc
Copy link
Member

sffc commented Dec 13, 2024

I re-read the notes and noticed that it states:

A lot of code for perf win in unclear-if-perf-critical code

I don't agree with this; fixed_from_iso and iso_from_fixed are very much in the critical path of datetime formatting.

@Manishearth
Copy link
Member

Manishearth commented Dec 13, 2024

@sffc I don't think that's sufficient justification? They're in the critical path (also, only sometimes? we shouldn't be calling this for most formatting operations unless converting between calendars ah there's week of year and day of year in these benches, those are more common), is their performance at all anywhere close to the order of magnitude of the overall performance of these tasks?

I'm happy to believe these are actually performance critical but so far I have yet to see benchmarks showing these numbers to be relevant. It is factually true that the perf benefit is not yet clear.

@Manishearth
Copy link
Member

The yd/ymd benchmarks measure the time taken for ~1,200,000 (365×2×2000) runs of the function, if I'm understanding the setup correctly (I'm on a phone, I might not be). The true time taken by most of these functions is in the order of nanoseconds.

The fixed benchmarks measure time taken for 10,000 runs, which also ends up with numbers in the realm of nanoseconds.

I am extremely skeptical that a single call to these functions during formatting is at all relevant for performance.

@Manishearth
Copy link
Member

Here are the numbers with iteration counts. I've listed the iteration count for each section under ITER, and the divided time is in braces. Note that some of these functions run an operation four times, I haven't divided by four.

A couple of these numbers seem a bit more low than I'd expect, so I'm not yet certain of these numbers, but at the very least these should be the right order of magnitude. Everything is in ns, from_fixed is the only stuff that's in the tens of nanoseconds. My understanding is that from_fixed is not in the critical path for formatting unless converting, at which point calendar-specific conversion code is going to eclipse this, not to mention how basically any bit of formatting code is going to eclipse this anyway.

ITER: 1,464,000
fixed_from_iso/OLD      time:   {16 ns} [23.898 ms 24.002 ms 24.120 ms]
                        change:         [+0.3544% +0.8842% +1.4301%] (p = 0.00 < 0.05)
fixed_from_iso/NEW      time:   {5 ns}  [8.4999 ms 8.5454 ms 8.5959 ms]
                        change:         [+0.0506% +0.9231% +1.7161%] (p = 0.03 < 0.05)

fixed_from_julian/OLD   time:   {7 ns}  [10.362 ms 10.429 ms 10.503 ms]
                        change:         [+1.4614% +2.2219% +2.9768%] (p = 0.00 < 0.05)
fixed_from_julian/NEW   time:   {4 ns}  [5.9676 ms 5.9940 ms 6.0227 ms]
                        change:         [-1.0328% -0.3838% +0.2869%] (p = 0.28 > 0.05)

day_of_week/iso/OLD     time:   {16 ns} [24.078 ms 24.185 ms 24.308 ms]
                        change:         [-0.3406% +0.2049% +0.7859%] (p = 0.49 > 0.05)
day_of_week/iso/NEW     time:   {8 ns}  [11.559 ms 11.602 ms 11.646 ms]
                        change:         [-1.6141% -1.0452% -0.4639%] (p = 0.00 < 0.05)

day_of_year/iso/OLD     time:   {7 ns}  [10.370 ms 10.419 ms 10.474 ms]
                        change:         [+1.8417% +2.4847% +3.1837%] (p = 0.00 < 0.05)
day_of_year/iso/NEW     time:   {3 ns}  [4.9905 ms 5.0421 ms 5.0995 ms]
                        change:         [+2.1166% +3.3795% +4.5959%] (p = 0.00 < 0.05)

day_of_year/julian/OLD  time:   {4 ns}  [5.3453 ms 5.4917 ms 5.5335 ms]
                        change:         [+0.8272% +1.7159% +2.5537%] (p = 0.00 < 0.05)
day_of_year/julian/NEW  time:   {3 ns}  [3.8352 ms 3.8482 ms 3.8619 ms]
                        change:         [+0.7694% +1.2778% +1.7898%] (p = 0.00 < 0.05)

ITER: 10,000
from_fixed/iso/OLD      time:   {72 ns} [727.22 µs 729.27 µs 731.30 µs]
                        change:         [-0.5123% +0.0909% +0.6777%] (p = 0.78 > 0.05)
from_fixed/iso/NEW      time:   {18 ns} [188.33 µs 188.89 µs 189.50 µs]
                        change:         [-3.1673% -2.4265% -1.6602%] (p = 0.00 < 0.05)

from_fixed/julian/OLD   time:   {27 ns} [272.92 µs 274.12 µs 275.52 µs]
                        change:         [-0.6668% +0.1737% +1.0292%] (p = 0.68 > 0.05)
from_fixed/julian/NEW   time:   {18 ns} [181.39 µs 182.00 µs 182.60 µs]
                        change:         [-1.8728% -1.0777% -0.3063%] (p = 0.01 < 0.05)

ITER: 1460000
from_year_day/iso/OLD/AVG
                        time:  {5.2 ns} [7.6168 ms 7.6470 ms 7.6807 ms]
                        change:         [+0.6165% +1.2356% +1.8289%] (p = 0.00 < 0.05)
from_year_day/iso/NEW/AVG
                        time:  {4.6 ns} [6.8247 ms 6.8469 ms 6.8697 ms]
                        change:         [+0.0875% +0.7088% +1.3100%] (p = 0.02 < 0.05)

ITER: 216000

from_year_day/iso/OLD/START
                        time:  {2.8 ns} [616.58 µs 618.70 µs 621.31 µs]
                        change:         [-0.0837% +0.6122% +1.2558%] (p = 0.07 > 0.05)
from_year_day/iso/NEW/START
                        time:  {4.0 ns} [878.33 µs 882.02 µs 886.45 µs]
                        change:         [-0.7903% +0.2027% +1.0847%] (p = 0.68 > 0.05)

ITER: 240000
from_year_day/iso/OLD/END
                        time:  {6.5 ns}   [1.5702 ms 1.5779 ms 1.5877 ms]
                        change:         [-1.7883% -0.5992% +0.5419%] (p = 0.32 > 0.05)
from_year_day/iso/NEW/END
                        time:  {4.8 ns}  [1.1674 ms 1.1708 ms 1.1741 ms]
                        change:         [-1.6080% -0.7991% +0.0059%] (p = 0.06 > 0.05)

@Manishearth
Copy link
Member

I don't think that's sufficient justification? They're in the critical path [snip] is their performance at all anywhere close to the order of magnitude of the overall performance of these tasks?

To expand on this, when it comes to whether we should optimize something when there are tradeoffs involved, here's how I look at it:

  • Is the code it is used in performance-critical? (i.e., who cares if datagen is a bit slow)
  • Is this snippet of code a significant chunk of the execution time of that function? Either it's called in a loop or it's just a comparable order of magnitude.

In this case we have a pile of functions, some of which are used in perf-critical code: I think the day-of-year and day-of-week stuff is going to be called in every format for certain skeleta, but the to/from fixed stuff is less critical to formatting? I could be wrong, but AIUI we should not call to/from fixed during formatting unless using a converting calendar.

So right off the bat it's unclear if we should be optimizing all of these. Then, on top of that, all of these functions are single or low-double digit nanoseconds (assuming my calcoulations above are correct, I do not have faith in them at the moment), and I'd be really surprised if that shows up on the order of magnitude of formatting performance since formatting actually hits memory and does string writing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants