Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More optimizations of Sin and Cos #4154

Merged
merged 9 commits into from
Jan 2, 2025

Conversation

pleroy
Copy link
Member

@pleroy pleroy commented Jan 2, 2025

  1. Cleanup leftovers from the previous version of OSACA macros.
  2. Simplify the FMA wrappers.
  3. Change all the algorithms except Sin near 0 to operate on the absolute value of their operand and restore the sign at the end if needed.
  4. Extract a deeply nested FMA and replace it with a multiplication. This makes a difference on Zen.

Comparison on Zen 3. Before:

RAW TSC:                         min      1‰      1%      5%     10%     25%     50%
            identity            4.94   +0.00   +0.00   +0.00   +0.38   +0.38   +0.38
    sqrtps_xmm0_xmm0           16.72   +0.38   +0.38   +0.38   +0.38   +0.38   +0.38
     mulsd_xmm0_xmm0            7.60   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
  mulsd_xmm0_xmm0_4x           15.20   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
Slope: 1.186186 cycle/TSC
Correlation coefficient: 0.999887
Cycles:             expected     min      1‰      1%      5%     10%     25%     50%
R           identity       0   -0.07   +0.00   +0.00   +0.00   +0.45   +0.45   +0.45
R    mulsd_xmm0_xmm0       3    3.08   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
R mulsd_xmm0_xmm0_4x      12   11.64   +0.45   +0.45   +0.45   +0.45   +0.45   +0.45
       principia_cos           52.21   +0.45   +0.90   +0.90   +0.90   +0.90   +1.35
R   sqrtps_xmm0_xmm0      14   13.90   +0.45   +0.45   +0.45   +0.45   +0.45   +0.45
             std_cos           30.58   +0.45   +0.45   +0.45   +0.45   +0.45   +0.45

After:

RAW TSC:                         min      1‰      1%      5%     10%     25%     50%
            identity            4.94   +0.00   +0.00   +0.00   +0.38   +0.38   +0.38
    sqrtps_xmm0_xmm0           16.72   +0.00   +0.38   +0.38   +0.38   +0.38   +0.38
     mulsd_xmm0_xmm0            7.60   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
  mulsd_xmm0_xmm0_4x           15.20   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
Slope: 1.186186 cycle/TSC
Correlation coefficient: 0.999887
Cycles:             expected     min      1‰      1%      5%     10%     25%     50%
R           identity       0   -0.07   +0.00   +0.00   +0.00   +0.45   +0.45   +0.45
R    mulsd_xmm0_xmm0       3    3.08   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
R mulsd_xmm0_xmm0_4x      12   12.10   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
       principia_cos           48.16   +0.90   +0.90   +0.90   +1.35   +1.35   +1.35
R   sqrtps_xmm0_xmm0      14   13.90   +0.00   +0.45   +0.45   +0.45   +0.45   +0.45
             std_cos           30.58   +0.45   +0.45   +0.45   +0.45   +0.45   +0.45

Comparison on Golden Cove. Before:

RAW TSC:                         min      1‰      1%      5%     10%     25%     50%
            identity            2.40   +0.06   +0.06   +0.08   +0.08   +0.10   +0.10
    sqrtps_xmm0_xmm0            9.62   +0.02   +0.04   +0.06   +0.06   +0.08   +0.10
     mulsd_xmm0_xmm0            5.16   +0.02   +0.02   +0.04   +0.06   +0.76   +0.80
  mulsd_xmm0_xmm0_4x           11.84   +0.04   +0.06   +0.08   +0.08   +0.10   +0.12
Slope: 1.710658 cycle/TSC
Correlation coefficient: 0.999084
Cycles:             expected     min      1‰      1%      5%     10%     25%     50%
R           identity       0   -0.27   +0.07   +0.07   +0.10   +0.10   +0.14   +0.17
R    mulsd_xmm0_xmm0       4    4.38   +0.07   +0.07   +0.10   +0.10   +0.14   +1.33
R mulsd_xmm0_xmm0_4x      16   15.88   +0.03   +0.07   +0.10   +0.10   +0.14   +0.17
       principia_cos           52.35   +0.48   +0.72   +0.96   +1.13   +1.78   +2.39
R   sqrtps_xmm0_xmm0      12   11.94   +0.14   +0.17   +0.21   +0.21   +0.24   +0.24
             std_cos           31.58   +0.17   +0.24   +0.31   +0.34   +0.41   +0.51

After:

RAW TSC:                         min      1‰      1%      5%     10%     25%     50%
            identity            2.40   +0.06   +0.06   +0.08   +0.08   +0.10   +0.12
    sqrtps_xmm0_xmm0            9.62   +0.02   +0.04   +0.04   +0.06   +0.08   +0.08
     mulsd_xmm0_xmm0            5.16   +0.02   +0.02   +0.04   +0.04   +0.06   +0.78
  mulsd_xmm0_xmm0_4x           11.86   +0.02   +0.04   +0.06   +0.06   +0.08   +0.10
Slope: 1.707841 cycle/TSC
Correlation coefficient: 0.999116
Cycles:             expected     min      1‰      1%      5%     10%     25%     50%
R           identity       0   -0.23   +0.03   +0.07   +0.07   +0.07   +0.10   +0.68
R    mulsd_xmm0_xmm0       4    4.41   +0.03   +0.03   +0.07   +0.07   +0.10   +0.14
R mulsd_xmm0_xmm0_4x      16   15.86   +0.03   +0.07   +0.10   +0.14   +0.17   +3.01
       principia_cos           46.05   +0.41   +0.58   +0.72   +0.82   +0.99   +1.26
R   sqrtps_xmm0_xmm0      12   11.96   +0.10   +0.14   +0.17   +0.17   +0.20   +0.24
             std_cos           31.50   +0.17   +0.27   +0.34   +0.38   +0.44   +0.55

@@ -429,30 +434,30 @@ Value CosImplementation(DoublePrecision<Argument> const θ_reduced) {
auto const& e = θ_reduced.error;
double const abs_x = std::abs(x);
__m128d const sign = _mm_and_pd(masks::sign_bit, _mm_set_sd(x));
double const abs_e = _mm_cvtsd_f64(_mm_xor_pd(_mm_set_sd(e), sign));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
double const abs_e = _mm_cvtsd_f64(_mm_xor_pd(_mm_set_sd(e), sign));
double const e_abs = _mm_cvtsd_f64(_mm_xor_pd(_mm_set_sd(e), sign));

(meaning eabs, short for something like eabs θ) otherwise it looks like it is abs e.

@eggrobin eggrobin added the LGTM label Jan 2, 2025
@pleroy pleroy merged commit cc70c4f into mockingbirdnest:master Jan 2, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants