Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regenerate kernel Horner coeffs: scaled to max val 1, and shave off some degrees for upsampfac=1.25 #499

Merged
merged 14 commits into from
Jul 23, 2024

Conversation

ahbarnett
Copy link
Collaborator

@ahbarnett ahbarnett commented Jul 22, 2024

This automates the ES kernel coeff generation in the new C++ header constexpr static templated array format of Barbone and Lu. This means we now have upsampfac=1.25 header coeff arrays, ready to use by the recent fast xsimd even-odd kernel evaluator of Lu.

In the process I corrected a bug in Ludvig's (or mine?) devel/gen_ker_horner_loop_C_code.m which confused d with nc=d+1, so made all degrees one less than intended.

It also reduces the # coeffs for upsampfac=1.25, so allows faster kernel eval here.
(exception is w=2, where degree goes from 3 to 4. This only affects tol=1e-1. If this causes slowdown, we should switch the degree rule from the current d=ceil(0.6w+3.2) to ceil(0.7w+2.2) or something.

It also rescales the ES kernel to max value 1 throughout the CPU spreadinterp.cpp code, fixing #454.

(GPU remains with the old kernel scale, thus may have overflow in 3d single-prec high-accuracy with large sized inputs, as in the above Issue.)

Finally I made a trivial fix to test/finufft1d_test to make it report ier=1 as a warning for type 3, as should be, and is in all other test codes.

Copy link
Member

@lu1and10 lu1and10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good to me. The only comment is https://github.com/flatironinstitute/finufft/pull/499/files#r1686214108
that std::ceil is not constexpr until c++23. Current code compiles as we use #include "ker_lowupsampfac_horner_allw_loop_constexpr.c", if later we change to use ker_lowupsampfac_horner_allw_loop_constexpr.h, the nc125 function may not compile with C++ standard < c++23.

@ahbarnett
Copy link
Collaborator Author

ahbarnett commented Jul 22, 2024 via email

@ahbarnett
Copy link
Collaborator Author

@lu1and10 and @DiamonDinoia see what you think of this much more maintainable scheme. A static array of ncs (nc for each w) is written. There is no code to maintain - rather a SSOT in the matlab codes. Much cleaner. Please check I used C++ correctly for the ncs definition, at the top of the two ker_*.h headers. Thanks!


template<class T, uint8_t w>
constexpr std::array<std::array<T, w>, nc200<w>()> get_horner_coeffs_200() noexcept {
constexpr auto nc = nc200<w>();
if constexpr (w == 2) {
if constexpr (w==2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces, the formatter will auto add them if run before commit. Please let me know if devnotes.rst or contributing.md are not easy to follow.

src/spreadinterp.cpp Show resolved Hide resolved
FLT c1[] = {2.5079742199350562E+01, -2.5079742199350562E+01, 0.0000000000000000E+00, 0.0000000000000000E+00};
FLT c2[] = {-3.5023281580177050E+00, -3.5023281580177086E+00, 0.0000000000000000E+00, 0.0000000000000000E+00};
FLT c3[] = {-7.3894949249195587E+00, 7.3894949249195632E+00, 0.0000000000000000E+00, 0.0000000000000000E+00};
FLT c0[] = {6.1209111871385724E-01, 6.1209111871385691E-01, 0.0000000000000000E+00, 0.0000000000000000E+00};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need padding anymore. The CPU version generates it (even when xsimd is not used explicitly. The GPU version does not use padding.

src/ker_horner_allw_loop_constexpr.c Show resolved Hide resolved
src/ker_lowupsampfac_horner_allw_loop_constexpr.h Outdated Show resolved Hide resolved
@ahbarnett
Copy link
Collaborator Author

ahbarnett commented Jul 22, 2024

well, that was extremely easy, and result much neater.

Ah, except conflict in spreadinterp.cpp now, will fix tomorrow.

@DiamonDinoia DiamonDinoia merged commit 3e231e2 into master Jul 23, 2024
72 checks passed
@DiamonDinoia
Copy link
Collaborator

New coefficients are a bit faster in 1D, 2/3D small difference

Intel(R)Xeon(R)Gold6244CPU@3 60GHz-3D
Intel(R)Xeon(R)Gold6244CPU@3 60GHz-2D
Intel(R)Xeon(R)Gold6244CPU@3 60GHz-1D

@DiamonDinoia DiamonDinoia added this to the 2.3 milestone Jul 23, 2024
@ahbarnett ahbarnett deleted the gencoef branch October 9, 2024 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants