Complex asinh accuracy refinement #6428

s-oboyle · 2025-10-31T20:57:33Z

Update the complex asinh function to avoid numerical issues.

The current complex asinh function loses accuracy in several places.
These mostly relate to over/underflow, and catastrophic cancellation for tough values.

This new version fixes these accuracy issues while retaining it's perf.

Perf

On GH100 we don't have much perf difference. (There used to be a much large perf gap until #5371 got merged, which the current version is availing of).
Using the math-teams standard math_bench test we have the following:

Operations/SM/cycle:
casinh():

H100	old	new	new/old
fp64	0.2531	0.2549	1.01
fp32	0.6072	0.6334	1.04

Correctness

The current version has several intervals where accuracy gets lost.
Apart from the usual over/underflow suspects, there is also some very subtle intervals where accuracy gets badly thrown out, especially by catastrophic cancellation very close to +-i.

This new version fixes these and testing gives the following:

GPU Correctness

For the new version, an intensive bracket and bisect search, along with testing special hard values, gives:

GPU fp64:
Max ulp real error (4.867,1.742) @ (0.007757045272,-0.0002247045536)    (0x3f7fc5d9fc1f5662,0xbf2d73d56affd72d)
        Ours = (0.007756967678,-0.0002246977954)    Ref = (0.007756967678,-0.0002246977954)
        Ours = (0x3f7fc5c527dd1d58,0xbf2d739b5d7a3961)               Ref = (0x3f7fc5c527dd1d53,0xbf2d739b5d7a3963)

Max ulp imag error (0.1719,5.453) @ (7.198570162e+103,-5.623976789e+101)        (0x558011effdb5a3ad,0xd510120190e898df)
        Ours = (239.8333247,-0.007812471425)    Ref = (239.8333247,-0.007812471425)
        Ours = (0x406dfaaa988c99ba,0xbf7ffff854599f01)               Ref = (0x406dfaaa988c99ba,0xbf7ffff854599efc)

GPU fp32:
Max ulp real error (6.619,2.232) @ (0.007812378462,-0.0007928675623)    (0x3bfffefb,0xba4fd871)
        Ours = (0.007812298369,-0.0007928435807)    Ref = (0.007812301628,-0.0007928434643)
        Ours = (0x3bfffe4f,0xba4fd6d5)               Ref = (0x3bfffe56,0xba4fd6d3)

Max ulp imag error (3.732,5.528) @ (0.007806597743,3.029832988e-05)     (0x3bffce7d,0x37fe292b)
        Ours = (0.007806516718,3.029741674e-05)    Ref = (0.007806518581,3.029740583e-05)
        Ours = (0x3bffcdcf,0x37fe2735)               Ref = (0x3bffcdd3,0x37fe272f)

CPU Correctness

CPU fp64:
Max ulp real error (4.125,0) @ (0.01542893159,-0)       (0x3f8f993424afec00,0x8000000000000000)
        Ours = (0.01542831951,-0)    Ref = (0.01542831951,-0)
        Ours = (0x3f8f98e1fdb37251,0x8000000000000000)               Ref = (0x3f8f98e1fdb3724d,0x8000000000000000)

Max ulp imag error (0.5078,3.484) @ (0.8869326854,-1.12001698e-254)     (0x3fec61c0a7b18800,0x8b3505783ad41800)
        Ours = (0.7991224705,-8.379245502e-255)    Ref = (0.7991224705,-8.379245502e-255)
        Ours = (0x3fe99269498d3a37,0x8b2f742347588681)               Ref = (0x3fe99269498d3a36,0x8b2f742347588684)

CPU fp32:
Max ulp real error (4.827,0.5125) @ (0.001131535857,0.9893865585)       (0x3a94500b,0x3f7d4870)
        Ours = (0.007776218932,1.424767017)    Ref = (0.007776216604,1.424766898)
        Ours = (0x3bfecfa7,0x3fb65ec4)               Ref = (0x3bfecfa2,0x3fb65ec3)

Max ulp imag error (0.498,4.695) @ (-5.695841894e+11,3.510264218e+10)   (0xd3049ddd,0x5102c47d)
        Ours = (-27.76321411,0.06155067682)    Ref = (-27.76321411,0.06155069545)
        Ours = (0xc1de1b10,0x3d7c1c90)               Ref = (0xc1de1b10,0x3d7c1c95)

…h header disappears

copy-pr-bot · 2025-10-31T20:57:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

davebayer

Some small things

libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_functions.h

miscco · 2025-11-03T10:07:50Z

libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_functions.h

+// An unsafe sqrt(_Tp + _Tp) extended precision sqrt.
+template <typename _Tp>
+static void __device__ __host__ __forceinline__
+__internal_double_Tp_sqrt_unsafe(_Tp __hi, _Tp __lo, _Tp* __out_hi, _Tp* __out_lo) noexcept


Nitpick: I would strongly prefer if we would return a simple struct here instead of inout parameters

template<class _Tp> struct _CCCL_ALIGNAS(2 * sizeof(_Tp)) __cccl_sqrt_return_hilo { __Tp __hi; __Tp __low; };

Applied suggested constant initializers. Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: David Bayer <[email protected]>

…ions.h Replace __device__ __host__ Co-authored-by: Michael Schellenberger Costa <[email protected]>

…ions.h Add guards for non-cuda compilers Co-authored-by: Michael Schellenberger Costa <[email protected]>

…ions.h undo-ing clang-format Co-authored-by: David Bayer <[email protected]>

davebayer · 2025-11-03T16:59:09Z

libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_functions.h

-#include <cuda/std/__cmath/abs.h>
-#include <cuda/std/__cmath/copysign.h>
-#include <cuda/std/__cmath/isinf.h>
-#include <cuda/std/__cmath/isnan.h>
-#include <cuda/std/__cmath/trigonometric_functions.h>
-#include <cuda/std/__complex/complex.h>
-#include <cuda/std/__complex/exponential_functions.h>
-#include <cuda/std/__complex/logarithms.h>
 #include <cuda/std/__complex/nvbf16.h>
 #include <cuda/std/__complex/nvfp16.h>
 #include <cuda/std/__complex/roots.h>
-#include <cuda/std/limits>
-#include <cuda/std/numbers>


Why are all of these includes dropped?

My prejudice basically, in math we get compile time build bugs because of header parsing so I usually try take them out if they're not needed. I was over zealous here it seems, not sure why this even built on my machine. Added them back, like I would probably have to do later after changing the other functions also.

s-oboyle · 2025-11-03T17:50:29Z

/ok to test 9629a78

github-actions · 2025-11-03T20:06:36Z

😬 CI Workflow Results

🟥 Finished in 2h 13m: Pass: 80%/90 | Total: 1d 06h | Max: 1h 32m | Hits: 94%/152135

See results here.

s-oboyle added 9 commits October 15, 2025 16:19

moving machine

1401314

Merge branch 'NVIDIA:main' into complex_asinh_accuracy_refinement

4469c9d

Comment cleanup

8b659d0

clang-format

c16cd26

More const's, comment cleanup

99c51dc

Remove unneeded headers. May need to add some back in when the roots.…

63898de

…h header disappears

Merge branch 'NVIDIA:main' into complex_asinh_accuracy_refinement

962f48a

re-enable header error

7844a7e

Add noexcept to internal function

08f5133

s-oboyle assigned miscco Oct 31, 2025

s-oboyle requested a review from a team as a code owner October 31, 2025 20:57

s-oboyle requested a review from pciolkosz October 31, 2025 20:57

github-project-automation bot added this to CCCL Oct 31, 2025

github-project-automation bot moved this to Todo in CCCL Oct 31, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Oct 31, 2025

spell-check

a8c2333

davebayer reviewed Nov 3, 2025

View reviewed changes

miscco requested changes Nov 3, 2025

View reviewed changes

github-project-automation bot moved this from In Review to In Progress in CCCL Nov 3, 2025

s-oboyle and others added 7 commits November 3, 2025 16:23

Apply suggestions from code review

c24b96d

Applied suggested constant initializers. Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: David Bayer <[email protected]>

Update libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_funct…

bce04ea

…ions.h Replace __device__ __host__ Co-authored-by: Michael Schellenberger Costa <[email protected]>

Updated all constant initialization

d69eafc

Update libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_funct…

9ad1b20

…ions.h Add guards for non-cuda compilers Co-authored-by: Michael Schellenberger Costa <[email protected]>

More non-cuda compiler guards

7996458

Update libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_funct…

55c63ad

…ions.h undo-ing clang-format Co-authored-by: David Bayer <[email protected]>

Changed return method for extended_sqrt

7b6aec1

davebayer reviewed Nov 3, 2025

View reviewed changes

s-oboyle added 2 commits November 3, 2025 18:46

Added headers back

8f447e8

Merge branch 'main' into complex_asinh_accuracy_refinement

9629a78

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Complex asinh accuracy refinement #6428

Complex asinh accuracy refinement #6428

Uh oh!

s-oboyle commented Oct 31, 2025

Uh oh!

copy-pr-bot bot commented Oct 31, 2025

Uh oh!

davebayer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

miscco Nov 3, 2025

Uh oh!

s-oboyle Nov 3, 2025

Uh oh!

davebayer Nov 3, 2025

Uh oh!

s-oboyle Nov 3, 2025 •

edited

Loading

Uh oh!

s-oboyle commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Complex asinh accuracy refinement #6428

Are you sure you want to change the base?

Complex asinh accuracy refinement #6428

Uh oh!

Conversation

s-oboyle commented Oct 31, 2025

Update the complex asinh function to avoid numerical issues.

Perf

Correctness

GPU Correctness

CPU Correctness

Uh oh!

copy-pr-bot bot commented Oct 31, 2025

Uh oh!

davebayer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

miscco Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

s-oboyle Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

davebayer Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

s-oboyle Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s-oboyle commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

😬 CI Workflow Results

🟥 Finished in 2h 13m: Pass: 80%/90 | Total: 1d 06h | Max: 1h 32m | Hits: 94%/152135

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

s-oboyle Nov 3, 2025 •

edited

Loading