-
Notifications
You must be signed in to change notification settings - Fork 285
Complex asinh accuracy refinement #6428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…h header disappears
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small things
libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_functions.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/__complex/inverse_hyperbolic_functions.h
Outdated
Show resolved
Hide resolved
| // An unsafe sqrt(_Tp + _Tp) extended precision sqrt. | ||
| template <typename _Tp> | ||
| static void __device__ __host__ __forceinline__ | ||
| __internal_double_Tp_sqrt_unsafe(_Tp __hi, _Tp __lo, _Tp* __out_hi, _Tp* __out_lo) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I would strongly prefer if we would return a simple struct here instead of inout parameters
template<class _Tp>
struct _CCCL_ALIGNAS(2 * sizeof(_Tp)) __cccl_sqrt_return_hilo {
__Tp __hi;
__Tp __low;
};There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Applied suggested constant initializers. Co-authored-by: Michael Schellenberger Costa <[email protected]> Co-authored-by: David Bayer <[email protected]>
…ions.h Replace __device__ __host__ Co-authored-by: Michael Schellenberger Costa <[email protected]>
…ions.h Add guards for non-cuda compilers Co-authored-by: Michael Schellenberger Costa <[email protected]>
…ions.h undo-ing clang-format Co-authored-by: David Bayer <[email protected]>
| #include <cuda/std/__cmath/abs.h> | ||
| #include <cuda/std/__cmath/copysign.h> | ||
| #include <cuda/std/__cmath/isinf.h> | ||
| #include <cuda/std/__cmath/isnan.h> | ||
| #include <cuda/std/__cmath/trigonometric_functions.h> | ||
| #include <cuda/std/__complex/complex.h> | ||
| #include <cuda/std/__complex/exponential_functions.h> | ||
| #include <cuda/std/__complex/logarithms.h> | ||
| #include <cuda/std/__complex/nvbf16.h> | ||
| #include <cuda/std/__complex/nvfp16.h> | ||
| #include <cuda/std/__complex/roots.h> | ||
| #include <cuda/std/limits> | ||
| #include <cuda/std/numbers> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are all of these includes dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My prejudice basically, in math we get compile time build bugs because of header parsing so I usually try take them out if they're not needed. I was over zealous here it seems, not sure why this even built on my machine. Added them back, like I would probably have to do later after changing the other functions also.
|
/ok to test 9629a78 |
😬 CI Workflow Results🟥 Finished in 2h 13m: Pass: 80%/90 | Total: 1d 06h | Max: 1h 32m | Hits: 94%/152135See results here. |
Update the complex asinh function to avoid numerical issues.
The current complex asinh function loses accuracy in several places.
These mostly relate to over/underflow, and catastrophic cancellation for tough values.
This new version fixes these accuracy issues while retaining it's perf.
Perf
On GH100 we don't have much perf difference. (There used to be a much large perf gap until #5371 got merged, which the current version is availing of).
Using the math-teams standard math_bench test we have the following:
Operations/SM/cycle:
casinh():Correctness
The current version has several intervals where accuracy gets lost.
Apart from the usual over/underflow suspects, there is also some very subtle intervals where accuracy gets badly thrown out, especially by catastrophic cancellation very close to
+-i.This new version fixes these and testing gives the following:
GPU Correctness
For the new version, an intensive bracket and bisect search, along with testing special hard values, gives:
CPU Correctness