Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve precision of horner_polynomial #280

Merged
merged 2 commits into from
Jan 8, 2024

Conversation

hvdijk
Copy link
Collaborator

@hvdijk hvdijk commented Jan 5, 2024

Overview

Improve precision of horner_polynomial

Reason for change

We were previously implicitly relying on LLVM's behaviour of promoting f16 operations to f32 and getting extra precision that amounts to getting the same results as FMA. We never did that consistently (it varied between platforms) and we now consistently do not do that, resulting in incorrect results for at least half-precision pow. Use FMA to fix that.

Description of change

This MR consists of two commits that are not intended to be squashed, and are best reviewed separately. There is a large mostly mechanical NFC commit to simplify how horner_polynomial is used, followed by a small commit that fixes its implementation.

Anything else we should know?

With this change, we pass OpenCL CTS fp16-staging branch's testing for half-precision pow.

Checklist

  • Read and follow the project Code of Conduct.
  • Make sure the project builds successfully with your changes.
  • Run relevant testing locally to avoid regressions.
  • Run clang-format-16 (the most
    recent version available through pip) on all modified code.

The common use of horner_polynomial is to take an array of coefficients,
and to specify that all coefficients in the array are used. This commit
changes horner_polynomial to allow inference of the size.
We were previously implicitly relying on LLVM's behaviour of promoting
f16 operations to f32 and getting extra precision that amounts to
getting the same results as FMA. We never did that consistently (it
varied between platforms) and we now consistently do not do that,
resulting in incorrect results for at least half-precision pow. Use FMA
to fix that.

tanpi and tgamma have hardcoded exceptions where the approximation was
known to not be sufficiently precise. These exceptions are updated.
@hvdijk hvdijk merged commit f732b95 into uxlfoundation:main Jan 8, 2024
3 checks passed
@hvdijk hvdijk deleted the horner-precision branch January 8, 2024 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants