Improve precision of horner_polynomial #280
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Improve precision of horner_polynomial
Reason for change
We were previously implicitly relying on LLVM's behaviour of promoting f16 operations to f32 and getting extra precision that amounts to getting the same results as FMA. We never did that consistently (it varied between platforms) and we now consistently do not do that, resulting in incorrect results for at least half-precision pow. Use FMA to fix that.
Description of change
This MR consists of two commits that are not intended to be squashed, and are best reviewed separately. There is a large mostly mechanical NFC commit to simplify how
horner_polynomial
is used, followed by a small commit that fixes its implementation.Anything else we should know?
With this change, we pass OpenCL CTS
fp16-staging
branch's testing for half-precisionpow
.Checklist
recent version available through
pip
) on all modified code.