Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F64 exp returns infinity slightly too soon #523

Open
bnprks opened this issue Feb 17, 2024 · 0 comments
Open

F64 exp returns infinity slightly too soon #523

bnprks opened this issue Feb 17, 2024 · 0 comments
Labels

Comments

@bnprks
Copy link

bnprks commented Feb 17, 2024

The double-precision exp functions in Sleef appear to return infinity at slightly too low a value.

The current cutoff input above which Sleef returns infinity is:
709.78271114955742909217217426
However,
709.78271289338399673222338991 seems to be a better cutoff.

The cutoff Sleef uses is equal to
log(1.7976900000001013e+308), whereas the cutoff I propose is:
log(1.79769313486231570815e+308) aka log(DBL_MAX)

There are 15.3 million double-precision values between Sleef's cutoff and the one I propose. I tested all of those values on x86 SSE4 and AVX2 with the constant swapped out, and it appears that all of the return values are accurate to within 1 ULP (using expl with long double precision as the reference). This constant also exists in the pow functions where I believe it is similarly set too low, but I'm not sure how to perform exhaustive testing to confirm raising the constant is okay.

The specific line of code relevant is here, though I believe the constant should probably be adjusted in all 4 places it appears.

Testing code I used to check accuracy after adjusting the constant
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

#include "sleef.h"

//cmake -B build -S . -G Ninja
//cmake --build build
//gcc -o sleef_exp_example sleef_exp_example.c -lsleef -lm -L./build/lib -I ./build/include -march=native

double ulp_diff(long double expected, double actual);
size_t bitcast_u64(double x);
double bitcast_f64(size_t x);
void checkAccuracy(double *input, double *actual, long double *expected, size_t N, double max_ulp);
// Problematic constant: 709.78271114955742909217217426
// Improved constant:    709.78271289338399673222338991
int main() {
    size_t start = bitcast_u64(709.78271114955742909217217426) - 2;
    size_t end =   bitcast_u64(709.78271289338399673222338991) + 2;
    size_t N = (((end - start) + 7)/8) * 8; // Round to nearest multiple of 8
    
    double *inputs = malloc(N * sizeof(double));
    double *outputs = malloc(N * sizeof(double));
    long double *expected = malloc(N * sizeof(long double));

    for (size_t i = 0; i < N; i++) {
        inputs[i] = bitcast_f64(start + i);
    }
    printf("Make %zu inputs from %.20g to %.20g\n", N, inputs[0], inputs[N-1]);

    for (size_t i = 0; i < N; i += 1) {
        expected[i] = expl((long double) inputs[i]);
    }

    printf("Sleef_expd4_u10avx2:\n");
    for (size_t i = 0; i < N; i += 4) {
        __m256d x = _mm256_loadu_pd(inputs + i);
        x = Sleef_expd4_u10avx2(x);
        _mm256_storeu_pd(outputs + i, x);
    }
    checkAccuracy(inputs, outputs, expected, N, 1.0);
    
    printf("Sleef_expd2_u10sse4:\n");
    for (size_t i = 0; i < N; i++) outputs[i] = -1;
    for (size_t i = 0; i < N; i += 2) {
        __m128d x = _mm_loadu_pd(inputs + i);
        x = Sleef_expd2_u10sse4(x);
        _mm_storeu_pd(outputs + i, x);
    }
    checkAccuracy(inputs, outputs, expected, N, 1.0);
}


double ulp_diff(long double expected, double actual) {
  if (isinf(actual) && ((double) expected) == actual) return 0;

  long double diff = fabsl(expected - ((long double) actual));

  // Calculate lowerp-precision expected1 and expected2 that straddle
  // the actual value of expected
  double expected1 = expected;
  double expected2 =
      nextafter(expected1, (expected1 < expected ? 1 : 1) * INFINITY);
  double ulp = fabsl(((long double) expected1) - ((long double) expected2));

  return diff / ulp;
}

size_t bitcast_u64(double x) {
    size_t y;
    memcpy(&y, &x, sizeof(double));
    return y;
}
double bitcast_f64(size_t x) {
    double y;
    memcpy(&y, &x, sizeof(double));
    return y;
}

void checkAccuracy(double *input, double *actual, long double *expected, size_t N, double max_ulp) {
    double worst_ulp = 0.0;
    int infinite_outputs = 0;
    for (size_t i = 0; i < N; i++) {
        if (isinf((double) expected[i])) {
            infinite_outputs += 1;
        }
        double ulp = ulp_diff(expected[i], actual[i]);
        worst_ulp = ulp > worst_ulp ? ulp : worst_ulp;
        if (ulp > max_ulp || isnan(ulp)) {
            printf("ulp=%.20g (in=%a, out=%a) (expected=%La (%a))\n", ulp, input[i], actual[i], expected[i], (double) expected[i]);
            return;
        }
    }
    printf("All within %f ULP (worst observed=%.20g, infinite outputs=%d)\n", max_ulp, worst_ulp, infinite_outputs);
}
@blapie blapie added the algo label Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants