Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip fast exp/log/pow/sin/cosine tests without sse 4.1 #8541

Merged
merged 1 commit into from
Dec 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions src/IROperator.h
Original file line number Diff line number Diff line change
Expand Up @@ -970,28 +970,32 @@ Expr pow(Expr x, Expr y);
* mantissa. Vectorizes cleanly. */
Expr erf(const Expr &x);

/** Fast vectorizable approximation to some trigonometric functions for Float(32).
* Absolute approximation error is less than 1e-5. */
/** Fast vectorizable approximation to some trigonometric functions for
* Float(32). Absolute approximation error is less than 1e-5. Slow on x86 if
* you don't have at least sse 4.1. */
// @{
Expr fast_sin(const Expr &x);
Expr fast_cos(const Expr &x);
// @}

/** Fast approximate cleanly vectorizable log for Float(32). Returns
* nonsense for x <= 0.0f. Accurate up to the last 5 bits of the
* mantissa. Vectorizes cleanly. */
* mantissa. Vectorizes cleanly. Slow on x86 if you don't
* have at least sse 4.1. */
Expr fast_log(const Expr &x);

/** Fast approximate cleanly vectorizable exp for Float(32). Returns
* nonsense for inputs that would overflow or underflow. Typically
* accurate up to the last 5 bits of the mantissa. Gets worse when
* approaching overflow. Vectorizes cleanly. */
* approaching overflow. Vectorizes cleanly. Slow on x86 if you don't
* have at least sse 4.1. */
Expr fast_exp(const Expr &x);

/** Fast approximate cleanly vectorizable pow for Float(32). Returns
* nonsense for x < 0.0f. Accurate up to the last 5 bits of the
* mantissa for typical exponents. Gets worse when approaching
* overflow. Vectorizes cleanly. */
* overflow. Vectorizes cleanly. Slow on x86 if you don't
* have at least sse 4.1. */
Expr fast_pow(Expr x, Expr y);

/** Fast approximate inverse for Float(32). Corresponds to the rcpps
Expand Down
6 changes: 6 additions & 0 deletions test/performance/fast_pow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ int main(int argc, char **argv) {
printf("HL_TARGET is: %s\n", hl_target.to_string().c_str());
printf("HL_JIT_TARGET is: %s\n", hl_jit_target.to_string().c_str());

if (hl_jit_target.arch == Target::X86 &&
!hl_jit_target.has_feature(Target::SSE41)) {
printf("[SKIP] These intrinsics are known to be slow on x86 without sse 4.1.\n");
return 0;
}

if (hl_jit_target.arch == Target::WebAssembly) {
printf("[SKIP] Performance tests are meaningless and/or misleading under WebAssembly interpreter.\n");
return 0;
Expand Down
7 changes: 7 additions & 0 deletions test/performance/fast_sine_cosine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@ using namespace Halide::Tools;

int main(int argc, char **argv) {
Target target = get_jit_target_from_environment();

if (target.arch == Target::X86 &&
!target.has_feature(Target::SSE41)) {
printf("[SKIP] These intrinsics are known to be slow on x86 without sse 4.1.\n");
return 0;
}

if (target.arch == Target::WebAssembly) {
printf("[SKIP] Performance tests are meaningless and/or misleading under WebAssembly interpreter.\n");
return 0;
Expand Down