Add SIMD optimization for int_to_float conversion #580

hjanuschka · 2025-12-22T09:02:40Z

SIMD fast paths for the int_to_float function which converts custom bit-depth floats stored as i32 back to f32.

32-bit float: straightforward bitcast via SIMD.

16-bit float (f16): SIMD handles normal values, zeros, and inf/nan. Subnormals fall back to scalar since they need a variable-iteration normalization loop.

Waiting for perf CI to see the impact.

Add SIMD fast paths for converting custom bit-depth floats to f32: - 32-bit float passthrough: Simple bitcast using SIMD - 16-bit float (f16/half-precision): SIMD conversion with scalar fallback for subnormal values The 16-bit float SIMD path handles normal, zero, and inf/nan cases directly, falling back to scalar for the rare subnormal case which requires variable-iteration normalization. Also adds BitDepth::f16() test helper and comprehensive unit tests for the conversion functions.

github-actions · 2025-12-22T09:23:18Z

Benchmark @ `85ee297`

MULTI-FILE BENCHMARK RESULTS (4 files)
  CPU architecture: x86_64
  WARNING: System appears noisy: high system load (2.66). Results may be unreliable.
Statistics:
  Confidence:               99.0%
  Max relative error:        3.0%

Comparing: 352a1543 (Base) vs a1817c3d (PR)

File	Base (MP/s)	PR (MP/s)	Δ%
bike.jxl	23.839	23.241	-2.51% ±1.8%
green_queen_modular_e3.jxl	8.093	6.372	-21.26% ±0.9%
green_queen_vardct_e3.jxl	20.070	19.593	-2.38% ±0.9%
sunset_logo.jxl	2.244	2.303	+2.63% ±1.7%

jxl/src/render/stages/convert.rs

Address veluca93 review: add load_f16_bits() and store_f16() methods to F32SimdVec trait instead of implementing conversion in convert.rs. - AVX2+F16C: Hardware _mm256_cvtph_ps/_mm256_cvtps_ph - AVX-512: Hardware _mm512_cvtph_ps/_mm512_cvtps_ph - SSE4.2/NEON/Scalar: Scalar fallback Simplifies convert.rs by ~100 lines.

veluca93 · 2026-01-17T09:50:37Z

jxl_simd/src/x86_64/avx.rs

+    fn load_f16_bits(d: Self::Descriptor, mem: &[u16]) -> Self {
+        assert!(mem.len() >= Self::LEN);
+        // Check for F16C at runtime and use hardware conversion if available
+        if is_x86_feature_detected!("f16c") {


That's not a good idea. Given that f16c is as common as avx2 (if not more), let's just always require f16c for the AVX2 path.

veluca93 · 2026-01-17T09:52:03Z

jxl_simd/src/aarch64/neon.rs

+
+        fn store_f16(this: F32VecNeon, dest: &mut [u16]) {
+            assert!(dest.len() >= F32VecNeon::LEN);
+            // TODO: Use vcvt_f16_f32 once Rust stdarch fix lands


I think at this point I would just use inline ASM here, but we can do that as a follow-up.

veluca93 · 2026-01-17T09:52:34Z

jxl_simd/src/x86_64/avx512.rs

+        unsafe fn load_f16_impl(d: Avx512Descriptor, mem: &[u16]) -> F32VecAvx512 {
+            // SAFETY: mem.len() >= 16 is checked by caller, and avx512f is available
+            unsafe {
+                let bits = _mm256_loadu_si256(mem.as_ptr() as *const __m256i);


Only the loadu needs to be in an unsafe block.

veluca93 · 2026-01-17T09:53:10Z

jxl_simd/src/x86_64/avx512.rs

+            unsafe {
+                // _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC = 0
+                let bits = _mm512_cvtps_ph::<0>(v);
+                _mm256_storeu_si256(dest.as_mut_ptr() as *mut __m256i, bits);


Similarly, only the store needs to be in an unsafe block.

veluca93 · 2026-01-17T09:53:52Z

jxl_simd/src/x86_64/avx512.rs

+        // AVX512 implies F16C, so we can always use hardware conversion
+        #[target_feature(enable = "avx512f")]
+        #[inline]
+        unsafe fn load_f16_impl(d: Avx512Descriptor, mem: &[u16]) -> F32VecAvx512 {


This function does not need to be unsafe if we move the assert inside.

veluca93 · 2026-01-17T09:54:16Z

jxl_simd/src/x86_64/avx512.rs

+            // SAFETY: dest.len() >= 16 is checked by caller, and avx512f is available
+            unsafe {
+                // _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC = 0
+                let bits = _mm512_cvtps_ph::<0>(v);


Let's please use ::<{_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC}>.

veluca93 · 2026-01-17T09:57:05Z

jxl_simd/src/scalar.rs


 use super::{F32SimdVec, I32SimdVec, SimdDescriptor, SimdMask};

+/// Convert f16 bits (as u16) to f32.


There's already https://github.com/libjxl/jxl-rs/blob/main/jxl/src/util/float16.rs that has conversion code.

I think we should use that type and code (perhaps by moving the code to the jxl_simd crate), instead of using u16.

hjanuschka added 2 commits December 22, 2025 10:02

Fix clippy excessive precision warnings in f16 tests

fca2520

veluca93 reviewed Dec 22, 2025

View reviewed changes

jxl/src/render/stages/convert.rs Show resolved Hide resolved

hjanuschka added 2 commits January 16, 2026 19:35

Add SAFETY comments to unsafe blocks

85ee297

veluca93 reviewed Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SIMD optimization for int_to_float conversion #580

Add SIMD optimization for int_to_float conversion #580

Uh oh!

hjanuschka commented Dec 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

veluca93 Jan 17, 2026

Uh oh!

veluca93 Jan 17, 2026

Uh oh!

veluca93 Jan 17, 2026

Uh oh!

veluca93 Jan 17, 2026

Uh oh!

veluca93 Jan 17, 2026

Uh oh!

veluca93 Jan 17, 2026

Uh oh!

veluca93 Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		use super::{F32SimdVec, I32SimdVec, SimdDescriptor, SimdMask};

		/// Convert f16 bits (as u16) to f32.

Add SIMD optimization for int_to_float conversion #580

Are you sure you want to change the base?

Add SIMD optimization for int_to_float conversion #580

Uh oh!

Conversation

hjanuschka commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark @ 85ee297

Uh oh!

Uh oh!

veluca93 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

veluca93 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

veluca93 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

veluca93 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

veluca93 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

veluca93 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

veluca93 Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hjanuschka commented Dec 22, 2025 •

edited

Loading

github-actions bot commented Dec 22, 2025 •

edited

Loading

Benchmark @ `85ee297`