Skip to content

Conversation

Shnatsel
Copy link
Member

@Shnatsel Shnatsel commented Jun 3, 2025

The code isn't vectorized on baseline x86_64 but at least per-block bounds checks in the inner loop are gone.

Assembly:
before: https://rust.godbolt.org/z/f5r1noP51
after: https://rust.godbolt.org/z/1fx9qebvh

If I tell the compiler to target Zen 3, it does vectorize the code, so perhaps this function would be a good target for multiversioning: https://rust.godbolt.org/z/jrnor5vqY
But that's out of scope of this PR.

@kornelski kornelski merged commit e9a3fb4 into image-rs:main Jun 3, 2025
7 checks passed
@kornelski
Copy link
Contributor

Couple more panics can be removed:

    let block_size = 4 << size_bits;
    if block_size == 0 || width == 0 || block_xsize == 0 {
        return;
    }let Some(row_tf_data) = transform_data.get(row_transform_data_start..) else { continue; };

This is probably negligible for code speed, but makes the compiled code nicer.

@Shnatsel
Copy link
Member Author

Shnatsel commented Jun 3, 2025

Another thing that removes a lot of code is turning this

#[inline]
pub(crate) fn subsample_size(size: u16, bits: u8) -> u16 {
    ((u32::from(size) + (1u32 << bits) - 1) >> bits)
        .try_into()
        .unwrap()
}

into this

#[inline]
pub(crate) fn subsample_size(size: u16, bits: u8) -> u16 {
    ((u32::from(size) + (1u32 << bits) - 1) >> bits) as u16
}

which makes the godbolt assembly way shorter.

But I don't think it matters for performance and it's nice to have a sanity check so I left it as is.

@Shnatsel Shnatsel deleted the wip-color-transform-opt branch June 3, 2025 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants