cross3, dot3, scale, bias benchmark (AOS) - scalar always faster than zmath on M1 #5

dmurph · 2024-07-14T19:46:23Z

I'm consistently seeing scalar being faster on M1 mac, with -Doptimize=ReleaseFast

Example: cross3, dot3, scale, bias benchmark (AOS) - scalar version: 0.9780s, zmath version: 1.0045s

I noticed that the 'swizzle' function call actually has extra CPU instructions generated - see the dot4Old function in this godbolt and play around with the commented out line and the one next to it.

By changing cross3 to use shuffle this seems to help the benchmark:

pub inline fn cross3(v0: Vec, v1: Vec) Vec {
    var xmm0 = @shuffle(f32, v0, undefined, [4]i32{ 1, 2, 0, 2 });
    var xmm1 = @shuffle(f32, v1, undefined, [4]i32{ 2, 0, 1, 3 });
    var result = xmm0 * xmm1;
    xmm0 = @shuffle(f32, xmm0, undefined, [4]i32{ 1, 2, 0, 3 });
    xmm1 = @shuffle(f32, xmm1, undefined, [4]i32{ 2, 0, 1, 3 });
    result = result - xmm0 * xmm1;
    return andInt(result, f32x4_mask3);
}

I recommend changing this everywhere. Also the dot2 is weird... there are a lot of potential perf improvements in the zmath area.

The text was updated successfully, but these errors were encountered:

dmurph mentioned this issue Jul 15, 2024

[zmath] Replace swizzles with shuffles & remove some unnecessary math complexity to increase perf. zig-gamedev/zig-gamedev#637

Draft

hazeycode transferred this issue from zig-gamedev/zig-gamedev Nov 5, 2024

hazeycode changed the title ~~zmath: cross3, dot3, scale, bias benchmark (AOS) - scalar always faster than zmath on M1~~ cross3, dot3, scale, bias benchmark (AOS) - scalar always faster than zmath on M1 Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cross3, dot3, scale, bias benchmark (AOS) - scalar always faster than zmath on M1 #5

cross3, dot3, scale, bias benchmark (AOS) - scalar always faster than zmath on M1 #5

dmurph commented Jul 14, 2024

cross3, dot3, scale, bias benchmark (AOS) - scalar always faster than zmath on M1 #5

cross3, dot3, scale, bias benchmark (AOS) - scalar always faster than zmath on M1 #5

Comments

dmurph commented Jul 14, 2024