std.mem.swap is slow

### Zig Version

0.15.1

### Steps to Reproduce and Observed Behavior

## Zig 

* https://godbolt.org/z/xjsWxbT4W

```zig
const std = @import("std");

const Huge = struct {
    data: [128]u128,
};

export fn f(x: *Huge, y: *Huge) void {
    std.mem.swap(Huge, x, y);
}
```

```asm
f:
        push    rbp
        mov     rbp, rsp
        push    r15
        push    r14
        push    rbx
        sub     rsp, 2056
        lea     r15, [rbp - 2080]
        mov     edx, 2048
        mov     r14, rdi
        mov     rbx, rsi
        mov     rsi, r14
        mov     rdi, r15
        call    memcpy@PLT
        mov     edx, 2048
        mov     rdi, r14
        mov     rsi, rbx
        call    memcpy@PLT
        mov     edx, 2048
        mov     rdi, rbx
        mov     rsi, r15
        call    memcpy@PLT
        add     rsp, 2056
        pop     rbx
        pop     r14
        pop     r15
        pop     rbp
        ret
```


## Rust

https://godbolt.org/z/a7dv8jx7a

```rust
struct Huge([u128; 128]);

pub extern fn f(x: &mut Huge, y: &mut Huge) {
    std::mem::swap(x, y);
}
```

```asm
example::f::h4bf74e253b89ab56:
        xor     eax, eax
.LBB0_1:
        movaps  xmm0, xmmword ptr [rdi + 8*rax]
        movaps  xmm1, xmmword ptr [rdi + 8*rax + 16]
        movaps  xmm2, xmmword ptr [rsi + 8*rax]
        movaps  xmm3, xmmword ptr [rsi + 8*rax + 16]
        movaps  xmmword ptr [rdi + 8*rax], xmm2
        movaps  xmmword ptr [rdi + 8*rax + 16], xmm3
        movaps  xmmword ptr [rsi + 8*rax], xmm0
        movaps  xmmword ptr [rsi + 8*rax + 16], xmm1
        movaps  xmm0, xmmword ptr [rdi + 8*rax + 32]
        movaps  xmm1, xmmword ptr [rdi + 8*rax + 48]
        movaps  xmm2, xmmword ptr [rsi + 8*rax + 32]
        movaps  xmm3, xmmword ptr [rsi + 8*rax + 48]
        movaps  xmmword ptr [rdi + 8*rax + 32], xmm2
        movaps  xmmword ptr [rdi + 8*rax + 48], xmm3
        movaps  xmmword ptr [rsi + 8*rax + 32], xmm0
        movaps  xmmword ptr [rsi + 8*rax + 48], xmm1
        add     rax, 8
        cmp     rax, 256
        jne     .LBB0_1
        ret
```

Zig uses three memcpys, while Rust just shuffles some bytes in registers. I _think_ the two implementations actually differ. They give different results if two `Huge` structs overlap! Zig assumes aliasing, and gives suboptimal code, but it rather should _assert_ that no aliasing happens (adding `noalias` annotations doesn't change anything). 


### Expected Behavior

Zig should use O(1) extra memory to swap two structs of size N, with a no-alias safety check in `std.mem.swap`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

std.mem.swap is slow #25334

Zig Version

Steps to Reproduce and Observed Behavior

Zig

Rust

Expected Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

std.mem.swap is slow #25334

Description

Zig Version

Steps to Reproduce and Observed Behavior

Zig

Rust

Expected Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions