GdsFileCopier wait_io alignment fix corrupting weights for gpt-oss-120b on DGX Spark

When using fastsafetensors as part of vLLM on a DGX Spark (Blackwell GB 10, SM121 CUDA device), the GdsFileCopier gets used by default. The `noalign` parameter passed into `wait_io` is hardcoded to False, which triggers the alignment fixing code in GdsFileCopier. This is doing something to the weights, corrupting model responses of gpt-oss-120b and likely other models.

Turning on debug logging, I see messages like this while the model is loading:

```
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=0, tmp=0xe1e940000000                                                                                                                                                                       
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=1073741824, tmp=0xe1e940000000                                                                                                                                                              
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=2147483648, tmp=0xe1e940000000                                                                                                                                                              
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=3221225472, tmp=0xe1e940000000                                                                                                                                                              
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=4294967296, tmp=0xe1e940000000 
```

I'm able to restore expected model responses by hardcoding `noalign=True` in a local build of fastsafetensors or by hardcoding `nogds=True` in the vLLM integration of fastsafetensors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GdsFileCopier wait_io alignment fix corrupting weights for gpt-oss-120b on DGX Spark #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GdsFileCopier wait_io alignment fix corrupting weights for gpt-oss-120b on DGX Spark #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions