-
Notifications
You must be signed in to change notification settings - Fork 16
Closed
Description
When using fastsafetensors as part of vLLM on a DGX Spark (Blackwell GB 10, SM121 CUDA device), the GdsFileCopier gets used by default. The noalign parameter passed into wait_io is hardcoded to False, which triggers the alignment fixing code in GdsFileCopier. This is doing something to the weights, corrupting model responses of gpt-oss-120b and likely other models.
Turning on debug logging, I see messages like this while the model is loading:
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=0, tmp=0xe1e940000000
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=1073741824, tmp=0xe1e940000000
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=2147483648, tmp=0xe1e940000000
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=3221225472, tmp=0xe1e940000000
(EngineCore_DP0 pid=609395) wait_io: fix misalignment, src=0xe1e980000000, misaligned_bytes=8, count=4294967296, tmp=0xe1e940000000
I'm able to restore expected model responses by hardcoding noalign=True in a local build of fastsafetensors or by hardcoding nogds=True in the vLLM integration of fastsafetensors.
Metadata
Metadata
Assignees
Labels
No labels