fix: Initialize master_weight with params_dtype directly #1748

Mirza-Samad-Ahmed-Baig · 2025-08-15T17:21:11Z

Summary:
This change fixes a potential precision loss in _initialize_affine_weight_cpu by initializing master_weight directly with params_dtype instead of creating it in torch.float and then casting.

Problem:
Previously, master_weight was created as torch.float32 and then cast to params_dtype.
If params_dtype is torch.float16 or torch.bfloat16, this casting could cause precision truncation.
While the effect might be small, in large-scale training such errors can accumulate, potentially affecting convergence.

Solution:
Initialize master_weight directly with params_dtype.
Avoids unnecessary casting and ensures precision is correct from the outset.
Benefits:
Improved numerical precision: Maintains intended precision during initialization, important for mixed-precision training.

Code clarity: The initialization now reflects the intended data type explicitly.

Impact:
Improves numerical stability in Megatron-LM without altering other behaviors.
No API changes or backward compatibility issues.

fix: Initialize master_weight with params_dtype directly

f2b6115

sbhavani added the bug Something isn't working label Sep 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Initialize master_weight with params_dtype directly #1748

fix: Initialize master_weight with params_dtype directly #1748

Uh oh!

Mirza-Samad-Ahmed-Baig commented Aug 15, 2025

Uh oh!

Uh oh!

fix: Initialize master_weight with params_dtype directly #1748

Are you sure you want to change the base?

fix: Initialize master_weight with params_dtype directly #1748

Uh oh!

Conversation

Mirza-Samad-Ahmed-Baig commented Aug 15, 2025

Uh oh!

Uh oh!