Bias running average computation in float (#738)

kinjalpatel27 · web-flow · commit fe52b2a46e76 · 2026-01-05T14:35:34.000-08:00
## What does this PR do? **Type of change:** Bug fix **Overview:** ? Computing Bias running average with bf16 creates incorrect estimations. Impact on accuracy for Qwen2.5-7B model: With BF16 running average: NVFP4_AFFINE_KV | 59.11% -- | -- With running average in Float: NVFP4_AFFINE_KV | 71.81% -- | -- ## Usage Use examples/lm_eval/mmlu.py with batchsize of 1 Note: the issue is masked with larger batch sizes ## Testing - Ran mmlu benchmark with mmlu.py and nv-eval - also ploted bf16 and float running average for different layers, one of the example for layer 0 in Qwen2.5-7B: <img width="2100" height="600" alt="image" src="https://github.com/user-attachments/assets/715059c5-34a4-495e-b6f1-0b57cf0c08af" /> Note: for the larger value bf16 shows smaller value compared to float ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: NA - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: NA ## Additional Information  Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
diff --git a/modelopt/torch/quantization/calib/bias.py b/modelopt/torch/quantization/calib/bias.py
@@ -134,7 +134,13 @@ def collect(self, x: torch.Tensor):
             if self._calib_bias is None:
                 self._calib_bias = bias_
             else:
-                self._calib_bias = (self._calib_bias * self._cnt + bias_) / (self._cnt + 1)
+                dtype = bias_.dtype
+                # Convert bias to float for numerical stability
+                self._calib_bias = (self._calib_bias.float() * self._cnt + bias_.float()) / (
+                    self._cnt + 1
+                )
+                self._calib_bias = self._calib_bias.to(dtype)
+
             self._cnt += 1
         elif self._method == "max_min":
             max_, min_ = compute_maxmin(x, self._axis)