Commit fe52b2a
authored
Bias running average computation in float (#738)
## What does this PR do?
**Type of change:** Bug fix
**Overview:** ?
Computing Bias running average with bf16 creates incorrect estimations.
Impact on accuracy for Qwen2.5-7B model:
With BF16 running average:
NVFP4_AFFINE_KV | 59.11%
-- | --
With running average in Float:
NVFP4_AFFINE_KV | 71.81%
-- | --
## Usage
Use examples/lm_eval/mmlu.py with batchsize of 1
Note: the issue is masked with larger batch sizes
## Testing
- Ran mmlu benchmark with mmlu.py and nv-eval
- also ploted bf16 and float running average for different layers, one
of the example for layer 0 in Qwen2.5-7B:
<img width="2100" height="600" alt="image"
src="https://github.com/user-attachments/assets/715059c5-34a4-495e-b6f1-0b57cf0c08af"
/>
Note: for the larger value bf16 shows smaller value compared to float
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: NA
- **Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**:
NA
## Additional Information
<!-- E.g. related issue. -->
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>1 parent 4eb1835 commit fe52b2a
1 file changed
+7
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
137 | | - | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
138 | 144 | | |
139 | 145 | | |
140 | 146 | | |
| |||
0 commit comments