You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current implementation of SiLUT involves one extra storage for saving
the 1st-order gradient. This PR reduces the memory footprint by
calculating the 1st-order gradient on-the-fly in
`silut_double_backward`. It introduces an overhead of ~0.5% of
calculation time.
I've tested this PR on OMat with 9 DPA-3 layers and batch size=auto:512.
| Metric | Before | After | Improvement |
|------------------------|----------|----------|-------------|
| Peak Memory | 25.0G | 21.7G | -13% |
| Speed (per 100 steps) | 30.29s | 30.46s | -0.56% |
The correctness of this modification is covered by
`source/tests/pt/test_custom_activation.py`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Refactor**
- Streamlined internal computation logic by refining variable naming for
clarity.
- Updated public method signatures to return outputs in a structured
tuple, ensuring more intuitive and consistent integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Chun Cai <[email protected]>
Co-authored-by: Duo <[email protected]>
0 commit comments