[main][feature][under updating]adapt for offload activation #2145

GeYuhong · 2025-09-02T16:11:07Z

Description

This pr is used to adapt for offload activation (a new feature in Megatron-LM, NVIDIA/Megatron-LM#1752).

Offload activation select inputs of specific modules (such as core_attn, qkv_linear, router_fc1), offloading them to CPU in forward pass and reloading them to GPU in backward pass.

When offloading the modules that include weights (nn.Parameter), attributes of these weights (such as main_grad, grad_added_to_main_grad) are ripped off by torch. Therefore, this feature needs to modify the basic modules in TE (such as grouped_linear.py, 'layernorm_linear.py') to preserve these necessary attributes.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Based on whether the input tensor contains the offloading_activation attribute, add support for retrieving the offload_activation flag in grouped_linear.py, linear.py, and layernorm_linear.py.
Save the grad_added_to_main_grad attribute in forward pass and get it in backward pass in grouped_linear.py, linear.py, and layernorm_linear.py.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

for more information, see https://pre-commit.ci

Signed-off-by: Hongbin Liu <[email protected]>

…tion Hongbinl/adapt for offload activation

for more information, see https://pre-commit.ci

nvMelissa · 2025-10-15T19:24:06Z

Hi @GeYuhong, thanks for your contribution!
Our repository requires all commits to be signed off to comply with the Developer Certificate of Origin (DCO).
One or more of your commits are missing the Signed-off-by line. Please correct this issue so we can proceed with checks. For more details, refer to our CONTRIBUTING file: https://github.com/NVIDIA/TransformerEngine/blob/main/CONTRIBUTING.rst
Thank you! cc: @timmoon10

GeYuhong and others added 2 commits September 2, 2025 23:48

adapt grouped_linear, layernorm_linear and linear

ae109ff

[pre-commit.ci] auto fixes from pre-commit.com hooks

c44b45d

for more information, see https://pre-commit.ci

GeYuhong mentioned this pull request Sep 8, 2025

[main][feature][under updating]zero-overhead activation offload NVIDIA/Megatron-LM#1752

Open

lhb8125 and others added 5 commits September 18, 2025 07:00

Bug fix

93be702

Signed-off-by: Hongbin Liu <[email protected]>

renaming

f0726f7

Signed-off-by: Hongbin Liu <[email protected]>

minor fix for fp8

a1c6e07

Signed-off-by: Hongbin Liu <[email protected]>

Merge pull request #1 from GeYuhong/hongbinl/adapt_for_offload_activa…

3f9fd22

…tion Hongbinl/adapt for offload activation

[pre-commit.ci] auto fixes from pre-commit.com hooks

37250db

for more information, see https://pre-commit.ci

nvMelissa added megatron Support for Megatron community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. waiting-for-feedback Waiting for PR owner to answer question from repo maintainer labels Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main][feature][under updating]adapt for offload activation #2145

[main][feature][under updating]adapt for offload activation #2145

Uh oh!

GeYuhong commented Sep 2, 2025

Uh oh!

nvMelissa commented Oct 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[main][feature][under updating]adapt for offload activation #2145

Are you sure you want to change the base?

[main][feature][under updating]adapt for offload activation #2145

Uh oh!

Conversation

GeYuhong commented Sep 2, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

nvMelissa commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvMelissa commented Oct 15, 2025 •

edited

Loading