Feat: Context Parallel for Eagle3 Training #745

h-guo18 · 2026-01-08T08:53:51Z

What does this PR do?

Type of change: New Feature

Overview:

Supported Context Parallel by patching torch ring attention;
Require following libirary version for stable cp:
- torch2.8.0
- transformers5.0.0
- accelrate1.12.0
Move to FSDP2
Removed unused arguments in training script (--multi_gpu, fsdp_wrap_layer)

Usage

./launch_train.sh --model $MODEL \
            --output_dir $OUTPUT_DIR \  
            --data $DATA \
            --num_epochs 0.1 \
            --train_bs 1 \
            --eagle_config eagle_config.json \
            --training_seq_len 1024 \
            --cp_size 2   #newly added

Testing

SDPA level correctness: tested TTT attention with/without CP, diff < 1%

=== Compare context-parallel (CP) outputs and grads with non-CP ===
Forward output comparison (CP vs Non-CP):
  Absolute diff (adiff) cp_out vs out: 0.001953125
  Relative diff (rdiff) cp_out vs out: 0.00182342529296875
WQ (query proj) grad comparison (CP vs Non-CP):
  Absolute diff (adiff) cp_wq_grad vs wq_grad: 0.0078125
  Relative diff (rdiff) cp_wq_grad vs wq_grad: 0.00347900390625
WK (key proj) grad comparison (CP vs Non-CP):
  Absolute diff (adiff) cp_wk_grad vs wk_grad: 0.0078125
  Relative diff (rdiff) cp_wk_grad vs wk_grad: 0.002471923828125
WV (value proj) grad comparison (CP vs Non-CP):
  Absolute diff (adiff) cp_wv_grad vs wv_grad: 0.25
  Relative diff (rdiff) cp_wv_grad vs wv_grad: 0.0069580078125
==============================================================

E2E Training Acc
(Llama3.1-8B, Unsynthesized magpie)

Peak Mem Reserved
(llama3.1-8B, 8xH100, train_length=4k)

cp_size max_memory_allocated(MB) max_memory_reserved (MB)

1 65040.20 79018.00

2 50409.17 73098.00

4 45120.92 72052.00

8 38882.12 66484.00
Max Training Length test
(llama3.1-8B, H100)

cp_size 6k 12k 24k 48k

1 ✅ OOM OOM OOM

2 ✅ ✅ OOM OOM

4 ✅ ✅ ✅ OOM

8 ✅ ✅ ✅ ✅

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Signed-off-by: h-guo18 <[email protected]>

copy-pr-bot · 2026-01-08T08:53:55Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: h-guo18 <[email protected]>

copy-pr-bot · 2026-01-09T23:26:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: h-guo18 <[email protected]>

init: cp ttt

34be75c

Signed-off-by: h-guo18 <[email protected]>

h-guo18 added 2 commits January 8, 2026 08:55

docstring

f0d5cb9

Signed-off-by: h-guo18 <[email protected]>

add fsdp config

e0f8e57

Signed-off-by: h-guo18 <[email protected]>

h-guo18 self-assigned this Jan 8, 2026

update requirements

73235dc

Signed-off-by: h-guo18 <[email protected]>

h-guo18 marked this pull request as ready for review January 9, 2026 23:42

h-guo18 requested a review from a team as a code owner January 9, 2026 23:42

h-guo18 requested review from ChenhanYu and yeyu-nvidia January 9, 2026 23:42

h-guo18 added 2 commits January 11, 2026 17:36

efficient mask construction

8918bac

Signed-off-by: h-guo18 <[email protected]>

revert irrelevant change

e8ca86d

Signed-off-by: h-guo18 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: Context Parallel for Eagle3 Training #745

Feat: Context Parallel for Eagle3 Training #745

Uh oh!

h-guo18 commented Jan 8, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jan 8, 2026

Uh oh!

copy-pr-bot bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cp_size	max_memory_allocated(MB)	max_memory_reserved (MB)
1	65040.20	79018.00
2	50409.17	73098.00
4	45120.92	72052.00
8	38882.12	66484.00

cp_size	6k	12k	24k	48k
1	✅	OOM	OOM	OOM
2	✅	✅	OOM	OOM
4	✅	✅	✅	OOM
8	✅	✅	✅	✅

Feat: Context Parallel for Eagle3 Training #745

Are you sure you want to change the base?

Feat: Context Parallel for Eagle3 Training #745

Uh oh!

Conversation

h-guo18 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Jan 8, 2026

Uh oh!

copy-pr-bot bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

h-guo18 commented Jan 8, 2026 •

edited

Loading