[BUG]The time consumption of these two functions, backward_step and get_grad_norm, has increased

**Describe the bug**
When I trained a deepseek 16B model using GRPO+sglang+megatron in verl, I found the time taken for each step is steadily increasing, specifically the time for update_policy is continuously growing. Is there any bug？

**Stack trace/logs**
I found that the time consumption of these two functions in megatron-core, (backward_step and get_grad_norm) has increased.
stack trace 1:
forward_backward_pipelining_without_interleaving (core/pipeline_parallel/schedules.py:1959
backward_step (core/pipeline_parallel/schedules.py:400)
backward (torch/autograd/__init__.py:347)
_engine_run_backward (torch/autograd/graph.py:823)
stack trace 2:
get_grad_norm (core/optimizer/optimizer.py:192)
get_grad_norm_fp32 (core/optimizer/clip_grads.py:137)


**Environment (please complete the following information):**
 - Megatron-LM : megatron-core==0.12.0
 - PyTorch 2.6.0
 - CUDA version:12.4
 - NCCL version:2.21.5


**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]The time consumption of these two functions, backward_step and get_grad_norm, has increased #1691

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]The time consumption of these two functions, backward_step and get_grad_norm, has increased #1691

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions