Skip to content

Logging behaviour when steps_per_generation > gradient_accumulation_steps in GRPOTrainer #4086

@vatsal-kr

Description

@vatsal-kr

Hello. I am training a model using GRPO with steps_per_generation = 2*gradient accumulation steps, and my logs look something like the following:

  0%|          | 0/1562 [00:00<?, ?it/s]INFO 09-15 14:29:21 [executor_base.py:204] It took 0.292539 seconds to wake up tags {'weights', 'kv_cache'}.
INFO 09-15 14:29:21 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:31:05 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:31:07 [gpu_worker.py:115] Sleep mode freed 51.57 GiB memory, 49.25 GiB memory is still in use.
INFO 09-15 14:31:07 [executor_base.py:188] It took 1.295267 seconds to fall asleep.
INFO 09-15 14:31:16 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:31:17 [gpu_worker.py:115] Sleep mode freed 51.57 GiB memory, 48.55 GiB memory is still in use.
INFO 09-15 14:31:17 [executor_base.py:188] It took 1.595915 seconds to fall asleep.
INFO 09-15 14:31:28 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:31:29 [gpu_worker.py:115] Sleep mode freed 51.57 GiB memory, 49.11 GiB memory is still in use.
INFO 09-15 14:31:29 [executor_base.py:188] It took 1.461282 seconds to fall asleep.
INFO 09-15 14:31:45 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:31:47 [gpu_worker.py:115] Sleep mode freed 51.57 GiB memory, 49.25 GiB memory is still in use.
INFO 09-15 14:31:47 [executor_base.py:188] It took 1.713201 seconds to fall asleep.
[2025-09-15 14:37:19,498] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | optimizer_allgather: 33.42 | optimizer_gradients: 14.04 | optimizer_step: 58.16
[2025-09-15 14:37:19,501] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | fwd_microstep: 121398.40 | bwd_microstep: 122186.67 | bwd_inner_microstep: 121895.88 | bwd_allreduce_microstep: 286.20 | step_microstep: 135.19
[2025-09-15 14:37:19,503] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | fwd: 121396.43 | bwd: 122186.46 | bwd_inner: 121896.55 | bwd_allreduce: 285.83 | step: 135.20

  0%|          | 1/1562 [08:01<208:41:00, 481.27s/it]
                                                     
{'loss': 0.0389, 'grad_norm': 0.06330768764019012, 'learning_rate': 0.0, 'num_tokens': 6265709.0, 'completions/mean_length': 3095.2470703125, 'completions/min_length': 378.0, 'completions/max_length': 8192.0, 'completions/clipped_ratio': 0.060546875, 'completions/mean_terminated_length': 2766.76611328125, 'completions/min_terminated_length': 378.0, 'completions/max_terminated_length': 8192.0, 'rewards/list_reward/mean': 0.0029296875, 'rewards/list_reward/std': 0.054073650389909744, 'rewards/format_reward/mean': -0.0400390625, 'rewards/format_reward/std': 0.1961466521024704, 'rewards/soft_overlong_punishment/mean': -0.1396808624267578, 'rewards/soft_overlong_punishment/std': 0.33162686228752136, 'reward': -0.1767902374267578, 'reward_std': 0.47051888704299927, 'frac_reward_zero_std': 0.0, 'sampling/sampling_logp_difference/mean': 0.04104730114340782, 'sampling/sampling_logp_difference/max': 3.437645196914673, 'sampling/importance_sampling_ratio/min': 0.032140281051397324, 'sampling/importance_sampling_ratio/mean': 0.9999222755432129, 'sampling/importance_sampling_ratio/max': 2.0, 'kl': 0.0, 'entropy': 0.8257587123662233, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}

  0%|          | 1/1562 [08:01<208:41:00, 481.27s/it][2025-09-15 14:37:19,728] [INFO] [profiler.py:82:start_profile] Flops profiler started
[2025-09-15 14:40:07,671] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | optimizer_allgather: 32.00 | optimizer_gradients: 14.01 | optimizer_step: 45.43
[2025-09-15 14:40:08,113] [INFO] [profiler.py:230:end_profile] Flops profiler finished
[2025-09-15 14:40:08,115] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | fwd_microstep: 41583.50 | bwd_microstep: 123261.98 | bwd_inner_microstep: 122977.34 | bwd_allreduce_microstep: 279.95 | step_microstep: 104.55
[2025-09-15 14:40:08,117] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | fwd: 0.00 | bwd: 0.00 | bwd_inner: 122977.77 | bwd_allreduce: 278.81 | step: 0.00

  0%|          | 2/1562 [10:49<128:51:13, 297.36s/it]
                                                     
{'loss': -0.0409, 'grad_norm': 0.05034307762980461, 'learning_rate': 1.2658227848101265e-08, 'kl': 0.0, 'entropy': 0.825460085645318, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}

  0%|          | 2/1562 [10:49<128:51:13, 297.36s/it]INFO 09-15 14:40:10 [executor_base.py:204] It took 0.988489 seconds to wake up tags {'weights', 'kv_cache'}.
INFO 09-15 14:40:10 [executor_base.py:204] It took 1.057249 seconds to wake up tags {'kv_cache', 'weights'}.
INFO 09-15 14:40:10 [executor_base.py:204] It took 1.080800 seconds to wake up tags {'weights', 'kv_cache'}.
INFO 09-15 14:40:10 [executor_base.py:204] It took 1.007129 seconds to wake up tags {'weights', 'kv_cache'}.
INFO 09-15 14:40:10 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:40:10 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:40:10 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:40:10 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:42:08 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:42:09 [gpu_worker.py:115] Sleep mode freed 51.57 GiB memory, 67.18 GiB memory is still in use.
INFO 09-15 14:42:09 [executor_base.py:188] It took 1.463854 seconds to fall asleep.
INFO 09-15 14:42:20 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:42:21 [gpu_worker.py:115] Sleep mode freed 51.57 GiB memory, 67.74 GiB memory is still in use.
INFO 09-15 14:42:21 [executor_base.py:188] It took 1.460260 seconds to fall asleep.
INFO 09-15 14:42:35 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:42:36 [gpu_worker.py:115] Sleep mode freed 51.57 GiB memory, 67.88 GiB memory is still in use.
INFO 09-15 14:42:36 [executor_base.py:188] It took 1.503160 seconds to fall asleep.
INFO 09-15 14:42:47 [block_pool.py:280] Successfully reset prefix cache
INFO 09-15 14:42:48 [gpu_worker.py:115] Sleep mode freed 51.57 GiB memory, 77.17 GiB memory is still in use.
INFO 09-15 14:42:48 [executor_base.py:188] It took 1.600564 seconds to fall asleep.
[2025-09-15 14:48:06,446] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | optimizer_allgather: 32.02 | optimizer_gradients: 14.23 | optimizer_step: 45.45
[2025-09-15 14:48:06,449] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | fwd_microstep: 114578.15 | bwd_microstep: 117430.47 | bwd_inner_microstep: 117137.45 | bwd_allreduce_microstep: 286.56 | step_microstep: 105.41
[2025-09-15 14:48:06,452] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | fwd: 114577.10 | bwd: 117430.12 | bwd_inner: 117138.34 | bwd_allreduce: 286.16 | step: 105.42

  0%|          | 3/1562 [18:48<164:33:29, 379.99s/it]
                                                     
{'loss': -0.0684, 'grad_norm': 0.043870024383068085, 'learning_rate': 2.531645569620253e-08, 'num_tokens': 12020746.0, 'completions/mean_length': 3445.5126953125, 'completions/min_length': 324.0, 'completions/max_length': 8192.0, 'completions/clipped_ratio': 0.1181640625, 'completions/mean_terminated_length': 2809.493896484375, 'completions/min_terminated_length': 324.0, 'completions/max_terminated_length': 8192.0, 'rewards/list_reward/mean': 0.0078125, 'rewards/list_reward/std': 0.08808542788028717, 'rewards/format_reward/mean': -0.091796875, 'rewards/format_reward/std': 0.2888800799846649, 'rewards/soft_overlong_punishment/mean': -0.1961212158203125, 'rewards/soft_overlong_punishment/std': 0.3845544755458832, 'reward': -0.2801055908203125, 'reward_std': 0.6255616545677185, 'frac_reward_zero_std': 0.0, 'sampling/sampling_logp_difference/mean': 0.040210433304309845, 'sampling/sampling_logp_difference/max': 4.728130340576172, 'sampling/importance_sampling_ratio/min': 0.00884298887103796, 'sampling/importance_sampling_ratio/mean': 0.9999879598617554, 'sampling/importance_sampling_ratio/max': 2.0, 'kl': 0.000720990683475975, 'entropy': 0.8006362821906805, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}

  0%|          | 3/1562 [18:48<164:33:29, 379.99s/it][2025-09-15 14:50:48,366] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | optimizer_allgather: 32.14 | optimizer_gradients: 14.02 | optimizer_step: 45.35
[2025-09-15 14:50:48,368] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | fwd_microstep: 37983.79 | bwd_microstep: 119344.26 | bwd_inner_microstep: 119051.60 | bwd_allreduce_microstep: 286.91 | step_microstep: 105.03
[2025-09-15 14:50:48,370] [INFO] [logging.py:107:log_dist] [Rank 0] time (ms) | fwd: 37983.53 | bwd: 119343.62 | bwd_inner: 119052.35 | bwd_allreduce: 286.32 | step: 105.34

  0%|          | 4/1562 [21:30<127:11:36, 293.90s/it]
                                                     
{'loss': -0.0506, 'grad_norm': 0.04704352095723152, 'learning_rate': 3.79746835443038e-08, 'kl': 0.0007260022939590272, 'entropy': 0.8083300068974495, 'clip_ratio/low_mean': 0.0, 'clip_ratio/low_min': 0.0, 'clip_ratio/high_mean': 0.0, 'clip_ratio/high_max': 0.0, 'clip_ratio/region_mean': 0.0, 'epoch': 0.0}

The completion and reward stats are printed every two steps (ie every vllm generation step) as opposed to every optimizer step. Is this behaviour intended?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions