Skip to content

Commit

Permalink
new grpo logic (#274)
Browse files Browse the repository at this point in the history
  • Loading branch information
qgallouedec authored Feb 11, 2025
1 parent 82b2a65 commit 52aa875
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 8 deletions.
6 changes: 3 additions & 3 deletions recipes/DeepSeek-R1-Distill-Qwen-7B/grpo/config_demo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ lr_scheduler_type: cosine
max_prompt_length: 512
max_completion_length: 1024
max_steps: -1
num_generations: 2
num_generations: 7
num_train_epochs: 1
output_dir: data/DeepSeek-R1-Distill-Qwen-7B-GRPO
overwrite_output_dir: true
per_device_eval_batch_size: 4
per_device_train_batch_size: 2
per_device_eval_batch_size: 32
per_device_train_batch_size: 16
push_to_hub: true
report_to:
- wandb
Expand Down
6 changes: 3 additions & 3 deletions recipes/Qwen2.5-1.5B-Instruct/grpo/config_demo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ lr_scheduler_type: cosine
max_prompt_length: 512
max_completion_length: 1024
max_steps: -1
num_generations: 2
num_generations: 7
num_train_epochs: 1
output_dir: data/Qwen2.5-1.5B-Open-R1-GRPO
overwrite_output_dir: true
per_device_eval_batch_size: 4
per_device_train_batch_size: 2
per_device_eval_batch_size: 32
per_device_train_batch_size: 16
push_to_hub: true
report_to:
- wandb
Expand Down
4 changes: 2 additions & 2 deletions recipes/Qwen2.5-Math-7B/grpo/config_simple_rl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ num_generations: 7
num_train_epochs: 1
output_dir: data/Qwen-2.5-7B-Simple-RL
overwrite_output_dir: true
per_device_eval_batch_size: 2
per_device_train_batch_size: 2
per_device_eval_batch_size: 16
per_device_train_batch_size: 16
push_to_hub: true
report_to:
- wandb
Expand Down

0 comments on commit 52aa875

Please sign in to comment.