-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add GRPOConfig for Arbor #8882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GRPOConfig for Arbor #8882
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the tutorials as well and test that they still work?
Those are in docs/docs/tutorials/rl_multihop
and docs/docs/tutorials/rl_papillon
@@ -0,0 +1,138 @@ | |||
from dataclasses import dataclass, field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for us to make a telepromp/grpo/ directory where we can put grpo.py
and grpo_config.py
? The GEPA optimizer does this so I think there is already a precedent allowing us to do this.
@zhassan223 Great work! Requested some small changes but wonderful progress! |
…ted GRPOConfig for rl multihop notebook
" multitask=True,\n", | ||
" num_dspy_examples_per_grpo_step=4,\n", | ||
" num_samples_per_input=8,\n", | ||
" num_rollouts_per_grpo_step=8,#changed from num_generations since that parameter doesn't exist anymore\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
… backwards compatibility to only use GRPOConfig dataclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more things!
@zhassan223 it looks like a few tests failed too, could you take a look? |
new data class for GRPO train_kwargs instead of dictionary. Makes it cleaner and has in-data-class argument checks.