-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[DOCS] Lora without regret #4181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, will go back after finishing the referenced blog
docs/source/lora_without_regret.md
Outdated
| # TODO: local command | ||
| ``` | ||
|
|
||
| To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details. | |
| To run the script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need uv to run local script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to use a custom uv based script. I'll use the standard trl scripts instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the development @burtenshaw !! 🙌
Adding some more comments. Maybe we could add pointers to the blogs for each key finding.
Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: sergiopaniego <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!!! just a few ideas and we're good to go :)
| ```bash | ||
|
|
||
| uv run "https://huggingface.co/datasets/burtenshaw/lora-without-regrets/resolve/main/grpo.py" \ | ||
| --model_name_or_path Qwen/Qwen3-0.6B \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default this model operates in "think" mode and thus produces many more tokens than the 4096 you've allocated. The best thing to do would be to copy the dataset (or make a subset) with a chat_template_kwargs column that has {"enable_thinking": false} if you want to only optimise the non-reasoning mode.
Alternatively you could pick a model like Gemma3 which doesn't reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake. The script and model choice don't align. In the SmolmLM3 and reasoning script I do:
def make_conversation(example):
prompt = [{"role": "user", "content": example["problem"]}]
example["chat_template_kwargs"] = {"enable_thinking": False}
return {"prompt": prompt}I'll update the script now on the hub: https://huggingface.co/datasets/burtenshaw/lora-without-regrets/blob/main/grpo.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw there is https://huggingface.co/datasets/trl-lib/documentation-images in case you want to use it.
Co-authored-by: lewtun <[email protected]>
Co-authored-by: Kashif Rasul <[email protected]>
Co-authored-by: Kashif Rasul <[email protected]>
Co-authored-by: Kashif Rasul <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm with a few minor suggestions
Feel free to merge even you Donny apply all the suggestions, we can still refine later :)
Co-authored-by: Quentin Gallouédec <[email protected]>
|
@qgallouedec Thanks for the review. I've responded but you'll need to merge. |
Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: sergiopaniego <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Kashif Rasul <[email protected]>
This is draft PR for a docs page to implement the blog post 'lora without regret' in TRL.
@edbeeching is going to review and share a script.
@sergiopaniego
example with sft
example with grpo