Skip to content

Conversation

@burtenshaw
Copy link
Collaborator

@burtenshaw burtenshaw commented Sep 30, 2025

This is draft PR for a docs page to implement the blog post 'lora without regret' in TRL.

@edbeeching is going to review and share a script.
@sergiopaniego

example with sft

hf jobs uv run \
    --flavor a100-large \
    --timeout 8h \
    --secrets HF_TOKEN \
    "https://gist.githubusercontent.com/burtenshaw/fce24305833f2ecacfe8da181901d345/raw/sft_lora.py" \
    --model_name_or_path Qwen/Qwen2.5-3B-Instruct \
    --dataset_name open-thoughts/OpenThoughts-114k \
    --learning_rate 2.0e-5 \
    --num_train_epochs 1 \
    --packing \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --gradient_checkpointing \
    --eval_strategy no \
    --use_peft \
    --lora_r 256 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --output_dir Qwen2.5-3B-OpenThoughts-LoRA \
    --report_to trackio \
    --push_to_hub

example with grpo

hf jobs uv run \
    --flavor a100-large \
    --timeout 6h \
    --secrets HF_TOKEN \
    "https://gist.githubusercontent.com/burtenshaw/f3fd519cb7efd647254c60b6b904cbcb/raw/c688abe1a9487090bb931b51ecec12c6737cdc52/grpo_lora.py" \
    --model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \
    --output_dir grpo-Qwen2.5-VL-3B-Instruct-LoRA \
    --learning_rate 1e-5 \
    --gradient_checkpointing \
    --torch_dtype bfloat16 \
    --max_prompt_length 2048 \
    --max_completion_length 1024 \
    --use_vllm \
    --vllm_mode colocate \
    --use_peft \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --log_completions \
    --report_to trackio \
    --push_to_hub

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, will go back after finishing the referenced blog

# TODO: local command
```

To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.
To run the script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need uv to run local script

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to use a custom uv based script. I'll use the standard trl scripts instead.

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the development @burtenshaw !! 🙌
Adding some more comments. Maybe we could add pointers to the blogs for each key finding.

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!!! just a few ideas and we're good to go :)

```bash

uv run "https://huggingface.co/datasets/burtenshaw/lora-without-regrets/resolve/main/grpo.py" \
--model_name_or_path Qwen/Qwen3-0.6B \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default this model operates in "think" mode and thus produces many more tokens than the 4096 you've allocated. The best thing to do would be to copy the dataset (or make a subset) with a chat_template_kwargs column that has {"enable_thinking": false} if you want to only optimise the non-reasoning mode.

Alternatively you could pick a model like Gemma3 which doesn't reason.

Copy link
Collaborator Author

@burtenshaw burtenshaw Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. The script and model choice don't align. In the SmolmLM3 and reasoning script I do:

def make_conversation(example):
    prompt = [{"role": "user", "content": example["problem"]}]
    example["chat_template_kwargs"] = {"enable_thinking": False}
    return {"prompt": prompt}

I'll update the script now on the hub: https://huggingface.co/datasets/burtenshaw/lora-without-regrets/blob/main/grpo.py

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw there is https://huggingface.co/datasets/trl-lib/documentation-images in case you want to use it.

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm with a few minor suggestions

Feel free to merge even you Donny apply all the suggestions, we can still refine later :)

@burtenshaw
Copy link
Collaborator Author

@qgallouedec Thanks for the review. I've responded but you'll need to merge.

@kashif kashif merged commit 1eff7da into huggingface:main Oct 3, 2025
1 check passed
qgallouedec added a commit that referenced this pull request Oct 6, 2025
Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: sergiopaniego <[email protected]>
Co-authored-by: lewtun <[email protected]>
Co-authored-by: Kashif Rasul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants