[DOCS] Lora without regret #4181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

kashif merged 24 commits into huggingface:main from burtenshaw:lora-without-regret

Oct 3, 2025

+449 −0

Collaborator

burtenshaw commented Sep 30, 2025 •

edited

Loading

This is draft PR for a docs page to implement the blog post 'lora without regret' in TRL.

@edbeeching is going to review and share a script.
@sergiopaniego

example with sft

hf jobs uv run \
    --flavor a100-large \
    --timeout 8h \
    --secrets HF_TOKEN \
    "https://gist.githubusercontent.com/burtenshaw/fce24305833f2ecacfe8da181901d345/raw/sft_lora.py" \
    --model_name_or_path Qwen/Qwen2.5-3B-Instruct \
    --dataset_name open-thoughts/OpenThoughts-114k \
    --learning_rate 2.0e-5 \
    --num_train_epochs 1 \
    --packing \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 16 \
    --gradient_checkpointing \
    --eval_strategy no \
    --use_peft \
    --lora_r 256 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --output_dir Qwen2.5-3B-OpenThoughts-LoRA \
    --report_to trackio \
    --push_to_hub

example with grpo

hf jobs uv run \
    --flavor a100-large \
    --timeout 6h \
    --secrets HF_TOKEN \
    "https://gist.githubusercontent.com/burtenshaw/f3fd519cb7efd647254c60b6b904cbcb/raw/c688abe1a9487090bb931b51ecec12c6737cdc52/grpo_lora.py" \
    --model_name_or_path Qwen/Qwen2.5-VL-3B-Instruct \
    --output_dir grpo-Qwen2.5-VL-3B-Instruct-LoRA \
    --learning_rate 1e-5 \
    --gradient_checkpointing \
    --torch_dtype bfloat16 \
    --max_prompt_length 2048 \
    --max_completion_length 1024 \
    --use_vllm \
    --vllm_mode colocate \
    --use_peft \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_target_modules all-linear \
    --log_completions \
    --report_to trackio \
    --push_to_hub

burtenshaw added 3 commits

September 30, 2025 16:22


          add first draft of blog post

4e33b88


          add to toc

a9fcac0


          Merge branch 'main' into lora-without-regret

55a35ad

HuggingFaceDocBuilderDev commented Sep 30, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergiopaniego reviewed

View reviewed changes

Member

sergiopaniego left a comment

First pass, will go back after finishing the referenced blog

docs/source/_toctree.yml Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated

    
              # TODO: local command

              ```

              To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.

Member

sergiopaniego Sep 30, 2025

Suggested change

      
            To run th script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.
          
            To run the script locally, you will need to have `uv` installed. Check out the [uv documentation](https://docs.astral.sh/uv/) for more details.

Member

qgallouedec Oct 1, 2025

I don't think we need uv to run local script

Collaborator Author

burtenshaw Oct 2, 2025

I was going to use a custom uv based script. I'll use the standard trl scripts instead.

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

sergiopaniego reviewed

View reviewed changes

Member

sergiopaniego left a comment

Thanks for the development @burtenshaw !! 🙌
Adding some more comments. Maybe we could add pointers to the blogs for each key finding.

docs/source/lora_without_regret.md Show resolved Hide resolved

docs/source/lora_without_regret.md Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

qgallouedec reviewed

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

qgallouedec reviewed

View reviewed changes

docs/source/lora_without_regret.md Show resolved Hide resolved

qgallouedec reviewed

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

qgallouedec reviewed

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

qgallouedec reviewed

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

qgallouedec reviewed

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

qgallouedec reviewed

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

burtenshaw and others added 6 commits

October 2, 2025 12:59


          respond to feedback

ed094bf

Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: sergiopaniego <[email protected]>


          restructure and add commands in uv and jobs

d1675fc


          use latex math

21c78c3


          add the figure to the docs page

64dd3f5


          fix steps


          remove jobs from prose

lewtun reviewed

View reviewed changes

docs/source/lora_without_regret.md Show resolved Hide resolved


          add the parameters to the docs page

63a5d21

sergiopaniego approved these changes

View reviewed changes

Member

sergiopaniego left a comment

awesome!!! just a few ideas and we're good to go :)

docs/source/_toctree.yml Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Show resolved Hide resolved

burtenshaw added 4 commits

October 2, 2025 17:52


          add actual lora commands

46e3255


          respond to review

fd5eb14


          add take aways

fc85021


          add memory figure

3e1942d

lewtun reviewed

View reviewed changes

docs/source/lora_without_regret.md

    
              ```bash

              uv run "https://huggingface.co/datasets/burtenshaw/lora-without-regrets/resolve/main/grpo.py" \

                  --model_name_or_path Qwen/Qwen3-0.6B \

Member

lewtun Oct 2, 2025

By default this model operates in "think" mode and thus produces many more tokens than the 4096 you've allocated. The best thing to do would be to copy the dataset (or make a subset) with a chat_template_kwargs column that has {"enable_thinking": false} if you want to only optimise the non-reasoning mode.

Alternatively you could pick a model like Gemma3 which doesn't reason.

Collaborator Author

burtenshaw Oct 2, 2025 •

edited

Loading

My mistake. The script and model choice don't align. In the SmolmLM3 and reasoning script I do:

def make_conversation(example):
    prompt = [{"role": "user", "content": example["problem"]}]
    example["chat_template_kwargs"] = {"enable_thinking": False}
    return {"prompt": prompt}

I'll update the script now on the hub: https://huggingface.co/datasets/burtenshaw/lora-without-regrets/blob/main/grpo.py

lewtun reviewed

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

sergiopaniego reviewed

View reviewed changes

Member

sergiopaniego left a comment

btw there is https://huggingface.co/datasets/trl-lib/documentation-images in case you want to use it.


          Update docs/source/lora_without_regret.md

23238d7

Co-authored-by: lewtun <[email protected]>

kashif reviewed

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved


          Update docs/source/lora_without_regret.md

75ecba0

Co-authored-by: Kashif Rasul <[email protected]>

burtenshaw commented

View reviewed changes

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

burtenshaw and others added 5 commits

October 3, 2025 05:57


          Update docs/source/lora_without_regret.md

a56672d


          Apply suggestions from code review

081636e

Co-authored-by: Kashif Rasul <[email protected]>


          Update docs/source/lora_without_regret.md

087f100

Co-authored-by: Kashif Rasul <[email protected]>


          add python examples to tabs

c547533


          typos

c10527a

qgallouedec approved these changes

View reviewed changes

Member

qgallouedec left a comment

Lgtm with a few minor suggestions

Feel free to merge even you Donny apply all the suggestions, we can still refine later :)

docs/source/_toctree.yml Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

docs/source/lora_without_regret.md Outdated Show resolved Hide resolved

burtenshaw and others added 3 commits

October 3, 2025 19:52


          move in menu

d81f44a


          Apply suggestions from code review

5a59a8a

Co-authored-by: Quentin Gallouédec <[email protected]>


          fix tabs

27f373a

Collaborator Author

burtenshaw commented Oct 3, 2025

@qgallouedec Thanks for the review. I've responded but you'll need to merge.

kashif merged commit 1eff7da into huggingface:main

1 check passed

qgallouedec added a commit that referenced this pull request


          [DOCS] Lora without regret (#4181)

21a67fc

Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: sergiopaniego <[email protected]>
Co-authored-by: lewtun <[email protected]>
Co-authored-by: Kashif Rasul <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet