Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions docs/tutorials/nemo-rl-grpo/single-node-training.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ The Nemotron Nano 9B v2 model uses a custom chat template that must be modified
```bash
tokenizer_config_path=$(find $PWD/.cache/hub/models--nvidia--NVIDIA-Nemotron-Nano-9B-v2 -name tokenizer_config.json)
sed -i 's/enable_thinking=true/enable_thinking=false/g' $tokenizer_config_path
sed -i 's/{%- if messages\[-1\]\['\''role'\''\] == '\''assistant'\'' -%}{%- set ns.last_turn_assistant_content = messages\[-1\]\['\''content'\''\].strip() -%}{%- set messages = messages\[:-1\] -%}{%- endif -%}//g' $tokenizer_config_path
sed -i 's/{%- if messages\[-1\]\['\''role'\'\'] == '\''assistant'\'' -%}{%- set ns.last_turn_assistant_content = messages\[-1\]\['\''content'\'\'].strip() -%}{%- set messages = messages\[:-1\] -%}{%- endif -%}//g' $tokenizer_config_path
```

**✅ Success Check**: The `sed` commands complete without errors.
Expand All @@ -75,6 +75,10 @@ sed -i 's/{%- if messages\[-1\]\['\''role'\''\] == '\''assistant'\'' -%}{%- set

**Estimated time**: ~15-30 minutes

:::{note}
The configuration file and training script referenced below are located in the NeMo RL repository that you cloned during {doc}`Setup <setup>`. Make sure you're in the `RL/` directory before running the training command.
:::

By default, this runs only 3 training steps (`grpo.max_num_steps=3`) as a small test run in preparation for multi-node training. If you are using a single node for the full training run, you can remove this value. The full training will take several hours.

```bash
Expand Down Expand Up @@ -116,4 +120,4 @@ The end of the command above does the following:
2. `&`: This final ampersand runs the job in the background, which frees up your terminal to do other things. You can view all the background jobs using the `jobs` command. If you need to quit the training run, you can use the `fg` command to bring the job from the background into the foreground and then Ctrl+C like normal.
:::

**✅ Success Check**: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing.
**✅ Success Check**: Training completes 3 steps on single node without any issues. Check the logs for errors and verify that training steps are progressing.