Skip to content

Conversation

@lbliii
Copy link
Contributor

@lbliii lbliii commented Dec 1, 2025

This PR creates a dedicated Get Started section that provides scoped articles for:

  • quickstart
  • installation
  • per-algorithm onboarding based off of example scripts
  • cluster setup

This is currently an exploration for what your Get Started could look like. I'm looking for feedback.

Summary by CodeRabbit

  • Documentation
    • Reorganized documentation structure with new "Getting Started" section.
    • Added comprehensive guides for installation, cluster setup, and training workflows (SFT, DPO, GRPO).
    • Redesigned main documentation index with improved navigation and layout.
    • Updated documentation links across guides to reflect new structure.
    • Consolidated and streamlined setup documentation for clarity.

✏️ Tip: You can customize this high-level summary in your review settings.

@lbliii lbliii self-assigned this Dec 1, 2025
@lbliii lbliii changed the title Llane/get started section docs: get started section Dec 1, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 1, 2025
@lbliii lbliii force-pushed the llane/get-started-section branch from 3df12ee to 16da7b7 Compare December 1, 2025 20:41
@lbliii lbliii requested a review from jgerh December 2, 2025 18:11
Copy link
Contributor

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed the review of get-started articles. The articles are clear, accessible, and engaging. Consistent style makes them easy to navigate, the teaching tone feels welcoming, and putting the "Steps" section upfront gives a clear path. The “What’s happening” sections also add helpful context, the index is a strong entry point into RL, and the Sphinx cards and tabs keep the presentation compact and visually interesting.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

📝 Walkthrough

Walkthrough

Documentation reorganization and relocation effort that moves setup, installation, and quickstart guides from docs/about/ and root docs/ directory to new docs/get-started/ structure. Updates cross-references throughout the codebase and restructures the main documentation index.

Changes

Cohort / File(s) Summary
Removed Documentation
docs/about/clusters.md, docs/about/installation.md, docs/about/quick-start.md, docs/cluster.md, docs/local-workstation.md
Deleted cluster setup, installation prerequisites, quick-start guides, and local workstation documentation previously located in root and about directories.
New Get-Started Guides
docs/get-started/cluster.md, docs/get-started/installation.md, docs/get-started/index.md, docs/get-started/sft.md, docs/get-started/dpo.md, docs/get-started/grpo.md
Added comprehensive setup, installation, and quickstart documentation for multi-node Slurm clusters, bare-metal environments, and training workflows (SFT, DPO, GRPO).
Updated Cross-References
docs/debugging.md, docs/guides/dpo.md, docs/guides/grpo.md, docs/guides/rm.md, docs/guides/sft.md
Updated documentation links from ../cluster.md to ../get-started/cluster.md and adjusted specific anchor references to reflect new file locations.
Documentation Index Restructuring
docs/index.md
Added YAML front matter metadata, expanded introductory section, reorganized navigation from broad headings into feature-focused quickstart layout, updated toctree structures with new paths (get-started, guides, design-docs), and converted list items to grid-item-card blocks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify all cross-reference updates are correctly applied across docs/guides/* and docs/debugging.md
  • Validate completeness of new documentation files, particularly docs/get-started/index.md and docs/get-started/installation.md for accuracy and migration of content from removed files
  • Ensure docs/index.md toctree restructuring with new paths and sections is semantically correct and improves documentation hierarchy
  • Confirm no essential content was lost during relocation from docs/about/ and root directories to docs/get-started/

Possibly related PRs

Suggested reviewers

  • aschilling-nv
  • snowmanwwg
  • terrykong

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: get started section' is directly related to the main change—the PR creates a dedicated 'Get Started' section with multiple new documentation files and reorganizes existing documentation to support this structure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains only documentation restructuring with no code changes, API modifications, or functional updates affecting numerics, convergence, or performance.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch llane/get-started-section

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
docs/get-started/cluster.md (1)

70-71: Remove "re-" from "re-queueing".

Line 71 uses "re-queueing" which is flagged by analysis tools. The standard spelling in this context is "requeuing" (without hyphen) or "re-submitting."

-Interactive mode launches the cluster and gives you a shell on the **Head Node**. This is perfect for debugging because you can run scripts, check files, and kill/restart jobs without re-queueing.
+Interactive mode launches the cluster and gives you a shell on the **Head Node**. This is perfect for debugging because you can run scripts, check files, and kill/restart jobs without resubmitting.
docs/get-started/index.md (3)

123-131: Specify language for code block.

The code block showing expected output should have a language specified for consistency with other examples in the documentation.

-   **Example output**:
-   ```
+   **Example output**:
+   ```text
    Initializing Ray cluster...
    Ray dashboard available at http://127.0.0.1:8265
    Loading model: Qwen/Qwen2.5-1.5B-Instruct
    Training started...
    Step 1: reward=0.25, policy_kl_error=0.001
    Step 2: reward=0.31, policy_kl_error=0.002
    ...
-   ```
+   ```

137-197: Convert dropdown headers from emphasis to proper Markdown headings.

Lines 137, 148, 168, and 172 use bold text as pseudo-headers within dropdown sections. Per Markdown best practices (MD036), these should be converted to proper heading levels (e.g., ### How to Control GPU Usage & Run Concurrent Jobs) for semantic correctness and consistency.

-:::{dropdown} 💡 How to Control GPU Usage & Run Concurrent Jobs
+:::{dropdown} 💡 How to Control GPU Usage & Run Concurrent Jobs

-**Controlling GPU Usage**
+#### Controlling GPU Usage

Apply the same pattern to the other emphasis-as-heading instances at lines 148, 168, and 172.


24-26: Add periods for consistency with other steps sections.

Per previous feedback, steps sections should have periods for consistency. Lines 24-26 lack terminal punctuation.

 **Steps**:
 
-1. Install `uv` and system prerequisites
-2. Clone the repository and initialize the environment
-3. Run a sample GRPO training job to verify installation
+1. Install `uv` and system prerequisites.
+2. Clone the repository and initialize the environment.
+3. Run a sample GRPO training job to verify installation.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1cad374 and 0831e2e.

📒 Files selected for processing (17)
  • docs/about/clusters.md (0 hunks)
  • docs/about/installation.md (0 hunks)
  • docs/about/quick-start.md (0 hunks)
  • docs/cluster.md (0 hunks)
  • docs/debugging.md (1 hunks)
  • docs/get-started/cluster.md (1 hunks)
  • docs/get-started/dpo.md (1 hunks)
  • docs/get-started/grpo.md (1 hunks)
  • docs/get-started/index.md (1 hunks)
  • docs/get-started/installation.md (1 hunks)
  • docs/get-started/sft.md (1 hunks)
  • docs/guides/dpo.md (1 hunks)
  • docs/guides/grpo.md (1 hunks)
  • docs/guides/rm.md (1 hunks)
  • docs/guides/sft.md (1 hunks)
  • docs/index.md (1 hunks)
  • docs/local-workstation.md (0 hunks)
💤 Files with no reviewable changes (5)
  • docs/local-workstation.md
  • docs/about/installation.md
  • docs/about/quick-start.md
  • docs/about/clusters.md
  • docs/cluster.md
🧰 Additional context used
📓 Path-based instructions (2)
docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section

Files:

  • docs/guides/dpo.md
  • docs/guides/grpo.md
  • docs/debugging.md
  • docs/get-started/installation.md
  • docs/get-started/cluster.md
  • docs/get-started/dpo.md
  • docs/get-started/grpo.md
  • docs/guides/rm.md
  • docs/guides/sft.md
  • docs/get-started/sft.md
  • docs/index.md
  • docs/get-started/index.md
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • docs/guides/dpo.md
  • docs/guides/grpo.md
  • docs/debugging.md
  • docs/get-started/installation.md
  • docs/get-started/cluster.md
  • docs/get-started/dpo.md
  • docs/get-started/grpo.md
  • docs/guides/rm.md
  • docs/guides/sft.md
  • docs/get-started/sft.md
  • docs/index.md
  • docs/get-started/index.md
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to docs/**/*.md : Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section
📚 Learning: 2025-09-18T14:57:31.003Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: nemo_rl/algorithms/distillation.py:312-354
Timestamp: 2025-09-18T14:57:31.003Z
Learning: The distillation algorithm's cluster setup logic is designed to follow the same patterns used in GRPO for handling distributed training clusters and resource allocation.

Applied to files:

  • docs/get-started/grpo.md
  • docs/get-started/index.md
🪛 LanguageTool
docs/get-started/cluster.md

[grammar] ~71-~71: Ensure spelling is correct
Context: ...files, and kill/restart jobs without re-queueing. 1. Submit the Request Ask for the...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~183-~183: Consider a different adjective to strengthen your wording.
Context: ...ation The following variables allow for deeper customization of the Ray cluster. Most ...

(DEEP_PROFOUND)

🪛 markdownlint-cli2 (0.18.1)
docs/get-started/cluster.md

203-203: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe

(MD055, table-pipe-style)


203-203: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe

(MD055, table-pipe-style)


203-203: Table column count
Expected: 2; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)

docs/get-started/index.md

123-123: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


139-139: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


148-148: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


168-168: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


172-172: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Docs_Tests
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (14)
docs/guides/dpo.md (1)

9-9: Consistency check on cluster documentation reference.

This file now points to ../get-started/cluster.md, consistent with the restructuring. Ensure this path is correct and the target file exists.

docs/guides/rm.md (1)

7-7: Consistency check on cluster documentation reference.

This file now points to ../get-started/cluster.md, consistent with the documentation restructuring across other guides. Verify the target file exists at the new location.

docs/guides/grpo.md (1)

7-7: Consistency check on cluster documentation reference.

This file now points to ../get-started/cluster.md, consistent with the documentation restructuring. Verify the target file exists.

docs/get-started/dpo.md (3)

153-153: Relative path to cluster.md is correct.

The file docs/get-started/cluster.md exists and the reference [Cluster Setup](cluster.md) on line 153 is a valid relative link to the sibling file.


10-10: Verify that downstream documentation files reference this cross-reference target.

The MyST syntax (gs-dpo)= is correctly formatted to create a referenceable target that downstream files can link to using {ref}\gs-dpo``. Confirm that files referencing this target exist and resolve correctly in documentation builds.


1-8: Verify this file is registered in docs/index.md.

As per the coding guideline, all new markdown files under docs/**/*.md must be added to the documentation index. Confirm that this file appears in the appropriate section of docs/index.md.

docs/get-started/sft.md (1)

153-153: The relative path reference is correct. The file docs/get-started/cluster.md exists and the link [Cluster Setup](cluster.md) on line 153 resolves properly to the sibling file.

docs/debugging.md (1)

12-12: The target section get-started/cluster.md#2-submit-a-job exists at line 53 with the heading "## 2. Submit a Job". The link is correctly configured and points to a valid section describing the job submission workflow.

docs/get-started/installation.md (1)

1-8: Ensure this file is registered in docs/index.md.

Per the coding guidelines, new markdown files under docs/**/*.md must be added to the documentation index. This file should appear in the appropriate section of docs/index.md (likely under "Getting Started" or similar).

docs/guides/sft.md (1)

7-7: The link update is correct and the target file exists.

The relative path ../get-started/cluster.md from docs/guides/sft.md correctly resolves to docs/get-started/cluster.md, which exists and contains the relevant cluster setup information. The documentation index (docs/index.md) already includes proper references to guides/sft.md, so no additional updates are needed.

docs/get-started/grpo.md (1)

1-146: Well-structured GRPO quickstart guide.

The content is well-organized with clear progression from prerequisites through data preparation, configuration, execution, monitoring, and scaling. Code examples are properly formatted with appropriate language syntax highlighting. Cross-references to related documentation (cluster setup and advanced GRPO guide) are correct.

docs/index.md (3)

202-213: Get Started section properly registered in documentation index.

The new get-started documents are correctly added to the main docs/index.md toctree with appropriate maxdepth and organization. The structure aligns with the coding guideline requirement to update the index when new markdown docs are added under docs/**/*.md. All six new get-started pages are registered and linked in the Quickstarts grid section.

Per coding guidelines, this satisfies the requirement: "Update docs/index.md when a new markdown doc is added under docs/**/*.md".


59-98: Documentation navigation structure is well-organized.

The restructured Quickstarts section (lines 59-98) with grid cards clearly guides users to the new get-started pages. The section properly links to:

  • Installation & Quickstart (get-started/index.md)
  • GRPO Quickstart (get-started/grpo.md)
  • SFT Quickstart (get-started/sft.md)
  • DPO Quickstart (get-started/dpo.md)

This aligns well with the PR objective to create a dedicated Get Started section with scoped articles.


184-287: All referenced files in the toctree sections exist and are correctly located. The cross-references are valid with no stale or missing file references detected.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lbliii can you git mv the source files so that github can show us the diff? it's hard to tell what changes were made

ex:

git mv docs/cluster.md docs/get-started/cluster.md

can you do that to all the markdown files that were moved and then modified? it will make reviewing much easier

… links

- Remove redundant cluster redirect and local-workstation files
- Add algorithm quickstart guides for DPO, GRPO, and SFT
- Restructure and enhance cluster, installation, and quickstart content
- Update index, debugging, and guide links for new structure
- Restore missing sbatch flags, env vars, and auth setup

Signed-off-by: Lawrence Lane <[email protected]>
Signed-off-by: Lawrence Lane <[email protected]>
@lbliii
Copy link
Contributor Author

lbliii commented Dec 4, 2025

@lbliii can you git mv the source files so that github can show us the diff? it's hard to tell what changes were made

ex:

git mv docs/cluster.md docs/get-started/cluster.md

can you do that to all the markdown files that were moved and then modified? it will make reviewing much easier

Not sure if I did this correctly, but:

Commit What Reviewer Sees
80f2fe5 Clean renames (click here first)
87ab66f initial content changes
5122545 sbatch syntax fix
955c4a3 job-name placeholder fix

if i need to regenerate the branch/PR in a different way, let me know.

Signed-off-by: Lawrence Lane <[email protected]>
Signed-off-by: Lawrence Lane <[email protected]>
Signed-off-by: Lawrence Lane <[email protected]>
@shashank3959
Copy link
Contributor

do you think it makes sense to also link the in-depth installation from the index page?

@shashank3959
Copy link
Contributor

Good to add docker commands as another installation option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants