med-lm-train

Setup + Installation

Clone the repository

git clone --recurse-submodules --shallow-submodules --depth 50 https://github.com/MedARC-AI/med-lm-train.git
cd med-lm-train

Or if you already have the repo cloned without submodules:

git submodule update --init --recursive --depth 50

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Install dependencies

uv sync

To install the bundled PRIME-RL environment packages used by some examples:

uv sync --extra envs

For flash attention support:

uv sync --extra fa2   # flash-attn 2
uv sync --extra fa3   # flash-attn 2 + 3 (use for H100s)
uv sync --extra fa4   # flash-attn 2, 3, & 4 (use for B200s)

Legacy extra names flash-attn-2, flash-attn-3, and flash-attn-4 remain supported for backward compatibility.

And to install both env and flash attention extras:

uv sync --extra envs --extra fa3

medarc_slurm

medarc_slurm is a CLI tool that generates and submits single-node SLURM jobs for PRIME-RL SFT and RL training. It is based on PRIME-RL's built-in rl_slurm and sft_slurm commands but adapted for shared-node environments where jobs don't neccesarily have exclusive access to the machine.

# SFT: single torchrun job
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2

# RL: splits GPUs between vLLM inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 1 --infer-gpus 2

# RL: share a single GPU between inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --single-gpu

# SFT: low-priority queue + email notifications + resume from latest checkpoint
medarc_slurm sft --config config.toml \
  --output-dir runs/my-sft \
  --gpus 2 \
  --priority low \
  --mail all \
  --mail-user email@domain.com \
  --slurm-resume

# Validate an RL submission (including dependency syntax) without creating a job
medarc_slurm rl --config config.toml \
  --output-dir runs/my-rl \
  --train-gpus 1 \
  --infer-gpus 2 \
  --dependency afterok:123456 \
  --test-only

Generated artifacts are written to --output-dir:

sft.sh or rl.sh — the SLURM batch script
configs/ — resolved TOML subconfigs passed to each component

You can pass PRIME-RL config overrides directly as extra flags (for example --wandb.project my-proj --wandb.name my-run). You may also insert -- before passthrough overrides for readability, but it is optional. To layer multiple PRIME-RL configs, repeat --config with later files overriding earlier ones.

medarc_slurm now defaults --account to training. You can override it with --account <name>. Email mode is --mail all or --mail begin_end (with --mail-user). Use --dependency "<expr>" to pass SLURM dependencies and --test-only to run sbatch validation without submitting.

Job resumption

When --slurm-resume is enabled, SLURM will automatically requeue preempted jobs. This sets PRIME-RL's ckpt.resume_step=-1 which tells PRIME-RL to resume from the latest checkpoin.

If not set, a preempted job will be canceled and will require manual resubmission.

Job priority

Priority / QoS

Use --priority to set the SLURM priority level (QoS) for the job. Priority controls preemption: jobs at a higher QoS value can preempt jobs with a lower QoS value. The two tiers are available to MedARC members:

Value	When to use
`normal`	Normal scheduling; can preempt `low` jobs. Default when unset.
`low`	Background or exploratory runs; can be preempted by normal jobs

# Normal priority (default)
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2

# Low priority — can be preempted by normal-priority jobs
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --priority low

Fine-tuning with `--nice`

Within a QoS tier, pass --nice <value> to further adjust scheduling order. Higher values yield more to other jobs at the same QoS level. While preemption is triggered by QoS, nice does influence it in two ways:

Which jobs get preempted first: when a higher-QoS job needs resources, SLURM preferentially preempts jobs with higher nice values.
Requeue order after preemption: preempted jobs are requeued, and those with higher nice values are scheduled later than those with lower nice values.

# Yield to other low-priority jobs in the queue
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --priority low --nice 100

# Long sweep — deprioritize as much as possible within the low tier and resume if preempted
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 2 --infer-gpus 2 --priority low --nice 200 --slurm-resume

Suggested --nice values:

Value	When to use
`0` (default)	No adjustment; scheduled normally within the QoS tier
`100`	Yield to other jobs at the same QoS level
`200`	Long-running sweeps or archival jobs that should rarely run ahead of others

Job resumption with `--slurm-resume`

Pass --slurm-resume to automatically resume training from the latest checkpoint when a job is requeued (e.g. after preemption or node failure). This sets PRIME-RL's ckpt.resume_step=-1, which tells the trainer to discover and load the most recent checkpoint in the output directory.

# SFT with automatic checkpoint resumption
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --slurm-resume

# RL with automatic checkpoint resumption
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 2 --infer-gpus 2 --slurm-resume

For local training via medarc_train, the equivalent flag is --resume:

medarc_train sft --config config.toml --output-dir runs/my-sft --resume

Run medarc_slurm sft --help or medarc_slurm rl --help for more details on available options.

Examples

Each example has its own README with setup instructions, SFT/RL commands, and eval steps:

Example	GPUs	Description
reverse_text	1 (shared)	Single-GPU SFT + RL on a toy text reversal task
hendrycks_sanity	4	Multi-GPU RL on Hendrycks MATH (sanity subset)
alphabet_sort	8	Full-node RL on alphabet sorting

All examples use medarc_slurm to generate and submit single-node SLURM jobs. Start with reverse_text to verify your setup. Examples that rely on PRIME-RL environment packages require installing the envs extra first: uv sync --extra envs.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
examples		examples
medarc_rl		medarc_rl
prime-rl @ 63331ad		prime-rl @ 63331ad
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

med-lm-train

Setup + Installation

medarc_slurm

Job resumption

Job priority

Priority / QoS

Fine-tuning with `--nice`

Job resumption with `--slurm-resume`

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

med-lm-train

Setup + Installation

medarc_slurm

Job resumption

Job priority

Priority / QoS

Fine-tuning with --nice

Job resumption with --slurm-resume

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Fine-tuning with `--nice`

Job resumption with `--slurm-resume`

Packages