- Clone the repository
git clone --recurse-submodules --shallow-submodules --depth 50 https://github.com/MedARC-AI/med-lm-train.git
cd med-lm-trainOr if you already have the repo cloned without submodules:
git submodule update --init --recursive --depth 50- Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env- Install dependencies
uv syncTo install the bundled PRIME-RL environment packages used by some examples:
uv sync --extra envsFor flash attention support:
uv sync --extra fa2 # flash-attn 2
uv sync --extra fa3 # flash-attn 2 + 3 (use for H100s)
uv sync --extra fa4 # flash-attn 2, 3, & 4 (use for B200s)Legacy extra names flash-attn-2, flash-attn-3, and flash-attn-4 remain supported for backward compatibility.
And to install both env and flash attention extras:
uv sync --extra envs --extra fa3medarc_slurm is a CLI tool that generates and submits single-node SLURM jobs for PRIME-RL SFT and RL training. It is based on PRIME-RL's built-in rl_slurm and sft_slurm commands but adapted for shared-node environments where jobs don't neccesarily have exclusive access to the machine.
# SFT: single torchrun job
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2
# RL: splits GPUs between vLLM inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 1 --infer-gpus 2
# RL: share a single GPU between inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --single-gpu
# SFT: low-priority queue + email notifications + resume from latest checkpoint
medarc_slurm sft --config config.toml \
--output-dir runs/my-sft \
--gpus 2 \
--priority low \
--mail all \
--mail-user email@domain.com \
--slurm-resume
# Validate an RL submission (including dependency syntax) without creating a job
medarc_slurm rl --config config.toml \
--output-dir runs/my-rl \
--train-gpus 1 \
--infer-gpus 2 \
--dependency afterok:123456 \
--test-onlyGenerated artifacts are written to --output-dir:
sft.shorrl.sh— the SLURM batch scriptconfigs/— resolved TOML subconfigs passed to each component
You can pass PRIME-RL config overrides directly as extra flags (for example --wandb.project my-proj --wandb.name my-run). You may also insert -- before passthrough overrides for readability, but it is optional. To layer multiple PRIME-RL configs, repeat --config with later files overriding earlier ones.
medarc_slurm now defaults --account to training. You can override it with --account <name>.
Email mode is --mail all or --mail begin_end (with --mail-user).
Use --dependency "<expr>" to pass SLURM dependencies and --test-only to run sbatch validation without submitting.
When --slurm-resume is enabled, SLURM will automatically requeue preempted jobs. This sets PRIME-RL's ckpt.resume_step=-1 which tells PRIME-RL to resume from the latest checkpoin.
If not set, a preempted job will be canceled and will require manual resubmission.
Use --priority to set the SLURM priority level (QoS) for the job. Priority controls preemption: jobs at a higher QoS value can preempt jobs with a lower QoS value. The two tiers are available to MedARC members:
| Value | When to use |
|---|---|
normal |
Normal scheduling; can preempt low jobs. Default when unset. |
low |
Background or exploratory runs; can be preempted by normal jobs |
# Normal priority (default)
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2
# Low priority — can be preempted by normal-priority jobs
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --priority lowWithin a QoS tier, pass --nice <value> to further adjust scheduling order. Higher values yield more to other jobs at the same QoS level. While preemption is triggered by QoS, nice does influence it in two ways:
- Which jobs get preempted first: when a higher-QoS job needs resources, SLURM preferentially preempts jobs with higher nice values.
- Requeue order after preemption: preempted jobs are requeued, and those with higher nice values are scheduled later than those with lower nice values.
# Yield to other low-priority jobs in the queue
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --priority low --nice 100
# Long sweep — deprioritize as much as possible within the low tier and resume if preempted
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 2 --infer-gpus 2 --priority low --nice 200 --slurm-resumeSuggested --nice values:
| Value | When to use |
|---|---|
0 (default) |
No adjustment; scheduled normally within the QoS tier |
100 |
Yield to other jobs at the same QoS level |
200 |
Long-running sweeps or archival jobs that should rarely run ahead of others |
Pass --slurm-resume to automatically resume training from the latest checkpoint when a job is requeued (e.g. after preemption or node failure). This sets PRIME-RL's ckpt.resume_step=-1, which tells the trainer to discover and load the most recent checkpoint in the output directory.
# SFT with automatic checkpoint resumption
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --slurm-resume
# RL with automatic checkpoint resumption
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 2 --infer-gpus 2 --slurm-resumeFor local training via medarc_train, the equivalent flag is --resume:
medarc_train sft --config config.toml --output-dir runs/my-sft --resumeRun medarc_slurm sft --help or medarc_slurm rl --help for more details on available options.
Each example has its own README with setup instructions, SFT/RL commands, and eval steps:
| Example | GPUs | Description |
|---|---|---|
| reverse_text | 1 (shared) | Single-GPU SFT + RL on a toy text reversal task |
| hendrycks_sanity | 4 | Multi-GPU RL on Hendrycks MATH (sanity subset) |
| alphabet_sort | 8 | Full-node RL on alphabet sorting |
All examples use medarc_slurm to generate and submit single-node SLURM jobs. Start with reverse_text to verify your setup.
Examples that rely on PRIME-RL environment packages require installing the envs extra first: uv sync --extra envs.