Skip to content

MedARC-AI/med-lm-train

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

med-lm-train

Setup + Installation

  1. Clone the repository
git clone --recurse-submodules --shallow-submodules --depth 50 https://github.com/MedARC-AI/med-lm-train.git
cd med-lm-train

Or if you already have the repo cloned without submodules:

git submodule update --init --recursive --depth 50
  1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
  1. Install dependencies
uv sync

To install the bundled PRIME-RL environment packages used by some examples:

uv sync --extra envs

For flash attention support:

uv sync --extra fa2   # flash-attn 2
uv sync --extra fa3   # flash-attn 2 + 3 (use for H100s)
uv sync --extra fa4   # flash-attn 2, 3, & 4 (use for B200s)

Legacy extra names flash-attn-2, flash-attn-3, and flash-attn-4 remain supported for backward compatibility.

And to install both env and flash attention extras:

uv sync --extra envs --extra fa3

medarc_slurm

medarc_slurm is a CLI tool that generates and submits single-node SLURM jobs for PRIME-RL SFT and RL training. It is based on PRIME-RL's built-in rl_slurm and sft_slurm commands but adapted for shared-node environments where jobs don't neccesarily have exclusive access to the machine.

# SFT: single torchrun job
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2

# RL: splits GPUs between vLLM inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 1 --infer-gpus 2

# RL: share a single GPU between inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --single-gpu

# SFT: low-priority queue + email notifications + resume from latest checkpoint
medarc_slurm sft --config config.toml \
  --output-dir runs/my-sft \
  --gpus 2 \
  --priority low \
  --mail all \
  --mail-user email@domain.com \
  --slurm-resume

# Validate an RL submission (including dependency syntax) without creating a job
medarc_slurm rl --config config.toml \
  --output-dir runs/my-rl \
  --train-gpus 1 \
  --infer-gpus 2 \
  --dependency afterok:123456 \
  --test-only

Generated artifacts are written to --output-dir:

  • sft.sh or rl.sh — the SLURM batch script
  • configs/ — resolved TOML subconfigs passed to each component

You can pass PRIME-RL config overrides directly as extra flags (for example --wandb.project my-proj --wandb.name my-run). You may also insert -- before passthrough overrides for readability, but it is optional. To layer multiple PRIME-RL configs, repeat --config with later files overriding earlier ones.

medarc_slurm now defaults --account to training. You can override it with --account <name>. Email mode is --mail all or --mail begin_end (with --mail-user). Use --dependency "<expr>" to pass SLURM dependencies and --test-only to run sbatch validation without submitting.

Job resumption

When --slurm-resume is enabled, SLURM will automatically requeue preempted jobs. This sets PRIME-RL's ckpt.resume_step=-1 which tells PRIME-RL to resume from the latest checkpoin.

If not set, a preempted job will be canceled and will require manual resubmission.

Job priority

Priority / QoS

Use --priority to set the SLURM priority level (QoS) for the job. Priority controls preemption: jobs at a higher QoS value can preempt jobs with a lower QoS value. The two tiers are available to MedARC members:

Value When to use
normal Normal scheduling; can preempt low jobs. Default when unset.
low Background or exploratory runs; can be preempted by normal jobs
# Normal priority (default)
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2

# Low priority — can be preempted by normal-priority jobs
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --priority low

Fine-tuning with --nice

Within a QoS tier, pass --nice <value> to further adjust scheduling order. Higher values yield more to other jobs at the same QoS level. While preemption is triggered by QoS, nice does influence it in two ways:

  • Which jobs get preempted first: when a higher-QoS job needs resources, SLURM preferentially preempts jobs with higher nice values.
  • Requeue order after preemption: preempted jobs are requeued, and those with higher nice values are scheduled later than those with lower nice values.
# Yield to other low-priority jobs in the queue
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --priority low --nice 100

# Long sweep — deprioritize as much as possible within the low tier and resume if preempted
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 2 --infer-gpus 2 --priority low --nice 200 --slurm-resume

Suggested --nice values:

Value When to use
0 (default) No adjustment; scheduled normally within the QoS tier
100 Yield to other jobs at the same QoS level
200 Long-running sweeps or archival jobs that should rarely run ahead of others

Job resumption with --slurm-resume

Pass --slurm-resume to automatically resume training from the latest checkpoint when a job is requeued (e.g. after preemption or node failure). This sets PRIME-RL's ckpt.resume_step=-1, which tells the trainer to discover and load the most recent checkpoint in the output directory.

# SFT with automatic checkpoint resumption
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2 --slurm-resume

# RL with automatic checkpoint resumption
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 2 --infer-gpus 2 --slurm-resume

For local training via medarc_train, the equivalent flag is --resume:

medarc_train sft --config config.toml --output-dir runs/my-sft --resume

Run medarc_slurm sft --help or medarc_slurm rl --help for more details on available options.

Examples

Each example has its own README with setup instructions, SFT/RL commands, and eval steps:

Example GPUs Description
reverse_text 1 (shared) Single-GPU SFT + RL on a toy text reversal task
hendrycks_sanity 4 Multi-GPU RL on Hendrycks MATH (sanity subset)
alphabet_sort 8 Full-node RL on alphabet sorting

All examples use medarc_slurm to generate and submit single-node SLURM jobs. Start with reverse_text to verify your setup. Examples that rely on PRIME-RL environment packages require installing the envs extra first: uv sync --extra envs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors