Releases: oumi-ai/oumi
v0.4.0
Oumi v0.4 Changelog
✨ gpt-oss Training and Inference
OpenAI released two highly-anticipated open-weight models in August, gpt-oss-20b
and gpt-oss-120b
. They’re mixture-of-experts (MoE) reasoning models with strong tool-use performance, and are optimized with native 4-bit quantization for memory-efficient training and inference. You can now run training and inference on these models in Oumi!
Usage Example:
# Train gpt-oss-20b with LoRA on a single GPU
oumi train -c oumi://configs/recipes/gpt_oss/sft/20b_lora_single_gpu_train.yaml
# Run local inference on gpt-oss-120b using vLLM
oumi infer -i -c oumi://configs/recipes/gpt_oss/inference/120b_vllm_infer.yaml
⚡ DeepSpeed Support
DeepSpeed is a powerful and configurable optimization library that allows you to train large models efficiently using techniques like distributed training and memory optimization. Oumi now supports DeepSpeed in addition to PyTorch’s native Fully Sharded Data Parallel (FSDP) for distributed training!
Usage Example:
# Train Llama 3.1 8B using DeepSpeed’s ZeRO-3 optimization strategy
oumi train -c oumi://configs/examples/deepspeed/llama3_1_8b_deepspeed_z3_train.yaml
# Combine DeepSpeed with YARN RoPE scaling to enable training on longer contexts!
# Train Qwen2.5 7B with 128k token context length using YARN and DeepSpeed
oumi train -c oumi://configs/projects/limo/qwen2.5_7b_fft_yarn_deepspeed.yaml
🗄️ CLI Tool for Hugging Face Cache Management
When using datasets and models from Hugging Face Hub, over time it becomes hard to track what’s been cached, how much space it’s taking up, etc. In #1897, @aniruddh-alt has added a oumi cache
utility to the Oumi CLI. This lets you view, add to, and delete from the Hugging Face Hub local cache, in addition to getting more information about entries in the cache.
Usage Example:
# View what’s in the cache
oumi cache ls
# Filter for items containing the substring “llama”, and sort by name
oumi cache ls -f *llama* --sort name
# Download a model to cache
oumi cache get Qwen/Qwen3-0.6B
# View information about the cached model
oumi cache card Qwen/Qwen3-0.6B
# Remove a model from cache
oumi cache rm Qwen/Qwen3-0.6B
🎯 Vision DPO and KTO Support
We have added support for two new training methods: Direct Preference Optimization (DPO) on Vision-Language Models and Kahneman-Tversky Optimization (KTO). Special thanks to @efsiatras for implementing KTO support in #1538!
Usage Example:
# Vision DPO on Qwen2.5-VL 3B
oumi train -c oumi://configs/recipes/vision/qwen2_5_vl_3b/dpo/train.yaml
# KTO on Phi-3
oumi train -c oumi://configs/recipes/phi3/kto/train.yaml
🛠️ Developer Experience
- Upgrade several package dependencies to latest versions
- Additional GGUF, MacOS LlamaCPP, and remote frontier model inference configs by @penfever in #1923 and #1947
- Add Pre-Populated GitHub Issue Link On Failures by @rlehman221 in #1936
- Add Verbose Flag to Oumi Train by @rlehman221 in #1940
- Enable users to log data samples during training for debugging by @shanghongsim in #1943
New Contributors
- @efsiatras made their first contribution in #1538
- @rlehman221 made their first contribution in #1936
All Contributors
@aniruddh-alt, @efsiatras, @jgreer013, @kaisopos, @oelachqar, @penfever, @rlehman221, @ryan-arman, @shanghongsim, @stefanwebb, @taenin, @wizeng23
Full Changelog: v0.3.0...v0.4.0
v0.3.0
Oumi v0.3 Changelog
🔧 Model Quantization (NEW)
Quantization is a crucially important family of methods for reducing model size, for example, prior to deployment. Oumi now supports applying Activation-aware Weight Quantization (AWQ) to all models. See how in our notebook.
Usage Example:
# Quick start - quantize TinyLlama to 4-bit
oumi quantize --method awq_q4_0 --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --output quantized_model
# With configuration file
oumi quantize --config quantization_config.yaml
⚖️ Judge API V2 (MAJOR UPDATE)
LLM-as-a-Judge is a method for using foundation models to reliably evaluate other foundation models. We’ve overhauled Oumi’s LLM-as-Judge interface for ease-of-use and flexibility. Check out our notebook here.
Usage Example:
from oumi.judges.simple_judge import SimpleJudge
# Built-in truthfulness judge
simple_judge = SimpleJudge(judge_config="oumi://configs/projects/judges/generic/truthfulness.yaml")
dataset = [{"request": "What is the capital of France?", "response": "Rome"}]
outputs = simple_judge.judge(dataset)
🎯 Adaptive Inference (NEW)
💪 Adaptive Inference, as we term it, refers to new features in Oumi for resuming training (or any task) when a job has crashed, as well as optimizing inference parallelization to maximize bandwidth. Learn more in our notebook.
🛠️ Developer Experience
- Updated contributing guidelines
- Enhanced documentation
- Tutorial notebook fixes
- Improved error handling and testing
- MLflow integration improvements
- Multi-node verl Slurm job support
- Rich logging handler option
New Contributors
Full Changelog: v0.2.1...v0.3.0
v0.2.1
What's Changed
- Set infer_online and infer_from_file to private by @jgreer013 in #1745
- Update launch.md by @shanghongsim in #1781
- Add adaptive semaphore to enable future adaptive throughput scenarios by @jgreer013 in #1780
- Fix a pyright regression by @taenin in #1783
- Judge API V2 | Fix judge config from repo path by @kaisopos in #1782
- Add permutable attributes and combination sampling for data synthesis by @jgreer013 in #1773
- Removed collator in finetuning tutorial notebook by @shanghongsim in #1788
- Update our contributing guidelines. by @taenin in #1789
- Add adaptive concurrency controller in preparation for adaptive inference by @jgreer013 in #1784
- Fixed issue with final conversations not consistently being saved by @jgreer013 in #1795
- Add support for ingesting datasets for synthesis by @jgreer013 in #1790
- Add support for adaptive inference by @jgreer013 in #1791
- Add support for Example Sources in Synthesis by @jgreer013 in #1797
- Webinar announcement and other news by @stefanwebb in #1800
- Added utm_source parameters by @stefanwebb in #1802
- Add code to handle document ingestion by @jgreer013 in #1796
- Add code for handling basic document segmentation by @jgreer013 in #1803
- Update mflow support in oumi trainer by @oelachqar in #1804
- Add multi-node verl SLURM job by @wizeng23 in #1798
- Fixed various tutorial notebooks by @shanghongsim in #1792
- Add parameter logging to oumi trainer by @oelachqar in #1807
- Judge API V2 | Enable prompt variable replacement by YAML by @kaisopos in #1805
- [tiny] Update train config comment header by @wizeng23 in #1809
- Add experimental option to use the rich logging handler by @oelachqar in #1810
New Contributors
- @shanghongsim made their first contribution in #1781
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Highlights
GRPO support for trl and verl trainers
Oumi now supports GRPO training for both the trl and verl libraries! This allows you to run GRPO training with no/low code using Oumi's configs. You can also benefit from other features of the Oumi platform, such as custom evaluation and launching remote jobs.
Running GRPO training in Oumi is as simple as:
- Create a reward function, and register it to Oumi's reward function registry using
@register("<my_reward_fn>", RegistryType.REWARD_FUNCTION)
. - Create a dataset class to process your HF dataset into the format needed for your target framework, and register it to Oumi's dataset registry using
@register_dataset("@hf-org-name/my-dataset-name")
. - Create an Oumi training config with your model, dataset, reward function, and hyperparameters. For specific details on setting up the config for GRPO, see our documentation.
- Launch the training job locally using the oumi train CLI, or launch a remote job using the oumi launch CLI.
For an end-to-end example using Oumi + trl, check out our notebook walkthrough. For verl, check out our multi-modal Geometry3K config. Finally, check out our blog post for more information.
Models built with Oumi: HallOumi and CoALM
We’re proud to announce the release of two models built with Oumi: HallOumi and CoALM! Both of these were trained on Oumi, and we provide recipes to reproduce their training from scratch.
- 🧀 HallOumi: A truly open-source claim verification (hallucination detection) model developed by Oumi, outperforming Claude Sonnet, OpenAI o1, DeepSeek R1, Llama 405B, and Gemini Pro at only 8B parameters. Check out the Oumi recipe to train the model here.
- 🤖 CoALM: Conversational Agentic Language Model (CoALM) is a a unified approach that integrates both conversational and agentic capabilities. It includes an instruction tuning dataset and three trained models (8B, 70B, 405B). The project was a partnership between the ConvAI Lab at UIUC and Oumi, and the paper was accepted to ACL. Check out the Oumi recipes to train the models here.
New model support: Llama 4, Qwen3, Falcon H1, and more
We’ve added support for many recent models to Oumi, with tested recipes that work out-of-the-box!
- Vision Language Models
- Text-to-text LLMs
Support for Slurm and Frontier clusters
At Oumi, we want unify and simplify the processes for running jobs on remote clusters. We have now added support for launching jobs on Slurm clusters, and on Frontier, a supercomputer at the Oak Ridge Leadership Computing Facility.
What's Changed
- [bugfix] Allow prerelease when building docker image by @oelachqar in #1753
- Update link to Oumi banner image in README by @wizeng23 in #1752
- docs: add a badge and link to the social network Twitter by @Radovenchyk in #1751
- Support OLCF (Oak Ridge Leadership Computing Facility) Frontier HPC cluster in Oumi launcher by @nikg4 in #1721
- Judge API V2 | Core Functionality by @kaisopos in #1717
- Update
oumi distributed torchrun
to fallback tooumi train -c cfg.yaml ....
on a single-node with 1 GPU by @nikg4 in #1755 - deps: Upgrade verl to 0.4.0 by @wizeng23 in #1749
- add DCVLR logo to readme by @penfever in #1754
- Judge API V2 | Few-Shots by @kaisopos in #1746
- Update infer.md to fix a broken link by @ryan-arman in #1756
- Judge API V2 | minor nit by @kaisopos in #1757
- [Evaluation] Disabling flaky MMMU test by @kaisopos in #1758
- Automatically tail SkyPilot logs by @wizeng23 in #1761
- Enable vLLM for trl GRPO jobs by @wizeng23 in #1760
- Judge API V2 | Implement CLI by @kaisopos in #1759
- Updates to Oumi news for May, June by @stefanwebb in #1763
- Additional news items by @stefanwebb in #1764
- Judge API V2 | Support for built-in judges by @kaisopos in #1762
- [bug] safetensors v0.6.0rc0 is causing a regression, prevent upgrading by @oelachqar in #1772
- [verl] Support resuming from checkpoint by @wizeng23 in #1766
- Upgrade accelerate and peft by @wizeng23 in #1774
- [tiny] Pin flash-attn version by @wizeng23 in #1775
- Pin the version of lm_eval to prevent a breaking change in the 4.9 release by @taenin in #1777
- Update inference to resume from temporary result file when possible by @jgreer013 in #1734
- [tiny] Fix gradient checkpointing for Oumi trainer by @wizeng23 in #1778
- [tiny] Remove
use_liger
argument by @wizeng23 in #1779 - Judge API V2 | Merge Judge and Inference configs by @kaisopos in #1776
Full Changelog: v0.1.14...v0.2.0
v0.1.14
What's Changed
- Record latency histograms in base inference engine by @nikg4 in #1702
- Feat: add falcon-e integration by @younesbelkada in #1705
- [tiny] Minor update to fix the failing pre-commit checks by @oelachqar in #1707
- Add collator kwargs field to DataParams by @oelachqar in #1708
- [vision] Add option to process images individually by @oelachqar in #1706
- Update dev_setup.md to correct the order of steps by @ryan-arman in #1709
- Add configs for molmo support by @oelachqar in #1710
- [tiny] fix pre-commits checks on a fresh install by @oelachqar in #1711
- Add config for the Molmo O variant by @oelachqar in #1712
- Add experimental molmo grpo config and train aliases by @oelachqar in #1713
- Update installation.md to fix subversion handling by adding required … by @ryan-arman in #1715
- Frontier: Fix -n param in launcher script by @nikg4 in #1720
- Fix Falcon H1 dependency setup by @wizeng23 in #1723
- letter count notebook improvements by @penfever in #1697
- [vision] Update vision feature generator to support training on completions only by @oelachqar in #1722
- [tiny] fix bug with vl collator by @oelachqar in #1725
- Add data synthesis config, params, and unit tests by @jgreer013 in #1700
- Add support for additional exception types for remote inference engine, as well as fast failing for non-retryable status codes. by @jgreer013 in #1704
- Adds DPO + QLoRA example for Falcon-H1 by @stefanwebb in #1719
- Update inference to always write intermediate results to file. by @jgreer013 in #1724
- Added doc for new QLoRA param by @stefanwebb in #1727
- Readme for Falcon-E and note on extra dependencies required by @stefanwebb in #1729
- Add generic vision dataset by @oelachqar in #1726
- [tiny][bug] make git cmd optional by @oelachqar in #1730
- [tiny][bug] Add missing molmo feature by @oelachqar in #1731
- [tiny] Update phi3-vision configs to use oumi trainer by @oelachqar in #1733
- Minor bugfixes for 2 clouds in launcher code by @nikg4 in #1728
- Update dev_setup.md to add additional instructions by @ryan-arman in #1736
- Update trl to 0.18 by @wizeng23 in #1693
- Update Verl trainer to export models in HF format by @nikg4 in #1714
- Add lmms-lab/multimodal-open-r1-8k-verified dataset by @oelachqar in #1732
- Add placeholders for DCVLR by @oelachqar in #1738
- add debug logging capabilities to collators by @aniruddh-alt in #1678
- [bug] update trainer to save processor when training with fsdp by @oelachqar in #1742
- Add model revision param by @oelachqar in #1740
- Add ability to customize HF model config via model.model_kwargs by @oelachqar in #1741
- Add docker release workflow by @oelachqar in #1743
- [bug] fix rank/local rank parsing for docker env by @oelachqar in #1747
- deps: Update vLLM to 0.8.3 by @wizeng23 in #1739
- [docs] update dcvlr readme by @oelachqar in #1748
- Dcvlr by @penfever in #1750
New Contributors
- @younesbelkada made their first contribution in #1705
- @ryan-arman made their first contribution in #1709
- @stefanwebb made their first contribution in #1719
- @aniruddh-alt made their first contribution in #1678
Full Changelog: v0.1.13...v0.1.14
v0.1.13
What's Changed
- Update dev_setup.md by @wizeng23 in #1641
- [tiny] Remove vllm install commands by @wizeng23 in #1643
- Support for custom
processor args
: misc improvements by @nikg4 in #1642 - Add Countdown dataset and reward function by @wizeng23 in #1645
- Adding LoRA train config for Qwen-VL 2.0 by @optas in #1637
- [Evaluation] Convenience function for standard config retrieval by @kaisopos in #1644
- Add demo script by @oelachqar in #1647
- [bug] fix build errors by @oelachqar in #1649
- Adding LoRA train config for SmolVLM by @optas in #1639
- [tiny] Update cli help shorthand by @oelachqar in #1648
- Oelachqar/update hooks by @oelachqar in #1650
- Add verl PPO trainer by @wizeng23 in #1646
- Fix a missing dependency in the verl trainer. by @taenin in #1651
- Integrate verl GRPO trainer into train script by @wizeng23 in #1652
- Update e2e tests to run on lambda by @wizeng23 in #1653
- Add Qwen3 32B configs by @wizeng23 in #1661
- Add Qwen3 30B A3B configs by @wizeng23 in #1665
- [verl] Populate verl config from Oumi config by @wizeng23 in #1659
- Provide option to configure
label_ignore_index
in training config by @nikg4 in #1666 - [Documentation] Custom Evaluations (PR 1-of-2) by @kaisopos in #1664
- InterVL-3.0 SFT with limited training capabilities by @optas in #1663
- Add verl GRPO Countdown configs by @wizeng23 in #1668
- Set explicit permissions for our test workflows. by @taenin in #1670
- Add support for repetition_penalty in GrpoParams by @REDDITARUN in #1654
- Fix broken tests due to precommit violations by @taenin in #1671
- [Documentation] Custom Evaluations (PR 2-of-2) by @kaisopos in #1669
- Migrate to
logger.warning
usage by @emmanuel-ferdman in #1673 - Update the Oumi launcher and e2e tests to support runpod. by @taenin in #1672
- Switch back to using GCP for e2e tests. by @taenin in #1675
- Mark an e2e test as is_lora by @taenin in #1676
- Add Phi4 reasoning plus configs by @wizeng23 in #1674
- Fix a test breakage caused by a new Click version (8.2.0) by @taenin in #1679
- chore: edited the link to the stars badge by @Radovenchyk in #1681
- Update verl GRPO countdown configs by @wizeng23 in #1682
- [very nit] center oumi logo in the cli by @oelachqar in #1683
- [tiny] Update training environments doc by @wizeng23 in #1686
- Add Geometry3K VLM dataset by @nikg4 in #1687
- Add
torchao
version topyproject.toml
by @nikg4 in #1688 - [Feature] Save evaluation config as YAML in output_dir #1546 by @asish-kun in #1680
- Create a script to calculate memory used during training by @wizeng23 in #1441
- Support VLM-s with VERL_GRPO trainer by @nikg4 in #1689
- docs: Add GRPO/verl documentation by @wizeng23 in #1690
- Update GRPO letter counting reward function and hparams for stability by @jgreer013 in #1692
- [GRPO] Update letter counting notebook by @wizeng23 in #1694
- Add Lambda Inference Engine by @oelachqar in #1695
- Basic shell script for launching jobs on OLCF Frontier HPC cluster by @nikg4 in #1691
- Add CoALM dataset class by @oelachqar in #1696
- Added exponential backoff and content-type error handling in remote inference engine by @abhiramvad in #1685
- Make SFT datasets usable with GRPO_TRL trainer by @nikg4 in #1698
- Implement Falcon H1 by @dhiaEddineRhaiem in #1699
- [tiny] Remove deprecated
use_async_dataset
from configs by @wizeng23 in #1701 - Add sample inference configs for
HuggingFaceTB/SmolVLM-Instruct
by @nikg4 in #1703
New Contributors
- @REDDITARUN made their first contribution in #1654
- @emmanuel-ferdman made their first contribution in #1673
- @Radovenchyk made their first contribution in #1681
- @asish-kun made their first contribution in #1680
- @abhiramvad made their first contribution in #1685
- @dhiaEddineRhaiem made their first contribution in #1699
Full Changelog: v0.1.12...v0.1.13
v0.1.12
What's Changed
- Add
vllm
togpu
optional dependencies by @wizeng23 in #1614 - [HallOumi] Update inference notebook by @wizeng23 in #1613
- Update llama4 GCP jobs for non-dev environments. by @taenin in #1621
- Update transformers to 4.51.0 by @wizeng23 in #1620
- Lazy load skypilot by @taenin in #1622
- Add additional_model_kwargs and additional_trainer_kwargs to train function by @hommayushi3 in #1624
- Added 3 Pixmo vision-language datasets by @jrwana in #1523
- [GRPO] Add notebook to demonstrate GRPO & evaluation for letter counting by @wizeng23 in #1625
- [Remote Inference] Update Default Params by @kaisopos in #1630
- Update trl to 0.16 by @wizeng23 in #1631
- Support custom
processor args
inModelParams
by @nikg4 in #1634 - Support BerryBench evaluation by @wizeng23 in #1635
- [Remote Inference] Error checking for
api_key
by @kaisopos in #1638 - Rename cnn_mnist_example to cnn_mnist_tutorial by @wizeng23 in #1640
- [Remote Inference][GCP] Constructing
api_url
from the Project ID and Region by @kaisopos in #1636
New Contributors
Full Changelog: v0.1.11...v0.1.12
v0.1.11
Oumi v0.1.11 Release Notes 🚀
Key Highlights
Model Support 🤖
- Integrated Llama 4 (Scout and Maverick variants) with complete workflow configs 🦙
- Added LoRA training for Phi3, Phi4, and Qwen2.5-VL multimodal models 🖼️
Developer Experience 💻
- Introduced MLflow integration for experiment tracking 📝
- Enhanced CLI with convenient alias functionality ⌨️
HallOumi Framework 🧠
- Added examples for Halloumi
- Added dedicated inference notebooks for both generative and classifier approaches 📓
Welcome to our new contributors @hommayushi3 and @gabrielaugz! 👋
For details, see the [full changelog](v0.1.10...v0.1.11).
v0.1.10
0.1.9
What's Changed
- Add QwQ full fine-tune and QLoRA configs by @wizeng23 in #1518
- Update TRL to 0.15 and fix Liger/dataset code by @wizeng23 in #1507
- [tiny] Remove vLLM Colab link and fix Alpaca Eval quickstart by @wizeng23 in #1530
- Evaluation: Inference optimizations by @kaisopos in #1522
- Qwen2.5 VL: Replace "from source" install with
transformers>=0.49
by @nikg4 in #1528 - [Evaluation] Renaming
evaluation_platform
→evaluation_backend
by @kaisopos in #1526 - [tiny] Clean up datasets code by @wizeng23 in #1529
- Minor logging improvements in
BaseMapDataset
by @nikg4 in #1532 - Upload scripts used in a Weekly Walkthrough by @taenin in #1533
- Update VisionLanguageConversationFeatureGenerator by @nikg4 in #1531
- [docs] add security.md by @oelachqar in #1534
- [Evaluation] Custom evaluation notebook: a reliability classifier by @kaisopos in #1535
- Multimodal: Limit max number of images per Conversation by @nikg4 in #1536
- Auto-populate and validate params specific to
vision_language_sft
collator inTrainingConfig
by @nikg4 in #1537 - Update Oumi Env to use Rich formatting by @taenin in #1541
- Update oumi launch to use Rich formatting by @taenin in #1543
- Update oumi evaluate to use rich formatting. by @taenin in #1544
- Update the CLI to replace all prints with Rich prints. by @taenin in #1547
- Render the oumi env command as a shell block in bug reports. by @taenin in #1548
- Define
Conversation
proto bufs by @nikg4 in #1550 - [Evaluation] Modifying Alpaca Eval results format to be consistent with LM Harness by @kaisopos in #1551
- Augmenting logging training/model statistics by @optas in #1545
- Misc no-op code cleanups by @nikg4 in #1553
- Add code used for the evaluation demo. by @taenin in #1556
- Add
OUMI_FORCE_EDITABLE_INSTALL
env var to do editable Oumi install from source in job configs by @wizeng23 in #1420 - Add letter counting GRPO example by @wizeng23 in #1539
- Remove UV install from notebooks as this breaks colab by @taenin in #1558
- [Evaluation] Updates in hallucination notebook by @kaisopos in #1552
- [Evaluations] Custom evals: Adding support for
eval_kwargs
by @kaisopos in #1557 - Logging message update in
log_number_of_model_parameters
by @nikg4 in #1560 - [Evaluation][Custom] Removing restrictions and better error checking by @kaisopos in #1561
- Support text truncation (
max_length
) forvision_language_sft
collator by @nikg4 in #1559 - phi 4 multimodal training version 1 ( with limitations ) by @optas in #1555
- Phi-4 basic inference with native/vllm by @optas in #1563
- [minor] phi4 train improvements by @optas in #1564
- Fix printing errors in oumi env for non-string values. by @taenin in #1565
Full Changelog: v0.1.8...v0.1.9