NeMo Skills

In this repository we provide pipelines to improve "skills" of large language models (LLMs). Currently we focus on the ability to solve mathematical problems, but you can use our pipelines for many other tasks as well.

Here are some of the things we support.

Easily convert models between NeMo, vLLM and TensorRT-LLM formats.
Host the server in any of the above formats to run large-scale synthetic data generation. You can also call Nvidia NIM API or OpenAI API with the same interface, so it's easy to switch from quick prototyping to large-scale slurm jobs.
Evaluate your models on many popular benchmarks (it's easy to add new benchmarks or customize existing settings). The following benchmarks are supported out-of-the-box
- Math problem solving: gsm8k, math, amc23, aime24 (and many more)
- Coding skills: human-eval, mbpp
- Chat/instruction following: ifeval, arena-hard
- General knowledge: mmlu (generative)
Train models using NeMo-Aligner.
We support other pipelines as well, such as LLM-based dataset decontamination or using LLM-as-a-judge. And it's easy to add new workflows!

To get started, follow the prerequisites and then run ns --help to see all available commands and their options.

OpenMathInstruct-2

Using our pipelines we created OpenMathInstruct-2 dataset which consists of 14M question-solution pairs (> 600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset.

The models trained on this dataset achieve strong results on common mathematical benchmarks.

model	GSM8K	MATH	AMC 2023	AIME 2024	Omni-MATH
Llama3.1-8B-Instruct	84.5	51.9	9/40	2/30	12.7
OpenMath2-Llama3.1-8B (nemo \| HF)	91.7	67.8	16/40	3/30	22.0
+ majority@256	94.1	76.1	23/40	3/30	24.6
Llama3.1-70B-Instruct	95.1	68.0	19/40	6/30	19.0
OpenMath2-Llama3.1-70B (nemo \| HF)	94.9	71.9	20/40	4/30	23.1
+ majority@256	96.0	79.6	24/40	6/30	27.6

We provide all instructions to fully reproduce our results.

Nemo Inspector

We also provide a convenient tool for visualizing inference and data analysis

Overview	Inference Page	Analyze Page

Papers

If you find our work useful, please consider citing us!

New paper TBD

@article{toshniwal2024openmath,
  title   = {OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset},
  author  = {Shubham Toshniwal and Ivan Moshkov and Sean Narenthiran and Daria Gitman and Fei Jia and Igor Gitman},
  year    = {2024},
  journal = {arXiv preprint arXiv: Arxiv-2402.10176}
}

Disclaimer: This project is strictly for research purposes, and not an official product from NVIDIA.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
cluster_configs		cluster_configs
dockerfiles		dockerfiles
docs		docs
nemo_inspector		nemo_inspector
nemo_skills		nemo_skills
requirements		requirements
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeMo Skills

OpenMathInstruct-2

Nemo Inspector

Papers

About

Releases

Packages

Contributors 13

Languages

License

Kipok/NeMo-Skills

Folders and files

Latest commit

History

Repository files navigation

NeMo Skills

OpenMathInstruct-2

Nemo Inspector

Papers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 13

Languages

Packages