In this repository we provide a pipeline to improve "skills" of large language models (LLMs). Currently we focus on the ability to solve simple mathematical problems, but more skills are coming (such as coding and table understanding).
Our pipeline consists of 3 steps and can be directly applied to any LLM that is supported in NVIDIA's NeMo Toolkit.
- Setup
- Pick a "student" model that you want to improve. E.g. Mistral-7B.
- [optionally] Pick a "teacher" model (can also use the student model itself). E.g. Mixtral-8x7B.
- Choose evaluation benchmarks and training datasets. E.g. GSM8K and MATH.
- Generate synthetic data
- Write a couple of examples of solutions that you want the student LLM to learn. E.g. teach it to use code to solve math problems.
- Run a large-scale generation of diverse solutions on the training datasets showing your examples in the prompt to the teacher model.
- Filter the generated solutions based on correctness and quality.
- Finetune the student model on the generated dataset
We release a series of OpenMath models improved with this pipeline that are one of the best open models for solving mathematical problems and are currently the only state-of-the-art open models that do not rely on OpenAI for data generation!
greedy | majority@50 | |||
model | GSM8K | MATH | GMS8K | MATH |
GPT-4 [1] | 94.4 | 56.2 | - | - |
GPT-4 + code [2] | 92.9 | 69.7 | - | - |
OpenMath-CodeLlama-7B (nemo | HF) | 75.9 | 43.6 | 84.8 | 55.6 |
OpenMath-Mistral-7B (nemo | HF) | 80.2 | 44.5 | 86.9 | 57.2 |
OpenMath-CodeLlama-13B (nemo | HF) | 78.8 | 45.5 | 86.8 | 57.6 |
OpenMath-CodeLlama-34B (nemo | HF) | 80.7 | 48.3 | 88.0 | 60.2 |
OpenMath-Llama2-70B (nemo | HF) | 84.7 | 46.3 | 90.1 | 58.3 |
OpenMath-CodeLlama-70B (nemo | HF) | 84.6 | 50.7 | 90.8 | 60.4 |
We also release OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs generated using permissively licensed Mixtral-8x7B model.
Please see our paper "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" for more details!
Try to run inference with our models with just a few commands!
We provide all instructions to fully reproduce our results.
If you want to improve your own models or to learn more about our pipeline, read through the relevant docs below.
Any model that is supported by NeMo can be used as a "student". Many popular models are supported, e.g. LLaMA2, CodeLLaMA, Mistral-7B and Mixtral-8x7B. For the "teacher" you can use virtually any openly available LLM, since only inference support is needed.
We currently support the following datasets.
Evaluation:
Training:
Please check out evaluation and finetuning sections to learn more!
If you find our work useful, please consider citing us!
@article{toshniwal2024openmath,
title = {OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset},
author = {Shubham Toshniwal and Ivan Moshkov and Sean Narenthiran and Daria Gitman and Fei Jia and Igor Gitman},
year = {2024},
journal = {arXiv preprint arXiv: Arxiv-2402.10176}
}
Disclaimer: This project is strictly for research purposes, and not an official product from NVIDIA.