⚡️ Nanotron

Installation • Quick Start • Features • Benchmarks • Contributing

Pretraining models made easy

Nanotron is a library for pretraining transformer models. It provides a simple and flexible API to pretrain models on custom datasets. Nanotron is designed to be easy to use, fast, and scalable. It is built with the following principles in mind:

Simplicity: Nanotron is designed to be easy to use. It provides a simple and flexible API to pretrain models on custom datasets.
Performance: Optimized for speed and scalability, Nanotron uses the latest techniques to train models faster and more efficiently.

📚 Check out our Ultrascale Playbook - A comprehensive guide to efficiently scale LLM training with Nanotron!

Installation

To run the code in this project, first create a Python virtual environment using e.g. uv:

uv venv nanotron --python 3.11 && source nanotron/bin/activate && uv pip install --upgrade pip

Tip

For Hugging Face cluster users, add export UV_LINK_MODE=copy to your .bashrc to suppress cache warnings from uv

Next, install Pytorch:

uv pip install torch --index-url https://download.pytorch.org/whl/cu124

Then install the core dependencies with:

uv pip install -e .

To run the example scripts, install the remaining dependencies as follows:

uv pip install datasets transformers datatrove[io] numba wandb
# Fused kernels
uv pip install ninja triton "flash-attn>=2.5.0" --no-build-isolation

Next, log into your Hugging Face and Weights and Biases accounts as follows:

huggingface-cli login
wandb login

Finally, check whether your system has Git LFS installed so that you can load and push models/datasets to the Hugging Face Hub:

git-lfs --version

If it isn't installed, run:

sudo apt-get install git-lfs

Quick Start

Training a tiny Llama model

The following command will train a tiny Llama model on a single node of 8 x H100s in about 10 minutes:

CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=8 run_train.py --config-file examples/config_tiny_llama.yaml

The model will be saved in the checkpoints directory as specified in the config file.

Note

You can use examples/config_tiny_llama.py to generate your own training config

For detailed instructions on training your first model, check out our Your First Training guide. For multi-node training with Slurm, see our Multi-Node Training guide.

Run generation from your checkpoint

torchrun --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/{checkpoint_number}/ --tp 1 --pp 1

Increase the value of --tp (tensor paralle) to accelerate generation with multiple GPUs and use a larger value of --pp (pipeline parallel) for very large models.

Debugging with VSCode

To debug with VSCode, add the following configuration to your launch.json file:

{
    "name": "run_train.py",
    "type": "python",
    "request": "launch",
    "program": "torchrun", // or full path to torchrun by running `which torchrun`
    "console": "integratedTerminal",
    "justMyCode": false,
    "args": [
        "--nproc_per_node=2",
        "run_train.py",
        "--config-file=examples/config_tiny_llama.yaml", // or use examples/config_tiny_llama.py to generate your own config
    ],
    "env": {
        // "NANOTRON_BENCHMARK": "1", // enable to benchmark your training for a couple of steps
        "CUDA_DEVICE_MAX_CONNECTIONS": "1",
        "WANDB_MODE": "disabled",
    }
},

Note

For more info check Debugging Nanotron example (on multiple GPUs)

Custom examples

You can find more examples in the /examples directory:

Example	Description
`custom-dataloader`	Plug a custom dataloader to nanotron
`datatrove`	Use the datatrove library to load data
`doremi`	Use DoReMi to speed up training
`mamba`	Train an example Mamba model
`moe`	Train an example Mixture-of-Experts (MoE) model
`mup`	Use spectral µTransfer to scale up your model
`examples/config_tiny_llama_with_s3_upload.yaml`	For automatically uploading checkpoints to S3

We're working on adding more examples soon! Feel free to add a PR to add your own example. 🚀

Benchmarks

We've conducted extensive benchmarking of Nanotron across various model sizes and configurations. The complete benchmark data, configurations, and logs are available in our ultrascale-playbook-data repository.

The diagram above showcases the best configurations we discovered for each model size and node count in nanotron v0.5, highlighting optimal MFU (Model FLOPS Utilization) and memory usage. These represent the most efficient training setups identified through our comprehensive benchmarking process. Stay tuned for even more optimizations coming soon! 🚀

For detailed analysis and best practices derived from these benchmarks, see our Ultrascale Playbook.

Features

We currently support the following features:

And we have on our roadmap:

Credits

We would like to thank everyone working on LLMs, especially those sharing their work openly from which we took great inspiration: Nvidia for Megatron-LM/apex, Microsoft for DeepSpeed, HazyResearch for flash-attn..

Name		Name	Last commit message	Last commit date
Latest commit History 1,253 Commits
.cursor/rules		.cursor/rules
.github		.github
docs		docs
examples		examples
scripts		scripts
src/nanotron		src/nanotron
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config-check.yaml		.pre-commit-config-check.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
run_evals.py		run_evals.py
run_generate.py		run_generate.py
run_train.py		run_train.py
slurm_launcher.py		slurm_launcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡️ Nanotron

Installation • Quick Start • Features • Benchmarks • Contributing

Pretraining models made easy

Installation

Quick Start

Training a tiny Llama model

Run generation from your checkpoint

Debugging with VSCode

Custom examples

Benchmarks

Features

Credits

About

Releases 4

Packages

Contributors 26

Languages

License

huggingface/nanotron

Folders and files

Latest commit

History

Repository files navigation

⚡️ Nanotron

Installation • Quick Start • Features • Benchmarks • Contributing

Pretraining models made easy

Installation

Quick Start

Training a tiny Llama model

Run generation from your checkpoint

Debugging with VSCode

Custom examples

Benchmarks

Features

Credits

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 26

Languages

Packages