GitHub

ATOM (AiTer Optimized Model) is a lightweight vLLM-like implementation, focusing on integration and optimization based on aiter.

🚀 Features

ROCm Optimized: Built on AMD's ROCm platform with torch compile support
Model Support: Compatible with Deepseek, Qwen, Llama, and Mixtral.
Easy Integration: Simple API for quick deployment

📋 Requirements

AMD GPU with ROCm support
Docker

🛠️ Installation

1. Pull Docker Image

docker pull rocm/pytorch:rocm7.0.2_ubuntu24.04_py3.12_pytorch_release_2.8.0

2. Run Docker Container

docker run -it --network=host \
  --device=/dev/kfd \
  --device=/dev/dri \
  --group-add video \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  -v $HOME:/home/$USER \
  -v /mnt:/mnt \
  -v /data:/data \
  --shm-size=16G \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  rocm/pytorch:rocm7.0.2_ubuntu24.04_py3.12_pytorch_release_2.8.0

3. Clone and Setup

pip install aiter -i https://mkmartifactory.amd.com/artifactory/api/pypi/hw-orc3pypi-prod-local/simple
git clone https://github.com/ROCm/ATOM.git
cd ./ATOM
pip install .

💡 Usage

Basic Example

The default optimization level is 3 (running with torch compile). Supported models include Deepseek, Qwen, Llama, and Mixtral.

python -m atom.examples.simple_inference --model meta-llama/Meta-Llama-3-8B

Note: First-time execution may take approximately 10 minutes for model compilation.

Performance profiling

Profile offline inference

python -m atom.examples.profile_offline --model Qwen/Qwen3-0.6B

Or profile offline with custom input length

python -m atom.examples.profile_offline --model Qwen/Qwen3-0.6B --random-input --input-length 1024 --output-length 32

Profile online inference, after starting the server

python -m atom.examples.profile_online

Or profile online with custom input length

python -m atom.examples.profile_online --model Qwen/Qwen3-0.6B --random-input --input-length 1024 --output-length 32

Or directly send start profile and stop profile reuqest

curl -s -S -X POST http://127.0.0.1:8000/start_profile

curl -s -S -X POST http://127.0.0.1:8000/stop_profile

Performance Benchmarking

Run online throughput benchmark:

start the server

python -m atom.entrypoints.openai_server --model Qwen/Qwen3-0.6B
python -m atom.entrypoints.openai_server --model deepseek-ai/DeepSeek-R1 -tp 8 --block-size 1

run benchmark

MODEL=deepseek-ai/DeepSeek-R1
ISL=1024
OSL=1024
CONC=128
PORT=8000
RESULT_FILENAME=Deepseek-R1-result
 
python benchmark_serving.py \
--model=$MODEL --backend=vllm --base-url=http://localhost:$PORT \
--dataset-name=random \
--random-input-len=$ISL --random-output-len=$OSL \
--random-range-ratio 0.8 \
--num-prompts=$(( $CONC * 10 )) \
--max-concurrency=$CONC \
--request-rate=inf --ignore-eos \
--save-result --percentile-metrics="ttft,tpot,itl,e2el" \
--result-dir=./ --result-filename=$RESULT_FILENAME.json

📊 Performance Comparison

ATOM demonstrates significant performance improvements over vLLM:

Model	Framework	Tokens	Time	Throughput
Qwen3-0.6B	ATOM	4096	0.25s	16,643.74 tok/s
Qwen3-0.6B	vLLM	4096	0.63s	6,543.06 tok/s
Llama-3.1-8B-Instruct-FP8-KV	ATOM	4096	0.68s	5,983.37 tok/s
Llama-3.1-8B-Instruct-FP8-KV	vLLM	4096	1.68s	2,432.62 tok/s

Online serving throughput:

Deepseek-V3

concurrency	IPS/QPS	prompts num	vLLM Throughput	ATOM Throughput
16	1024/1024	128	423.68 tok/s	922.03 tok/s
32	1024/1024	128	629.06 tok/s	1488.52 tok/s
64	1024/1024	128	760.22 tok/s	2221.25 tok/s
128	1024/1024	128	1107.93 tok/s	2254.88 tok/s

Accuracy Benchmarking

First, install lm-eval to test model accuracy:

pip install lm-eval[api]

Next, start an OpenAI-compatible server using openai_server.py:

python -m atom.entrypoints.openai_server --model meta-llama/Meta-Llama-3-8B

Finally, run the evaluation by choosing your datasets:

lm_eval --model local-completions \
        --model_args model=meta-llama/Meta-Llama-3-8B,base_url=http://localhost:8000/v1/completions,num_concurrent=8,max_retries=3,tokenized_requests=False \
        --tasks gsm8k \
        --num_fewshot 3

Acknowledgements

This project was adapted from nano-vllm (https://github.com/GeeeekExplorer/nano-vllm)

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.github/workflows		.github/workflows
atom		atom
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Features

📋 Requirements

🛠️ Installation

1. Pull Docker Image

2. Run Docker Container

3. Clone and Setup

💡 Usage

Basic Example

Performance profiling

Performance Benchmarking

📊 Performance Comparison

Online serving throughput:

Accuracy Benchmarking

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 14

Languages

License

ROCm/ATOM

Folders and files

Latest commit

History

Repository files navigation

🚀 Features

📋 Requirements

🛠️ Installation

1. Pull Docker Image

2. Run Docker Container

3. Clone and Setup

💡 Usage

Basic Example

Performance profiling

Performance Benchmarking

📊 Performance Comparison

Online serving throughput:

Accuracy Benchmarking

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 14

Languages

Packages