GitHub - codefuse-ai/rodimus

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Overview

We propose Rodimus*, including Rodimus and Rodimus+, which tries to break the accuracy-efficency trade-off existing in Vanilla tranformers by introducing several innovative features.

Rodimus:

Linear attention-based, purely recurrent model.
Incorporates Data-Dependent Tempered Selection (DDTS) for semantic compression.
Reduced memory usage.

Rodimus+:

Hybrid model combining Rodimus with Sliding Window Shared-Key Attention (SW-SKA).
Enhances semantic, token, and head compression.

Rodimus+-Coder:

We train and open-source the lightweight Rodimus+-Coder model, available in 1.6B and 4B sizes, achieving performance surpassing SOTA models of similar sizes.

Highlights

Constant memory footprint but better language modeling performance.

Better scaling performance than Transformer.

A real lite model, without memory complexity O(T) in KV cache.

Pretrained Checkpoints

Benchmark Checkpoints

This checkpoints completed training before submitting the paper, used to reproduce the benchmarks in the paper.

If you want to use the more practical model, we strongly recommand you to download the checkpionts in Rodimus+-Coder.

Model (2024/10/01)	#Total Params	Training Tokens	Context Length	Download
Rodimus-1.4B-Base	1.4B	500B	2K	🤗 HuggingFace 🤖 ModelScope
Rodimus+-1.6B-Base	1.6B	1T	2K	🤗 HuggingFace 🤖 ModelScope
Rodimus+-Coder-1.6B-Base-20241001	1.6B	2.5T	4K	🤗 HuggingFace 🤖 ModelScope

The Rodimus+-Coder-1.6B-Base-20241001 is the model enhanced by multi-stage training with math and code datasets in the paper.

Rodimus+-Coder Checkpoints

You can download the following table to see the various parameters for your use case. If you are located in mainland China, we also provide the model on modelscope.cn to speed up the download process.

Model	#Total Params	Training Tokens	Context Length	Download
Rodimus+-Coder-1.6B-Base	1.6B	8.2T	4K	🤗 HuggingFace 🤖 ModelScope
Rodimus+-Coder-1.6B-Chat	1.6B	-	4K	🤗 HuggingFace 🤖 ModelScope
Rodimus+-Coder-4B-Base	4B	8.2T	4K	🤗 HuggingFace 🤖 ModelScope
Rodimus+-Coder-4B-Chat	4B	-	4K	🤗 HuggingFace 🤖 ModelScope

Rodimus+-Coder Evaluation

We re-evaluate the metrics of the Qwen series models, and the metrics of other series models are quoted from the original paper. For detailed evaluation code, please refer to the evaluation method of Ling-Coder-Lite in CodeFuse-Evaluation.

Rodimus+-Coder-Base

Datasets	Qwen2.5-Coder-1.5B	Rodimus+-Coder-1.6B-Base	Gemma2-2B-PT	Qwen2.5-Coder-3B	Rodimus+-Coder-4B-Base	Gemma3-4B-PT	Qwen2.5-Coder-7B
Coding Tasks
HumanEval	41.5	51.2	19.5	51.8	60.4	36.0	60.4
HumanEval+	34.8	45.1	-	40.9	52.4	-	50.6
MBPP	57.2	51.2	31.0	62.6	64.6	46.0	70.0
MBPP+	66.1	62.2	-	65.9	71.4	-	70.1
BCB_COMPLETION	21.6	17.9	-	26.2	30.8	-	30.4
MultiPL-E	46.1	52.5	-	49.4	60.7	-	56.9
CRUXEval	38.5	45.1	-	44.6	56.4	-	56.8
Coding Avg.	43.7	46.5	-	48.8	56.7	-	56.4
General Tasks
C-EVAL	55.2	56.7	-	65.3	70.2	-	69.1
CMMLU	54.5	52.3	-	65.4	68.3	-	72.7
MMLU	55.5	51.1	52.2	63.3	62.6	59.6	70.5
BBH	21.8	46.8	42.4	32.5	61.9	50.9	67.3
General Avg.	46.8	51.7	-	56.6	65.8	-	69.9
Mathematics Tasks
GSM8K	60.4	68.7	25.0	72.1	78.5	38.4	83.4
MATH	23.7	29.0	16.4	31.9	37.0	24.2	42.2
Math Avg.	41.9	48.9	20.7	52.0	57.8	31.3	62.8
Overall
Overall	44.4	48.4	-	51.7	59.6	-	61.6

Rodimus+-Coder-Chat

Datasets	Qwen2.5-Coder-1.5B-Instruct	Rodimus+-Coder-1.6B-Chat	Gemma2-2B-IT	Qwen2.5-Coder-Instruct	Phi-4-Mini-3.8B	Rodimus+-Coder-4B-Chat	Gemma3-4B-IT	Qwen2.5-Coder-7B-Instruct
Coding Tasks
HumanEval	64.6	76.8	20.1	79.9	74.4	86.6	71.3	87.2
HumanEval+	63.4	73.8	-	80.5	68.3	82.9	-	82.3
MBPP	51.0	59.0	36.6	59.2	65.3	68.0	63.2	75.8
MBPP+	53.0	66.4	-	61.9	63.8	68.5	-	75.1
LCB_{(24.08-24.11)}	4.0	10.9	-	13.0	-	13.9	-	22.8
BCB_INSTRUCT	10.8	21.5	-	21.7	33.8	26.6	-	30.6
HumanEval-Mul	50.8	57.3	-	67.4	-	70.6	-	76.1
MBPP-Mul	43.4	52.4	-	53.4	-	59.6	-	61.4
MBXP-EN	55.8	75.5	-	76.0	-	87.3	-	87.7
MBXP-CN	48.8	75.0	-	68.7	-	84.3	-	83.5
CRUXEval	28.6	55.0	-	51.6	-	63.2	-	69.3
HumanEvalFix	38.9	52.6	-	55.5	-	68.8	-	69.3
Spider	61.2	71.4	-	71.8	42.2	73.5	-	82.0
Coding Avg.	44.2	57.5	-	58.5	-	65.7	-	69.5
General Tasks
C-EVAL	51.5	50.8	-	62.0	-	61.6	-	66.4
CMMLU	45.2	50.5	-	60.1	-	62.0	-	64.9
MMLU	52.0	49.3	56.1	61.7	67.3	57.5	58.1	66.1
BBH	24.2	58.7	41.4	57.3	70.4	63.7	72.2	59.1
General Avg.	43.2	52.3	-	60.3	-	61.2	-	64.1
Mathematics Tasks
GSM8K	54.4	68.5	62.6	73.5	88.6	79.2	89.2	79.5
MATH	38.1	33.5	27.2	44.1	64.0	44.1	75.6	60.8
Math Avg.	46.2	51.0	44.9	58.8	68.8	61.7	82.4	70.1
Overall
Overall	44.2	55.8	-	58.9	-	64.3	-	68.4

Quick Starts

Installation

The latest version of transformers is recommended (at least 4.42.0).
We evaluate our models with python=3.8 and torch==2.1.2.
If you use Rodimus, you need to install flash-linear-attention, causal_conv1d and triton>=2.2.0. If you use Rodimus+, you need to further install flash-attention.

Examples

In examples/generation_script.py, we show a code snippet to show you how to use the model to generate:

import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer

# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
    ckpt_dir,
    torch_dtype=torch.float16,
    device_map="cuda"
).eval()

# inference
input_prompt = "你好！你是谁？"
model_inputs = tokenizer(input_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_length=32)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(response)

In examples/chat_script.py, we further show how to chat with Rodimus+:

import os
import torch
from modeling_rodimus import RodimusForCausalLM
from tokenization_rodimus_fast import RodimusTokenizer

# load model
ckpt_dir = "model_path"
tokenizer = RodimusTokenizer.from_pretrained(ckpt_dir)
model = RodimusForCausalLM.from_pretrained(
    ckpt_dir,
    torch_dtype=torch.float16,
    device_map="cuda"
).eval()

# inference
input_prompt = "简单介绍一下大型语言模型。"
messages = [
    {"role": "HUMAN", "content": input_prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    system='You are Rodimus$+$, created by AntGroup. You are a helpful assistant.',
    tokenize=False,
)
print(text)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**model_inputs, max_length=2048)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

print(response)

Citation

If you find our work helpful, feel free to give us a cite.

@inproceedings{
    he2025rodimus,
    title={Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions},
    author={Zhihao He and Hang Yu and Zi Gong and Shizhan Liu and Jianguo Li and Weiyao Lin},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=IIVYiJ1ggK}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
examples		examples
modules		modules
ops		ops
.gitignore		.gitignore
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md
configuration_rodimus.py		configuration_rodimus.py
modeling_rodimus.py		modeling_rodimus.py
tokenization_rodimus_fast.py		tokenization_rodimus_fast.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Overview

Highlights

Pretrained Checkpoints

Benchmark Checkpoints

Rodimus+-Coder Checkpoints

Rodimus+-Coder Evaluation

Rodimus+-Coder-Base

Rodimus+-Coder-Chat

Quick Starts

Installation

Examples

Citation

About

Releases

Packages

Contributors 3

Languages

License

codefuse-ai/rodimus

Folders and files

Latest commit

History

Repository files navigation

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Overview

Highlights

Pretrained Checkpoints

Benchmark Checkpoints

Rodimus+-Coder Checkpoints

Rodimus+-Coder Evaluation

Rodimus+-Coder-Base

Rodimus+-Coder-Chat

Quick Starts

Installation

Examples

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages