EverMemModel

This repository contains the official implementation for the paper: EverMemModel.

📝 Abstract

Large Language Models (LLMs) struggle in knowledge-intensive domains that require deep, specialized knowledge. While Retrieval-Augmented Generation (RAG) is a common solution, its decoupled retrieve-then-read pipeline suffers from misaligned objectives and is prone to performance degradation from distractor documents. We propose EverMemModel, a unified, end-to-end trainable memory model. EverMemModel can handle memory contexts on the scale of 100 million tokens. Our model achieves state-of-the-art results on both retrieval and question-answering benchmarks, significantly outperforming traditional RAG pipelines and long-context models.

✨ Key Contributions

End-to-End Memory Model: We propose EverMemModel, a unified architecture that seamlessly integrates retrieval and generation, moving beyond the limitations of decoupled RAG systems.
State-of-the-Art Performance: EverMemModel achieves SOTA performance on both the retrieval benchmark(NQ320k) and the question-answering task(MS MARCO and TriviaQA).
Massive-Scale Context: Thanks to its efficient architecture, EverMemModel is one of the first models capable of handling contexts up to 100M tokens.

🚀 Results

Retrieval Performance (NQ320k)

EverMemModel sets a new state of the art on retrieval task. The best result is in bold.

Method	NQ320K (Full text)
	R@1
*Sparse retrieval*
BM25 (Robertson & Zaragoza, 2009b)	29.7
DocT5Query (Nogueira et al., 2019)	38.0
*Dense retrieval*
DPR (Karpukhin et al., 2020b)	50.2
ANCE (Xiong et al., 2021)	50.2
GTR-Base (Ni et al., 2021)	56.0
Sentence-T5 (Ni et al., 2022)	53.6
HCE-J (Chen et al., 2025)	71.2
Qwen3-Embedding-0.6B (Zhang et al., 2025)	54.0
Qwen3-Embedding-4B (Zhang et al., 2025)	62.6
*Generative retrieval*
DSI-QG (Zhuang et al., 2022)	63.1
NCI (Wang et al., 2022)	66.4
GenRet (Sun et al., 2023)	68.1
Self Retrieval (Tang et al., 2024)	73.3
Ours (EverMemModel)	75.5

Question Answering Performance (MS MARCO and TriviaQA)

EverMemModel significantly outperforms both strong RAG baselines and large-context models.

Dataset	Docs	Qwen3RAG-QA			Gemini-2.5-Flash	EverMemModel (Ours)
Dataset	Docs	R@1	R@5	R@10	Gemini-2.5-Flash	EverMemModel (Ours)
MS MARCO (0.8M Tokens)	8,389	2.235	2.535	2.548	2.710	3.812
MS (7.1M Tokens)	75,574	2.225	2.521	2.759	N/A†	2.774
TriviaQA (0.87M Tokens)	607	3.69	4.10	4.36	3.29	4.53
TriviaQA (8.71M Tokens)	5,721	3.27	3.53	3.86	N/A†	4.22

† Input exceeds the model's maximum context length.

Training Time

🌐 Our Homepage

Click here to visit EverMind AI's official website

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
README.md		README.md
training_time_vs_context_length_updated.png		training_time_vs_context_length_updated.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EverMemModel

📝 Abstract

✨ Key Contributions

🚀 Results

Retrieval Performance (NQ320k)

Question Answering Performance (MS MARCO and TriviaQA)

Training Time

🌐 Our Homepage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

EverMind-AI/EverMemModel

Folders and files

Latest commit

History

Repository files navigation

EverMemModel

📝 Abstract

✨ Key Contributions

🚀 Results

Retrieval Performance (NQ320k)

Question Answering Performance (MS MARCO and TriviaQA)

Training Time

🌐 Our Homepage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages