Skip to content

EverMind-AI/EverMemModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

48 Commits
Β 
Β 
Β 
Β 

Repository files navigation

EverMemModel

License: MIT

This repository contains the official implementation for the paper: EverMemModel.

πŸ“ Abstract

Large Language Models (LLMs) struggle in knowledge-intensive domains that require deep, specialized knowledge. While Retrieval-Augmented Generation (RAG) is a common solution, its decoupled retrieve-then-read pipeline suffers from misaligned objectives and is prone to performance degradation from distractor documents. We propose EverMemModel, a unified, end-to-end trainable memory model. EverMemModel can handle memory contexts on the scale of 100 million tokens. Our model achieves state-of-the-art results on both retrieval and question-answering benchmarks, significantly outperforming traditional RAG pipelines and long-context models.

✨ Key Contributions

  • End-to-End Memory Model: We propose EverMemModel, a unified architecture that seamlessly integrates retrieval and generation, moving beyond the limitations of decoupled RAG systems.
  • State-of-the-Art Performance: EverMemModel achieves SOTA performance on both the retrieval benchmark(NQ320k) and the question-answering task(MS MARCO and TriviaQA).
  • Massive-Scale Context: Thanks to its efficient architecture, EverMemModel is one of the first models capable of handling contexts up to 100M tokens.

πŸš€ Results

Retrieval Performance (NQ320k)

EverMemModel sets a new state of the art on retrieval task. The best result is in bold.

Method NQ320K (Full text)
R@1
Sparse retrieval
BM25 (Robertson & Zaragoza, 2009b) 29.7
DocT5Query (Nogueira et al., 2019) 38.0
Dense retrieval
DPR (Karpukhin et al., 2020b) 50.2
ANCE (Xiong et al., 2021) 50.2
GTR-Base (Ni et al., 2021) 56.0
Sentence-T5 (Ni et al., 2022) 53.6
HCE-J (Chen et al., 2025) 71.2
Qwen3-Embedding-0.6B (Zhang et al., 2025) 54.0
Qwen3-Embedding-4B (Zhang et al., 2025) 62.6
Generative retrieval
DSI-QG (Zhuang et al., 2022) 63.1
NCI (Wang et al., 2022) 66.4
GenRet (Sun et al., 2023) 68.1
Self Retrieval (Tang et al., 2024) 73.3
Ours (EverMemModel) 75.5

Question Answering Performance (MS MARCO and TriviaQA)

EverMemModel significantly outperforms both strong RAG baselines and large-context models.

Dataset Docs Qwen3RAG-QA Gemini-2.5-Flash EverMemModel (Ours)
R@1 R@5 R@10
MS MARCO (0.8M Tokens) 8,389 2.235 2.535 2.548 2.710 3.812
MS (7.1M Tokens) 75,574 2.225 2.521 2.759 N/A† 2.774
TriviaQA (0.87M Tokens) 607 3.69 4.10 4.36 3.29 4.53
TriviaQA (8.71M Tokens) 5,721 3.27 3.53 3.86 N/A† 4.22

† Input exceeds the model's maximum context length.

Training Time

🌐 Our Homepage

Click here to visit EverMind AI's official website

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •