π News β’ π Introduction β’ π¬ Methodology β’
- [2025-08-23] π Paper accepted at EMNLP Findings 2025
- [2025-08-23] π DeAR achieves 90.97 nDCG@10 on NovelEval, outperforming GPT-4 by +3.09!
- [2025-08-23] β‘ Inference speedup: 2.2s pointwise, 11.16s listwise
DeAR (Deep Agent Rank) is a novel dual-stage document reranking framework that decouples fine-grained relevance scoring from holistic cross-document analysis. By combining knowledge distillation with reasoning agents, DeAR achieves superior accuracy and interpretability compared to single-stage approaches.
- π§ Dual-Stage Architecture: Pointwise scoring + Listwise reasoning
- π Knowledge Distillation: Transfer from 13B teacher to 3B/8B students
- π€ Reasoning Agents: Chain-of-Thought guided reranking
- β‘ Efficient Training: LoRA adapters for lightweight fine-tuning
- π SOTA Performance: Surpasses GPT-4 on multiple benchmarks
- π Open Source: No proprietary API dependencies
Dataset | DeAR-L | GPT-4 | Improvement |
---|---|---|---|
NovelEval | 90.97 | 87.88 | +3.09 |
DL20 | 68.71 | - | +5.1 vs baselines |
Natural Questions | 54.29 | - | SOTA |
Covid | 88.36 | 85.51 | +2.85 |
- Teacher: Frozen 13B LLaMA model generates relevance logits
- Student: Compact {3B, 8B} models learn via hybrid loss:
- Cross-Entropy Loss (ranking objective)
- RankNet Loss (pairwise preferences)
- KL Divergence Loss (teacher alignment)
- Output: Top-100 ranked candidates
- Synthetic Data: 20K GPT-4o generated reasoning examples
- Chain-of-Thought: Step-by-step ranking explanations
- Training: Supervised fine-tuning on top-20 candidates
- Output: Final interpretable ranking with justifications

Example of RankLLM training prompt used to generate synthetic reasoning data. The model follows structured steps: (1) identify information requirements, (2) match passages to requirements, (3) provide final ranking with reasoning.
git clone https://github.com/your-username/DeAR-Reranking.git
cd DeAR-Reranking
pip install -r requirements.txt
The generation process follows the structured prompt format shown in the figure above, ensuring consistent reasoning patterns across all synthetic examples.
Method | DL19 | DL20 | Avg |
---|---|---|---|
BM25 | 50.58 | 47.96 | 49.27 |
MonoT5-3B | 71.83 | 68.89 | 70.36 |
RankGPT-4 | 75.59 | 70.56 | 73.08 |
DeAR-L-8B | 77.91 | 75.63 | 76.77 |
Method | Covid | NFCorpus | Touche | DBPedia | SciFact | News | Robust04 | Signal |
---|---|---|---|---|---|---|---|---|
BM25 | 59.47 | 30.75 | 44.22 | 31.80 | 67.89 | 39.52 | 40.70 | 33.05 |
MonoT5-3B | 80.71 | 38.97 | 32.41 | 44.45 | 76.57 | 48.49 | 56.71 | 32.55 |
DeAR-L-8B | 88.36 | 40.56 | 37.23 | 47.12 | 74.95 | 52.89 | 62.18 | 34.40 |
Method | nDCG@1 | nDCG@5 | nDCG@10 | Avg |
---|---|---|---|---|
BM25 | 33.33 | 45.96 | 55.77 | 45.02 |
RankGPT-4 | 85.71 | 87.49 | 90.45 | 87.88 |
DeAR-L-8B | 92.86 | 88.04 | 92.01 | 90.97 |
Method | nDCG@10 | Time (s) | Speed Rank |
---|---|---|---|
DeAR-P-8B | 74.5 | 2.2 | π₯ |
DeAR-L-8B | 75.54 | 11.16 | β‘ |
RankZephyr | 74.2 | 21.58 | π |
RankVicuna | 66.82 | 17.86 | π |
- MS MARCO: Pointwise distillation (40K queries)
- Synthetic Reasoning: GPT-4o generated CoT examples (20K)
We generate 20K high-quality reasoning examples using the structured prompt shown above. Each example contains:
- Query: Original search query
- Documents: Top candidate passages with IDs [1], [2], [3]...
- Reasoning Steps: Step-by-step CoT explanation
- Final Ranking: Structured output format
### Final Reranking: [1] > [2] > [3]
The prompt guides GPT-4o to:
- List information requirements for the query
- Match passages to these requirements
- Provide final ranking using document identifiers
- TREC DL19/20: Deep Learning tracks
- BEIR: 8 diverse retrieval datasets
- NovelEval-2306: Novel query generalization
- Natural Questions & WebQA: Open-domain QA
- π Bug fixes and improvements
- π New evaluation benchmarks
- π§ Alternative reasoning strategies
- β‘ Performance optimizations
- π Documentation improvements
Model | Parameters | Stage | Performance | Speed |
---|---|---|---|---|
DeAR-3B-P | 3B | Pointwise | High | Very Fast |
DeAR-8B-P | 8B | Pointwise | Higher | Fast |
DeAR-3B-L | 3B | Listwise | Very High | Moderate |
DeAR-8B-L | 8B | Listwise | Highest | Moderate |
Track the latest results on our community leaderboard: π DeAR Leaderboard (Coming Soon)
If you use DeAR in your research, please cite our paper:
@misc{abdallah2025dear,
title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
author={Abdelrahman Abdallah and Jamshid Mozafari and Bhawna Piryani and Adam Jatowt},
year={2025},
eprint={2508.16998},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Authors: Abdelrahman Abdallah, Jamshid Mozafari
- Institution: University of Innsbruck
- Issues: GitHub Issues
- Discussions: GitHub Discussions
π Star this repo if DeAR helps your research! π
π§ Questions? Open an issue or contact the authors
π Follow us for updates: @YourHandle