Skip to content

LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 

Repository files navigation

A Comprehensive Survey on Long Context Language Modeling 💡

LICENSE Awesome commit PR GitHub Repo stars

This repository provides a collection of papers and resources focused on Long Context Language Modeling. For a clear taxonomy and more insights about the methodology, you can refer to our survey: A Comprehensive Survey on Long Context Language Modeling with a overview shown below.

We appreciate any useful suggestions for improvement of this paper list or survey from peers and commit to regularly updating the repository.

If you would like to include your paper or any modifications in this survey and repository, please feel free to raise issues or send an email to [email protected] or [email protected] or [email protected]. We sincerely appreciate your collaboration!

We would like to extend our sincere gratitude to Awesome-LLM-Long-Context-Modeling for providing valuable reference to support the expansion of this project and the development of the comprehensive scholarly survey.

We would also like to mention Thus Spake Long-Context Large Language Model (Github), a concurrent survey that details the development history of long-context LLMs. They've created a video with Thus Spake Zarathustra symphony to introduce LCLM-related work.

If you find our survey useful for your research, please consider citing the following paper:

@article{liu2025comprehensive,
  title={A Comprehensive Survey on Long Context Language Modeling},
  author={Liu, Jiaheng and Zhu, Dawei and Bai, Zhiqi and He, Yancheng and Liao, Huanxuan and Que, Haoran and Wang, Zekun and Zhang, Chenchen and Zhang, Ge and Zhang, Jiebin and others},
  journal={arXiv preprint arXiv:2503.17407},
  year={2025}
}

Updates

  • [2025.03.25] Our paper is finally out on arxiv.
  • [2025.03.13] We have a good communication with the authors of concurrent work, and will promote work of both parties in the future.
  • [2025.03.11] We release the first version of the survey on Long Context Language Modeling [lclm-survey.pdf] and opensource our repo.

Table of Contents

Paper List

Data

Pretraining

  1. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. J. Mach. Learn. Res. 2020

  2. Scaling Language Models: Methods, Analysis {&} Insights from Training Gopher. Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, H. Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po{-}Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant M. Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent Sifre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean{-}Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d'Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew J. Johnson, Blake A. Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Edward Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving. Arxiv 2021

  3. Structured Packing in LLM Training Improves Long Context Utilization. Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś. Arxiv 2024.

  4. SemDeDup: Data-efficient learning at web-scale through semantic deduplication. Amro Abbas, Kushal Tirumala, Daniel Simig, Surya Ganguli, Ari S. Morcos. Arxiv 2023

  5. {SlimPajama: A 627B token cleaned and deduplicated version of RedPajama}. Daria Soboleva, Faisal Al-Khateeb, Robert Myers, Jacob R Steeves, Joel Hestness, Nolan Dey. Arxiv 2023

  6. In-Context Pretraining: Language Modeling Beyond Document Boundaries. Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Wen-tau Yih, Mike Lewis. ICLR 2024 Spotlight.         GitHub Repo stars

  7. Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance. Jiasheng Ye, Peiju Liu, Tianxiang Sun, Yunhua Zhou, Jun Zhan, Xipeng Qiu. Arxiv 2024

  8. Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models. Longze Chen, Ziqiang Liu, Wanwei He, Yunshui Li, Run Luo, Min Yang. Arxiv 2024.         GitHub Repo stars

  9. {L}ong{W}anjuan: Towards Systematic Measurement for Long Text Quality. Xiaoran Liu, Kai Lv, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin. ACL 2024

  10. Map-neo: Highly capable and transparent bilingual large language model series. Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, others. Arxiv 2024

  11. Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model. Chaochen Gao, Xing Wu, Qi Fu, Songlin Hu. Arxiv 2024.

  12. Data Engineering for Scaling Language Models to 128K Context. Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng. Arxiv 2024.         GitHub Repo stars

  13. RegMix: Data Mixture as Regression for Language Model Pre-training. Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, Jing Jiang, Min Lin. Arxiv 2024

  14. How to Train Long-Context Language Models (Effectively). Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen. Arxiv 2024.         GitHub Repo stars

  15. LongAttn: Selecting Long-context Training Data via Token-level Attention. Longyun Wu, Dawei Zhu, Guangxiang Zhao, Zhuocheng Yu, Junfeng Ran, Xiangyu Wong, Lin Sun, Sujian Li. Arxiv 2025.         GitHub Repo stars

  16. Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models. Junfeng Tian, Da Zheng, Yang Cheng, Rui Wang, Colin Zhang, Debing Zhang. Arxiv 2024.         GitHub Repo stars

Posttraining

  1. The {N}arrative{QA} Reading Comprehension Challenge. Tom{'a}{\v{s}} Ko{\v{c}}isk{'y}, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, G{'a}bor Melis, Edward Grefenstette. ACL 2018

  2. Training language models to follow instructions with human feedback. Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke E. Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Francis Christiano, Jan Leike, Ryan J. Lowe. Arxiv 2022

  3. {SlimPajama: A 627B token cleaned and deduplicated version of RedPajama}. Daria Soboleva, Faisal Al-Khateeb, Robert Myers, Jacob R Steeves, Joel Hestness, Nolan Dey. Arxiv 2023

  4. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn. Arxiv 2023

  5. WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models. Conghui He, Zhenjiang Jin, Chaoxi Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Da Lin. Arxiv 2023

  6. {L}ong{W}anjuan: Towards Systematic Measurement for Long Text Quality. Xiaoran Liu, Kai Lv, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin. ACL 2024

  7. LOGO--Long cOntext aliGnment via efficient preference Optimization. Zecheng Tang, Zechen Sun, Juntao Li, Qiaoming Zhu, Min Zhang. Arxiv 2024

  8. {L}ong{A}lign: A Recipe for Long Context Alignment of Large Language Models. Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li. ACL 2024

  9. What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices. Zhi Chen, Qiguang Chen, Libo Qin, Qipeng Guo, Haijun Lv, Yicheng Zou, Wanxiang Che, Hang Yan, Kai Chen, Dahua Lin. Arxiv 2024.         GitHub Repo stars

  10. Weaver: Foundation Models for Creative Writing. Tiannan Wang, Jiamin Chen, Qingrui Jia, Shuai Wang, Ruoyu Fang, Huilin Wang, Zhaowei Gao, Chunzhao Xie, Chuou Xu, Jihong Dai, Yibin Liu, Jialong Wu, Shengwei Ding, Long Li, Zhiwei Huang, Xinle Deng, Teng Yu, Gangan Ma, Han Xiao, Zixin Chen, Danjun Xiang, Yunxia Wang, Yuanyuan Zhu, Yi Xiao, Jing Wang, Yiru Wang, Siran Ding, Jiayang Huang, Jiayi Xu, Yilihamu Tayier, Zhenyu Hu, Yuan Gao, Chengfeng Zheng, Yueshu Ye, Yihang Li, Lei Wan, Xinyue Jiang, Yujie Wang, Siyu Cheng, Zhule Song, Xiangru Tang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang, Wangchunshu Zhou. Arxiv 2024

  11. LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs. Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li. Arxiv 2024.         GitHub Repo stars

  12. LongReward: Improving Long-context Large Language Models with AI Feedback. Jiajie Zhang, Zhongni Hou, Xin Lv, Shulin Cao, Zhenyu Hou, Yilin Niu, Lei Hou, Yuxiao Dong, Ling Feng, Juanzi Li. Arxiv 2024

  13. ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities. Peng Xu, Wei Ping, Xianchao Wu, Zihan Liu, Mohammad Shoeybi, Bryan Catanzaro. Arxiv 2024.

  14. LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models. Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia. ICLR 2024 Oral.         GitHub Repo stars

  15. {ORPO}: Monolithic Preference Optimization without Reference Model. Jiwoo Hong, Noah Lee, James Thorne. EMNLP 2024

  16. Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training. Junqing He, Kunhao Pan, Xiaoqun Dong, Zhuoyang Song, LiuYiBo LiuYiBo, Qianguosun Qianguosun, Yuxin Liang, Hao Wang, Enming Zhang, Jiaxing Zhang. ACL 2024

  17. Make Your {LLM} Fully Utilize the Context. Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen. NeurIPS 2024

  18. LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information. Bowen Ping, Jiali Zeng, Fandong Meng, Shuo Wang, Jie Zhou, Shanghang Zhang. Arxiv 2025.

  19. LongAttn: Selecting Long-context Training Data via Token-level Attention. Longyun Wu, Dawei Zhu, Guangxiang Zhao, Zhuocheng Yu, Junfeng Ran, Xiangyu Wong, Lin Sun, Sujian Li. Arxiv 2025.         GitHub Repo stars

Model

Position Embeddings

  1. An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding. Tong Wu, Yanpeng Zhao, Zilong Zheng. NeurIPS 2024. GitHub Repo stars

  2. PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training. Dawei Zhu,Nan Yang,Liang Wang,Yifan Song,Wenhao Wu,Furu Wei,Sujian Li. Arxiv 2023. GitHub Repo stars

  3. Contextual Position Encoding: Learning to Count What's Important. Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar. Arxiv 2024.

  4. Why Does the Effective Context Length of LLMs Fall Short?. Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong. Arxiv 2024.

  5. HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation. Yuhan Chen, Ang Lv, Jian Luan, Bin Wang, Wei Liu. Arxiv 2024.

  6. DAPE: Data-Adaptive Positional Encoding for Length Extrapolation. Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li. NeurIPS 2024. GitHub Repo stars

  7. Convolutional sequence to sequence learning. Jonas Gehring and Michael Auli and David Grangier and Denis Yarats and Yann N. Dauphin. Arxiv 2017

  8. Self-attention with relative position representations. Peter Shaw and Jakob Uszkoreit and Ashish Vaswani. Arxiv 2018

  9. Encoding word order in complex embeddings. Benyou Wang and Donghao Zhao and Christina Lioma and Qiuchi Li and Peng Zhang and Jakob Grue Simonsen. Arxiv 2020

  10. Train short, test long: Attention with linear biases enables input length extrapolation. Ofir Press and Noah A. Smith and Mike Lewis. Arxiv 2022

  11. Kerple: Kernelized relative positional embedding for length extrapolation. Ta-Chung Chi and Ting-Han Fan and Peter J. Ramadge and Alexander I. Rudnicky. Arxiv 2022

  12. Dissecting transformer length extrapolation via the lens of receptive field analysis. Ta-Chung Chi and Ting-Han Fan and Alexander I. Rudnicky and Peter J. Ramadge. Arxiv 2023

  13. A length-extrapolatable transformer. Yutao Sun and Li Dong and Barun Patra and Shuming Ma and Shaohan Huang and Alon Benhaim and Vishrav Chaudhary and Xia Song and Furu Wei. Arxiv 2022

  14. Functional interpolation for relative positions improves long context transformers. Shanda Li and Chong You and Guru Guruganesh and Joshua Ainslie and Santiago Ontanon and Manzil Zaheer and Sumit Sanghai and Yiming Yang and Sanjiv Kumar and Srinadh Bhojanapalli. Arxiv 2024

  15. Latent positional information is in the self-attention variance of transformer language models without positional embeddings. Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings. Arxiv 2023

  16. Extending context window of large language models via positional interpolation. Shouyuan Chen and Sherman Wong and Liangjian Chen and Yuandong Tian. Arxiv 2023

  17. Randomized positional encodings boost length generalization of transformers. Anian Ruoss and Grégoire Delétang and Tim Genewein and Jordi Grau-Moya and Róbert Csordás and Mehdi Bennani and Shane Legg and Joel Veness. Arxiv 2023

  18. Yarn: Efficient context window extension of large language models. Bowen Peng and Jeffrey Quesnelle and Honglu Fan and Enrico Shippole. Arxiv 2023

  19. Clex: Continuous length extrapolation for large language models. Guanzheng Chen and Xin Li and Zaiqiao Meng and Shangsong Liang and Lidong Bing. Arxiv 2024

  20. Effective long-context scaling of foundation models. Wenhan Xiong and Jingyu Liu and Igor Molybog and Hejia Zhang and Prajjwal Bhargava and Rui Hou and Louis Martin and Rashi Rungta and Karthik Abinav Sankararaman and Barlas Oguz and Madian Khabsa and Han Fang and Yashar Mehdad and Sharan Narang and Kshitiz Malik and Angela Fan and Shruti Bhosale and Sergey Edunov and Mike Lewis and Sinong Wang and Hao Ma. Arxiv 2023

  21. Giraffe: Adventures in expanding context lengths in llms. Arka Pal and Deep Karkhanis and Manley Roberts and Samuel Dooley and Arvind Sundararajan and Siddartha Naidu. Arxiv 2023

  22. Resonance rope: Improving context length generalization of large language models. Suyuchen Wang and Ivan Kobyzev and Peng Lu and Mehdi Rezagholizadeh and Bang Liu. Arxiv 2024

  23. Long context alignment with short instructions and synthesized positions. Wenhao Wu and Yizhong Wang and Yao Fu and Xiang Yue and Dawei Zhu and Sujian Li. Arxiv 2024

  24. Two stones hit one bird: Bilevel positional encoding for better length extrapolation. Zhenyu He and Guhao Feng and Shengjie Luo and Kai Yang and Liwei Wang and Jingjing Xu and Zhi Zhang and Hongxia Yang and Di He. Arxiv 2024

  25. Found in the middle: How language models use long contexts better via plug-and-play positional encoding. Zhenyu Zhang and Runjin Chen and Shiwei Liu and Zhewei Yao and Olatunji Ruwase and Beidi Chen and Xiaoxia Wu and Zhangyang Wang. Arxiv 2024

  26. Llm maybe longlm: Self-extend llm context window without tuning. Hongye Jin and Xiaotian Han and Jingfeng Yang and Zhimeng Jiang and Zirui Liu and Chia-Yuan Chang and Huiyuan Chen and Xia Hu. Arxiv 2024

  27. Longrope: Extending llm context window beyond 2 million tokens. Yiran Ding and Li Lyna Zhang and Chengruidong Zhang and Yuanyuan Xu and Ning Shang and Jiahang Xu and Fan Yang and Mao Yang. Arxiv 2024

  28. The impact of positional encoding on length generalization in transformers. Amirhossein Kazemnejad and Inkit Padhi and Karthikeyan Natesan Ramamurthy and Payel Das and Siva Reddy. Arxiv 2024

  29. Roformer: Enhanced transformer with rotary position embedding. Jianlin Su and Yu Lu and Shengfeng Pan and Ahmed Murtadha and Bo Wen and Yunfeng Liu. Arxiv 2023

  30. Training-free long-context scaling of large language models. Chenxin An and Fei Huang and Jun Zhang and Shansan Gong and Xipeng Qiu and Chang Zhou and Lingpeng Kong. Arxiv 2024

Architecture

  1. Compressive Transformers for Long-Range Sequence Modelling. Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap. Arxiv 2019. GitHub Repo stars

  2. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention. Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret. ICML 2020. GitHub Repo stars

  3. Block-Recurrent Transformers. DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur. Arxiv 2023. GitHub Repo stars

  4. Memorizing Transformers. Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian Szegedy. Arxiv 2022. GitHub Repo stars

  5. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebrón, Sumit Sanghai. Arxiv 2023.

  6. Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention. Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu. Arxiv 2023.

  7. Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention. Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal. Arxiv 2024.

  8. Weighted Grouped Query Attention in Transformers. Sai Sena Chinnakonduru, Astarag Mohapatra. Arxiv 2024.

  9. Associative Recurrent Memory Transformer. Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev. ICML 2024 Workshop. GitHub Repo stars

  10. Simple linear attention language models balance the recall-throughput tradeoff. Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré. Arxiv 2024. GitHub Repo stars

  11. DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads. Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, Junxian Guo, Shang Yang, Haotian Tang, Yao Fu, Song Han. Arxiv 2024. GitHub Repo stars

  12. TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention. Lijie Yang, Zhihao Zhang, Zhuofu Chen, Zikun Li, Zhihao Jia. Arxiv 2024. GitHub Repo stars

  13. Selective Attention Improves Transformer. Yaniv Leviathan, Matan Kalman, Yossi Matias. Arxiv 2024.

  14. SnapKV: LLM Knows What You are Looking for Before Generation. Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen. Arxiv 2024. GitHub Repo stars

  15. Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures. Evan Lucas, Dylan Kangas, Timothy C Havens. Arxiv 2024.

  16. An Empirical Study of Mamba-based Language Models. Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, Bryan Catanzaro. Arxiv 2024. GitHub Repo stars

  17. Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models. Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong. Arxiv 2024. GitHub Repo stars

  18. Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention. Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong. Arxiv 2024.

  19. SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs. Yizhao Gao, Zhichen Zeng, Dayou Du, Shijie Cao, Hayden Kwok-Hay So, Ting Cao, Fan Yang, Mao Yang. Arxiv 2024. GitHub Repo stars

  20. Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling. Yingfa Chen, Xinrong Zhang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun. Arxiv 2024. GitHub Repo stars

  21. Taipan: Efficient and Expressive State Space Language Models with Selective Attention. Chien Van Nguyen, Huy Huu Nguyen, Thang M. Pham, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Ryan A. Rossi, Trung Bui, Viet Dac Lai, Franck Dernoncourt, Thien Huu Nguyen. Arxiv 2024.

  22. Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length. Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou. Arxiv 2024. ![GitHub Repo stars](https://img.shields.io/github/stars/XuezheMax/megalodon

  23. Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling. Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen. Arxiv 2024. GitHub Repo stars

  24. ReMamba: Equip Mamba with Effective Long-Sequence Modeling. Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao. Arxiv 2024.

  25. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention. Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Y. X. Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng. Arxiv 2025.

  26. MoBA: Mixture of Block Attention for Long-Context LLMs. Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu. Arxiv 2025. GitHub Repo stars

  27. MiniMax-01: Scaling Foundation Models with Lightning Attention. MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Lin Zheng, Linbo Chai, Long Xing, Meizhi Ju, Mingyuan Chi, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qi Yang, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiao Su, Xiaodong Han, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhen Qin, Zhenhua Fan, Zhihang Yu, Zhuo Jiang, Zijia Wu. Arxiv 2025. GitHub Repo stars

  28. Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks. Jongho Park and Jaeseung Park and Zheyang Xiong and Nayoung Lee and Jaewoong Cho and Samet Oymak and Kangwook Lee and Dimitris Papailiopoulos. Arxiv 2024

  29. A new approach to linear filtering and prediction problems. Basar, Tamer. IEEE 2001

  30. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter. Arxiv 2016

  31. Neural Discrete Representation Learning. Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu. Arxiv 2018

  32. Improving spiking dynamical networks: Accurate delays, higher-order synapses, and time cells. Voelker, Aaron R and Eliasmith, Chris. IEEE 2018

  33. Improving language understanding by generative pre-training. Radford, Alec and Narasimhan, Karthik and Salimans, Tim and Sutskever, Ilya and others. mikecaptain 2018

  34. Memformer: The Memory-Augmented Transformer. Qingyang Wu, Zhenzhong Lan, Jing Gu, Zhou Yu. Arxiv 2020

  35. Linformer: Self-Attention with Linear Complexity. Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma. Arxiv 2020

  36. Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers. Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher R{'{e}}. Arxiv 2021

  37. Nystr"omformer: A Nystr"om-Based Algorithm for Approximating Self-Attention. Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. Arxiv 2021

  38. Efficient attention: Attention with linear complexities. Zhuoran Shen and Mingyuan Zhang and Haiyu Zhao and Shuai Yi and Hongsheng Li. Arxiv 2024

  39. ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention. Yang Liu, Jiaxiang Liu, Li Chen, Yuxiang Lu, Shikun Feng, Zhida Feng, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang. Arxiv 2022

  40. cosFormer: Rethinking Softmax in Attention. Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong. Arxiv 2022

  41. Rethinking Attention with Performers. Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller. Arxiv 2022

  42. Multi-head state space model for speech recognition. Yassir Fathullah and Chunyang Wu and Yuan Shangguan and Junteng Jia and Wenhan Xiong and Jay Mahadeokar and Chunxi Liu and Yangyang Shi and Ozlem Kalinli and Mike Seltzer and Mark J. F. Gales. Arxiv 2023

  43. Attention Is All You Need. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Arxiv 2023

  44. Retentive Network: A Successor to Transformer for Large Language Models. Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei. Arxiv 2023

  45. Scaling Transformer to 1M tokens and beyond with {RMT}. Aydar Bulatov and Yuri Kuratov and Yermek Kapushev and Mikhail S. Burtsev. Arxiv 2024

  46. FLatten Transformer: Vision Transformer using Focused Linear Attention. Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang. Arxiv 2023

  47. TRAMS: Training-free Memory Selection for Long-range Language Modeling. Haofei Yu and Cunxiang Wang and Yue Zhang and Wei Bi. Arxiv 2023

  48. Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model. Yinghan Long and Sayeed Shafayet Chowdhury and Kaushik Roy. Arxiv 2023

  49. Transformer-VQ: Linear-Time Transformers via Vector Quantization. Lucas D. Lingle. Arxiv 2024

  50. Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. Tri Dao and Albert Gu. Arxiv 2024

  51. Block-state transformers. Mahan Fathi and Jonathan Pilault and Orhan Firat and Christopher Pal and Pierre-Luc Bacon and Ross Goroshin. Arxiv 2023

  52. Extensible Embedding: {A} Flexible Multipler For LLM's Context Length. Ninglu Shao and Shitao Xiao and Zheng Liu and Peitian Zhang. Arxiv 2024

  53. DeciMamba: Exploring the Length Extrapolation Potential of Mamba. Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein, Nadav Cohen, Amir Globerson, Lior Wolf, Raja Giryes. Arxiv 2024

  54. CORM: Cache Optimization with Recent Message for Large Language Model Inference. Jincheng Dai, Zhuowei Huang, Haiyun Jiang, Chen Chen, Deng Cai, Wei Bi, Shuming Shi. Arxiv 2024

  55. Longformer: The Long-Document Transformer. Iz Beltagy, Matthew E. Peters, Arman Cohan. Arxiv 2020. GitHub Repo stars

  56. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao. ICLR 2024 Oral.

  57. PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling. Zefan Cai., Yichi Zhang, Bofei Gao, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, Wen Xiao. Arxiv 2024.

  58. RazorAttention: Efficient KV Cache Compression Through Retrieval Heads. Hanlin Tang, Yang Lin, Jing Lin, Qingsen Han, Shikuan Hong, Yiwu Yao, Gongyi Wang. Arxiv 2024.

  59. Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning. Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao. Arxiv 2024.

  60. Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference. Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han. ICML 2024. GitHub Repo stars

  61. Efficient Streaming Language Models with Attention Sinks. Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis. Arxiv 2023. GitHub Repo stars

  62. PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference. William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan Kelly. Arxiv 2024. GitHub Repo stars

  63. MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention. Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu. Arxiv 2024. GitHub Repo stars Static Badge

  64. LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference. Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi. Arxiv 2024.

  65. DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs. Xiabin Zhou, Wenbin Wang, Minyan Zeng, Jiaxian Guo, Xuebo Liu, Li Shen, Min Zhang, Liang Ding. Arxiv 2024.

  66. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models. Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher R'{e}, Clark Barrett, Zhangyang "Atlas" Wang, Beidi Chen. Arxiv 2023

  67. Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time. Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis, Anshumali Shrivastava. Arxiv 2023

  68. Loki: Low-rank Keys for Efficient Sparse Attention. Prajwal Singhania, Siddharth Singh, Shwai He, Soheil Feizi, Abhinav Bhatele. Arxiv 2024

  69. LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models. Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. Arxiv 2024

  70. Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference. Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, S. Kevin Zhou. Arxiv 2025

  71. LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation. Xuan Zhang, Fengzhuo Zhang, Cunxiao Du, Chao Du, Tianyu Pang, Wei Gao, Min Lin. Arxiv 2025

  72. Hierarchical Attention Networks for Document Classification. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, Eduard H. Hovy. Arxiv 2016

  73. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. Arthur Jacot, Cl{'{e}}ment Hongler, Franck Gabriel. Arxiv 2018

  74. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc Viet Le, Ruslan Salakhutdinov. Arxiv 2019

  75. {BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. Arxiv 2019

  76. HiPPO: Recurrent Memory with Optimal Polynomial Projections. Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher R{'{e}}. Arxiv 2020

  77. Language Models are Few-Shot Learners. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert{-}Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. Arxiv 2020

  78. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. Arxiv 2020

  79. Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling. Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang. Arxiv 2021

  80. Nystr{"{o}}mformer: {A} Nystr{"{o}}m-based Algorithm for Approximating Self-Attention. Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh. Arxiv 2021

  81. {GLM}: General Language Model Pretraining with Autoregressive Blank Infilling. Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang. Arxiv 2022

  82. {OPT:} Open Pre-trained Transformer Language Models. Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona T. Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer. Arxiv 2022

  83. {BLOOM:} {A} 176B-Parameter Open-Access Multilingual Language Model. Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilic, Daniel Hesslow, Roman Castagn{'{e}}, Alexandra Sasha Luccioni, Fran{\c{c}}ois Yvon, Matthias Gall{'{e}}, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Beno{^{\i}}t Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan{-}Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Lauren{\c{c}}on, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, et al.. Arxiv 2022

  84. Efficiently Modeling Long Sequences with Structured State Spaces. Albert Gu, Karan Goel, Christopher R{'{e}}. Arxiv 2022

  85. NTK-ALiBi: Long Text Extrapolation of ALiBi Position Encoding through Interpolation. * *. Arxiv 2023

  86. LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie{-}Anne Lachaux, Timoth{'{e}}e Lacroix, Baptiste Rozi{`{e}}re, Naman Goyal, Eric Hambro, Faisal Azhar, Aur{'{e}}lien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. Arxiv 2023

  87. Position Interpolation Improves ALiBi Extrapolation. Faisal Al{-}Khateeb, Nolan Dey, Daria Soboleva, Joel Hestness. Arxiv 2023

  88. Efficient Prompting via Dynamic In-Context Learning. Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan. Arxiv 2023

  89. {RWKV:} Reinventing RNNs for the Transformer Era. Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Leon Derczynski, Xingjian Du, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Jiaju Lin, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Johan S. Wind, Stanislaw Wozniak, Zhenyuan Zhang, Qinghua Zhou, Jian Zhu, Rui{-}Jie Zhu. Arxiv 2023

  90. {GQA:} Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. Joshua Ainslie, James Lee{-}Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr{'{o}}n, Sumit Sanghai. Arxiv 2023

  91. Baichuan 2: Open Large-scale Language Models. Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su, Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu. Arxiv 2023

  92. Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence. Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Jiaju Lin, Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Cahya Wirawan, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao, Peng Zhou, Jian Zhu, Rui-Jie Zhu. Arxiv 2024

  93. Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use. Yuhan Chen, Ang Lv, Ting{-}En Lin, Changyu Chen, Yuchuan Wu, Fei Huang, Yongbin Li, Rui Yan. Arxiv 2024

  94. QUEST: Query-Aware Sparsity for Efficient Long-Context {LLM} Inference. Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han. Arxiv 2024

  95. RecurrentGemma: Moving Past Transformers for Efficient Open Language Models. Aleksandar Botev, Soham De, Samuel L. Smith, Anushan Fernando, George{-}Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, L{'{e}}onard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi{`{e}}re, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Cl{'{e}}ment Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas. Arxiv 2024

  96. SnapKV: LLM Knows What You are Looking for Before Generation. Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen. Arxiv 2024

  97. PyramidInfer: Pyramid {KV} Cache Compression for High-throughput {LLM} Inference. Dongjie Yang, Xiaodong Han, Yan Gao, Yao Hu, Shilin Zhang, Hai Zhao. Arxiv 2024

  98. HiRoPE: Length Extrapolation for Code Models Using Hierarchical Position. Kechi Zhang, Ge Li, Huangzhao Zhang, Zhi Jin. Arxiv 2024

  99. DAPE V2: Process Attention Score as Feature Map for Length Extrapolation. Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael K. Ng, Xin Jiang, Zhenguo Li, Yu Li. Arxiv 2024

  100. LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models. Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. Arxiv 2024

  101. Model Tells You What to Discard: Adaptive {KV} Cache Compression for {LLM}s. Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao. Arxiv 2024

  102. LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models. Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See{-}Kiong Ng, Zhiwei Jiang, Bryan Hooi. Arxiv 2024

  103. LongHeads: Multi-Head Attention is Secretly a Long Context Processor. Yi Lu, Xin Zhou, Wei He, Jun Zhao, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang. Arxiv 2024

  104. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. Albert Gu, Tri Dao. Arxiv 2024

  105. DeepSeek-V2: {A} Strong, Economical, and Efficient Mixture-of-Experts Language Model. DeepSeek{-}AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, Hao Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, Tao Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun. Arxiv 2024

  106. Can Mamba Learn How To Learn? {A} Comparative Study on In-Context Learning Tasks. Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos. Arxiv 2024

  107. You Only Cache Once: Decoder-Decoder Architectures for Language Models. Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei. Arxiv 2024

  108. Zamba: A Compact 7B SSM Hybrid Model. Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge. Arxiv 2024

  109. Qwen2.5-1M Technical Report. An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, Junyang Lin, Kai Dang, Kexin Yang, Le Yu, Mei Li, Minmin Sun, Qin Zhu, Rui Men, Tao He, Weijia Xu, Wenbiao Yin, Wenyuan Yu, Xiafei Qiu, Xingzhang Ren, Xinlong Yang, Yong Li, Zhiying Xu, Zipeng Zhang. Arxiv 2025

  110. RazorAttention: Efficient {KV} Cache Compression Through Retrieval Heads. Hanlin Tang, Yang Lin, Jing Lin, Qingsen Han, Danning Ke, Shikuan Hong, Yiwu Yao, Gongyi Wang. Arxiv 2025

  111. LightTransfer: Your Long-Context {LLM} is Secretly a Hybrid Model with Effortless Adaptation. Xuan Zhang, Fengzhuo Zhang, Cunxiao Du, Chao Du, Tianyu Pang, Wei Gao, Min Lin. Arxiv 2025

  112. Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression. Haoyu Wang, Tong Teng, Tianyu Guo, An Xiao, Duyu Tang, Hanting Chen, Yunhe Wang. Arxiv 2025.

  113. Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs. Tao Ji, Bin Guo, Yuanbin Wu, Qipeng Guo, Lixing Shen, Zhan Chen, Xipeng Qiu, Qi Zhang, Tao Gui. Arxiv 2025.         GitHub Repo stars

  114. SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention. Hong Yankun, Li Xing, Zhen Hui-Ling, Yu Xianzhi, Liu Wulong, Yuan Mingxuan. Arxiv 2025.

  115. Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference. Yaohua Tang, Zhicheng Hu, Kun Cheng, Fan Mo, Qiheng Lv, Hua Wang, Zhi Chen. Arxiv 2025.

  116. DBudgetKV: Dynamic Budget in KV Cache Compression for Ensuring Optimal Performance. Xuanfan Ni, Liyan Xu, Chenyang Lyu, Longyue Wang, Mo Yu, Lemao Liu, Fandong Meng, Jie Zhou, Piji Li. Arxiv 2025.

  117. KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse. Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Changi. Arxiv 2025.         GitHub Repo stars

  118. FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference. Bingzhe Zhao, Ke Cheng, Aomufei Yuan, Yuxuan Tian, Ruiguang Zhong, Chengchen Hu, Tong Yang, Lian Yu. Arxiv 2025.

  119. CoKV: Optimizing KV Cache Allocation via Cooperative Game. Qiheng Sun, Hongwei Zhang, Haocheng Xia, Jiayao Zhang, Jinfei Liu, Kui Ren. Arxiv 2025.         GitHub Repo stars

  120. MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference. Zhongwei Wan, Hui Shen, Xin Wang, Che Liu, Zheda Mai, Mi Zhang. NAACL 2025.         GitHub Repo stars

  121. FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference. Xunhao Lai, Jianqiao Lu, Yao Luo, Yiyuan Ma, Xun Zhou. ICLR 2025 Oral.

  122. WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models. Jian Yuan, Ziwei He, Haoli Bai, Jingwen Leng, Bo Jiang. ICASSP 2025.

  123. Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs. Ravi Ghadia, Avinash Kumar, Gaurav Jain, Prashant Nair, Poulami Das. Arxiv 2025.

  124. KVCrush: Key value cache size-reduction using similarity in head-behaviour. Gopi Krishna Jha, Sameh Gobriel, Liubov Talamanova, Alexander Kozlov, Nilesh Jain. Arxiv 2025.

  125. EliteKV: Scalable KV Cache Compression via RoPE Frequency Selection and Joint Low-Rank Projection. Yuhao Zhou, Sirui Song, Boyang Liu, Zhiheng Xi, Senjie Jin, Xiaoran Fan, Zhihao Zhang, Wei Li, Xuanjing Huang. Arxiv 2025.

  126. Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving. Qihui Zhou, Peiqi Yin, Pengfei Zuo, James Cheng. Arxiv 2025.

  127. Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression. Nathan Godey, Alessio Devoto, Yu Zhao, Simone Scardapane, Pasquale Minervini, Éric de la Clergerie, Benoît Sagot. Arxiv 2025.         GitHub Repo stars

  128. TokenButler: Token Importance is Predictable. Yash Akhauri, Ahmed F AbouElhamayed, Yifei Gao, Chi-Chih Chang, Nilesh Jain, Mohamed S. Abdelfattah. Arxiv 2025.         GitHub Repo stars

  129. Slim attention: cut your context memory in half without loss of accuracy -- K-cache is all you need for MHA. Nils Graef, Andrew Wasielewski. Arxiv 2025.         GitHub Repo stars

  130. LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference. Guangtao Wang, Shubhangi Upasani, Chen Wu, Darshan Gandhi, Jonathan Li, Changran Hu, Bo Li, Urmish Thakker. ICLR 2025.

  131. KV-Distill: Nearly Lossless Learnable Context Compression for LLMs. Vivek Chari, Guanghui Qin, Benjamin Van Durme. Arxiv 2025.         GitHub Repo stars

  132. Radar: Fast Long-Context Decoding for Any Transformer. Yongchang Hao, Mengyao Zhai, Hossein Hajimirsadeghi, Sepidehsadat Hosseini, Frederick Tung. ICLR 2025.

  133. PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention. Lida Chen, Dong Xu, Chenxin An, Xintao Wang, Yikai Zhang, Jiangjie Chen, Zujie Liang, Feng Wei, Jiaqing Liang, Yanghua Xiao, Wei Wang. Arxiv 2025.         GitHub Repo stars

  134. Cost-Optimal Grouped-Query Attention for Long-Context LLMs. Yingfa Chen, Yutong Wu, Xu Han, Zhiyuan Liu, Maosong Sun. Arxiv 2025.         GitHub Repo stars

Hybrid Architecture

  1. C4AI Command R7B: A 7 Billion Parameter Multilingual Model. Cohere, Cohere For AI. Arxiv 2024

  2. Jamba: A hybrid transformer-mamba language model. Opher Lieber and Barak Lenz and Hofit Bata and Gal Cohen and Jhonathan Osin and Itay Dalmedigos and Erez Safahi and Shaked Meirom and Yonatan Belinkov and Shai Shalev-Shwartz and Omri Abend and Raz Alon and Tomer Asida and Amir Bergman and Roman Glozman and Michael Gokhman and Avashalom Manevich and Nir Ratner and Noam Rozen and Erez Shwartz and Mor Zusman and Yoav Shoham. Arxiv 2024

  3. Hymba: A hybrid-head architecture for small language models. Xin Dong and Yonggan Fu and Shizhe Diao and Wonmin Byeon and Zijia Chen and Ameya Sunil Mahabaleshwarkar and Shih-Yang Liu and Matthijs Van Keirsbilck and Min-Hung Chen and Yoshi Suhara and Yingyan Lin and Jan Kautz and Pavlo Molchanov. Arxiv 2024

  4. Zamba: A compact 7b ssm hybrid model. Paolo Glorioso and Quentin Anthony and Yury Tokpanov and James Whittington and Jonathan Pilault and Adam Ibrahim and Beren Millidge. Arxiv 2024

  5. Goldfinch: High performance rwkv/transformer hybrid with linear pre-fill and extreme kv-cache compression. Daniel Goldstein and Fares Obeid and Eric Alcaide and Guangyu Song and Eugene Cheah. Arxiv 2024

  6. Gemma 2: Improving open language models at a practical size. Gemma Team and Morgane Riviere and Shreya Pathak and Pier Giuseppe Sessa and Cassidy Hardin and Surya Bhupatiraju and Léonard Hussenot and Thomas Mesnard and Bobak Shahriari and Alexandre Ramé and Johan Ferret and Peter Liu and Pouya Tafti and Abe Friesen and Michelle Casbon and Sabela Ramos and Ravin Kumar and Charline Le Lan and Sammy Jerome and Anton Tsitsulin and Nino Vieillard and Piotr Stanczyk and Sertan Girgin and Nikola Momchev and Matt Hoffman and Shantanu Thakoor and Jean-Bastien Grill and Behnam Neyshabur and Olivier Bachem and Alanna Walton and Aliaksei Severyn and Alicia Parrish and Aliya Ahmad and Allen Hutchison and Alvin Abdagic and Amanda Carl and Amy Shen and Andy Brock and Andy Coenen and Anthony Laforge and Antonia Paterson and Ben Bastian and Bilal Piot and Bo Wu and Brandon Royal and Charlie Chen and Chintu Kumar and Chris Perry and Chris Welty and Christopher A. Choquette-Choo and Danila Sinopalnikov and David Weinberger and Dimple Vijaykumar and Dominika Rogozińska and Dustin Herbison and Elisa Bandy and Emma Wang and Eric Noland and Erica Moreira and Evan Senter and Evgenii Eltyshev and Francesco Visin and Gabriel Rasskin and Gary Wei and Glenn Cameron and Gus Martins and Hadi Hashemi and Hanna Klimczak-Plucińska and Harleen Batra and Harsh Dhand and Ivan Nardini and Jacinda Mein and Jack Zhou and James Svensson and Jeff Stanway and Jetha Chan and Jin Peng Zhou and Joana Carrasqueira and Joana Iljazi and Jocelyn Becker and Joe Fernandez and Joost van Amersfoort and Josh Gordon and Josh Lipschultz and Josh Newlan and Ju-yeong Ji and Kareem Mohamed and Kartikeya Badola and Kat Black and Katie Millican and Keelin McDonell and Kelvin Nguyen and Kiranbir Sodhia and Kish Greene and Lars Lowe Sjoesund and Lauren Usui and Laurent Sifre and Lena Heuermann and Leticia Lago and Lilly McNealus and Livio Baldini Soares and Logan Kilpatrick and Lucas Dixon and Luciano Martins and Machel Reid and Manvinder Singh and Mark Iverson and Martin Görner and Mat Velloso and Mateo Wirth and Matt Davidow and Matt Miller and Matthew Rahtz and Matthew Watson and Meg Risdal and Mehran Kazemi and Michael Moynihan and Ming Zhang and Minsuk Kahng and Minwoo Park and Mofi Rahman and Mohit Khatwani and Natalie Dao and Nenshad Bardoliwalla and Nesh Devanathan and Neta Dumai and Nilay Chauhan and Oscar Wahltinez and Pankil Botarda and Parker Barnes and Paul Barham and Paul Michel and Pengchong Jin and Petko Georgiev and Phil Culliton and Pradeep Kuppala and Ramona Comanescu and Ramona Merhej and Reena Jana and Reza Ardeshir Rokni and Rishabh Agarwal and Ryan Mullins and Samaneh Saadat and Sara Mc Carthy and Sarah Cogan and Sarah Perrin and Sébastien M. R. Arnold and Sebastian Krause and Shengyang Dai and Shruti Garg and Shruti Sheth and Sue Ronstrom and Susan Chan and Timothy Jordan and Ting Yu and Tom Eccles and Tom Hennigan and Tomas Kocisky and Tulsee Doshi and Vihan Jain and Vikas Yadav and Vilobh Meshram and Vishal Dharmadhikari and Warren Barkley and Wei Wei and Wenming Ye and Woohyun Han and Woosuk Kwon and Xiang Xu and Zhe Shen and Zhitao Gong and Zichuan Wei and Victor Cotruta and Phoebe Kirk and Anand Rao and Minh Giang and Ludovic Peran and Tris Warkentin and Eli Collins and Joelle Barral and Zoubin Ghahramani and Raia Hadsell and D. Sculley and Jeanine Banks and Anca Dragan and Slav Petrov and Oriol Vinyals and Jeff Dean and Demis Hassabis and Koray Kavukcuoglu and Clement Farabet and Elena Buchatskaya and Sebastian Borgeaud and Noah Fiedel and Armand Joulin and Kathleen Kenealy and Robert Dadashi and Alek Andreev. Arxiv 2024

  7. Jamba-1.5: Hybrid transformer-mamba models at scale. Jamba Team and Barak Lenz and Alan Arazi and Amir Bergman and Avshalom Manevich and Barak Peleg and Ben Aviram and Chen Almagor and Clara Fridman and Dan Padnos and Daniel Gissin and Daniel Jannai and Dor Muhlgay and Dor Zimberg and Edden M Gerber and Elad Dolev and Eran Krakovsky and Erez Safahi and Erez Schwartz and Gal Cohen and Gal Shachaf and Haim Rozenblum and Hofit Bata and Ido Blass and Inbal Magar and Itay Dalmedigos and Jhonathan Osin and Julie Fadlon and Maria Rozman and Matan Danos and Michael Gokhman and Mor Zusman and Naama Gidron and Nir Ratner and Noam Gat and Noam Rozen and Oded Fried and Ohad Leshno and Omer Antverg and Omri Abend and Opher Lieber and Or Dagan and Orit Cohavi and Raz Alon and Ro'i Belson and Roi Cohen and Rom Gilad and Roman Glozman and Shahar Lev and Shaked Meirom and Tal Delbari and Tal Ness and Tomer Asida and Tom Ben Gal and Tom Braude and Uriya Pumerantz and Yehoshua Cohen and Yonatan Belinkov and Yuval Globerson and Yuval Peleg Levy and Yoav Shoham. Arxiv 2024

  8. RecurrentGemma: Moving Past Transformers for Efficient Open Language Models. Aleksandar Botev and Soham De and Samuel L Smith and Anushan Fernando and George-Cristian Muraru and Ruba Haroun and Leonard Berrada and Razvan Pascanu and Pier Giuseppe Sessa and Robert Dadashi and Léonard Hussenot and Johan Ferret and Sertan Girgin and Olivier Bachem and Alek Andreev and Kathleen Kenealy and Thomas Mesnard and Cassidy Hardin and Surya Bhupatiraju and Shreya Pathak and Laurent Sifre and Morgane Rivière and Mihir Sanjay Kale and Juliette Love and Pouya Tafti and Armand Joulin and Noah Fiedel and Evan Senter and Yutian Chen and Srivatsan Srinivasan and Guillaume Desjardins and David Budden and Arnaud Doucet and Sharad Vikram and Adam Paszke and Trevor Gale and Sebastian Borgeaud and Charlie Chen and Andy Brock and Antonia Paterson and Jenny Brennan and Meg Risdal and Raj Gundluru and Nesh Devanathan and Paul Mooney and Nilay Chauhan and Phil Culliton and Luiz Gustavo Martins and Elisa Bandy and David Huntsperger and Glenn Cameron and Arthur Zucker and Tris Warkentin and Ludovic Peran and Minh Giang and Zoubin Ghahramani and Clément Farabet and Koray Kavukcuoglu and Demis Hassabis and Raia Hadsell and Yee Whye Teh and Nando de Frietas. Arxiv 2024

  9. The Zamba2 Suite: Technical Report. Paolo Glorioso and Quentin Anthony and Yury Tokpanov and Anna Golubeva and Vasudev Shyam and James Whittington and Jonathan Pilault and Beren Millidge. Arxiv 2024

  10. You only cache once: Decoder-decoder architectures for language models. Yutao Sun and Li Dong and Yi Zhu and Shaohan Huang and Wenhui Wang and Shuming Ma and Quanlu Zhang and Jianyong Wang and Furu Wei. Arxiv 2024

Workflow Design

Prompt Compression

  1. Prompt Compression for Large Language Models: A Survey. Zongqian Li, Yinhong Liu, Yixuan Su, Nigel Collier. Arxiv 2024.
Hard Prompt Compression
  1. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models. Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu. Arxiv 2023. GitHub Repo stars

  2. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression. Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu. Arxiv 2023. GitHub Repo stars

  3. LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression. Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang. Arxiv 2024. GitHub Repo stars

  4. Compressing Context to Enhance Inference Efficiency of Large Language Models. Yucheng Li, Bo Dong, Chenghua Lin, Frank Guerin. Arxiv 2023. GitHub Repo stars

  5. TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning. Shivam Shandilya, Menglin Xia, Supriyo Ghosh, Huiqiang Jiang, Jue Zhang, Qianhui Wu, Victor Rühle. Arxiv 2024.

  6. Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference. Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke. Arxiv 2024. GitHub Repo stars

  7. AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models. Qianchi Zhang, Hainan Zhang, Liang Pang, Hongwei Zheng, Zhiming Zheng. Arxiv 2024.

  8. Learning to Compress Prompt in Natural Language Formats. Yu-Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu. Arxiv 2024.

  9. {TCRA}-{LLM}: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction. Junyi Liu, Liangzhi Li, Tong Xiang, Bowen Wang, Yiming Qian. Arxiv 2023

  10. Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation. Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen. Arxiv 2024

  11. Discrete Prompt Compression With Reinforcement Learning. Hoyoun Jung, Kyung-Joong Kim. Arxiv 2024

  12. CompAct: Compressing Retrieved Documents Actively for Question Answering. Chanwoong Yoon, Taewhoo Lee, Hyeon Hwang, Minbyul Jeong, Jaewoo Kang. Arxiv 2024

Soft Prompt Compression
  1. Adapting Language Models to Compress Contexts. Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen. Arxiv 2023. GitHub Repo stars

  2. xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token. Xin Cheng, Xun Wang, Xingxing Zhang, Tao Ge, Si-Qing Chen, Furu Wei, Huishuai Zhang, Dongyan Zhao. Arxiv 2024. GitHub Repo stars

  3. In-context Autoencoder for Context Compression in a Large Language Model. Tao Ge, Hu Jing, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei. ICLR 2024. GitHub Repo stars

  4. The Power of Scale for Parameter-Efficient Prompt Tuning. Brian Lester, Rami Al-Rfou, Noah Constant. Arxiv 2021

  5. Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models. David Wingate, Mohammad Shoeybi, Taylor Sorensen. Arxiv 2022

  6. Learning to Compress Prompts with Gist Tokens. Jesse Mu, Xiang Lisa Li, Noah Goodman. Arxiv 2024

  7. Unifying Demonstration Selection and Compression for In-Context Learning. Jun Gao, Ziqiang Cao, Wenjie Li. Arxiv 2024

  8. Long Context Compression with Activation Beacon. Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou. Arxiv 2024

  9. 500xCompressor: Generalized Prompt Compression for Large Language Models. Zongqian Li, Yixuan Su, Nigel Collier. Arxiv 2024

  10. DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models. Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari, Yong Zhang. Arxiv 2025.

  11. EFPC: Towards Efficient and Flexible Prompt Compression. Yun-Hao Cao, Yangsong Wang, Shuzheng Hao, Zhenxing Li, Chengjun Zhan, Sichao Liu, Yi-Qi Hu. Arxiv 2025.

Memory-Based

  1. Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for Continual System Improvement. Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Clark. EMNLP 2022

  2. Augmenting Language Models with Long-Term Memory. Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei. NeurIPS 2023

  3. {MEMORYLLM:} Towards Self-Updatable Large Language Models. Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, Jingbo Shang, Julian J. McAuley. ICML 2024

  4. MemoryBank: Enhancing Large Language Models with Long-Term Memory. Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, Yanlin Wang. Arxiv 2023.
            GitHub Repo stars

RAG-Based

  1. {BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. ACL 2019

  2. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. Gautier Izacard, Edouard Grave. ACL 2021

  3. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian{-}Guang Lou, Weizhu Chen. EMNLP 2023

  4. Query Rewriting in Retrieval-Augmented Large Language Models. Xinbei Ma, Yeyun Gong, Pengcheng He, hai zhao, Nan Duan. EMNLP 2023

  5. {REPLUG}: Retrieval-Augmented Black-Box Language Models. Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Richard James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih. ACL 2024

  6. {BGE} M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu. Arxiv 2024

  7. Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference. Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli. Arxiv 2024

  8. Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning. Giulio Corallo, Orion Weller, Fabio Petroni, Paolo Papotti. Arxiv 2025.

  9. Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention. Emily Xiao, Chin-Jou Li, Yilin Zhang, Graham Neubig, Amanda Bertsch. Arxiv 2025.         GitHub Repo stars

Agent-Based

  1. Re3: Generating Longer Stories With Recursive Reprompting and Revision. Kevin Yang, Yuandong Tian, Nanyun Peng, Dan Klein. EMNLP 2022

  2. Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading. Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz. Arxiv 2023.

  3. PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents. Simeng Sun, Yang Liu, Shuohang Wang, Dan Iter, Chenguang Zhu, Mohit Iyyer. EACL 2024.         GitHub Repo stars

  4. Learning to Reason and Memorize with Self-Notes. Jack Lanchantin, Shubham Toshniwal, Jason Weston, arthur szlam, Sainbayar Sukhbaatar. NeurIPS 2023

  5. GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models. Shilong Li, Yancheng He, Hangyu Guo, Xingyuan Bu, Ge Bai, Jie Liu, Jiaheng Liu, Xingwei Qu, Yangguang Li, Wanli Ouyang, Wenbo Su, Bo Zheng. Arxiv 2024.

  6. A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts. Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer. Arxiv 2024.

  7. RoleAgent: Building, Interacting, and Benchmarking High-quality Role-Playing Agents from Scripts. Jiaheng Liu, Zehao Ni, Haoran Que, Tao Sun, Noah Wang, Jian Yang, JiakaiWang, Hongcheng Guo, Z.Y. Peng, Ge Zhang, Jiayi Tian, Xingyuan Bu, Ke Xu, Wenge Rong, Junran Peng, Zhaoxiang Zhang. NeurIPS 2024

  8. Chain of Agents: Large Language Models Collaborating on Long-Context Tasks. Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Ö. Arik. Arxiv 2024.

  9. LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration. Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang. Arxiv 2024.

Evaluation

Long-Context Comprehension

  1. Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks. Chonghua Wang, Haodong Duan, Songyang Zhang, Dahua Lin, Kai Chen. Arxiv 2024. GitHub Repo stars

  2. BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack. Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev. Arxiv 2024. GitHub Repo stars

  3. DENIAHL: In-Context Features Influence LLM Needle-In-A-Haystack Abilities. Hui Dai, Dan Pechi, Xinyi Yang, Garvit Banga, Raghav Mantri. Arxiv 2024. GitHub Repo stars

  4. Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data. Seiji Maekawa, Hayate Iso, Nikita Bhutani. Arxiv 2024. GitHub Repo stars

  5. LongIns: A Challenging Long-context Instruction-based Exam for LLMs. Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang. Arxiv 2024.

  6. Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs. Runchu Tian, Yanghao Li, Yuepeng Fu, Siyang Deng, Qinyu Luo, Cheng Qian, Shuo Wang, Xin Cong, Zhong Zhang, Yesai Wu, Yankai Lin, Huadong Wang, Xiaojiang Liu. Arxiv 2024. GitHub Repo stars

  7. LIFBench: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios. Xiaodong Wu, Minhao Wang, Yichen Liu, Xiaoming Shi, He Yan, Xiangju Lu, Junmin Zhu, Wei Zhang. Arxiv 2024.

  8. Long Range Arena: A Benchmark for Efficient Transformers. Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler. Arxiv 2020

  9. LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion. Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen. Arxiv 2025.

  10. Evaluating Multilingual Long-Context Models for Retrieval and Reasoning. Agrawal, Ameeta and Dang, Andy and Nezhad, Sina Bagheri and Pokharel, Rhitabrat and Scheinberg, Russell. ACL 2024.

  11. M4le: A multi-ability multi-range multi-task multi-domain long-context evaluation benchmark for large language models. Kwan, Wai-Chung and Zeng, Xingshan and Wang, Yufei and Sun, Yusen and Li, Liangyou and Shang, Lifeng and Liu, Qun and Wong, Kam-Fai. ACL 2024.

  12. Michelangelo: Long context evaluations beyond haystacks via latent structure queries. Vodrahalli, Kiran and Ontanon, Santiago and Tripuraneni, Nilesh and Xu, Kelvin and Jain, Sanil and Shivanna, Rakesh and Hui, Jeffrey and Dikkala, Nishanth and Kazemi, Mehran and Fatemi, Bahare and others. Arxiv 2024.

  13. Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models. Amey Hengle, Prasoon Bajpai, Soham Dan, Tanmoy Chakraborty. Arxiv 2024. GitHub Repo stars

  14. Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?. Jonathan Roberts, Kai Han, Samuel Albanie. Arxiv 2024. GitHub Repo stars

  15. NoLiMa: Long-Context Evaluation Beyond Literal Matching. Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, Seunghyun Yoon, Hinrich Schütze. Arxiv 2025.

  16. RULER: What’s the Real Context Size of Your Long-Context Language Models?. Hsieh, Cheng-Ping and Sun, Simeng and Kriman, Samuel and Acharya, Shantanu and Rekesh, Dima and Jia, Fei and Ginsburg, Boris. COLM 2024.

  17. S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Model. Lei, Fangyu and Liu, Qian and Huang, Yiming and He, Shizhu and Zhao, Jun and Liu, Kang. NAACL 2024.

  18. Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems. Philippe Laban, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu. Arxiv 2024. GitHub Repo stars

  19. LongHealth: A Question Answering Benchmark with Long Clinical Documents. Lisa Adams, Felix Busch, Tianyu Han, Jean-Baptiste Excoffier, Matthieu Ortala, Alexander Löser, Hugo JWL. Aerts, Jakob Nikolas Kather, Daniel Truhn, Keno Bressem. Arxiv 2024.

  20. Mathhay: An automated benchmark for long-context mathematical reasoning in llms. Wang, Lei and Dong, Shan and Xu, Yuhui and Dong, Hanze and Wang, Yalu and Saha, Amrita and Lim, Ee-Peng and Xiong, Caiming and Sahoo, Doyen. Arxiv 2024.

  21. RepoQA: Evaluating Long Context Code Understanding. Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan Katherine Wang, Jun Yang, Lingming Zhang. Arxiv 2024. GitHub Repo stars         Static Badge

  22. Bamboo: A comprehensive benchmark for evaluating long text modeling capacities of large language models. Dong, Zican and Tang, Tianyi and Li, Junyi and Zhao, Wayne Xin and Wen, Ji-Rong. ACL 2024.

  23. Clongeval: A chinese benchmark for evaluating long-context large language models. Qiu, Zexuan and Li, Jingjing and Huang, Shijue and Jiao, Xiaoqi and Zhong, Wanjun and King, Irwin. EMNLP 2024.

  24. Detectiveqa: Evaluating long-context reasoning on detective novels. Xu, Zhe and Ye, Jiasheng and Liu, Xiangyang and Sun, Tianxiang and Liu, Xiaoran and Guo, Qipeng and Li, Linlin and Liu, Qun and Huang, Xuanjing and Qiu, Xipeng. Arxiv 2024.

  25. ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage. Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang. Arxiv 2024. GitHub Repo stars

  26. Extending long context evaluation beyond 100k tokens. Zhang, Xinrong and Chen, Yingfa and Hu, Shengding and Xu, Zihang and Chen, Junhao and Hao, Moo and Han, Xu and Thai, Zhen and Wang, Shuo and Liu, Zhiyuan and others. ACL 2024.

  27. Helmet: How to evaluate long-context language models effectively and thoroughly. Yen, Howard and Gao, Tianyu and Hou, Minmin and Ding, Ke and Fleischer, Daniel and Izsak, Peter and Wasserblat, Moshe and Chen, Danqi. ICLR 2025.

  28. L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?. Zecheng Tang and Keyan Zhou and Juntao Li and Baibei Ji and Jianye Hou and Min Zhang. Arxiv 2024.

  29. L-eval: Instituting standardized evaluation for long context language models. An, Chenxin and Gong, Shansan and Zhong, Ming and Zhao, Xingjian and Li, Mukai and Zhang, Jun and Kong, Lingpeng and Qiu, Xipeng. ACL 2024.

  30. Long Input Benchmark for Russian Analysis. Igor Churin, Murat Apishev, Maria Tikhonova, Denis Shevelev, Aydar Bulatov, Yuri Kuratov, Sergej Averkiev, Alena Fenogenova. Arxiv 2024.

  31. Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?. Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu. Arxiv 2024. GitHub Repo stars

  32. LONG2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall. Qi, Zehan and Xu, Rongwu and Guo, Zhijiang and Wang, Cunxiang and Zhang, Hao and Xu, Wei. ACL 2024.

  33. Longbench: A bilingual, multitask benchmark for long context understanding. Bai, Yushi and Lv, Xin and Zhang, Jiajie and Lyu, Hongchang and Tang, Jiankai and Huang, Zhidian and Du, Zhengxiao and Liu, Xiao and Zeng, Aohan and Hou, Lei and others. ACL 2024.

  34. LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks. Yushi Bai, Shangqing Tu, Jiajie Zhang, Hao Peng, Xiaozhi Wang, Xin Lv, Shulin Cao, Jiazheng Xu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li. Arxiv 2024. GitHub Repo stars

  35. Longcite: Enabling llms to generate fine-grained citations in long-context qa. Zhang, Jiajie and Bai, Yushi and Lv, Xin and Gu, Wanjun and Liu, Danqing and Zou, Minhao and Cao, Shulin and Hou, Lei and Dong, Yuxiao and Feng, Ling and others. Arxiv 2024.

  36. Long-context llms struggle with long in-context learning. Li, Tianle and Zhang, Ge and Do, Quy Duc and Yue, Xiang and Chen, Wenhu. TMLR.

  37. LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory. Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, Dong Yu. Arxiv 2024. GitHub Repo stars

  38. Leave no document behind: Benchmarking long-context llms with extended multi-doc qa. Wang, Minzheng and Chen, Longze and Cheng, Fu and Liao, Shengyi and Zhang, Xinghua and Wu, Bingli and Yu, Haiyang and Xu, Nan and Zhang, Lei and Luo, Run and others. EMNLP 2024.

  39. LooGLE: Can Long-Context Language Models Understand Long Contexts?. Li, Jiaqi and Wang, Mengmeng and Zheng, Zilong and Zhang, Muhan. ACL 2024.

  40. LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K. Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang. Arxiv 2024. GitHub Repo stars

  41. Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation. Kaijian Zou, Muhammad Khalifa, Lu Wang. Arxiv 2024. GitHub Repo stars

  42. Marathon: A race through the realm of long context with large language models. Zhang, Lei and Li, Yunshui and Liu, Ziqiang and Liu, Junhao and Chen, Longze and Luo, Run and Yang, Min and others. ACL 2024.

  43. One Thousand and One Pairs: A "novel" challenge for long-context language models. Marzena Karpinska, Katherine Thai, Kyle Lo, Tanya Goyal, Mohit Iyyer. Arxiv 2024. GitHub Repo stars         Static Badge

  44. Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding. Zhihan Zhang, Yixin Cao, Chenchen Ye, Yunshan Ma, Lizi Liao, Tat-Seng Chua. Arxiv 2024.

  45. Zeroscrolls: A zero-shot benchmark for long text understanding. Shaham, Uri and Ivgi, Maor and Efrat, Avia and Berant, Jonathan and Levy, Omer. EMNLP 2023.

  46. DocFinQA: {A} Long-Context Financial Reasoning Dataset. Varshini Reddy, Rik Koncel{-}Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner. ACL 2024

  47. FinTextQA: A Dataset for Long-form Financial Question Answering. Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang. Arxiv 2024.

  48. Long Code Arena: a Set of Benchmarks for Long-Context Code Models. Bogomolov, Egor and Eliseeva, Aleksandra and Galimzyanov, Timur and Glukhov, Evgeniy and Shapkin, Anton and Tigina, Maria and Golubev, Yaroslav and Kovrigin, Alexander and van Deursen, Arie and Izadi, Maliheh and others. Arxiv 2024.

  49. MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens. Yongqi Fan, Hongli Sun, Kui Xue, Xiaofan Zhang, Shaoting Zhang, Tong Ruan. Arxiv 2024. GitHub Repo stars

  50. Examining Long-Context Large Language Models for Environmental Review Document Comprehension. Phan, Hung and Acharya, Anurag and Meyur, Rounak and Chaturvedi, Sarthak and Sharma, Shivam and Parker, Mike and Nally, Dan and Jannesari, Ali and Pazdernik, Karl and Halappanavar, Mahantesh and others. Arxiv 2024.

  51. Train short, test long: Attention with linear biases enables input length extrapolation. Ofir Press and Noah A. Smith and Mike Lewis. Arxiv 2022

  52. PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training. Dawei Zhu,Nan Yang,Liang Wang,Yifan Song,Wenhao Wu,Furu Wei,Sujian Li. Arxiv 2023. GitHub Repo stars

  53. Landmark Attention: Random-Access Infinite Context Length for Transformers. Amirkeivan Mohtashami, Martin Jaggi Arxiv 2023. GitHub Repo stars

  54. NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?. Mo Li, Songyang Zhang, Yunxin Liu, Kai Chen. Arxiv 2024. GitHub Repo stars

  55. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. Fabbri, Alexander Richard and Li, Irene and She, Tianwei and Li, Suyi and Radev, Dragomir. ACL 2019.

  56. Ms marco: A human-generated machine reading comprehension dataset. Nguyen, Tri and Rosenberg, Mir and Song, Xia and Gao, Jianfeng and Tiwary, Saurabh and Majumder, Rangan and Deng, Li. Arxiv 2016.

  57. U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack. Yunfan Gao, Yun Xiong, Wenlong Wu, Zijing Huang, Bohan Li, Haofen WangYunfan Gao, Yun Xiong, Wenlong Wu, Zijing Huang, Bohan Li, Haofen Wang. Arxiv 2025.         GitHub Repo stars

  58. L2M: Mutual Information Scaling Law for Long-Context Language Modeling. Zhuo Chen, Oriol Mayné i Comas, Zhuotao Jin, Di Luo, Marin Soljačić. Arxiv 2025.

Long-Form Generation

  1. ELI5: Long form question answering. Fan, Angela and Jernite, Yacine and Perez, Ethan and Grangier, David and Weston, Jason and Auli, Michael. Arxiv 2019.

  2. Ms marco: A human-generated machine reading comprehension dataset. Nguyen, Tri and Rosenberg, Mir and Song, Xia and Gao, Jianfeng and Tiwary, Saurabh and Majumder, Rangan and Deng, Li. Arxiv 2016.

  3. Expertqa: Expert-curated questions and attributed answers. Malaviya, Chaitanya and Lee, Subin and Chen, Sihao and Sieber, Elizabeth and Yatskar, Mark and Roth, Dan. NAACL 2024.

  4. Proxyqa: An alternative framework for evaluating long-form text generation with large language models. Tan, Haochen and Guo, Zhijiang and Shi, Zhan and Xu, Lu and Liu, Zhili and Feng, Yunlong and Li, Xiaoguang and Wang, Yasheng and Shang, Lifeng and Liu, Qun and others. ACL 2024

  5. LongGenBench: Long-context Generation Benchmark. Xiang Liu, Peijie Dong, Xuming Hu, Xiaowen Chu. EMNLP 2024.

  6. ASQA: Factoid questions meet long-form answers. Stelmakh, Ivan and Luan, Yi and Dhingra, Bhuwan and Chang, Ming-Wei. EMNLP 2022.

  7. Qasa: advanced question answering on scientific articles. Lee, Yoonjoo and Lee, Kyungjae and Park, Sunghyun and Hwang, Dasol and Kim, Jaehyeon and Lee, Hong-in and Lee, Moontae. PMLR 2023.

  8. CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems. Sara Rosenthal, Avirup Sil, Radu Florian, Salim Roukos. Arxiv 2024. GitHub Repo stars

  9. LONG2RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall. Qi, Zehan and Xu, Rongwu and Guo, Zhijiang and Wang, Cunxiang and Zhang, Hao and Xu, Wei. ACL 2024.

  10. A Benchmark for Long-Form Medical Question Answering. Pedram Hosseini, Jessica M. Sin, Bing Ren, Bryceton G. Thomas, Elnaz Nouri, Ali Farahanchi, Saeed Hassanpour. NeurIPS 2024. GitHub Repo stars

  11. OLAPH: Improving Factuality in Biomedical Long-form Question Answering. Minbyul Jeong, Hyeon Hwang, Chanwoong Yoon, Taewhoo Lee, Jaewoo Kang. Arxiv 2024. GitHub Repo stars

  12. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang Wei and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh. EMNLP 2023.

  13. Long-form factuality in large language models. Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le. Arxiv 2024. GitHub Repo stars

  14. Large Language Models Still Exhibit Bias in Long Text. Wonje Jeung, Dongjae Jeon, Ashkan Yousefpour, Jonghyun Choi. Arxiv 2024.

  15. Aquamuse: Automatically generating datasets for query-based multi-document summarization. Kulkarni, Sayali and Chammas, Sheide and Zhu, Wan and Sha, Fei and Ie, Eugene. Arxiv 2020.

  16. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. Fabbri, Alexander Richard and Li, Irene and She, Tianwei and Li, Suyi and Radev, Dragomir. ACL 2019.

  17. LCFO: Long Context and Long Form Output Dataset and Benchmarking. Marta R. Costa-jussà, Pierre Andrews, Mariano Coria Meglioli, Joy Chen, Joe Chuang, David Dale, Christophe Ropers, Alexandre Mourachko, Eduardo Sánchez, Holger Schwenk, Tuan Tran, Arina Turkatenko, Carleigh Wood. Arxiv 2024.

  18. LongForm: Effective Instruction Tuning with Reverse Instructions. Koksal, Abdullatif and Schick, Timo and Korhonen, Anna and Schutze, Hinrich. EMNLP 2024.

  19. Suri: Multi-constraint Instruction Following for Long-form Text Generation. Chau Minh Pham, Simeng Sun, Mohit Iyyer. EMNLP 2024.         GitHub Repo stars

  20. LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs. Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li. Arxiv 2024. GitHub Repo stars

  21. Language Models can Self-Lengthen to Generate Long Texts. Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, Junyang Lin. Arxiv 2024. GitHub Repo stars

  22. LOT: A story-centric benchmark for evaluating Chinese long text understanding and generation. Guan, Jian and Feng, Zhuoer and Chen, Yamei and He, Ruilin and Mao, Xiaoxi and Fan, Changjie and Huang, Minlie. TACL 2022.

  23. Longlamp: A benchmark for personalized long-form text generation. Kumar, Ishita and Viswanathan, Snigdha and Yerra, Sushrita and Salemi, Alireza and Rossi, Ryan A and Dernoncourt, Franck and Deilamsalehy, Hanieh and Chen, Xiang and Zhang, Ruiyi and Agarwal, Shubham and others. Arxiv 2o24.

  24. DOLOMITES: Domain-Specific Long-Form Methodical Tasks. Chaitanya Malaviya, Priyanka Agrawal, Kuzman Ganchev, Pranesh Srinivasan, Fantine Huot, Jonathan Berant, Mark Yatskar, Dipanjan Das, Mirella Lapata, Chris Alberti. Arxiv 2024.

  25. LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs. Yuhao Wu, Ming Shan Hee, Zhiqing Hu, Roy Ka-Wei Lee. Arxiv 2024. GitHub Repo stars

  26. LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation. Xi Ye, Fangcong Yin, Yinghui He, Joie Zhang, Howard Yen, Tianyu Gao, Greg Durrett, Danqi Chen. Arxiv 2025. GitHub Repo stars         Static Badge

  27. Hellobench: Evaluating long text generation capabilities of large language models. Que, Haoran and Duan, Feiyu and He, Liqun and Mou, Yutao and Zhou, Wangchunshu and Liu, Jiaheng and Rong, Wenge and Wang, Zekun Moore and Yang, Jian and Zhang, Ge and others. Arxiv 2024.         GitHub Repo stars

  28. The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input. Alon Jacovi, Andrew Wang, Chris Alberti, Connie Tao, Jon Lipovetz, Kate Olszewska, Lukas Haas, Michelle Liu, Nate Keating, Adam Bloniarz, Carl Saroufim, Corey Fry, Dror Marcus, Doron Kukliansky, Gaurav Singh Tomar, James Swirhun, Jinwei Xing, Lily Wang, Madhu Gurumurthy, Michael Aaron, Moran Ambar, Rachana Fellinger, Rui Wang, Zizhao Zhang, Sasha Goldshtein, Dipanjan Das. Arxiv 2025. Static Badge

  29. RAPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery. Hongchao Gu, Dexun Li, Kuicai Dong, Hao Zhang, Hang Lv, Hao Wang, Defu Lian, Yong Liu, Enhong Chen. Arxiv 2025.

  30. DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation. Ming Wang, Fang Wang, Minghao Hu, Li He, Haiyang Wang, Jun Zhang, Tianwei Yan, Li Li, Zhunchen Luo, Wei Luo, Xiaoying Bai, Guotong Geng. Arxiv 2025.         GitHub Repo stars

  31. Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation. Junhao Zhang, Richong Zhang, Fanshuang Kong, Ziyang Miao, Yanhan Ye, Yaowei Zheng. Arxiv 2025.         GitHub Repo stars

  32. Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models. Ruibin Xiong, Yimeng Chen, Dmitrii Khizbullin, Jürgen Schmidhuber. Arxiv 2025.         GitHub Repo stars

AI Infrastructure

Training

  1. Mixed precision training. Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, others. Arxiv 2017

  2. Megatron-lm: Training multi-billion parameter language models using model parallelism. Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro. Arxiv 2019

  3. Efficient sequence packing without cross-contamination: Accelerating large language models without impacting performance. Mario Michael Krell, Matej Kosec, Sergio P Perez, Andrew Fitzgibbon. Arxiv 2021

  4. Fptq: Fine-grained post-training quantization for large language models. Qingyuan Li, Yifan Zhang, Liang Li, Peng Yao, Bo Zhang, Xiangxiang Chu, Yerui Sun, Li Du, Yuchen Xie. Arxiv 2023

  5. Striped attention: Faster ring attention for causal transformers. William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian Jin, Zhiye Song, Jonathan Ragan-Kelley. Arxiv 2023

  6. Pytorch fsdp: experiences on scaling fully sharded data parallel. Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, others. Arxiv 2023

  7. Deepspeed ulysses: System optimizations for enabling training of extreme long sequence transformer models. Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He. Arxiv 2023

  8. Ring attention with blockwise transformers for near-infinite context. Hao Liu, Matei Zaharia, Pieter Abbeel. Arxiv 2023

  9. Fp8-lm: Training fp8 large language models. Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, others. Arxiv 2023

  10. Structured packing in llm training improves long context utilization. Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur, Yu Zhao, Henryk Michalewski, {\L}ukasz Kuci{'n}ski, Piotr Mi{\l}o{'s}. Arxiv 2023

  11. Understanding llms: A comprehensive overview from training to inference. Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, others. Arxiv 2024

  12. DeepSeek-V3 Technical Report. DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J.L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jiawei Wang, Jin Chen, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, Junxiao Song, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Litong Wang, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qiancheng Wang, Qihao Zhu, Qinyu Chen, Qiushi Du, R.J. Chen, R.L. Jin, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, Runxin Xu, Ruoyu Zhang, Ruyi Chen, S.S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Shuting Pan, T. Wang, Tao Yun, Tian Pei, Tianyu Sun, W.L. Xiao, Wangding Zeng et al. (100 additional authors not shown). Arxiv 2025.         GitHub Repo stars

  13. Longalign: A recipe for long context alignment of large language models. Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li. Arxiv 2024

  14. Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models. Longze Chen, Ziqiang Liu, Wanwei He, Yunshui Li, Run Luo, Min Yang. Arxiv 2024.         GitHub Repo stars

  15. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. Tri Dao. Arxiv 2023.         GitHub Repo stars

  16. Longskywork: A training recipe for efficiently extending context length in large language models. Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, others. Arxiv 2024

  17. DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning. Keer Lu, Xiaonan Nie, Zheng Liang, Da Pan, Shusen Zhang, Keshi Zhao, Weipeng Chen, Zenan Zhou, Guosheng Dong, Bin Cui, others. Arxiv 2024

  18. How to Train Long-Context Language Models (Effectively). Tianyu Gao, Alexander Wettig, Howard Yen, Danqi Chen. Arxiv 2024.         GitHub Repo stars

  19. SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models. Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Xianglong Liu, Luca Benini, Michele Magno, Xiaojuan Qi. Arxiv 2024

  20. Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum. Hadi Pouransari, Chun-Liang Li, Jen-Hao Rick Chang, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Oncel Tuzel. Arxiv 2024

  21. Enhancing training efficiency using packing with flash attention. Achintya Kundu, Rhui Dih Lee, Laura Wynter, Raghu Kiran Ganti, Mayank Mishra. Arxiv 2024

  22. FLUX: fast software-based communication overlap on gpus through kernel fusion. Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Chengji Yao, Ziheng Jiang, others. Arxiv 2024

  23. Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies. Felix Brakel, Uraz Odyurt, Ana-Lucia Varbanescu. Arxiv 2024

  24. Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences. Haoyang Li, Fangcheng Fu, Sheng Lin, Hao Ge, Xuanyu Wang, Jiawen Niu, Jie Jiang, Bin Cui. Arxiv 2024

  25. Collage: Light-Weight Low-Precision Strategy for LLM Training. Tao Yu, Gaurav Gupta, Karthick Gopalswamy, Amith Mamidala, Hao Zhou, Jeffrey Huynh, Youngsuk Park, Ron Diamant, Anoop Deoras, Luke Huan. Arxiv 2024

  26. COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training. Haocheng Xi, Han Cai, Ligeng Zhu, Yao Lu, Kurt Keutzer, Jianfei Chen, Song Han. Arxiv 2024

  27. When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training. Haonan Wang, Qian Liu, Chao Du, Tongyao Zhu, Cunxiao Du, Kenji Kawaguchi, Tianyu Pang. Arxiv 2024.         GitHub Repo stars

  28. Efficient training of large language models on distributed infrastructures: a survey. Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, others. Arxiv 2024

  29. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention. Jingyang Yuan, Huazuo Gao, Damai Dai, Junyu Luo, Liang Zhao, Zhengyan Zhang, Zhenda Xie, Y. X. Wei, Lean Wang, Zhiping Xiao, Yuqing Wang, Chong Ruan, Ming Zhang, Wenfeng Liang, Wangding Zeng. Arxiv 2025.

  30. Qwen2. 5-1M Technical Report. An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, others. Arxiv 2025

  31. MoBA: Mixture of Block Attention for Long-Context LLMs. Enzhe Lu, Zhejun Jiang, Jingyuan Liu, Yulun Du, Tao Jiang, Chao Hong, Shaowei Liu, Weiran He, Enming Yuan, Yuzhi Wang, Zhiqi Huang, Huan Yuan, Suting Xu, Xinran Xu, Guokun Lai, Yanru Chen, Huabin Zheng, Junjie Yan, Jianlin Su, Yuxin Wu, Neo Y. Zhang, Zhilin Yang, Xinyu Zhou, Mingxing Zhang, Jiezhong Qiu. Arxiv 2025.         GitHub Repo stars

Inference

  1. Speed: Speculative pipelined execution for efficient decoding. Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Sophia Shao. Arxiv 2023

  2. vtensor: Flexible virtual tensor management for efficient llm serving. Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, others. Arxiv 2024

  3. Fastdecode: High-throughput gpu-efficient llm serving using heterogeneous pipelines. Jiaao He, Jidong Zhai. Arxiv 2024

  4. KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head. Isaac Rehg. Arxiv 2024.         GitHub Repo stars

  5. Magicdec: Breaking the latency-throughput tradeoff for long context generation with speculative decoding. Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Beidi Chen. Arxiv 2024

  6. QAQ: Quality Adaptive Quantization for LLM KV Cache. Shichen Dong, Wen Cheng, Jiayu Qin, Wei Wang. Arxiv 2024

  7. Wkvquant: Quantizing weight and key/value cache for large language models gains more. Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, Liqiang Nie. Arxiv 2024

  8. Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression. Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen. Arxiv 2024.

  9. ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition. Lu Ye, Ze Tao, Yong Huang, Yang Li. Arxiv 2024.

  10. Memserve: Context caching for disaggregated llm serving with elastic memory pool. Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, others. Arxiv 2024

  11. Efficient llm inference with i/o-aware partial kv cache recomputation. Chaoyi Jiang, Lei Gao, Hossein Entezari Zarch, Murali Annavaram. Arxiv 2024

  12. GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM. Hao Kang, Qingru Zhang, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tushar Krishna, Tuo Zhao. Arxiv 2024

  13. Scbench: A kv cache-centric analysis of long-context methods. Yucheng Li, Huiqiang Jiang, Qianhui Wu, Xufang Luo, Surin Ahn, Chengruidong Zhang, Amir H Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, others. Arxiv 2024

  14. Mooncake: A kvcache-centric disaggregated architecture for llm serving. Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, Xinran Xu. Arxiv 2024

  15. LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification. Penghui Yang, Cunxiao Du, Fengzhuo Zhang, Haonan Wang, Tianyu Pang, Chao Du, Bo An. Arxiv 2025.         GitHub Repo stars

  16. Long-Context Inference with Retrieval-Augmented Speculative Decoding. Guanzheng Chen, Qilong Feng, Jinjie Ni, Xin Li, Michael Qizhe Shieh. Arxiv 2025.         GitHub Repo stars

Interpretability

Performance Analysis

  1. Longrope: Extending llm context window beyond 2 million tokens Ding, Yiran and Zhang, Li Lyna and Zhang, Chengruidong and Xu, Yuanyuan and Shang, Ning and Xu, Jiahang and Yang, Fan and Yang, Mao. ICML 2024. GitHub Repo stars
  2. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Team, Gemini and Georgiev, Petko and Lei, Ving Ian and Burnell, Ryan and Bai, Libin and Gulati, Anmol and Tanzer, Garrett and Vincent, Damien and Pan, Zhufeng and Wang, Shibo and others. Arxiv 2024.
  3. RULER: What’s the Real Context Size of Your Long-Context Language Models? Hsieh, Cheng-Ping and Sun, Simeng and Kriman, Samuel and Acharya, Shantanu and Rekesh, Dima and Jia, Fei and Ginsburg, Boris. Arxiv 2024. GitHub Repo stars
  4. Lost in the middle: How language models use long contexts Liu, Nelson F and Lin, Kevin and Hewitt, John and Paranjape, Ashwin and Bevilacqua, Michele and Petroni, Fabio and Liang, Percy. ACL 2024.
  5. Make Your LLM Fully Utilize the Context Shengnan An and Zexiong Ma and Zeqi Lin and Nanning Zheng and Jian-Guang Lou and Weizhu Chen. NeurIPS 2024. GitHub Repo stars
  6. Never Lost in the Middle: Mastering Long-Context Question Answering with Position-Agnostic Decompositional Training He, Junqing and Pan, Kunhao and Dong, Xiaoqun and Song, Zhuoyang and Liu, Yibo and Sun, Qianguo and Liang, Yuxin and Wang, Hao and Zhang, Enming and Zhang, Jiaxing. ACL 2024.
  7. Compression Represents Intelligence Linearly Huang, Yuzhen and Zhang, Jinghan and Shan, Zifei and He, Junxian. COLM 2024. GitHub Repo stars
  8. Can Perplexity Reflect Large Language Model's Ability in Long Text Understanding? Hu, Yutong and Huang, Quzhe and Tao, Mingxu and Zhang, Chen and Feng, Yansong. ICLR 2024.
  9. Do Long-Range Language Models Actually Use Long-Range Context? Sun, Simeng and Krishna, Kalpesh and Mattarella-Micke, Andrew and Iyyer, Mohit. ACL 2021.
  10. Extending context window of large language models via positional interpolation Chen, Shouyuan and Wong, Sherman and Chen, Liangjian and Tian, Yuandong. Arxiv 2023.
  11. What is Wrong with Perplexity for Long-context Language Modeling? Fang, Lizhe and Wang, Yifei and Liu, Zhaoyang and Zhang, Chenheng and Jegelka, Stefanie and Gao, Jinyang and Ding, Bolin and Wang, Yisen. ICLR 2025.
  12. Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach Li, Zhuowan and Li, Cheng and Zhang, Mingyang and Mei, Qiaozhu and Bendersky, Michael. ACL 2024.
  13. Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG Jin, Bowen and Yoon, Jinsung and Han, Jiawei and Arik, Sercan O. Arxiv 2024.
  14. Longrag: Enhancing retrieval-augmented generation with long-context llms Jiang, Ziyan and Ma, Xueguang and Chen, Wenhu. Arxiv 2024. GitHub Repo stars

Model Structure Analysis

  1. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, Ren Ng. NeurIPS 2020. GitHub Repo stars

  2. In-context Learning and Induction Heads. Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, Chris Olah. Arxiv 2022

  3. YaRN: Efficient Context Window Extension of Large Language Models. Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole. ICLR 2024. GitHub Repo stars

  4. Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small. Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt. ICLR 2023. GitHub Repo stars

  5. Scaling laws of rope-based extrapolation. Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin. ICLR 2024. GitHub Repo stars

  6. Base of RoPE Bounds Context Length. Xin Men, Mingyu Xu, Bingning Wang, Qingyu Zhang, Hongyu Lin, Xianpei Han, Weipeng Chen. NeurIPS 2024

  7. LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models. Chi Han, Qifan Wang, Hao Peng, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang. NAACL 2024. GitHub Repo stars

  8. Neurons in Large Language Models: Dead, N-gram, Positional. Elena Voita, Javier Ferrando, Christoforos Nalmpantis. ACL Findings 2024

  9. Interpreting and Improving Large Language Models in Arithmetic Calculation. Wei Zhang, Chaoqun Wan, Yonggang Zhang, Yiu-Ming Cheung, Xinmei Tian, Xu Shen, Jieping Ye. ICML 2024

  10. Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning. Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao. ICLR 2025. GitHub Repo stars

  11. Rope to Nope and Back Again: A New Hybrid Attention Strategy. Bowen Yang, Bharat Venkitesh, Dwarak Talupuru, Hangyu Lin, David Cairuz, Phil Blunsom, Acyr Locatelli. Arxiv 2025

  12. Retrieval Head Mechanistically Explains Long-Context Factuality. Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu. ICLR 2025. GitHub Repo stars

Application

Agent

  1. Agents: An Open-source Framework for Autonomous Language Agents. Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, Shiding Zhu, Jiyu Chen, Wentao Zhang, Ningyu Zhang, Huajun Chen, Peng Cui, Mrinmaya Sachan. Arxiv 2023

  2. ReAct: Synergizing Reasoning and Acting in Language Models. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, Yuan Cao. ICLR 2023

  3. The Rise and Potential of Large Language Model Based Agents: {A} Survey. Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, Tao Gui. Arxiv 2023

  4. Benchmarking Large Language Models As {AI} Research Agents. Qian Huang, Jian Vora, Percy Liang, Jure Leskovec. Arxiv 2023

  5. VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?. Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue. Arxiv 2024

  6. Towards General Computer Control: {A} Multimodal Agent for Red Dead Redemption {II} as a Case Study. Weihao Tan, Ziluo Ding, Wentao Zhang, Boyu Li, Bohan Zhou, Junpeng Yue, Haochong Xia, Jiechuan Jiang, Longtao Zheng, Xinrun Xu, Yifei Bi, Pengjie Gu, Xinrun Wang, B{"{o}}rje F. Karlsson, Bo An, Zongqing Lu. Arxiv 2024

  7. TravelAgent: An {AI} Assistant for Personalized Travel Planning. Aili Chen, Xuyang Ge, Ziquan Fu, Yanghua Xiao, Jiangjie Chen. Arxiv 2024

  8. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering. John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press. NeurIPS 2024

  9. GPTSwarm: Language Agents as Optimizable Graphs. Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, J{"{u}}rgen Schmidhuber. ICML 2024

  10. SWE-bench: Can Language Models Resolve Real-world Github Issues?. Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik R. Narasimhan. ICLR 2024

  11. AutoCodeRover: Autonomous Program Improvement. Yuntong Zhang, Haifeng Ruan, Zhiyu Fan, Abhik Roychoudhury. Proceedings of the 33rd {ACM} {SIGSOFT} International Symposium on Software Testing and Analysis, {ISSTA} 2024, Vienna, Austria, September 16-20, 2024 2024

  12. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments. Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu. NeurIPS 2024

  13. WebArena: {A} Realistic Web Environment for Building Autonomous Agents. Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig. ICLR 2024

  14. Agentless: Demystifying LLM-based Software Engineering Agents. Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, Lingming Zhang. Arxiv 2024

  15. Symbolic Learning Enables Self-Evolving Agents. Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang. Arxiv 2024

  16. MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering. Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Lilian Weng, Aleksander Madry. Arxiv 2024

RAG

  1. How Can Recommender Systems Benefit from Large Language Models: A Survey. Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, Weinan Zhang. 2023
  2. A Comprehensive Survey of Retrieval-Augmented Generation RAG: Evolution, Current Landscape and Future Directions. Shailja Gupta, Rajesh Ranjan, Surya Narayan Singh. 2024
  3. Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG. Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan {"{O}}. Arik. 2024
  4. LitLLM: A Toolkit for Scientific Literature Review. Shubham Agarwal, Issam H. Laradji, Laurent Charlin, Christopher Pal. 2024
  5. SPAR: Personalized Content-Based Recommendation via Long Engagement Attention. Chiyu Zhang, Yifei Sun, Jun Chen, Jie Lei, Muhammad Abdul{-}Mageed, Sinong Wang, Rong Jin, Sem Park, Ning Yao, Bo Long. 2024
  6. ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation. Jianghao Lin, Rong Shan, Chenxu Zhu, Kounianhua Du, Bo Chen, Shigang Quan, Ruiming Tang, Yong Yu, Weinan Zhang. 2024
  7. HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications. Rishi Kalra, Zekun Wu, Ayesha Gulley, Airlie Hilliard, Xin Guan, Adriano S. Koshiyama, Philip C. Treleaven. 2024
  8. In Defense of RAG in the Era of Long-Context Language Models. Tan Yu, Anbang Xu, Rama Akkiraju. 2024
  9. Let long-term interests talk: An disentangled learning model for recommendation based on short-term interests generation. Sirui Duan, Mengya Ouyang, Rong Wang, Qian Li, Yunpeng Xiao. 2025

Chatbot

  1. MemoryBank: Enhancing Large Language Models with Long-Term Memory. Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, Yanlin Wang. Arxiv 2023.

        GitHub Repo stars

  1. Augmenting Language Models with Long-Term Memory. Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei. 2023
  2. Kimi Chat. Moonshot AI. 2023
  3. Character AI. {Character AI}. 2023
  4. I’m Pi, Your personal AI. Inflection. 2023
  5. Prompted LLMs as Chatbot Modules for Long Open-domain Conversation. Gibbeum Lee, Volker Hartmann, Jongho Park, Dimitris Papailiopoulos, Kangwook Lee. 2023
  6. Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention. Eunkyung Jo, Yuin Jeong, SoHyun Park, Daniel A. Epstein, Young{-}Ho Kim. 2024
  7. Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models. Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi. 2024
  8. Memory and New Controls for ChatGPT. OpenAI. 2024

Code

  1. GitHub Copilot. GitHub. 2022
  2. RepoFusion: Training Code Models to Understand Your Repository. Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak. 2023
  3. Repository-Level Prompt Generation for Large Language Models of Code. Disha Shrivastava, Hugo Larochelle, Daniel Tarlow. 2023
  4. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian{-}Guang Lou, Weizhu Chen. 2023
  5. Granite Code Models: A Family of Open Foundation Models for Code Intelligence. Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan{-}Hong Dang, Pengyuan Li, Kun{-}Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal, Yi Zhou, Chris Johnson, Aanchal Goyal, Hima Patel, S. Yousaf Shah, Petros Zerfos, Heiko Ludwig, Asim Munawar, Maxwell Crouse, Pavan Kapanipathi, Shweta Salaria, Bob Calio, Sophia Wen, Seetharami Seelam, Brian Belgodere, Carlos A. Fonseca, Amith Singhee, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda. 2024
  6. RepoHyper: Better Context Retrieval Is All You Need for Repository-Level Code Completion. Huy Nhat Phan, Hoang Nhat Phan, Tien N. Nguyen, Nghi D. Q. Bui. 2024
  7. Qwen2.5-Coder Technical Report. Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, An Yang, Rui Men, Fei Huang, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, Junyang Lin. 2024
  8. A Survey on Large Language Models for Code Generation. Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, Sunghun Kim. 2024
  9. StarCoder 2 and The Stack v2: The Next Generation. Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy{-}Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen{-}Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krau{\ss}, Naman Jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian J. McAuley, Han Hu, Torsten Scholak, S{'{e}}bastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, et al.. 2024
  10. Cursor - The AI Code Editor. Anysphere. 2025

NLP Tasks

  1. Longformer: The Long-Document Transformer. Iz Beltagy, Matthew E. Peters, Arman Cohan. Arxiv 2020.

        GitHub Repo stars

  1. Big Bird: Transformers for Longer Sequences. Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. NeurIPS 2020.

        GitHub Repo stars

  1. LongEmbed: Extending Embedding Models for Long Context Retrieval. Dawei Zhu, Liang Wang, Nan Yang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li. Arxiv 2024.

        GitHub Repo stars

  1. Document-Level Neural Machine Translation with Hierarchical Attention Networks. Lesly Miculicich, Dhananjay Ram, Nikolaos Pappas, James Henderson. 2018
  2. Improving the Transformer Translation Model with Document-Level Context. Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Min Zhang, Yang Liu. 2018
  3. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization. Xingxing Zhang, Furu Wei, Ming Zhou. 2019
  4. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu. 2020
  5. G-Transformer for Document-Level Machine Translation. Guangsheng Bao, Yue Zhang, Zhiyang Teng, Boxing Chen, Weihua Luo. 2021
  6. LongT5: Efficient Text-To-Text Transformer for Long Sequences. Mandy Guo, Joshua Ainslie, David C. Uthus, Santiago Onta{~{n}}{'{o}}n, Jianmo Ni, Yun{-}Hsuan Sung, Yinfei Yang. 2022
  7. Large Language Models for Information Retrieval: A Survey. Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, Ji{-}Rong Wen. 2023
  8. Improving Long Context Document-Level Machine Translation. Christian Herold, Hermann Ney. 2023
  9. Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents. Michael G{"{u}}nther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao. 2023
  10. Document-Level Machine Translation with Large Language Models. Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, Zhaopeng Tu. 2023
  11. Benchmarking and Improving Long-Text Translation with Large Language Models. Longyue Wang, Zefeng Du, Wenxiang Jiao, Chenyang Lyu, Jianhui Pang, Leyang Cui, Kaiqiang Song, Derek F. Wong, Shuming Shi, Zhaopeng Tu. 2024
  12. A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models. Chenyang Lyu, Zefeng Du, Jitao Xu, Yitao Duan, Minghao Wu, Teresa Lynn, Alham Fikri Aji, Derek F. Wong, Longyue Wang. 2024
  13. Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT. Jon Saad{-}Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher R{'{e}}. 2024
  14. Improving Text Embeddings with Large Language Models. Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei. 2024
  15. A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. Hanlei Jin, Yang Zhang, Dan Meng, Jun Wang, Jinghua Tan. 2024
  16. BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu. 2024
  17. New Embedding Models and API Updates. OpenAI. 2024
  18. [A study of extractive summarization of long documents incorporating local topic and hierarchical information.] Ting Wang, Chuan Yang, Maoyang Zou, Jiaying Liang, Dong Xiang, Wenjie Yang, Hongyang Wang, Jia Li. 2024 [MISSING]
  19. Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications. Aditi S. Godbole, Jabin Geevarghese George, Smita Shandilya. 2024

Multimodal Tasks

  1. Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts. Aditya Sharma, Michael Saxon, William Yang Wang. Arxiv 2024.

        Static Badge

  1. Many-Shot In-Context Learning in Multimodal Foundation Models. Yixing Jiang, Jeremy Irvin, Ji Hun Wang, Muhammad Ahmed Chaudhry, Jonathan H. Chen, Andrew Y. Ng. Arxiv 2024.

        GitHub Repo stars

  1. LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models. Shangqing Tu, Yucheng Wang, Daniel Zhang-Li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou, Huiqin Liu, Zhiyuan Liu, Bin Xu, Juanzi Li. Arxiv 2025.

        GitHub Repo stars

Specific Domains

  1. Abstractive Text Summarization by Incorporating Reader Comments. Shen Gao, Xiuying Chen, Piji Li, Zhaochun Ren, Lidong Bing, Dongyan Zhao, Rui Yan. 2019
  2. A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges. Yuqi Nie, Yaxuan Kong, Xiaowen Dong, John M. Mulvey, H. Vincent Poor, Qingsong Wen, Stefan Zohren. 2024
  3. MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens. Yongqi Fan, Hongli Sun, Kui Xue, Xiaofan Zhang, Shaoting Zhang, Tong Ruan. 2024
  4. Promises and pitfalls of artificial intelligence for legal applications. Sayash Kapoor, Peter Henderson, Arvind Narayanan. 2024
  5. Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications. Aditi S. Godbole, Jabin Geevarghese George, Smita Shandilya. 2024
  6. DocFinQA: A Long-Context Financial Reasoning Dataset. Varshini Reddy, Rik Koncel{-}Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner. 2024
  7. Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers. Lukas Hilgert, Danni Liu, Jan Niehues. 2024
  8. LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents. Ahmed Masry, Amir Hajian. 2024

Future Directions

Long CoT

  1. When More is Less: Understanding Chain-of-Thought Length in LLMs. Yuyang Wu, Yifei Wang, Tianqi Du, Stefanie Jegelka, Yisen Wang. Arxiv 2025.

  2. LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!. Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Shishir G. Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica. Arxiv 2025.         GitHub Repo stars

  3. Monte Carlo Tree Diffusion for System 2 Planning. Jaesik Yoon, Hyeonseo Cho, Doojin Baek, Yoshua Bengio, Sungjin Ahn. Arxiv 2025.

  4. Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning. Qifan Yu, Zhenyu He, Sijie Li, Xun Zhou, Jun Zhang, Jingjing Xu, Di He. Arxiv 2025.         GitHub Repo stars

  5. CoT-Valve: Length-Compressible Chain-of-Thought Tuning. Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang. Arxiv 2025.         GitHub Repo stars

  6. Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity. Junhao Hu, Wenrui Huang, Weidong Wang, Zhenwen Li, Tiancheng Hu, Zhixia Liu, Xusheng Chen, Tao Xie, Yizhou Shan. Arxiv 2025.

  7. DRT: Deep Reasoning Translation via Long Chain-of-Thought. Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou. Arxiv 2024.         GitHub Repo stars

  8. Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs. Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu. Arxiv 2024.

  9. O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?. Zhen Huang, Haoyang Zou, Xuefeng Li, Yixiu Liu, Yuxiang Zheng, Ethan Chern, Shijie Xia, Yiwei Qin, Weizhe Yuan, Pengfei Liu. Arxiv 2024.         GitHub Repo stars

  10. OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning. Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Yuhang Wang, Jinlin Xiao, Jitao Sang. Arxiv 2024.         GitHub Repo stars

  11. Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning. Libo Wang. Arxiv 2025.         GitHub Repo stars

  12. SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities. Fengqing Jiang, Zhangchen Xu, Yuetai Li, Luyao Niu, Zhen Xiang, Bo Li, Bill Yuchen Lin, Radha Poovendran. Arxiv 2025.         Static Badge

  13. Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning. Qingwen Lin, Boyan Xu, Zijian Li, Zhifeng Hao, Keli Zhang, Ruichu Cai. Arxiv 2025.

  14. Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?. Zhiyuan Zeng, Qinyuan Cheng, Zhangyue Yin, Yunhua Zhou, Xipeng Qiu. Arxiv 2025.

  15. TokenSkip: Controllable Chain-of-Thought Compression in LLMs. Heming Xia, Yongqi Li, Chak Tou Leong, Wenjie Wang, Wenjie Li. Arxiv 2025.         GitHub Repo stars

  16. LightThinker: Thinking Step-by-Step Compression. Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang. Arxiv 2025.         GitHub Repo stars

  17. Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning. Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei. Arxiv 2025.

  18. Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?. Yancheng He, Shilong Li, Jiaheng Liu, Weixun Wang, Xingyuan Bu, Ge Zhang, Zhongyuan Peng, Zhaoxiang Zhang, Zhicheng Zheng, Wenbo Su, Bo Zheng. Arxiv 2025.         GitHub Repo stars

  19. Towards Widening The Distillation Bottleneck for Reasoning Models. Huifeng Yin, Yu Zhao, Minghao Wu, Xuanfan Ni, Bo Zeng, Hao Wang, Tianqi Shi, Liangying Shao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang. Arxiv 2025.

  20. What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret. Yufeng Yuan, Yu Yue, Ruofei Zhu, Tiantian Fan, Lin Yan. Arxiv 2025.

  21. MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving. Ruida Wang, Rui Pan, Yuxin Li, Jipeng Zhang, Yizhen Jia, Shizhe Diao, Renjie Pi, Junjie Hu, Tong Zhang. Arxiv 2025.

  22. START: Self-taught Reasoner with Tools. Chengpeng Li, Mingfeng Xue, Zhenru Zhang, Jiaxi Yang, Beichen Zhang, Xiang Wang, Bowen Yu, Binyuan Hui, Junyang Lin, Dayiheng Liu. Arxiv 2025.

  23. L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning. Pranjal Aggarwal, Sean Welleck. Arxiv 2025.

  24. InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models. Yuchen Yan, Yongliang Shen, Yang Liu, Jin Jiang, Mengdi Zhang, Jian Shao, Yueting Zhuang. Arxiv 2025.

  25. Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval. Yuwei Zhang, Jayanth Srinivasa, Gaowen Liu, Jingbo Shang. Arxiv 2025.

  26. "Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding. Hyunbin Jin, Je Won Yeom, Seunghyun Bae, Taesup Kim. Arxiv 2025.

  27. Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond. Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang. Arxiv 2025.

        GitHub Repo stars

Acknowledgments

Please contact us if We miss your names in the list, I will add you back ASAP!

Contributors

Star History

Star History Chart

About

A Comprehensive Survey on Long Context Language Modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published