If you like our project, please give us a star β on GitHub for the latest update.
This repository provides the papers mentioned in the survey "A Survey on Latent Reasoning".
If you find our survey useful for your research, please consider citing the following paper:
@article{map2025latent,
title={A Survey on Latent Reasoning},
author={M-A-P},
journal={arxiv},
year={2025}
}
[2025-07-08]
We have released the arxiv: A Survey on Latent Reasoning.
[2025-07-04]
We have initialed the repository.
We welcome feedback, suggestions, and contributions that can help improve this survey and repository and make them valuable resources for the entire community. We will actively maintain this repository by incorporating new research as it emerges. If you have any suggestions about our taxonomy, please take a look at any missed papers or update any preprint arXiv papers that have been accepted to some venue.
If you want to add your work or model to this list, please do not hesitate to email [email protected] or pull requests.
Markdown format:
* | **Paper Name** | Name of Conference or Journal + Year | Release Date | [Paper](link) - [Code](link) |
- π Citation
- π£ Update News
- π Explict Reasoning vs. Latent Reasoning
- πΌ Contents
- π Papers
- π§ Latent CoT Reasoning
- π¬ Mechanistic Interpretability
- βΎοΈ Towards Infinite-depth Reasoning
- π Acknowledgement
β₯οΈ Contributors
Title | Venue | Date | Links |
---|---|---|---|
Universal transformers | ICLR 2019 | Jul 2018 | Paper - Code |
CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference | ICLR 2025 | Oct 2023 | Paper |
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures | TMLR 2025 | Feb 2024 | Paper - Code |
Relaxed recursive transformers: Effective parameter sharing with layer-wise Lora | ICLR 2025 | Oct 2024 | Paper |
Byte Latent Transformer: Patches Scale Better Than Tokens | ACL 2025 Outstanding Paper | Dec 2024 | Paper - Code |
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach | ICLR 2025 | Feb 2025 | Paper - Code |
Pretraining Language Models to Ponder in Continuous Space | arXiv | May 2025 | Paper - Code |
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation | arXiv | Jul 2025 | Paper - Code |
Title | Venue | Date | Links |
---|---|---|---|
Think before you speak: Training Language Models With Pause Tokens | ICLR 2024 | Oct 2023 | Paper |
Guiding Language Model Reasoning with Planning Tokens | COLM 2024 | Oct 2023 | Paper - Code |
Let's Think Dot by Dot: Hidden computation in transformer language models | COLM 2024 | Apr 2024 | Paper - Code |
Disentangling memory and reasoning ability in large language models | ACL 2025 (main) | Nov 2024 | Paper - Code |
Training Large Language Models to Reason in a Continuous Latent Space | arXiv | Dec 2024 | Paper - Code |
Compressed chain of thought: Efficient reasoning through dense representations | arXiv | Dec 2024 | Paper |
Multimodal Latent Language Modeling with Next-Token Diffusion | arXiv | Dec 2024 | Paper - Page |
Efficient Reasoning with Hidden Thinking | arXiv | Jan 2025 | Paper - Code |
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning | ICML 2025 | Feb 2025 | Paper |
Lightthinker: Thinking step-by-step compression | arXiv | Feb 2025 | Paper - Code |
Codi: Compressing chain-of-thought into continuous space via self-distillation | arXiv | Feb 2025 | Paper - Code |
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts | arXiv | May 2025 | Paper |
Hybrid Latent Reasoning via Reinforcement Learning | arXiv | May 2025 | Paper - Code |
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens | arXiv | Jun 2025 | Paper - Code |
Parallel Continuous Chain-of-Thought with Jacobi Iteration | arXiv | Jun 2025 | Paper - Code |
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains | arXiv | Jun 2025 | Paper - Code |
SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought | arXiv | Aug 2025 | Paper |
Title | Venue | Date | Links |
---|---|---|---|
From explicit cot to implicit cot: Learning to internalize cot step by step | arXiv | May 2024 | Paper |
On the inductive bias of stacking towards improving reasoning | NeurIPS 2024 | Jun 2024 | Paper |
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding | arXiv | Nov 2024 | Paper - Code |
Training large language models to reason in a continuous latent space | arXiv | Dec 2024 | Paper |
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning | arXiv | Feb 2025 | Paper |
Reasoning with latent thoughts: On the power of looped transformers | arXiv | Feb 2025 | Paper |
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | arXiv | May 2025 | Paper - Code - Project |
ο½Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space ο½ arXiv | May 2025 | Paper Code |
Title | Venue | Date | Links |
---|---|---|---|
Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks | NeurIPS 2021 | Oct 2021 | Paper - Code |
Looped transformers as programmable computers | ICML 2023 | Jun 2023 | Paper - Code |
Simulation of graph algorithms with looped transformers | arXiv | Feb 2024 | Paper - Code |
Guiding Language Model Reasoning with Planning Tokens | CoLM 2024 | Feb 2024 | Paper - Code |
Can looped transformers learn to implement multi-step gradient descent for in-context learning? | arXiv | Oct 2024 | Paper |
Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent | arXiv | Oct 2024 | Paper |
Disentangling memory and reasoning ability in large language models | arXiv | Nov 2024 | Paper |
LatentPrompt: Optimizing Promts in Latent Space | arXiv | Aug 2025 | Paper |
β³ Temporal Hidden-state Methods
π¦ Hidden-state based methods
Title | Venue | Date | Links |
---|---|---|---|
Gated linear attention transformers with hardware-efficient training | arXiv | Dec 2023 | Paper - Code |
Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence | arXiv | Apr 2024 | Paper - Code |
Hgrn2: Gated linear rnns with state expansion | arXiv | Apr 2024 | Paper - Code |
Transformers are ssms: Generalized models and efficient algorithms through structured state space duality | arXiv | May 2024 | Paper - Code |
Parallelizing linear transformers with the delta rule over sequence length | arXiv | Jun 2024 | Paper - Code |
Title | Venue | Date | Links |
---|---|---|---|
Learning to (learn at test time): Rnns with expressive hidden states | arXiv | Jul 2024 | Paper |
Gated Delta Networks: Improving Mamba2 with Delta Rule | arXiv | Dec 2024 | Paper - Code |
Titans: Learning to memorize at test time | arXiv | Jan 2025 | Paper |
Lattice: Learning to efficiently compress the memory | arXiv | Apr 2025 | Paper |
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization | arXiv | Apr 2025 | Paper |
Atlas: Learning to optimally memorize the context at test time | arXiv | May 2025 | Paper |
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration | arXiv | May 2025 | Paper |
| Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers| SSRN | May 2025 | Paper - Code |
π Training-induced Hidden-State Conversion
Title | Venue | Date | Links |
---|---|---|---|
Linearizing large language models | arXiv | May 2024 | Paper |
Transformers to ssms: Distilling quadratic knowledge to subquadratic models | NeurIPS 2024 | Jun 2024 | Paper |
LoLCATs: On Low-Rank Linearizing of Large Language Models | ICLR 2025 | Oct 2024 | Paper |
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing | arXiv | Feb 2025 | Paper - Code |
Liger: Linearizing Large Language Models to Gated Recurrent Structures | arXiv | Mar 2025 | Paper - Code |
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows | arXiv | Jul 2025 | Paper |
Title | Venue | Date | Links |
---|---|---|---|
Towards a mechanistic interpretation of multi-step reasoning capabilities of language models | arXiv | Oct 2023 | Paper - Code |
Iteration head: A mechanistic study of chain-of-thought | NeurIPS 2024 | Jun 2024 | Paper |
Towards understanding how transformer perform multi-step reasoning with matching operation | arXiv | Jun 2024 | Paper |
Do LLMs Really Think Step-by-step In Implicit Reasoning? | arXiv | Nov 2024 | Paper |
Back attention: Understanding and enhancing multi-hop reasoning in large language models | arXiv | Feb 2025 | Paper |
How Do LLMs Perform Two-Hop Reasoning in Context? | arXiv | Feb 2025 | Paper |
Reasoning with latent thoughts: On the power of looped transformers | arXiv | Feb 2025 | Paper |
A little depth goes a long way: The expressive power of log-depth transformers | arXiv | Mar 2025 | Paper |
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer | arXiv | Jul 2025 | Paper - Code |
Title | Venue | Date | Links |
---|---|---|---|
Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting | NeurIPS 2019 | Jul 2019 | Paper |
Transformer feed-forward layers are key-value memories | EMNLP 2021 | Dec 2020 | Paper |
Interpretability in the wild: a circuit for indirect object identification in GPT-2 small | arXiv | Nov 2022 | Paper |
micse: Mutual information contrastive learning for low-shot sentence embeddings | arXiv | Nov 2022 | Paper - Code |
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model | NeurIPS 2023 | May 2023 | Paper |
A mechanistic interpretation of arithmetic reasoning in language models using causal mediation analysis | EMNLP 2023 | May 2023 | Paper - Code |
Why lift so heavy? slimming large language models by cutting off the layers | arXiv | Feb 2024 | Paper |
Do large language models latently perform multi-hop reasoning? | EACL 2024 | Feb 2024 | Paper |
Understanding and Patching Compositional Reasoning in LLMs | ACL 2024 (Finding) | Feb 2024 | Paper - Code |
How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning | ICLR 2024 | Feb 2024 | Paper |
The Unreasonable Ineffectiveness of the Deeper Layers | arXiv | Mar 2024 | Paper |
Inheritune: Training Smaller Yet More Attentive Language Models | arXiv | Apr 2024 | Paper - Code |
Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization | ICML 2024 | May 2024 | Paper - Code |
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning | NeurIPS 2024 | May 2024 | Paper - Code |
Loss landscape geometry reveals stagewise development of transformers | Hi-DL 2024 | Jun 2024 | Paper |
Hopping too late: Exploring the limitations of large language models on multi-hop queries | arXiv | Jun 2024 | Paper |
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning | arXiv | Jun 2024 | Paper |
Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons | arXiv | Aug 2024 | Paper |
Unveiling induction heads: Provable training dynamics and feature learning in transformers | arXiv | Sep 2024 | Paper |
Investigating layer importance in large language models | arXiv | Sep 2024 | Paper |
Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations | arXiv | Oct 2024 | Paper - Code |
Understanding Layer Significance in LLM Alignment | arXiv | Oct 2024 | Paper |
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation | ICLR 2025 | Oct 2024 | Paper - Code |
Does representation matter? exploring intermediate layers in large language models | arXiv | Dec 2024 | Paper |
Layer by Layer: Uncovering Hidden Representations in Language Models | ICML 2025 (oral) | Feb 2025 | Paper - Code |
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach | arXiv | Feb 2025 | Paper - Code |
The Curse of Depth in Large Language Models | arXiv | Feb 2025 | Paper - Code |
Back attention: Understanding and enhancing multi-hop reasoning in large language models | arXiv | Feb 2025 | Paper |
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis | arXiv | Feb 2025 | Paper |
An explainable transformer circuit for compositional generalization | arXiv | Feb 2025 | Paper |
Emergent Abilities in Large Language Models: A Survey | arXiv | Mar 2025 | Paper |
Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights | arXiv | May 2025 | Paper |
Do Language Models Use Their Depth Efficiently? | arXiv | May 2025 | Paper |
Void in Language Models | arXiv | May 2025 | Paper |
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking | arXiv | Aug 2025 | Paper |
Title | Venue | Date | Links |
---|---|---|---|
On the computational power of neural nets | JCSS | 1995 | Paper |
Long Short-Term Memory | Neural Computation | 1997 | Paper |
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation | EMNLP 2014 | Jun 2014 | Paper |
On the turing completeness of modern neural network architectures | IJCNN 2021 | Jan 2019 | Paper |
Recurrent memory transformer | NeurIPS 2022 | Jul 2022 | Paper |
Looped transformers as programmable computers | ICML 2023 | Jun 2023 | Paper |
On limitations of the transformer architecture | CoLM 2024 | Nov 2023 | Paper |
Investigating Recurrent Transformers with Dynamic Halt | arXiv | Feb 2024 | Paper |
Chain of thought empowers transformers to solve inherently serial problems | ICLR 2024 | Feb 2024 | Paper |
Quiet-star: Language models can teach themselves to think before speaking | arXiv | Mar 2024 | Paper |
Ask, and it shall be given: On the Turing completeness of prompting | arXiv | Nov 2024 | Paper |
Reinforcement Pre-Training | arXiv | Jun 2025 | Paper |
Constant Bit-size Transformers Are Turing Complete | arXiv | Jun 2025 | Paper |
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought | arXiv | Jul 2025 | Paper - Code |
Title | Venue | Date | Links |
---|---|---|---|
Structured denoising diffusion models in discrete state-spaces | NeurIPS 2021 | Jul 2021 | Paper |
Discrete diffusion modeling by estimating the ratios of the data distribution | ICML 2024 | June 2024 | Paper |
Your absorbing discrete diffusion secretly models the conditional distributions of clean data | arXiv | Jun 2024 | Paper |
Learning Iterative Reasoning through Energy Diffusion | ICML 2024 | Jun 2024 | Paper - Project |
Simplified and generalized masked diffusion for discrete data | NeurIPS 2024 | Jun 2024 | Paper -Project |
Simple and effective masked diffusion language models | NeurIPS 2024 | Jun 2024 | Paper - Code |
Scaling up Masked Diffusion Models on Text | arXiv | Oct 2024 | Paper - Project |
MMaDa: Multimodal large diffusion language models | arXiv | May 2025 | Paper - Project |
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models | arXiv | Aug 2025 | Paper - Code |
Title | Venue | Date | Links |
---|---|---|---|
Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models | ICLR 2024 | Feb 2024 | Paper - Project |
Large Language Diffusion Models | ICLR 2025 Workshop | Feb 2025 | Paper - Project |
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning | ICLR 2025 | Feb 2025 | Paper - Project |
dKV-Cache: The Cache for Diffusion Language Models | arXiv | May 2025 | Paper - Project |
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching | arXiv | May 2025 | Paper - Project |
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | arXiv | May 2025 | Paper |
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | arXiv | May 2025 | Paper - Project |
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | arXiv | June 2025 | Paper - Project |
Diffusion Beats Autoregressive in Data-Constrained Settings | arXiv | July 2025 | Paper - Project - Code |
Title | Venue | Date | Links |
---|---|---|---|
Diffusion-LM Improves Controllable Text Generation | NeurIPS 2022 | May 2022 | Paper - Project |
Continuous diο¬usion for categorical data | arXiv | Dec 2022 | Paper |
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning | ICLR 2023 | Mar 2023 | Paper - Project |
Likelihood-Based Diffusion Language Models | NeurIPS 2023 | May 2023 | Paper - Project |
Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models | ICLR 2024 | Feb 2024 | Paper - Project |
TESS: Text-to-Text Self-Conditioned Simplex Diffusion | EACL 2024 | Feb 2024 | Paper - Project |
TESS 2: A Large-Scale Generalist Diffusion Language Model | arXiv | Feb 2025 | Paper |
Title | Venue | Date | Links |
---|---|---|---|
Scaling Diffusion Language Models via Adaptation from Autoregressive Models | ICLR 2025 | Oct 2024 | Paper - Project |
Large Language Models to Diffusion Finetuning | ICML 2025 | Jan 2025 | Paper - Code |
Dream 7B: a large diffusion language model | Blog | Apr 2025 | Paper - Code |
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities | Technical Report | May 2025 | Paper |
Mercury: Ultra-Fast Language Models Based on Diffusion | arXiv | June 2025 | Paper - Page |
Title | Venue | Date | Links |
---|---|---|---|
MEMORYLLM: Towards Self-Updatable Large Language Models | ICML 2024 | Feb 2024 | Paper - Code |
Leave No Context Behind: Efficient infinite context transformers with infini-attention | arXiv | Apr 2024 | Paper - Project |
Learning to (learn at test time): Rnns with expressive hidden states | arXiv | Jul 2024 | Paper |
Titans: Learning to memorize at test time | arXiv | Jan 2025 | Paper |
Atlas: Learning to optimally memorize the context at test time | arXiv | May 2025 | Paper |
M+: Extending MemoryLLM with Scalable Long-Term Memory | ICML 2025 | May 2025 | Paper - Code |
Title | Venue | Date | Links |
---|---|---|---|
Implicit Language Models are RNNs: Balancing Parallelization and Expressivity | ICML 2025 | Feb 2025 | Paper - Code |
Title | Venue | Date | Links |
---|---|---|---|
A Survey of diffusion models in natural language processing | TACL | May 2023 | Paper |
Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis | CVPR 2025 (oral) | Dec 2024 | Paper - Code |
Large Language Diffusion Models | ICLR 2025 Workshop | Feb 2025 | Paper - Project - Code |
- Awesome-Latent-CoT: a curated list of papers exploring latent chain-of-thought reasoning in LLMs.
- Awesome-Efficient-Reasoning: a curated list of works on making LLM reasoning cheaper and faster.
- Efficient Reasoning Models: A Survey: the companion repo to the survey, aggregating methods/benchmarks for βshorter, smaller, fasterβ reasoning models.