Skip to content

multimodal-art-projection/LatentCoT-Horizon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

42 Commits
Β 
Β 
Β 
Β 

Repository files navigation

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Awesome arXiv hf_paper TechBeat zhihu zhihu PRs Welcome GitHub Repo stars

This repository provides the papers mentioned in the survey "A Survey on Latent Reasoning".

πŸ“‘ Citation

If you find our survey useful for your research, please consider citing the following paper:

@article{map2025latent,
  title={A Survey on Latent Reasoning},
  author={M-A-P},
  journal={arxiv},
  year={2025}
}

πŸ“£ Update News

[2025-07-08] We have released the arxiv: A Survey on Latent Reasoning.

[2025-07-04] We have initialed the repository.

πŸ†š Explicit Reasoning vs. Latent Reasoning

⚑ Contributing

We welcome feedback, suggestions, and contributions that can help improve this survey and repository and make them valuable resources for the entire community. We will actively maintain this repository by incorporating new research as it emerges. If you have any suggestions about our taxonomy, please take a look at any missed papers or update any preprint arXiv papers that have been accepted to some venue.

If you want to add your work or model to this list, please do not hesitate to email [email protected] or pull requests.

Markdown format:

* | **Paper Name** | Name of Conference or Journal + Year | Release Date | [Paper](link) - [Code](link) |

πŸ’Ό Contents

πŸ“œ Papers

🧠 Latent CoT Reasoning


πŸ”„ Activation-based Recurrent Methods

🧱 Architectural Recurrence
Title Venue Date Links
Universal transformers ICLR 2019 Jul 2018 Paper - Code
CoTFormer: A Chain-of-Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference ICLR 2025 Oct 2023 Paper
AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures TMLR 2025 Feb 2024 Paper - Code
Relaxed recursive transformers: Effective parameter sharing with layer-wise Lora ICLR 2025 Oct 2024 Paper
Byte Latent Transformer: Patches Scale Better Than Tokens ACL 2025 Outstanding Paper Dec 2024 Paper - Code
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach ICLR 2025 Feb 2025 Paper - Code
Pretraining Language Models to Ponder in Continuous Space arXiv May 2025 Paper - Code
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation arXiv Jul 2025 Paper - Code
πŸ‹οΈ Training-induced Recurrence
Title Venue Date Links
Think before you speak: Training Language Models With Pause Tokens ICLR 2024 Oct 2023 Paper
Guiding Language Model Reasoning with Planning Tokens COLM 2024 Oct 2023 Paper - Code
Let's Think Dot by Dot: Hidden computation in transformer language models COLM 2024 Apr 2024 Paper - Code
Disentangling memory and reasoning ability in large language models ACL 2025 (main) Nov 2024 Paper - Code
Training Large Language Models to Reason in a Continuous Latent Space arXiv Dec 2024 Paper - Code
Compressed chain of thought: Efficient reasoning through dense representations arXiv Dec 2024 Paper
Multimodal Latent Language Modeling with Next-Token Diffusion arXiv Dec 2024 Paper - Page
Efficient Reasoning with Hidden Thinking arXiv Jan 2025 Paper - Code
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning ICML 2025 Feb 2025 Paper
Lightthinker: Thinking step-by-step compression arXiv Feb 2025 Paper - Code
Codi: Compressing chain-of-thought into continuous space via self-distillation arXiv Feb 2025 Paper - Code
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts arXiv May 2025 Paper
Hybrid Latent Reasoning via Reinforcement Learning arXiv May 2025 Paper - Code
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens arXiv Jun 2025 Paper - Code
Parallel Continuous Chain-of-Thought with Jacobi Iteration arXiv Jun 2025 Paper - Code
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains arXiv Jun 2025 Paper - Code
SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought arXiv Aug 2025 Paper
🎯 Training Strategies for Recurrent Reasoning
Title Venue Date Links
From explicit cot to implicit cot: Learning to internalize cot step by step arXiv May 2024 Paper
On the inductive bias of stacking towards improving reasoning NeurIPS 2024 Jun 2024 Paper
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding arXiv Nov 2024 Paper - Code
Training large language models to reason in a continuous latent space arXiv Dec 2024 Paper
Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning arXiv Feb 2025 Paper
Reasoning with latent thoughts: On the power of looped transformers arXiv Feb 2025 Paper
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space arXiv May 2025 Paper - Code - Project
|Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space | arXiv May 2025 Paper Code
✨ Applications and Capabilities
Title Venue Date Links
Can you learn an algorithm? generalizing from easy to hard problems with recurrent networks NeurIPS 2021 Oct 2021 Paper - Code
Looped transformers as programmable computers ICML 2023 Jun 2023 Paper - Code
Simulation of graph algorithms with looped transformers arXiv Feb 2024 Paper - Code
Guiding Language Model Reasoning with Planning Tokens CoLM 2024 Feb 2024 Paper - Code
Can looped transformers learn to implement multi-step gradient descent for in-context learning? arXiv Oct 2024 Paper
Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent arXiv Oct 2024 Paper
Disentangling memory and reasoning ability in large language models arXiv Nov 2024 Paper
LatentPrompt: Optimizing Promts in Latent Space arXiv Aug 2025 Paper

⏳ Temporal Hidden-state Methods

πŸ“¦ Hidden-state based methods
Title Venue Date Links
Gated linear attention transformers with hardware-efficient training arXiv Dec 2023 Paper - Code
Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence arXiv Apr 2024 Paper - Code
Hgrn2: Gated linear rnns with state expansion arXiv Apr 2024 Paper - Code
Transformers are ssms: Generalized models and efficient algorithms through structured state space duality arXiv May 2024 Paper - Code
Parallelizing linear transformers with the delta rule over sequence length arXiv Jun 2024 Paper - Code
βš™οΈ Optimization-based State Evolution
Title Venue Date Links
Learning to (learn at test time): Rnns with expressive hidden states arXiv Jul 2024 Paper
Gated Delta Networks: Improving Mamba2 with Delta Rule arXiv Dec 2024 Paper - Code
Titans: Learning to memorize at test time arXiv Jan 2025 Paper
Lattice: Learning to efficiently compress the memory arXiv Apr 2025 Paper
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization arXiv Apr 2025 Paper
Atlas: Learning to optimally memorize the context at test time arXiv May 2025 Paper
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration arXiv May 2025 Paper

| Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers| SSRN | May 2025 | Paper - Code |

🎭 Training-induced Hidden-State Conversion
Title Venue Date Links
Linearizing large language models arXiv May 2024 Paper
Transformers to ssms: Distilling quadratic knowledge to subquadratic models NeurIPS 2024 Jun 2024 Paper
LoLCATs: On Low-Rank Linearizing of Large Language Models ICLR 2025 Oct 2024 Paper
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing arXiv Feb 2025 Paper - Code
Liger: Linearizing Large Language Models to Gated Recurrent Structures arXiv Mar 2025 Paper - Code
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows arXiv Jul 2025 Paper

πŸ”¬ Mechanistic Interpretability


🧐 Do Layer Stacks Reflect Latent CoT?

Title Venue Date Links
Towards a mechanistic interpretation of multi-step reasoning capabilities of language models arXiv Oct 2023 Paper - Code
Iteration head: A mechanistic study of chain-of-thought NeurIPS 2024 Jun 2024 Paper
Towards understanding how transformer perform multi-step reasoning with matching operation arXiv Jun 2024 Paper
Do LLMs Really Think Step-by-step In Implicit Reasoning? arXiv Nov 2024 Paper
Back attention: Understanding and enhancing multi-hop reasoning in large language models arXiv Feb 2025 Paper
How Do LLMs Perform Two-Hop Reasoning in Context? arXiv Feb 2025 Paper
Reasoning with latent thoughts: On the power of looped transformers arXiv Feb 2025 Paper
A little depth goes a long way: The expressive power of log-depth transformers arXiv Mar 2025 Paper
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer arXiv Jul 2025 Paper - Code

πŸ› οΈ Mechanisms of Latent CoT in Layer Representation

Title Venue Date Links
Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting NeurIPS 2019 Jul 2019 Paper
Transformer feed-forward layers are key-value memories EMNLP 2021 Dec 2020 Paper
Interpretability in the wild: a circuit for indirect object identification in GPT-2 small arXiv Nov 2022 Paper
micse: Mutual information contrastive learning for low-shot sentence embeddings arXiv Nov 2022 Paper - Code
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model NeurIPS 2023 May 2023 Paper
A mechanistic interpretation of arithmetic reasoning in language models using causal mediation analysis EMNLP 2023 May 2023 Paper - Code
Why lift so heavy? slimming large language models by cutting off the layers arXiv Feb 2024 Paper
Do large language models latently perform multi-hop reasoning? EACL 2024 Feb 2024 Paper
Understanding and Patching Compositional Reasoning in LLMs ACL 2024 (Finding) Feb 2024 Paper - Code
How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning ICLR 2024 Feb 2024 Paper
The Unreasonable Ineffectiveness of the Deeper Layers arXiv Mar 2024 Paper
Inheritune: Training Smaller Yet More Attentive Language Models arXiv Apr 2024 Paper - Code
Grokked transformers are implicit reasoners: A mechanistic journey to the edge of generalization ICML 2024 May 2024 Paper - Code
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning NeurIPS 2024 May 2024 Paper - Code
Loss landscape geometry reveals stagewise development of transformers Hi-DL 2024 Jun 2024 Paper
Hopping too late: Exploring the limitations of large language models on multi-hop queries arXiv Jun 2024 Paper
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning arXiv Jun 2024 Paper
Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons arXiv Aug 2024 Paper
Unveiling induction heads: Provable training dynamics and feature learning in transformers arXiv Sep 2024 Paper
Investigating layer importance in large language models arXiv Sep 2024 Paper
Unifying and Verifying Mechanistic Interpretations: A Case Study with Group Operations arXiv Oct 2024 Paper - Code
Understanding Layer Significance in LLM Alignment arXiv Oct 2024 Paper
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation ICLR 2025 Oct 2024 Paper - Code
Does representation matter? exploring intermediate layers in large language models arXiv Dec 2024 Paper
Layer by Layer: Uncovering Hidden Representations in Language Models ICML 2025 (oral) Feb 2025 Paper - Code
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach arXiv Feb 2025 Paper - Code
The Curse of Depth in Large Language Models arXiv Feb 2025 Paper - Code
Back attention: Understanding and enhancing multi-hop reasoning in large language models arXiv Feb 2025 Paper
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis arXiv Feb 2025 Paper
An explainable transformer circuit for compositional generalization arXiv Feb 2025 Paper
Emergent Abilities in Large Language Models: A Survey arXiv Mar 2025 Paper
Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights arXiv May 2025 Paper
Do Language Models Use Their Depth Efficiently? arXiv May 2025 Paper
Void in Language Models arXiv May 2025 Paper
LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking arXiv Aug 2025 Paper

πŸ’» Turing Completeness of Layer-Based Latent CoT

Title Venue Date Links
On the computational power of neural nets JCSS 1995 Paper
Long Short-Term Memory Neural Computation 1997 Paper
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP 2014 Jun 2014 Paper
On the turing completeness of modern neural network architectures IJCNN 2021 Jan 2019 Paper
Recurrent memory transformer NeurIPS 2022 Jul 2022 Paper
Looped transformers as programmable computers ICML 2023 Jun 2023 Paper
On limitations of the transformer architecture CoLM 2024 Nov 2023 Paper
Investigating Recurrent Transformers with Dynamic Halt arXiv Feb 2024 Paper
Chain of thought empowers transformers to solve inherently serial problems ICLR 2024 Feb 2024 Paper
Quiet-star: Language models can teach themselves to think before speaking arXiv Mar 2024 Paper
Ask, and it shall be given: On the Turing completeness of prompting arXiv Nov 2024 Paper
Reinforcement Pre-Training arXiv Jun 2025 Paper
Constant Bit-size Transformers Are Turing Complete arXiv Jun 2025 Paper
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought arXiv Jul 2025 Paper - Code

♾️ Towards Infinite-depth Reasoning


πŸŒ€ Spatial Infinite Reasoning: Text Diffusion Models

⬛ Masked Diffusion Models (Temporal-only)
Title Venue Date Links
Structured denoising diffusion models in discrete state-spaces NeurIPS 2021 Jul 2021 Paper
Discrete diffusion modeling by estimating the ratios of the data distribution ICML 2024 June 2024 Paper
Your absorbing discrete diffusion secretly models the conditional distributions of clean data arXiv Jun 2024 Paper
Learning Iterative Reasoning through Energy Diffusion ICML 2024 Jun 2024 Paper - Project
Simplified and generalized masked diffusion for discrete data NeurIPS 2024 Jun 2024 Paper -Project
Simple and effective masked diffusion language models NeurIPS 2024 Jun 2024 Paper - Code
Scaling up Masked Diffusion Models on Text arXiv Oct 2024 Paper - Project
MMaDa: Multimodal large diffusion language models arXiv May 2025 Paper - Project
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models arXiv Aug 2025 Paper - Code
⬛ Masked Diffusion Models (With Cache)
Title Venue Date Links
Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models ICLR 2024 Feb 2024 Paper - Project
Large Language Diffusion Models ICLR 2025 Workshop Feb 2025 Paper - Project
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning ICLR 2025 Feb 2025 Paper - Project
dKV-Cache: The Cache for Diffusion Language Models arXiv May 2025 Paper - Project
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching arXiv May 2025 Paper - Project
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models arXiv May 2025 Paper
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models arXiv May 2025 Paper - Project
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning arXiv June 2025 Paper - Project
Diffusion Beats Autoregressive in Data-Constrained Settings arXiv July 2025 Paper - Project - Code
πŸ”— Embedding-based Diffusion Models
Title Venue Date Links
Diffusion-LM Improves Controllable Text Generation NeurIPS 2022 May 2022 Paper - Project
Continuous diffusion for categorical data arXiv Dec 2022 Paper
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning ICLR 2023 Mar 2023 Paper - Project
Likelihood-Based Diffusion Language Models NeurIPS 2023 May 2023 Paper - Project
Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models ICLR 2024 Feb 2024 Paper - Project
TESS: Text-to-Text Self-Conditioned Simplex Diffusion EACL 2024 Feb 2024 Paper - Project
TESS 2: A Large-Scale Generalist Diffusion Language Model arXiv Feb 2025 Paper
🧬 Hybrid AR-Diffusion Models
Title Venue Date Links
Scaling Diffusion Language Models via Adaptation from Autoregressive Models ICLR 2025 Oct 2024 Paper - Project
Large Language Models to Diffusion Finetuning ICML 2025 Jan 2025 Paper - Code
Dream 7B: a large diffusion language model Blog Apr 2025 Paper - Code
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Technical Report May 2025 Paper
Mercury: Ultra-Fast Language Models Based on Diffusion arXiv June 2025 Paper - Page

πŸ•ΈοΈ Towards an 'Infinitely Long' Optimiser Network

Title Venue Date Links
MEMORYLLM: Towards Self-Updatable Large Language Models ICML 2024 Feb 2024 Paper - Code
Leave No Context Behind: Efficient infinite context transformers with infini-attention arXiv Apr 2024 Paper - Project
Learning to (learn at test time): Rnns with expressive hidden states arXiv Jul 2024 Paper
Titans: Learning to memorize at test time arXiv Jan 2025 Paper
Atlas: Learning to optimally memorize the context at test time arXiv May 2025 Paper
M+: Extending MemoryLLM with Scalable Long-Term Memory ICML 2025 May 2025 Paper - Code

πŸ“Œ Implicit Fixed Point RNNs

Title Venue Date Links
Implicit Language Models are RNNs: Balancing Parallelization and Expressivity ICML 2025 Feb 2025 Paper - Code

πŸ’¬ Discussion

Title Venue Date Links
A Survey of diffusion models in natural language processing TACL May 2023 Paper
Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis CVPR 2025 (oral) Dec 2024 Paper - Code
Large Language Diffusion Models ICLR 2025 Workshop Feb 2025 Paper - Project - Code

πŸ‘ Acknowledgement

β™₯️ Contributors

About

πŸ“– This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6