HITL Mechanisms within ReAct-based AI Agents #743

LearningGp · 2026-02-06T02:45:53Z

LearningGp
Feb 6, 2026
Maintainer

About HITL

As LLM driven agents evolve from simple QA tools into autonomous systems capable of complex reasoning, tool invocation, and long-term planning, "Human-in-the-Loop" (HITL) has ascended from a mere debugging aid to a core architectural element of production-grade systems. In high-stakes domains involving financial trading, enterprise decision-making, and code generation, complete autonomy often carries uncontrollable risks of hallucination and cascading errors. Consequently, constructing efficient, reliable intervention mechanisms aligned with human intent, without compromising Agent autonomy, has become a critical proposition in current Agent framework R&D.

HITL In AI Agents (ReAct)

HITL under the ReAct paradigm differs significantly from HITL in ChatClients or Workflows. In ChatClients, HITL is primarily dialogue-driven and initiated by humans (commands). In Workflows, there are explicit HITL nodes. However, under the ReAct paradigm, since Agents possess autonomous action capabilities, purely dialogue-based intervention is insufficient to meet requirements. Conversely, adding explicit HITL nodes at every step overly restricts the Agent's autonomy. Therefore, this document provides a preliminary survey of HITL practices and theories within the ReAct paradigm.

HITL Classification

Based on reviews of related framework practices and industry implementations, current HITL mechanisms can be categorized into 5 types from a business perspective, and distinguished by "Model-Initiated" vs. "System-Initiated" modes.

Human as Reviewer
- Planning/Intent Confirmation (Initiator: Model)
- Critical Operation Confirmation (Initiator: System)
Human as Collaborator
- Information Acquisition/Ambiguity Resolution (Initiator: Model)
- Manual Collaboration (Initiator: System/Model)
Human as Commander
- User Interruption (Initiator: System)

Detailed Introduction

Human as Reviewer

Planning/Intent Confirmation

Definition: The Agent proactively asks for confirmation after generating a critical path or complex plan.
Initiator: Model
Typical Scenario: Confirming steps after decomposing a complex task.
AS Solution: Guide the model to initiate an interruption via system prompts.

Critical Operation Confirmation

Definition: When the Agent attempts to perform operations marked as critical, the system suspends the request and mandates human confirmation (excluding automatic blocking scenarios)..
Initiator: System
Typical Scenario: Money transfers, sending emails, writing to codebases.
AS Solution: Implement using hooks combined with an Interrupt API (ref: hitl-chat).

Human as Collaborator

Information Acquisition/Ambiguity Resolution

Definition: The Agent cannot proceed with reasoning due to missing context or ambiguous parameters and requests supplementary information from humans.
Initiator: Model
Typical Scenario: Clarifying polysemes, missing necessary parameters.
[TBD] AS Solution:
- Provide a built-in information inquiry tool that the model can call to request human input.
- Provide a UserAgent that acts as a special sub-agent to interact with the Agent.

Manual Collaboration

Model Initiated

Definition: The Agent encounters a problem unsolvable in the digital world or a task outside its permissions/capabilities.
Initiator: Model
Typical Scenario: Processing Captchas, physical world actions (e.g., plugging in a network cable), subjective aesthetic judgments.
[TBD] AS Solution:
- Provide a human intervention confirmation tool. The model calls this tool to request human processing and subsequent confirmation.
- Provide a UserAgent that acts as a special sub-agent to interact with the Agent.

System Initiated

Definition: Process orchestration requires manual intervention/collaboration.
Initiator: System
Typical Scenario:
- Turn-taking: In brainstorming, multiple Agents take turns speaking, with humans participating as one of the members.
- Key Step Filling: Humans are responsible for specific steps, rather than just reviewing.
[TBD] AS Solution:
- Provide UserAgent as an Agent node in the orchestration.

Human as Commander

User Real-time Intervention

Definition: The user actively clicks "Pause" during Agent execution to alter the Agent's running trajectory.
Typical Scenario: Correcting the Agent's search direction, early termination of ineffective attempts.
AS Solution: Provide methods for post-inference pause and post-action pause within the Hook system.

Framework Support Status

Note: AI-generated, currently under manual verification.

Framework	Type	M-1 Planning/Intent	S-1 Critical Ops	M-2 Info Acquisition	M-3 Manual Collab	S-2 User Intervention	Core Mechanism & HITL Implementation Summary
AgentScope	Code	✅	✅	❌	⚠️	✅	M-1: Native support via PlanNotebook. S-1: Mechanism support via hooks + interrupt API (ref: hitl-chat). M-2: Not supported. M-3: Partial support via UserAgent. S-2: Mechanism support via interrupt API.
LangGraph	Code	✅	✅	✅	✅	✅	Currently the strongest. Based on State/Checkpointer mechanism, supports `interrupt_before`. Allows freezing threads and modifying State ("Time Travel") at any time, perfectly matching the "trajectory correction" requirement.
LlamaIndex (Workflows)	Code	✅	✅	✅	✅	✅	Similar to LangGraph, uses Event-Driven architecture. Uses `InputRequiredEvent` to pause workflows; supports global Context control.
Microsoft AutoGen	Code	✅	✅	✅	✅	⚠️	Core is Conversation. Implemented via `UserProxyAgent`'s `human_input_mode` (ALWAYS/TERMINATE). Great for conversational intervention (Class M), but less intuitive than graph frameworks for granular tool interception (S-1).
Dify	No-Code	⚠️	✅	✅	⚠️	⚠️	Provides "Question" nodes (M-2) and "Manual Confirmation" nodes (S-1). S-2 relies mainly on frontend "Stop Generation", making it hard to fine-tune intermediate reasoning states like code frameworks.
Flowise	Low-Code	⚠️	✅	✅	✅	❌	Provides `Human In The Loop` node and tool approval toggle (S-1). Logic is linear; difficult to implement complex "Plan-Modify-Retry" loops.
CrewAI	Code	✅	⚠️	✅	✅	❌	Task-level HITL. Setting `human_input=True` forces the Agent to ask the human before submitting task results (M-1). Lacks low-level interception capability for individual Tool calls.
LangChain (Legacy)	Code	⚠️	✅	✅	✅	❌	Traditional Chain. Class M requires Tool implementation (e.g., `HumanInputRun`); S-1 via `HumanApprovalCallback`. Lacks graph looping capabilities; S-2 is extremely difficult to implement.
Semantic Kernel	Code	❌	✅	⚠️	⚠️	❌	Strong in Filters. `AutoFunctionInvocationFilter` perfectly implements pre-execution approval for tools (S-1). Model-initiated interaction requires manual Prompt design.
PydanticAI	Code	✅	✅	✅	✅	⚠️	Very new framework. Uses Dependency Injection. Can insert validation logic before/after Tool execution (S-1); checks final results via `ResultValidator`.
AutoGPT	App	✅	✅	✅	❌	✅	Classic `y/n` loop. Defaults to intercepting file IO/Shell execution (S-1). Users can correct direction via Feedback after each inference step (S-2).
MetaGPT	Code	✅	⚠️	⚠️	✅	❌	Emphasizes SOP (Standard Operating Procedure). Usually implemented via `HumanProvider` or Code Review stages. Processes are rigid; difficult to intervene in real-time at arbitrary moments.
SuperAGI	App	⚠️	✅	⚠️	⚠️	✅	Features Tool Permission settings to enable Review mode for specific tools (S-1). Console supports Pause/Resume.
ADK (Bedrock)	Cloud	⚠️	✅	✅	⚠️	❌	Uses Return of Control mechanism. When the Agent needs confirmation or more info, it pauses and emits an event to the host application.
Letta (MemGPT)	Code	⚠️	❌	✅	⚠️	⚠️	Focuses on long-term memory. When the model realizes Context is missing, it sends a system-level message requesting user input (M-2).
ChatDev	Code	✅	❌	⚠️	✅	❌	Software engineering simulation. User role acts primarily as requester and acceptor (Client); intermediate processes are mostly fully automated and hard to intervene in.
Atomic Agents	Code	⚠️	⚠️	⚠️	⚠️	❌	Emphasizes atomic components. HITL requires manual flow control at the orchestration layer; no built-in specific state machine support.
Camel-AI	Code	⚠️	❌	⚠️	⚠️	❌	Role-playing framework. Users usually act as Observers. Unless a Human Role is specifically configured to enter the loop, intervention is difficult.

TBD

Is the HITL classification reasonable? Are there missing scenarios or industry implementations?

Does every scenario require a corresponding example, or is relevant documentation sufficient?

Is it necessary to implement built-in information inquiry tools and human intervention confirmation tools, or should they only be presented as examples?

关于 HITL

随着大语言模型（LLM）驱动的智能体（AI Agent）从单一的问答工具向具备复杂推理、工具调用及长期规划能力的自主系统演进，"人机协同"（Human-in-the-Loop, HITL）已从单纯的调试手段跃升为生产级系统的核心架构要素。在涉及金融交易、企业决策、代码生成等高风险领域的应用中，完全的自主性往往伴随着不可控的幻觉风险与级联错误。因此，如何在保持 Agent 自主性的同时，构建高效、可靠且符合人类意图的干预机制，成为当前 Agent 框架研发的关键命题。

关于 HITL In AI Agent（ReAct）

同时 ReAct 范式下的 HITL 也已经与此前 ChatClient 或是 Workflow 下的 HITL 不尽相同。ChatClient 中 HITL 主要通过对话实现，由人类发起命令；在 Workflow 中会有明确的 HITL 节点。而在 ReAct 范式下，由于 Agent 具备了自主行动的能力，只通过对话来干预难以满足需求，在每一个环节增加明确的 HITL 节点又在一定程度上过多地限制了 Agent 的自主性。因此本文对 ReAct 范式下的 HITL 实践和理论做了初步的调研。

HITL 分类

在查看了一些框架中的相关实践和一些业界分享的实现后，现阶段的 HITL 从业务视角可以分为 5 类，从发起方可以分为模型主动中断以及系统中断。

审核类 (Human as Reviewer)__
- 规划/意图确认（发起方：模型）
- 关键操作确认（发起方：系统）
协作类 (Human as Collaborator)
- 信息获取/歧义消除（发起方：模型）
- 人工协作（发起方：系统/模型）
控制类 (Human as Commander)
- 用户中断（发起方：系统）

详细介绍

审核类 (Human as Reviewer)

规划/意图确认

定义：Agent 产出关键路径或复杂方案后，按设定主动询问
发起方：模型
典型场景：复杂任务拆解后的步骤确认。
AS 解法：通过系统提示词引导模型进行主动中断。

关键操作确认

定义：当 Agent 试图进行一些被标记为关键的操作时，系统挂起请求，强制要求人类确认（这里不包含自动拦截场景）。
发起方：系统
典型场景：转账、发邮件、写代码库等工具调用。
AS 解法：通过 hook 搭配打断 API 实现，参考 hitl-chat。

协作类 (Human as Collaborator)

信息获取/歧义消除

定义：上下文缺失、参数模糊时，Agent 无法继续推理，请求人类补充信息。
发起方：模型
典型场景：多义词澄清、缺少必要参数。
【待评估】AS 解法：
- 提供内置信息询问工具，模型通过调用该工具可以请求人类输入。
- 提供 UserAgent ，可作为特殊的子 Agent 与 Agent 交互。

人工协作

模型发起

定义：Agent 遇到非数字世界能解决的问题，或者遇到其权限/能力范围之外的任务。
发起方：模型
典型场景：处理物理验证码（Captcha）、需要物理世界的动作（去机房插网线）、需要主观审美判断。
【待评估】AS 解法：
- 提供人类介入确认工具，模型通过调用该工具可以请求人类进行处理后确认。
- 提供 UserAgent ，可作为特殊的子 Agent 与 Agent 交互。

系统发起

定义：流程编排中需要人工介入协作
发起方：系统
典型场景：
- 人机轮流发言：头脑风暴中，多 Agent 轮流发言，人类参与作为其中一员。
- 关键环节填充：这里人类负责特定环节，而非审核。
【待评估】AS 解法：
- 提供 UserAgent ，作为编排中的 Agent 节点

控制类 (Human as Commander)

用户实时干预

定义：用户在 Agent 运行过程中，主动点击“暂停”，改变 Agent 的运行轨迹。
典型场景：纠正 Agent 的搜索方向、提前终止无效尝试。
AS 解法：在 Hook 体系中提供了推理后暂停以及行动后暂停的方法。

框架支持情况

AI 生成，人工确认中

框架名称	类型	M-1规划/意图确认	S-1关键操作确认	M-2信息获取/消歧消除	M-3人工协作	S-2用户实时干预	核心机制与 HITL 实现简述
AgentScope	Code	✅	✅	❌	⚠️	✅	M-1：原生支持，通过 PlanNotebook 实现。 S-1：机制支持，通过 hook 搭配打断 API 实现，参考 hitl-chat。 M-2：不支持。 M-3：部分支持，提供 UserAgent 支持部分实现。 S-2：机制支持，提供打断 API 实现。
LangGraph	Code	✅	✅	✅	✅	✅	目前最强。基于 State/Checkpointer 机制，支持 `interrupt_before`。可随时冻结线程、修改 State（实现“时光倒流”），完美契合文档中提到的“纠正运行轨迹”需求。
LlamaIndex (Workflows)	Code	✅	✅	✅	✅	✅	类似 LangGraph，采用 Event-Driven 架构。使用 `InputRequiredEvent` 暂停工作流，支持全局 Context 控制。
Microsoft AutoGen	Code	✅	✅	✅	✅	⚠️	核心是 Conversation。通过 `UserProxyAgent` 的 `human_input_mode` (ALWAYS/TERMINATE) 实现。非常适合对话式介入（M类），但在细粒度工具拦截（S-1）上不如图框架直观。
Dify	No-Code	⚠️	✅	✅	⚠️	⚠️	提供“问题”节点（M-2）和“人工确认”节点（S-1）。S-2 主要靠前端“停止生成”，难以像代码框架那样精细修改中间推理状态。
Flowise	Low-Code	⚠️	✅	✅	✅	❌	提供 `Human In The Loop` 节点和工具审批开关（S-1）。逻辑较线性，难以实现复杂的“规划-修改-重试”循环。
CrewAI	Code	✅	⚠️	✅	✅	❌	任务级 HITL。设置 `human_input=True` 后，Agent 在提交任务结果前强制询问人类（M-1）。缺乏对单个 Tool 调用过程的底层拦截能力。
LangChain (Legacy)	Code	⚠️	✅	✅	✅	❌	传统 Chain。M 类需通过 Tool 实现（如 `HumanInputRun`）；S-1 通过 `HumanApprovalCallback` 实现。缺乏图的循环能力，S-2 极难实现。
Semantic Kernel	Code	❌	✅	⚠️	⚠️	❌	强在 Filters (拦截器)。通过 `AutoFunctionInvocationFilter` 可完美实现工具调用前的审批（S-1）。模型主动交互需在 Prompt 中手动设计。
PydanticAI	Code	✅	✅	✅	✅	⚠️	极新框架。利用 Dependency Injection。可在 Tool 执行前后插入验证逻辑（S-1）；通过 `ResultValidator` 检查最终结果。
AutoGPT	App	✅	✅	✅	❌	✅	经典的 `y/n` 循环。默认对文件读写/Shell 执行进行拦截（S-1）。用户可在每一步推理后通过 Feedback 纠正方向（S-2）。
MetaGPT	Code	✅	⚠️	⚠️	✅	❌	强调 SOP (标准作业程序)。通常通过 `HumanProvider` 或 Code Review 环节实现。流程较刚性，难以进行任意时刻的实时干预。
SuperAGI	App	⚠️	✅	⚠️	⚠️	✅	拥有 Tool Permission 设置，可对特定工具开启 Review 模式（S-1）。控制台支持暂停/恢复。
ADK (Bedrock)	Cloud	⚠️	✅	✅	⚠️	❌	通过 Return of Control 机制。当 Agent 需要确认或更多信息时，会暂停并抛出事件给宿主应用。
Letta (MemGPT)	Code	⚠️	❌	✅	⚠️	⚠️	侧重长期记忆。当模型意识到 Context 缺失时，会发送系统级消息请求用户输入（M-2）。
ChatDev	Code	✅	❌	⚠️	✅	❌	软件工程模拟。User 角色主要作为需求提出者和验收者（Client），中间过程基本是全自动的，干预难度大。
Atomic Agents	Code	⚠️	⚠️	⚠️	⚠️	❌	强调原子化组件。HITL 需要在编排层手动控制流程，无内置特定状态机支持。
Camel-AI	Code	⚠️	❌	⚠️	⚠️	❌	角色扮演框架。用户通常作为观察者（Observer），除非专门设定一个 Human Role 进入循环，否则很难插手。

TBD

关于 HITL 的分类是否合理，有无遗漏的场景或者业界实现？

每一个场景是否都需要对应的 example，还是只需要相关文档？

是否需要内置信息询问工具以及人类介入确认工具的实现，还是仅作为 example 展示？

REF

https://github.com/HenryPengZou/Awesome-Human-Agent-Collaboration-Interaction-Systems

https://www.elastic.co/search-labs/blog/human-in-the-loop-hitllanggraph-elasticsearch

https://www.reddit.com/r/n8n/comments/1nt8jyj/the_future_of_ai_agents_humanintheloop_is_the/

https://aws.amazon.com/cn/blogs/machine-learning/implement-human-in-the-loop-confirmation-with-amazon-bedrock-agents/

https://gemini.google.com/share/8626f1a62770

https://arxiv.org/html/2505.00753

https://arxiv.org/html/2507.22358v1

https://github.com/bytedance/deer-flow

angel-heart · 2026-03-13T03:22:55Z

angel-heart
Mar 13, 2026

表格中的AgentScope是AgentScope的python版本还是java版本？

Is the AgentScope in the table a python or java version of AgentScope?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HITL Mechanisms within ReAct-based AI Agents #743

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

HITL Mechanisms within ReAct-based AI Agents #743

Uh oh!

LearningGp Feb 6, 2026 Maintainer

About HITL

HITL In AI Agents (ReAct)

HITL Classification

Detailed Introduction

Human as Reviewer

Planning/Intent Confirmation

Critical Operation Confirmation

Human as Collaborator

Information Acquisition/Ambiguity Resolution

Manual Collaboration

Model Initiated

System Initiated

Human as Commander

User Real-time Intervention

Framework Support Status

TBD

关于 HITL

关于 HITL In AI Agent（ReAct）

HITL 分类

详细介绍

审核类 (Human as Reviewer)

规划/意图确认

关键操作确认

协作类 (Human as Collaborator)

信息获取/歧义消除

人工协作

模型发起

系统发起

控制类 (Human as Commander)

用户实时干预

框架支持情况

TBD

REF

Replies: 1 comment

Uh oh!

angel-heart Mar 13, 2026

LearningGp
Feb 6, 2026
Maintainer

angel-heart
Mar 13, 2026