问题描述
重构设计文档(docs/design/hiclaw-controller-refactor.md Section 3.2)定义的 WorkerBackend 接口只有 5 个方法:Create、Delete、Status、Exec、Logs。
但设计文档的其他章节描述了依赖 Start/Stop/Update 能力的核心功能,这些功能在当前接口定义下无法实现。
缺失的能力与依赖它们的功能
缺少 Start/Stop
设计文档 Section 5.2-5.3 定义了 Worker 的 sleep/wake 生命周期:
Running ──idle timeout──> Sleeping ──wake──> Running
Team Leader 通过 hiclaw worker wake / hiclaw worker sleep 管理 Worker 容器的启停。Section 5.3 的 CLI 命令表也明确列出了:
| 操作 |
CLI 命令 |
Controller 行为 |
| 唤醒 |
hiclaw worker wake --name W |
Backend.Start(W) |
| 休眠 |
hiclaw worker sleep --name W |
Backend.Stop(W) |
但 WorkerBackend 接口中没有 Start 和 Stop 方法。
当前 docker-proxy(docker-proxy/security.go)已经支持容器的 start/stop/restart 操作,说明这个能力在现有系统中已经存在,只是没有被抽象到接口层。
缺少 Update
设计文档 Section 8.4 描述了 Runtime 引擎的滚动升级流程:
WorkerReconciler 检测到 image 变化 → 创建新版本 Worker 实例 → 等待新实例就绪 → 删除旧实例
这个"创建新 + 删除旧"的模式在 Docker 模式下是合理的,但在 K8s 模式下,更自然的做法是直接 patch Pod spec(原地升级)。没有 Update 方法意味着 K8sBackend 无法利用 K8s 原生的滚动更新能力,只能走"删旧建新"的路径。
为什么是接口设计问题而不是实现问题
WorkerBackend 接口在 Phase 1 定义,Phase 2(K8sBackend)、Phase 3(Team Leader 生命周期管理)、Phase 4(滚动升级)都依赖它。如果 Phase 1 定义的接口不完整,后续 Phase 加方法时会导致:
- 所有已实现的 Backend(DockerBackend、K8sBackend)都需要补充实现
- Reconciler 中的状态机逻辑需要重写(从只有 Create/Delete 两个动作变成 Create/Start/Stop/Update/Delete 五个动作)
- Worker 的 Phase 状态机(Pending → Running → Sleeping → Updating → Running)需要重新设计
接口是架构的骨架,必须在 Phase 1 一次设计到位。
当前代码位置
- 重构设计:
docs/design/hiclaw-controller-refactor.md Section 3.2(WorkerBackend 接口定义)
- 依赖 Start/Stop:Section 5.2-5.3(Team Leader Worker 生命周期管理)
- 依赖 Update:Section 8.4(Runtime 引擎升级)
- 现有 start/stop 能力:
docker-proxy/security.go(已支持容器 start/stop/restart)
相关
Problem description
The WorkerBackend interface defined in the refactoring design document (docs/design/hiclaw-controller-refactor.md Section 3.2) has only 5 methods: Create, Delete, Status, Exec, Logs.
But other sections of the design document describe core functionality that relies on the Start/Stop/Update capabilities, which cannot be implemented under the current interface definition.
Missing capabilities and functionality that relies on them
Missing Start/Stop
Design document Section 5.2-5.3 defines the sleep/wake life cycle of Worker:
Running ──idle timeout──> Sleeping ──wake──> Running
Team Leader manages the start and stop of Worker containers through hiclaw worker wake / hiclaw worker sleep. The CLI command list in Section 5.3 also explicitly lists:
| Operations |
CLI Commands |
Controller Behavior |
| Wake |
hiclaw worker wake --name W |
Backend.Start(W) |
| Hibernate |
hiclaw worker sleep --name W |
Backend.Stop(W) |
But there are no Start and Stop methods in the WorkerBackend interface.
Currently, docker-proxy (docker-proxy/security.go) already supports the start/stop/restart operations of containers, indicating that this capability already exists in existing systems, but has not been abstracted to the interface layer.
Missing Update
Design document Section 8.4 describes the rolling upgrade process of the Runtime engine:
WorkerReconciler detects the image change → creates a new version of the Worker instance → waits for the new instance to be ready → deletes the old instance
This "create new + delete old" model is reasonable in Docker mode, but in K8s mode, the more natural approach is to directly patch Pod spec (in-place upgrade). The absence of the Update method means that K8sBackend cannot take advantage of K8s's native rolling update capability and can only take the path of "delete the old and create the new".
Why is it an interface design issue rather than an implementation issue?
The WorkerBackend interface is defined in Phase 1, and Phase 2 (K8sBackend), Phase 3 (Team Leader life cycle management), and Phase 4 (rolling upgrade) all rely on it. If the interface defined in Phase 1 is incomplete, subsequent addition of methods in Phase will result in:
- All implemented Backends (DockerBackend, K8sBackend) need to be implemented additionally
- The state machine logic in Reconciler needs to be rewritten (from only two actions Create/Delete to five actions Create/Start/Stop/Update/Delete)
- Worker's Phase state machine (Pending → Running → Sleeping → Updating → Running) needs to be redesigned
The interface is the backbone of the architecture and must be designed once in Phase 1.
Current code location
- Refactoring design:
docs/design/hiclaw-controller-refactor.md Section 3.2 (WorkerBackend interface definition)
- Depends on Start/Stop: Section 5.2-5.3 (Team Leader Worker life cycle management)
- Depends on Update: Section 8.4 (Runtime engine upgrade)
- Existing start/stop capability:
docker-proxy/security.go (container start/stop/restart is already supported)
Related
问题描述
重构设计文档(
docs/design/hiclaw-controller-refactor.mdSection 3.2)定义的WorkerBackend接口只有 5 个方法:Create、Delete、Status、Exec、Logs。但设计文档的其他章节描述了依赖 Start/Stop/Update 能力的核心功能,这些功能在当前接口定义下无法实现。
缺失的能力与依赖它们的功能
缺少 Start/Stop
设计文档 Section 5.2-5.3 定义了 Worker 的 sleep/wake 生命周期:
Team Leader 通过
hiclaw worker wake/hiclaw worker sleep管理 Worker 容器的启停。Section 5.3 的 CLI 命令表也明确列出了:hiclaw worker wake --name Whiclaw worker sleep --name W但
WorkerBackend接口中没有Start和Stop方法。当前 docker-proxy(
docker-proxy/security.go)已经支持容器的 start/stop/restart 操作,说明这个能力在现有系统中已经存在,只是没有被抽象到接口层。缺少 Update
设计文档 Section 8.4 描述了 Runtime 引擎的滚动升级流程:
这个"创建新 + 删除旧"的模式在 Docker 模式下是合理的,但在 K8s 模式下,更自然的做法是直接 patch Pod spec(原地升级)。没有
Update方法意味着 K8sBackend 无法利用 K8s 原生的滚动更新能力,只能走"删旧建新"的路径。为什么是接口设计问题而不是实现问题
WorkerBackend接口在 Phase 1 定义,Phase 2(K8sBackend)、Phase 3(Team Leader 生命周期管理)、Phase 4(滚动升级)都依赖它。如果 Phase 1 定义的接口不完整,后续 Phase 加方法时会导致:接口是架构的骨架,必须在 Phase 1 一次设计到位。
当前代码位置
docs/design/hiclaw-controller-refactor.mdSection 3.2(WorkerBackend 接口定义)docker-proxy/security.go(已支持容器 start/stop/restart)相关
Problem description
The
WorkerBackendinterface defined in the refactoring design document (docs/design/hiclaw-controller-refactor.mdSection 3.2) has only 5 methods:Create,Delete,Status,Exec,Logs.But other sections of the design document describe core functionality that relies on the Start/Stop/Update capabilities, which cannot be implemented under the current interface definition.
Missing capabilities and functionality that relies on them
Missing Start/Stop
Design document Section 5.2-5.3 defines the sleep/wake life cycle of Worker:
Team Leader manages the start and stop of Worker containers through
hiclaw worker wake/hiclaw worker sleep. The CLI command list in Section 5.3 also explicitly lists:hiclaw worker wake --name Whiclaw worker sleep --name WBut there are no
StartandStopmethods in theWorkerBackendinterface.Currently, docker-proxy (
docker-proxy/security.go) already supports the start/stop/restart operations of containers, indicating that this capability already exists in existing systems, but has not been abstracted to the interface layer.Missing Update
Design document Section 8.4 describes the rolling upgrade process of the Runtime engine:
This "create new + delete old" model is reasonable in Docker mode, but in K8s mode, the more natural approach is to directly patch Pod spec (in-place upgrade). The absence of the
Updatemethod means that K8sBackend cannot take advantage of K8s's native rolling update capability and can only take the path of "delete the old and create the new".Why is it an interface design issue rather than an implementation issue?
The
WorkerBackendinterface is defined in Phase 1, and Phase 2 (K8sBackend), Phase 3 (Team Leader life cycle management), and Phase 4 (rolling upgrade) all rely on it. If the interface defined in Phase 1 is incomplete, subsequent addition of methods in Phase will result in:The interface is the backbone of the architecture and must be designed once in Phase 1.
Current code location
docs/design/hiclaw-controller-refactor.mdSection 3.2 (WorkerBackend interface definition)docker-proxy/security.go(container start/stop/restart is already supported)Related