hiclaw-controller 重构：WorkerBackend 接口缺少 Start/Stop/Update，无法支撑 Worker 生命周期管理 || hiclaw-controller refactoring: WorkerBackend interface lacks Start/Stop/Update and cannot support Worker life cycle management

## 问题描述

重构设计文档（`docs/design/hiclaw-controller-refactor.md` Section 3.2）定义的 `WorkerBackend` 接口只有 5 个方法：`Create`、`Delete`、`Status`、`Exec`、`Logs`。

但设计文档的其他章节描述了依赖 **Start/Stop/Update** 能力的核心功能，这些功能在当前接口定义下无法实现。

## 缺失的能力与依赖它们的功能

### 缺少 Start/Stop

设计文档 Section 5.2-5.3 定义了 Worker 的 sleep/wake 生命周期：

```
Running ──idle timeout──> Sleeping ──wake──> Running
```

Team Leader 通过 `hiclaw worker wake` / `hiclaw worker sleep` 管理 Worker 容器的启停。Section 5.3 的 CLI 命令表也明确列出了：

| 操作 | CLI 命令 | Controller 行为 |
|------|---------|----------------|
| 唤醒 | `hiclaw worker wake --name W` | Backend.Start(W) |
| 休眠 | `hiclaw worker sleep --name W` | Backend.Stop(W) |

但 `WorkerBackend` 接口中没有 `Start` 和 `Stop` 方法。

当前 docker-proxy（`docker-proxy/security.go`）已经支持容器的 start/stop/restart 操作，说明这个能力在现有系统中已经存在，只是没有被抽象到接口层。

### 缺少 Update

设计文档 Section 8.4 描述了 Runtime 引擎的滚动升级流程：

> WorkerReconciler 检测到 image 变化 → 创建新版本 Worker 实例 → 等待新实例就绪 → 删除旧实例

这个"创建新 + 删除旧"的模式在 Docker 模式下是合理的，但在 K8s 模式下，更自然的做法是直接 patch Pod spec（原地升级）。没有 `Update` 方法意味着 K8sBackend 无法利用 K8s 原生的滚动更新能力，只能走"删旧建新"的路径。

## 为什么是接口设计问题而不是实现问题

`WorkerBackend` 接口在 Phase 1 定义，Phase 2（K8sBackend）、Phase 3（Team Leader 生命周期管理）、Phase 4（滚动升级）都依赖它。如果 Phase 1 定义的接口不完整，后续 Phase 加方法时会导致：

- 所有已实现的 Backend（DockerBackend、K8sBackend）都需要补充实现
- Reconciler 中的状态机逻辑需要重写（从只有 Create/Delete 两个动作变成 Create/Start/Stop/Update/Delete 五个动作）
- Worker 的 Phase 状态机（Pending → Running → Sleeping → Updating → Running）需要重新设计

接口是架构的骨架，必须在 Phase 1 一次设计到位。

## 当前代码位置

- 重构设计：`docs/design/hiclaw-controller-refactor.md` Section 3.2（WorkerBackend 接口定义）
- 依赖 Start/Stop：Section 5.2-5.3（Team Leader Worker 生命周期管理）
- 依赖 Update：Section 8.4（Runtime 引擎升级）
- 现有 start/stop 能力：`docker-proxy/security.go`（已支持容器 start/stop/restart）

## 相关

- 重构设计方案 #551

---
## Problem description

The `WorkerBackend` interface defined in the refactoring design document (`docs/design/hiclaw-controller-refactor.md` Section 3.2) has only 5 methods: `Create`, `Delete`, `Status`, `Exec`, `Logs`.

But other sections of the design document describe core functionality that relies on the **Start/Stop/Update** capabilities, which cannot be implemented under the current interface definition.

## Missing capabilities and functionality that relies on them

### Missing Start/Stop

Design document Section 5.2-5.3 defines the sleep/wake life cycle of Worker:

```
Running ──idle timeout──> Sleeping ──wake──> Running
```

Team Leader manages the start and stop of Worker containers through `hiclaw worker wake` / `hiclaw worker sleep`. The CLI command list in Section 5.3 also explicitly lists:

| Operations | CLI Commands | Controller Behavior |
|------|---------|----------------|
| Wake | `hiclaw worker wake --name W` | Backend.Start(W) |
| Hibernate | `hiclaw worker sleep --name W` | Backend.Stop(W) |

But there are no `Start` and `Stop` methods in the `WorkerBackend` interface.

Currently, docker-proxy (`docker-proxy/security.go`) already supports the start/stop/restart operations of containers, indicating that this capability already exists in existing systems, but has not been abstracted to the interface layer.

### Missing Update

Design document Section 8.4 describes the rolling upgrade process of the Runtime engine:

> WorkerReconciler detects the image change → creates a new version of the Worker instance → waits for the new instance to be ready → deletes the old instance

This "create new + delete old" model is reasonable in Docker mode, but in K8s mode, the more natural approach is to directly patch Pod spec (in-place upgrade). The absence of the `Update` method means that K8sBackend cannot take advantage of K8s's native rolling update capability and can only take the path of "delete the old and create the new".

## Why is it an interface design issue rather than an implementation issue?

The `WorkerBackend` interface is defined in Phase 1, and Phase 2 (K8sBackend), Phase 3 (Team Leader life cycle management), and Phase 4 (rolling upgrade) all rely on it. If the interface defined in Phase 1 is incomplete, subsequent addition of methods in Phase will result in:

- All implemented Backends (DockerBackend, K8sBackend) need to be implemented additionally
- The state machine logic in Reconciler needs to be rewritten (from only two actions Create/Delete to five actions Create/Start/Stop/Update/Delete)
- Worker's Phase state machine (Pending → Running → Sleeping → Updating → Running) needs to be redesigned

The interface is the backbone of the architecture and must be designed once in Phase 1.

## Current code location

- Refactoring design: `docs/design/hiclaw-controller-refactor.md` Section 3.2 (WorkerBackend interface definition)
- Depends on Start/Stop: Section 5.2-5.3 (Team Leader Worker life cycle management)
- Depends on Update: Section 8.4 (Runtime engine upgrade)
- Existing start/stop capability: `docker-proxy/security.go` (container start/stop/restart is already supported)

## Related

- Refactoring design solution #551
- Reconciler idempotence issue #555

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hiclaw-controller 重构：WorkerBackend 接口缺少 Start/Stop/Update，无法支撑 Worker 生命周期管理 || hiclaw-controller refactoring: WorkerBackend interface lacks Start/Stop/Update and cannot support Worker life cycle management #556

问题描述

缺失的能力与依赖它们的功能

缺少 Start/Stop

缺少 Update

为什么是接口设计问题而不是实现问题

当前代码位置

相关

Problem description

Missing capabilities and functionality that relies on them

Missing Start/Stop

Missing Update

Why is it an interface design issue rather than an implementation issue?

Current code location

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

操作	CLI 命令	Controller 行为
唤醒	`hiclaw worker wake --name W`	Backend.Start(W)
休眠	`hiclaw worker sleep --name W`	Backend.Stop(W)

Operations	CLI Commands	Controller Behavior
Wake	`hiclaw worker wake --name W`	Backend.Start(W)
Hibernate	`hiclaw worker sleep --name W`	Backend.Stop(W)

hiclaw-controller 重构：WorkerBackend 接口缺少 Start/Stop/Update，无法支撑 Worker 生命周期管理 || hiclaw-controller refactoring: WorkerBackend interface lacks Start/Stop/Update and cannot support Worker life cycle management #556

Description

问题描述

缺失的能力与依赖它们的功能

缺少 Start/Stop

缺少 Update

为什么是接口设计问题而不是实现问题

当前代码位置

相关

Problem description

Missing capabilities and functionality that relies on them

Missing Start/Stop

Missing Update

Why is it an interface design issue rather than an implementation issue?

Current code location

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions