agentscope-ai · vanwaals · May 8, 2026
diff --git a/docs/sphinx_doc/source/getting-started/embedded-mode.md b/docs/sphinx_doc/source/getting-started/embedded-mode.md
@@ -0,0 +1,117 @@
+# Embedded Mode
+
+## Background
+
+TuFT is designed to serve as a **transparent compute service layer** for RL training frameworks like Trinity and veRL. In production, TuFT typically runs as a standalone daemon (`tuft launch`), and users must:
+
+1. Write a `tuft_config.yaml` configuration file
+2. Manually start the server with `tuft launch --config ...`
+3. Set the `TINKER_BASE_URL` environment variable for clients to connect
+
+This manual setup creates friction, especially for:
+- **RL framework users** who just want to run training scripts without learning TuFT internals
+- **Development/debugging** workflows where quick iteration is key
+- **CI pipelines** that need reproducible, self-contained environments
+
+**Embedded mode** solves this by providing a `tuft.init()` API — similar to `ray.init()` — that handles service discovery, configuration generation, startup, and connection automatically.
+
+## Two Modes of Operation
+
+| | Daemon Mode | Embedded Mode |
+|---|---|---|
+| How to start | `tuft launch --config ...` | `tuft.init(model=...)` |
+| Lifecycle | Independent process, manually managed | Follows main process, auto-cleanup via atexit |
+| Best for | Production deployments, multi-user shared clusters | Dev/debug, training scripts, CI |
+| Service discovery | User sets `TINKER_BASE_URL` manually | Automatic (env var → address file → process scan → default port) |
+
+**Both modes coexist**: `tuft.init()` first tries to discover an existing daemon. Only when no running service is found does it start an embedded instance.
+
+## Quick Start
+
+```python
+import tuft
+
+# Initialize TuFT — auto-discovers existing service or starts one
+tuft.init(model="/path/to/Qwen2.5-0.5B-Instruct")
+
+# Use the service client for training
+training_client = tuft.create_training_client(
+    base_model="Qwen2.5-0.5B-Instruct",
+    rank=8,
+)
+# ... your training loop ...
+
+# Optional: explicit shutdown (atexit handles this automatically)
+tuft.shutdown()
+```
+
+### Other `init()` patterns
+
+```python
+# Connect to a specific running server
+tuft.init(address="http://gpu-cluster:10610")
+
+# Use an existing config file
+tuft.init(config="/path/to/tuft_config.yaml")
+
+# No arguments — relies on env vars or default config file
+tuft.init()
+
+# Get a service client (auto-inits if not already done)
+service_client = tuft.get_service_client()
+```
+
+## Service Discovery Priority
+
+When `tuft.init()` is called, it tries to find an existing service in this order:
+
+1. `address=...` argument passed to `init()`
+2. `TUFT_ADDRESS` environment variable
+3. Address file at `~/.tuft/tuft_current_server`
+4. Process scan (looks for running `tuft launch` or `uvicorn` processes)
+5. Default port probe: `http://127.0.0.1:10610`
+
+If no service is found, embedded mode starts a new one using configuration from:
+
+1. `config=...` argument passed to `init()`
+2. `TUFT_CONFIG` environment variable
+3. `model=...` argument → auto-generates minimal config
+4. `TUFT_MODEL_PATH` environment variable → auto-generates minimal config
+5. Default config file: `~/.tuft/configs/tuft_config.yaml`
+6. None available → raises `RuntimeError` with helpful guidance
+
+## Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `TUFT_ADDRESS` | Address of running TuFT service | — |
+| `TUFT_API_KEY` | API authentication key | Auto-generated |
+| `TUFT_CONFIG` | Path to configuration file | — |
+| `TUFT_MODEL_PATH` | Model path for auto-config generation | — |
+| `TUFT_ENABLE_AUTO_CONNECT` | Enable auto-connect in `get_service_client()` | `"1"` |
+| `TUFT_HOME` | TuFT home directory | `~/.tuft` |
+| `TUFT_HOST` | Server bind address | `127.0.0.1` |
+| `TUFT_PORT` | Server bind port | `10610` |
+
+## Lifecycle
+
+- **Embedded services** are tied to the main process. When the Python process exits (normally or via signal), the embedded TuFT server is automatically terminated via `atexit`.
+- **Daemon services** (`tuft launch`) are independent and persist until manually stopped.
+- `tuft.shutdown()` can be called explicitly to stop an embedded service early.
+- `tuft.init()` is **idempotent** — calling it multiple times is safe (no-op after first success).
+
+## Integration with RL Frameworks
+
+For framework integrations (e.g., Trinity), the pattern is:
+
+```python
+import tuft
+
+# In your framework's initialization code:
+tuft.init(model=model_path, ignore_reinit_error=True)
+service_client = tuft.get_service_client()
+
+# Use service_client as before...
+```
+
+This requires no changes to the user's workflow — the framework handles TuFT setup transparently.
diff --git a/docs/sphinx_doc/source/getting-started/index.md b/docs/sphinx_doc/source/getting-started/index.md
@@ -20,6 +20,14 @@ Install TuFT from source, PyPI, or Docker.
 
 Run your first training and sampling example with TuFT.
 :::
+
+:::{grid-item-card} Embedded Mode
+:link: embedded-mode
+:link-type: doc
+:shadow: none
+
+Use `tuft.init()` for automatic service discovery and startup.
+:::
 ```
 
 ```{toctree}
@@ -28,4 +36,5 @@ Run your first training and sampling example with TuFT.
 
 installation
 quickstart
+embedded-mode
 ```
diff --git a/docs/sphinx_doc/source_zh/getting-started/embedded-mode.md b/docs/sphinx_doc/source_zh/getting-started/embedded-mode.md
@@ -0,0 +1,117 @@
+# 嵌入式模式
+
+## 背景
+
+TuFT 被设计为 RL 训练框架（如 Trinity）的**透明计算服务层**。在生产环境中，TuFT 通常作为独立守护进程运行（`tuft launch`），用户需要：
+
+1. 编写 `tuft_config.yaml` 配置文件
+2. 手动执行 `tuft launch --config ...` 启动服务
+3. 设置 `TINKER_BASE_URL` 环境变量供客户端连接
+
+这种手动配置带来了额外负担，尤其是：
+- **RL 框架用户**：只想运行训练脚本，不想学习 TuFT 的安装和配置
+- **开发调试**：需要快速迭代的工作流
+- **CI 流水线**：需要可复现的自包含环境
+
+**嵌入式模式**通过提供 `tuft.init()` API 解决了这个问题——类似 `ray.init()`——自动完成服务发现、配置生成、启动和连接。
+
+## 两种运行模式
+
+| | 守护进程模式 | 嵌入式模式 |
+|---|---|---|
+| 启动方式 | `tuft launch --config ...` | `tuft.init(model=...)` |
+| 生命周期 | 独立进程，手动管理 | 跟随主进程，atexit 自动清理 |
+| 适用场景 | 生产部署、多用户共享集群 | 开发调试、训练脚本、CI |
+| 服务发现 | 用户手动设置 `TINKER_BASE_URL` | 自动（环境变量 → 地址文件 → 进程扫描 → 默认端口） |
+
+**两种模式共存**：`tuft.init()` 首先尝试发现已有的守护进程服务。只有在找不到运行中的服务时，才会启动嵌入式实例。
+
+## 快速开始
+
+```python
+import tuft
+
+# 初始化 TuFT — 自动发现已有服务或启动一个新的
+tuft.init(model="/path/to/Qwen2.5-0.5B-Instruct")
+
+# 使用 service client 进行训练
+training_client = tuft.create_training_client(
+    base_model="Qwen2.5-0.5B-Instruct",
+    rank=8,
+)
+# ... 你的训练循环 ...
+
+# 可选：显式关闭（atexit 会自动处理）
+tuft.shutdown()
+```
+
+### 其他 `init()` 模式
+
+```python
+# 连接到指定的运行中服务
+tuft.init(address="http://gpu-cluster:10610")
+
+# 使用已有配置文件
+tuft.init(config="/path/to/tuft_config.yaml")
+
+# 无参数 — 依赖环境变量或默认配置文件
+tuft.init()
+
+# 获取 service client（未初始化时自动触发 init）
+service_client = tuft.get_service_client()
+```
+
+## 服务发现优先级
+
+调用 `tuft.init()` 时，按以下顺序尝试发现已有服务：
+
+1. `address=...` 参数显式传入
+2. `TUFT_ADDRESS` 环境变量
+3. 地址文件 `~/.tuft/tuft_current_server`
+4. 进程扫描（查找运行中的 `tuft launch` 或 `uvicorn` 进程）
+5. 默认端口探测：`http://127.0.0.1:10610`
+
+如果未发现服务，嵌入式模式按以下优先级获取配置并启动：
+
+1. `config=...` 参数显式传入
+2. `TUFT_CONFIG` 环境变量
+3. `model=...` 参数 → 自动生成最小配置
+4. `TUFT_MODEL_PATH` 环境变量 → 自动生成最小配置
+5. 默认配置文件：`~/.tuft/configs/tuft_config.yaml`
+6. 全部没有 → 抛出 `RuntimeError` 并给出提示
+
+## 环境变量
+
+| 变量 | 说明 | 默认值 |
+|------|------|--------|
+| `TUFT_ADDRESS` | TuFT 服务地址 | — |
+| `TUFT_API_KEY` | API 认证密钥 | 自动生成 |
+| `TUFT_CONFIG` | 配置文件路径 | — |
+| `TUFT_MODEL_PATH` | 模型路径（用于自动生成配置） | — |
+| `TUFT_ENABLE_AUTO_CONNECT` | 启用 `get_service_client()` 自动连接 | `"1"` |
+| `TUFT_HOME` | TuFT 主目录 | `~/.tuft` |
+| `TUFT_HOST` | 服务绑定地址 | `127.0.0.1` |
+| `TUFT_PORT` | 服务绑定端口 | `10610` |
+
+## 生命周期
+
+- **嵌入式服务**绑定到主进程。当 Python 进程退出（正常或信号）时，嵌入式 TuFT 服务通过 `atexit` 自动终止。
+- **守护进程服务**（`tuft launch`）独立运行，持续到手动停止。
+- `tuft.shutdown()` 可显式调用以提前停止嵌入式服务。
+- `tuft.init()` 是**幂等的** — 多次调用安全（首次成功后为空操作）。
+
+## 与 RL 框架集成
+
+框架集成（如 Trinity）的模式：
+
+```python
+import tuft
+
+# 在框架的初始化代码中：
+tuft.init(model=model_path, ignore_reinit_error=True)
+service_client = tuft.get_service_client()
+
+# 像之前一样使用 service_client...
+```
+
+这不需要改变用户的工作流 — 框架透明地处理 TuFT 的配置和启动。
diff --git a/docs/sphinx_doc/source_zh/getting-started/index.md b/docs/sphinx_doc/source_zh/getting-started/index.md
@@ -20,6 +20,14 @@
 
 使用 TuFT 运行您的第一个训练与推理示例。
 :::
+
+:::{grid-item-card} 嵌入式模式
+:link: embedded-mode
+:link-type: doc
+:shadow: none
+
+使用 `tuft.init()` 实现自动服务发现和启动。
+:::
 ```
 
 ```{toctree}
@@ -28,4 +36,5 @@
 
 installation
 quickstart
+embedded-mode
 ```
diff --git a/examples/chat_sft/train.py b/examples/chat_sft/train.py
@@ -112,6 +112,8 @@ def compute_weighted_nll_from_outputs(loss_fn_outputs, datums) -> float:
 
 def connect(cfg: Config) -> tinker.ServiceClient:
     print(f"[1/6] connect service: {cfg.base_url}")
+    # Alternative: use tuft.get_service_client() for auto-discovery/embedded mode
+    # import tuft; return tuft.get_service_client()
     return tinker.ServiceClient(base_url=cfg.base_url, api_key=cfg.api_key)
 
 

diff --git a/examples/countdown_rl/train.py b/examples/countdown_rl/train.py
@@ -97,6 +97,8 @@ def init_wandb(cfg: Config):
 
 def connect(cfg: Config) -> tinker.ServiceClient:
     print(f"[1/6] connect service: {cfg.base_url}")
+    # Alternative: use tuft.get_service_client() for auto-discovery/embedded mode
+    # import tuft; return tuft.get_service_client()
     return tinker.ServiceClient(base_url=cfg.base_url, api_key=cfg.api_key)
 
 

diff --git a/examples/embedded_quickstart/train.py b/examples/embedded_quickstart/train.py
@@ -0,0 +1,87 @@
+"""Embedded TuFT quickstart — demonstrates auto-init (embedded mode).
+
+This example shows how to use TuFT in embedded mode, where the service
+is automatically started and managed within your training script's lifecycle.
+
+No manual `tuft launch` or configuration files needed!
+
+Usage:
+    python train.py --model /path/to/Qwen2.5-0.5B-Instruct
+
+The script will:
+1. Auto-detect the model and GPU configuration
+2. Start a TuFT server in the background
+3. Connect and run a minimal training loop
+4. Automatically shut down the server on exit
+"""
+
+from __future__ import annotations
+
+import argparse
+
+from tinker import types
+
+import tuft
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Embedded TuFT quickstart")
+    parser.add_argument(
+        "--model",
+        type=str,
+        required=True,
+        help="Path to the base model (e.g., /path/to/Qwen2.5-0.5B-Instruct)",
+    )
+    parser.add_argument("--rank", type=int, default=8, help="LoRA rank")
+    parser.add_argument("--steps", type=int, default=5, help="Number of training steps")
+    args = parser.parse_args()
+
+    # =========================================================================
+    # Step 1: Initialize TuFT in embedded mode
+    # This will auto-detect GPUs, generate a minimal config, and start the server.
+    # If a TuFT server is already running, it will connect to it instead.
+    # =========================================================================
+    print(f"[1/4] Initializing TuFT with model: {args.model}")
+    tuft.init(model=args.model)
+    print("      TuFT initialized (mode: embedded)")
+
+    # =========================================================================
+    # Step 2: Create a training client
+    # =========================================================================
+    print(f"[2/4] Creating LoRA training client (rank={args.rank})")
+    training_client = tuft.create_training_client(
+        base_model=args.model,
+        rank=args.rank,
+        train_mlp=True,
+        train_attn=True,
+    )
+
+    # =========================================================================
+    # Step 3: Run a minimal training loop
+    # =========================================================================
+    print(f"[3/4] Running {args.steps} training steps (with fake data)")
+    for step in range(args.steps):
+        # Create a fake training datum (in practice, use real tokenized data)
+        datum = types.Datum(
+            model_input=types.ModelInput.from_ints([101, 42, 37, 102]),
+            loss_fn_inputs={
+                "target_tokens": types.TensorData(
+                    data=[101, 99, 73, 102], dtype="int64", shape=[4]
+                ),
+                "weights": types.TensorData(data=[1.0, 1.0, 1.0, 1.0], dtype="float32", shape=[4]),
+            },
+        )
+        training_client.forward_backward([datum], loss_fn="cross_entropy").result()
+        training_client.optim_step(types.AdamParams(learning_rate=1e-4)).result()
+        print(f"      Step {step + 1}/{args.steps} complete")
+
+    # =========================================================================
+    # Step 4: Clean up (optional — atexit handles this automatically)
+    # =========================================================================
+    print("[4/4] Shutting down TuFT")
+    tuft.shutdown()
+    print("      Done!")
+
+
+if __name__ == "__main__":
+    main()