Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions docs/sphinx_doc/source/getting-started/embedded-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Embedded Mode

## Background

TuFT is designed to serve as a **transparent compute service layer** for RL training frameworks like Trinity and veRL. In production, TuFT typically runs as a standalone daemon (`tuft launch`), and users must:

1. Write a `tuft_config.yaml` configuration file
2. Manually start the server with `tuft launch --config ...`
3. Set the `TINKER_BASE_URL` environment variable for clients to connect

This manual setup creates friction, especially for:
- **RL framework users** who just want to run training scripts without learning TuFT internals
- **Development/debugging** workflows where quick iteration is key
- **CI pipelines** that need reproducible, self-contained environments

**Embedded mode** solves this by providing a `tuft.init()` API — similar to `ray.init()` — that handles service discovery, configuration generation, startup, and connection automatically.

## Two Modes of Operation

| | Daemon Mode | Embedded Mode |
|---|---|---|
| How to start | `tuft launch --config ...` | `tuft.init(model=...)` |
| Lifecycle | Independent process, manually managed | Follows main process, auto-cleanup via atexit |
| Best for | Production deployments, multi-user shared clusters | Dev/debug, training scripts, CI |
| Service discovery | User sets `TINKER_BASE_URL` manually | Automatic (env var → address file → process scan → default port) |

**Both modes coexist**: `tuft.init()` first tries to discover an existing daemon. Only when no running service is found does it start an embedded instance.

## Quick Start

```python
import tuft

# Initialize TuFT — auto-discovers existing service or starts one
tuft.init(model="/path/to/Qwen2.5-0.5B-Instruct")

# Use the service client for training
training_client = tuft.create_training_client(
base_model="Qwen2.5-0.5B-Instruct",
rank=8,
)
# ... your training loop ...

# Optional: explicit shutdown (atexit handles this automatically)
tuft.shutdown()
```

### Other `init()` patterns

```python
# Connect to a specific running server
tuft.init(address="http://gpu-cluster:10610")

# Use an existing config file
tuft.init(config="/path/to/tuft_config.yaml")

# No arguments — relies on env vars or default config file
tuft.init()

# Get a service client (auto-inits if not already done)
service_client = tuft.get_service_client()
```

## Service Discovery Priority

When `tuft.init()` is called, it tries to find an existing service in this order:

1. `address=...` argument passed to `init()`
2. `TUFT_ADDRESS` environment variable
3. Address file at `~/.tuft/tuft_current_server`
4. Process scan (looks for running `tuft launch` or `uvicorn` processes)
5. Default port probe: `http://127.0.0.1:10610`

If no service is found, embedded mode starts a new one using configuration from:

1. `config=...` argument passed to `init()`
2. `TUFT_CONFIG` environment variable
3. `model=...` argument → auto-generates minimal config
4. `TUFT_MODEL_PATH` environment variable → auto-generates minimal config
5. Default config file: `~/.tuft/configs/tuft_config.yaml`
6. None available → raises `RuntimeError` with helpful guidance

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `TUFT_ADDRESS` | Address of running TuFT service | — |
| `TUFT_API_KEY` | API authentication key | Auto-generated |
| `TUFT_CONFIG` | Path to configuration file | — |
| `TUFT_MODEL_PATH` | Model path for auto-config generation | — |
| `TUFT_ENABLE_AUTO_CONNECT` | Enable auto-connect in `get_service_client()` | `"1"` |
| `TUFT_HOME` | TuFT home directory | `~/.tuft` |
| `TUFT_HOST` | Server bind address | `127.0.0.1` |
| `TUFT_PORT` | Server bind port | `10610` |

## Lifecycle

- **Embedded services** are tied to the main process. When the Python process exits (normally or via signal), the embedded TuFT server is automatically terminated via `atexit`.
- **Daemon services** (`tuft launch`) are independent and persist until manually stopped.
- `tuft.shutdown()` can be called explicitly to stop an embedded service early.
- `tuft.init()` is **idempotent** — calling it multiple times is safe (no-op after first success).

## Integration with RL Frameworks

For framework integrations (e.g., Trinity), the pattern is:

```python
import tuft

# In your framework's initialization code:
tuft.init(model=model_path, ignore_reinit_error=True)
service_client = tuft.get_service_client()

# Use service_client as before...
```

This requires no changes to the user's workflow — the framework handles TuFT setup transparently.
9 changes: 9 additions & 0 deletions docs/sphinx_doc/source/getting-started/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,14 @@ Install TuFT from source, PyPI, or Docker.

Run your first training and sampling example with TuFT.
:::

:::{grid-item-card} Embedded Mode
:link: embedded-mode
:link-type: doc
:shadow: none

Use `tuft.init()` for automatic service discovery and startup.
:::
```

```{toctree}
Expand All @@ -28,4 +36,5 @@ Run your first training and sampling example with TuFT.

installation
quickstart
embedded-mode
```
117 changes: 117 additions & 0 deletions docs/sphinx_doc/source_zh/getting-started/embedded-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# 嵌入式模式

## 背景

TuFT 被设计为 RL 训练框架(如 Trinity)的**透明计算服务层**。在生产环境中,TuFT 通常作为独立守护进程运行(`tuft launch`),用户需要:

1. 编写 `tuft_config.yaml` 配置文件
2. 手动执行 `tuft launch --config ...` 启动服务
3. 设置 `TINKER_BASE_URL` 环境变量供客户端连接

这种手动配置带来了额外负担,尤其是:
- **RL 框架用户**:只想运行训练脚本,不想学习 TuFT 的安装和配置
- **开发调试**:需要快速迭代的工作流
- **CI 流水线**:需要可复现的自包含环境

**嵌入式模式**通过提供 `tuft.init()` API 解决了这个问题——类似 `ray.init()`——自动完成服务发现、配置生成、启动和连接。

## 两种运行模式

| | 守护进程模式 | 嵌入式模式 |
|---|---|---|
| 启动方式 | `tuft launch --config ...` | `tuft.init(model=...)` |
| 生命周期 | 独立进程,手动管理 | 跟随主进程,atexit 自动清理 |
| 适用场景 | 生产部署、多用户共享集群 | 开发调试、训练脚本、CI |
| 服务发现 | 用户手动设置 `TINKER_BASE_URL` | 自动(环境变量 → 地址文件 → 进程扫描 → 默认端口) |

**两种模式共存**:`tuft.init()` 首先尝试发现已有的守护进程服务。只有在找不到运行中的服务时,才会启动嵌入式实例。

## 快速开始

```python
import tuft

# 初始化 TuFT — 自动发现已有服务或启动一个新的
tuft.init(model="/path/to/Qwen2.5-0.5B-Instruct")

# 使用 service client 进行训练
training_client = tuft.create_training_client(
base_model="Qwen2.5-0.5B-Instruct",
rank=8,
)
# ... 你的训练循环 ...

# 可选:显式关闭(atexit 会自动处理)
tuft.shutdown()
```

### 其他 `init()` 模式

```python
# 连接到指定的运行中服务
tuft.init(address="http://gpu-cluster:10610")

# 使用已有配置文件
tuft.init(config="/path/to/tuft_config.yaml")

# 无参数 — 依赖环境变量或默认配置文件
tuft.init()

# 获取 service client(未初始化时自动触发 init)
service_client = tuft.get_service_client()
```

## 服务发现优先级

调用 `tuft.init()` 时,按以下顺序尝试发现已有服务:

1. `address=...` 参数显式传入
2. `TUFT_ADDRESS` 环境变量
3. 地址文件 `~/.tuft/tuft_current_server`
4. 进程扫描(查找运行中的 `tuft launch` 或 `uvicorn` 进程)
5. 默认端口探测:`http://127.0.0.1:10610`

如果未发现服务,嵌入式模式按以下优先级获取配置并启动:

1. `config=...` 参数显式传入
2. `TUFT_CONFIG` 环境变量
3. `model=...` 参数 → 自动生成最小配置
4. `TUFT_MODEL_PATH` 环境变量 → 自动生成最小配置
5. 默认配置文件:`~/.tuft/configs/tuft_config.yaml`
6. 全部没有 → 抛出 `RuntimeError` 并给出提示

## 环境变量

| 变量 | 说明 | 默认值 |
|------|------|--------|
| `TUFT_ADDRESS` | TuFT 服务地址 | — |
| `TUFT_API_KEY` | API 认证密钥 | 自动生成 |
| `TUFT_CONFIG` | 配置文件路径 | — |
| `TUFT_MODEL_PATH` | 模型路径(用于自动生成配置) | — |
| `TUFT_ENABLE_AUTO_CONNECT` | 启用 `get_service_client()` 自动连接 | `"1"` |
| `TUFT_HOME` | TuFT 主目录 | `~/.tuft` |
| `TUFT_HOST` | 服务绑定地址 | `127.0.0.1` |
| `TUFT_PORT` | 服务绑定端口 | `10610` |

## 生命周期

- **嵌入式服务**绑定到主进程。当 Python 进程退出(正常或信号)时,嵌入式 TuFT 服务通过 `atexit` 自动终止。
- **守护进程服务**(`tuft launch`)独立运行,持续到手动停止。
- `tuft.shutdown()` 可显式调用以提前停止嵌入式服务。
- `tuft.init()` 是**幂等的** — 多次调用安全(首次成功后为空操作)。

## 与 RL 框架集成

框架集成(如 Trinity)的模式:

```python
import tuft

# 在框架的初始化代码中:
tuft.init(model=model_path, ignore_reinit_error=True)
service_client = tuft.get_service_client()

# 像之前一样使用 service_client...
```

这不需要改变用户的工作流 — 框架透明地处理 TuFT 的配置和启动。
9 changes: 9 additions & 0 deletions docs/sphinx_doc/source_zh/getting-started/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,14 @@

使用 TuFT 运行您的第一个训练与推理示例。
:::

:::{grid-item-card} 嵌入式模式
:link: embedded-mode
:link-type: doc
:shadow: none

使用 `tuft.init()` 实现自动服务发现和启动。
:::
```

```{toctree}
Expand All @@ -28,4 +36,5 @@

installation
quickstart
embedded-mode
```
2 changes: 2 additions & 0 deletions examples/chat_sft/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,8 @@ def compute_weighted_nll_from_outputs(loss_fn_outputs, datums) -> float:

def connect(cfg: Config) -> tinker.ServiceClient:
print(f"[1/6] connect service: {cfg.base_url}")
# Alternative: use tuft.get_service_client() for auto-discovery/embedded mode
# import tuft; return tuft.get_service_client()
return tinker.ServiceClient(base_url=cfg.base_url, api_key=cfg.api_key)


Expand Down
2 changes: 2 additions & 0 deletions examples/countdown_rl/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,8 @@ def init_wandb(cfg: Config):

def connect(cfg: Config) -> tinker.ServiceClient:
print(f"[1/6] connect service: {cfg.base_url}")
# Alternative: use tuft.get_service_client() for auto-discovery/embedded mode
# import tuft; return tuft.get_service_client()
return tinker.ServiceClient(base_url=cfg.base_url, api_key=cfg.api_key)


Expand Down
87 changes: 87 additions & 0 deletions examples/embedded_quickstart/train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""Embedded TuFT quickstart — demonstrates auto-init (embedded mode).

This example shows how to use TuFT in embedded mode, where the service
is automatically started and managed within your training script's lifecycle.

No manual `tuft launch` or configuration files needed!

Usage:
python train.py --model /path/to/Qwen2.5-0.5B-Instruct

The script will:
1. Auto-detect the model and GPU configuration
2. Start a TuFT server in the background
3. Connect and run a minimal training loop
4. Automatically shut down the server on exit
"""

from __future__ import annotations

import argparse

from tinker import types

import tuft


def main():
parser = argparse.ArgumentParser(description="Embedded TuFT quickstart")
parser.add_argument(
"--model",
type=str,
required=True,
help="Path to the base model (e.g., /path/to/Qwen2.5-0.5B-Instruct)",
)
parser.add_argument("--rank", type=int, default=8, help="LoRA rank")
parser.add_argument("--steps", type=int, default=5, help="Number of training steps")
args = parser.parse_args()

# =========================================================================
# Step 1: Initialize TuFT in embedded mode
# This will auto-detect GPUs, generate a minimal config, and start the server.
# If a TuFT server is already running, it will connect to it instead.
# =========================================================================
print(f"[1/4] Initializing TuFT with model: {args.model}")
tuft.init(model=args.model)
print(" TuFT initialized (mode: embedded)")

# =========================================================================
# Step 2: Create a training client
# =========================================================================
print(f"[2/4] Creating LoRA training client (rank={args.rank})")
training_client = tuft.create_training_client(
base_model=args.model,
rank=args.rank,
train_mlp=True,
train_attn=True,
)

# =========================================================================
# Step 3: Run a minimal training loop
# =========================================================================
print(f"[3/4] Running {args.steps} training steps (with fake data)")
for step in range(args.steps):
# Create a fake training datum (in practice, use real tokenized data)
datum = types.Datum(
model_input=types.ModelInput.from_ints([101, 42, 37, 102]),
loss_fn_inputs={
"target_tokens": types.TensorData(
data=[101, 99, 73, 102], dtype="int64", shape=[4]
),
"weights": types.TensorData(data=[1.0, 1.0, 1.0, 1.0], dtype="float32", shape=[4]),
},
)
training_client.forward_backward([datum], loss_fn="cross_entropy").result()
training_client.optim_step(types.AdamParams(learning_rate=1e-4)).result()
print(f" Step {step + 1}/{args.steps} complete")

# =========================================================================
# Step 4: Clean up (optional — atexit handles this automatically)
# =========================================================================
print("[4/4] Shutting down TuFT")
tuft.shutdown()
print(" Done!")


if __name__ == "__main__":
main()
Loading
Loading