This repository contains example PyWorkers used by Vast.ai’s default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:
- Exposes one or more HTTP routes (e.g.,
/v1/completions,/generate/sync) - Optionally validates/transforms request payloads
- Computes per-request workload for autoscaling
- Forwards requests to the local model server
- Optionally supports FIFO queueing when the backend cannot process concurrent requests
- Detects readiness/failure from model logs and runs a benchmark to estimate throughput
Important: The core PyWorker framework (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the
vastai/vastai-sdkPython package (https://github.com/vast-ai/vast-sdk). This repo focuses on worker implementations and examples, not the framework internals.
Use this repository as:
- A reference for how Vast templates wire up
worker.py - A starting point for implementing your own custom Serverless PyWorker
- A collection of working examples for common model backends
If you are looking for the framework code itself, refer to the Vast.ai SDK.
Typical layout:
workers/- Example worker implementations (each worker is usually a self-contained folder)
- Each example typically includes:
worker.py(the entrypoint used by Serverless)- Optional sample workflows / payloads (for ComfyUI-based workers)
- Optional local test harness scripts
On each worker instance, the template’s startup script typically:
- Clones your repository from
PYWORKER_REPO - Installs dependencies from
requirements.txt - Starts the model server (vLLM, TGI, ComfyUI, etc.)
- Runs:
python worker.py
Your worker.py builds a WorkerConfig, constructs a Worker, and starts the PyWorker HTTP server.
A PyWorker is usually a single worker.py that uses SDK configuration objects:
from vastai import (
Worker,
WorkerConfig,
HandlerConfig,
BenchmarkConfig,
LogActionConfig,
)
worker_config = WorkerConfig(
model_server_url="http://127.0.0.1",
model_server_port=18000,
model_log_file="/var/log/model/server.log",
handlers=[
HandlerConfig(
route="/v1/completions",
allow_parallel_requests=True,
max_queue_time=60.0,
workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
benchmark_config=BenchmarkConfig(
generator=lambda: {"prompt": "hello", "max_tokens": 128},
runs=8,
concurrency=10,
),
)
],
log_action_config=LogActionConfig(
on_load=["Application startup complete."],
on_error=["Traceback (most recent call last):", "RuntimeError:"],
on_info=['"message":"Download'],
),
)
Worker(worker_config).run()This repository contains example PyWorkers corresponding to common Vast templates, including:
- vLLM: OpenAI-compatible completions/chat endpoints with parallel request support
- TGI (Text Generation Inference): OpenAI-compatible endpoints and log-based readiness
- ComfyUI (Image / JSON workflows):
/generate/syncfor ComfyUI workflow execution - ComfyUI Wan 2.2 (T2V): ComfyUI workflow execution producing video outputs
- ComfyUI ACE Step (Text-to-Music): ComfyUI workflow execution producing audio outputs
Exact worker paths and naming may vary by template; use the workers/ directory as the source of truth.
-
Install Python dependencies for the examples you plan to run:
pip install -r requirements.txt
-
Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
- You know the model server URL/port
- You have a log file path you can tail for readiness/error detection
-
Run the worker:
python worker.py
or, if running an example from a subfolder:
python workers/<example>/worker.py
Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust
model_server_portandmodel_log_filefor local usage.
To use a custom PyWorker with Serverless:
-
Create a public Git repository containing:
worker.pyrequirements.txt
-
In your Serverless template / endpoint configuration, set:
PYWORKER_REPOto your Git repository URL- (Optional)
PYWORKER_REFto a git ref (branch, tag, or commit)
-
The template startup script will clone/install and run your
worker.py.
When implementing your own worker:
- Define one
HandlerConfigper route you want to expose. - Choose a workload function that correlates with compute cost:
- LLMs: prompt tokens + max output tokens (or
max_tokensas a simpler proxy) - Non-LLMs: constant cost per request (e.g.,
100.0) is often sufficient
- LLMs: prompt tokens + max output tokens (or
- Set
allow_parallel_requests=Falsefor backends that cannot handle concurrency (e.g., many ComfyUI deployments). - Configure exactly one
BenchmarkConfigacross all handlers to enable capacity estimation. - Use
LogActionConfigto reliably detect “model loaded” and “fatal error” log lines.
- Vast.ai Discord: https://discord.gg/Pa9M29FFye
- Vast.ai Subreddit: https://reddit.com/r/vastai/