Skip to content

vast-ai/pyworker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vast PyWorker Examples

This repository contains example PyWorkers used by Vast.ai’s default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:

  • Exposes one or more HTTP routes (e.g., /v1/completions, /generate/sync)
  • Optionally validates/transforms request payloads
  • Computes per-request workload for autoscaling
  • Forwards requests to the local model server
  • Optionally supports FIFO queueing when the backend cannot process concurrent requests
  • Detects readiness/failure from model logs and runs a benchmark to estimate throughput

Important: The core PyWorker framework (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the vastai / vastai-sdk Python package (https://github.com/vast-ai/vast-sdk). This repo focuses on worker implementations and examples, not the framework internals.

Repository Purpose

Use this repository as:

  • A reference for how Vast templates wire up worker.py
  • A starting point for implementing your own custom Serverless PyWorker
  • A collection of working examples for common model backends

If you are looking for the framework code itself, refer to the Vast.ai SDK.

Project Structure

Typical layout:

  • workers/
    • Example worker implementations (each worker is usually a self-contained folder)
    • Each example typically includes:
      • worker.py (the entrypoint used by Serverless)
      • Optional sample workflows / payloads (for ComfyUI-based workers)
      • Optional local test harness scripts

How Serverless launches worker.py

On each worker instance, the template’s startup script typically:

  1. Clones your repository from PYWORKER_REPO
  2. Installs dependencies from requirements.txt
  3. Starts the model server (vLLM, TGI, ComfyUI, etc.)
  4. Runs:
    python worker.py

Your worker.py builds a WorkerConfig, constructs a Worker, and starts the PyWorker HTTP server.

worker.py

A PyWorker is usually a single worker.py that uses SDK configuration objects:

from vastai import (
    Worker,
    WorkerConfig,
    HandlerConfig,
    BenchmarkConfig,
    LogActionConfig,
)

worker_config = WorkerConfig(
    model_server_url="http://127.0.0.1",
    model_server_port=18000,
    model_log_file="/var/log/model/server.log",
    handlers=[
        HandlerConfig(
            route="/v1/completions",
            allow_parallel_requests=True,
            max_queue_time=60.0,
            workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
            benchmark_config=BenchmarkConfig(
                generator=lambda: {"prompt": "hello", "max_tokens": 128},
                runs=8,
                concurrency=10,
            ),
        )
    ],
    log_action_config=LogActionConfig(
        on_load=["Application startup complete."],
        on_error=["Traceback (most recent call last):", "RuntimeError:"],
        on_info=['"message":"Download'],
    ),
)

Worker(worker_config).run()

Included Examples

This repository contains example PyWorkers corresponding to common Vast templates, including:

  • vLLM: OpenAI-compatible completions/chat endpoints with parallel request support
  • TGI (Text Generation Inference): OpenAI-compatible endpoints and log-based readiness
  • ComfyUI (Image / JSON workflows): /generate/sync for ComfyUI workflow execution
  • ComfyUI Wan 2.2 (T2V): ComfyUI workflow execution producing video outputs
  • ComfyUI ACE Step (Text-to-Music): ComfyUI workflow execution producing audio outputs

Exact worker paths and naming may vary by template; use the workers/ directory as the source of truth.

Getting Started (Local)

  1. Install Python dependencies for the examples you plan to run:

    pip install -r requirements.txt
  2. Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:

    • You know the model server URL/port
    • You have a log file path you can tail for readiness/error detection
  3. Run the worker:

    python worker.py

    or, if running an example from a subfolder:

    python workers/<example>/worker.py

Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust model_server_port and model_log_file for local usage.

Deploying on Vast Serverless

To use a custom PyWorker with Serverless:

  1. Create a public Git repository containing:

    • worker.py
    • requirements.txt
  2. In your Serverless template / endpoint configuration, set:

    • PYWORKER_REPO to your Git repository URL
    • (Optional) PYWORKER_REF to a git ref (branch, tag, or commit)
  3. The template startup script will clone/install and run your worker.py.

Guidance for Custom Workers

When implementing your own worker:

  • Define one HandlerConfig per route you want to expose.
  • Choose a workload function that correlates with compute cost:
    • LLMs: prompt tokens + max output tokens (or max_tokens as a simpler proxy)
    • Non-LLMs: constant cost per request (e.g., 100.0) is often sufficient
  • Set allow_parallel_requests=False for backends that cannot handle concurrency (e.g., many ComfyUI deployments).
  • Configure exactly one BenchmarkConfig across all handlers to enable capacity estimation.
  • Use LogActionConfig to reliably detect “model loaded” and “fatal error” log lines.

Community & Support

About

Vast serverless python worker agent

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 13