A Universal LLM API Gateway & Transformation Layer.
Plexus is a high-performance API gateway that unifies access to multiple AI providers (OpenAI, Anthropic, Google, GitHub Copilot, and more) under a single endpoint. Switch models and providers without rewriting client code.
All configuration has moved from YAML files to the database. This change is now live and affects all users upgrading from previous versions.
- Automatic Migration: On first launch after upgrading, Plexus will automatically import your existing
plexus.yamlandauth.jsoninto the database - Backward Compatibility: If
ADMIN_KEYis not set as an environment variable, Plexus will attempt to read it fromplexus.yamland display a prominent warning banner - Action Required: You should set
ADMIN_KEYas an environment variable before the next restart to avoid the deprecation warning
Required Environment Variables:
export ADMIN_KEY="your-secure-admin-password" # Recommended - will use plexus.yaml as fallback with warning
export DATABASE_URL="sqlite:///app/data/plexus.db" # Optional (default shown)
export ENCRYPTION_KEY="your-generated-hex-key" # Optional - encrypts sensitive data at rest (generate once: openssl rand -hex 32)
export PORT="4000" # OptionalNote: While the application will start without
ADMIN_KEYset (by reading fromplexus.yaml), a large deprecation warning will be displayed at startup and in the dashboard. SetADMIN_KEYas an environment variable to suppress this warning.
See Configuration Migration Details below for full documentation.
Plexus sits in front of your LLM providers and handles protocol translation, load balancing, failover, and usage tracking β transparently. Send any supported request format to Plexus and it routes to the right provider, transforms as needed, and returns the response in the format your client expects.
Key capabilities:
- Unified API surface β Accept OpenAI (
/v1/chat/completions), Anthropic (/v1/messages), Responses (/v1/responses), Gemini (/v1beta), Embeddings, Audio, Images. - Multi-provider routing β Route to OpenAI, Anthropic, Google Gemini, DeepSeek, Groq, OpenRouter, and any OpenAI-compatible provider
- OAuth providers β Authenticate via GitHub Copilot, Anthropic Claude, OpenAI Codex, Gemini CLI, and Antigravity through OAuth (no API key required)
- Model aliasing & load balancing β Define virtual model names backed by multiple real providers with
random,cost,performance,latency, orin_orderselectors - Vision fallthrough β Automatically convert images to text descriptions for models that don't natively support vision, ensuring compatibility across all providers
- Intelligent failover β Exponential backoff cooldowns automatically remove unhealthy providers from rotation
- Usage tracking β Per-request cost, token counts, latency, and TPS metrics with a built-in dashboard
- MCP proxy β Proxy Model Context Protocol servers through Plexus with per-request session isolation
- User quotas β Per-API-key rate limiting by requests or tokens with rolling, daily, or weekly windows, along with cost restriction.
- Admin dashboard β Web UI for configuration, usage analytics, debug traces, and quota monitoring
Plexus allows you to use vision-capable aliases with backend models that don't natively support images. When enabled, Plexus automatically intercepts images in the request, sends them to a high-performance "descriptor" model (like Gemini 3 Flash or GPT-5.3-Codex) to generate text descriptions, and then passes those descriptions to the non-vision target.
This enables you to use cheap or specialized models for the main task while still supporting image inputs transparently.
Setup is simple: Enable Vision Fallthrough for any model alias directly in the Admin UI under the Models tab. Specify a global "Descriptor Model" in the settings to handle the image-to-text conversion.
DATABASE_URL is required and tells Plexus where to store usage data. Use a local SQLite file for simple deployments, or a PostgreSQL connection string for production.
ADMIN_KEY is required and specifies the administrative key.
docker run -p 4000:4000 \
-v $(pwd)/config/plexus.yaml:/app/config/plexus.yaml \
-v plexus-data:/app/data \
-e DATABASE_URL=sqlite:///app/data/plexus.db \
-e ENCRYPTION_KEY="your-generated-hex-key" \
ghcr.io/mcowger/plexus:latestDownload the latest pre-built binary from GitHub Releases:
# macOS (Apple Silicon)
curl -L https://github.com/mcowger/plexus/releases/latest/download/plexus-macos -o plexus
chmod +x plexus
DATABASE_URL=sqlite://./data/plexus.db ./plexus
# Linux (x64)
curl -L https://github.com/mcowger/plexus/releases/latest/download/plexus-linux -o plexus
chmod +x plexus
DATABASE_URL=sqlite://./data/plexus.db ./plexus
# Windows (x64) β download plexus.exe from the releases page, then:
# set DATABASE_URL=sqlite://./data/plexus.db && plexus.exeThe binary is self-contained (no runtime or dependencies required). By default it looks for config/plexus.yaml relative to the working directory.
curl -X POST http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-plexus-my-key" \
-H "Content-Type: application/json" \
-d '{"model": "fast", "messages": [{"role": "user", "content": "Hello!"}]}'The dashboard is at http://localhost:4000 β log in with your adminKey.
OAuth providers (GitHub Copilot, Anthropic, OpenAI Codex, etc.) use credentials managed through the Admin UI. See Configuration: OAuth Providers.
See Installation Guide for Docker Compose, building from source, and all environment variable options.
For complete documentation on the configuration migration from YAML to database, including all breaking changes, migration steps, and post-migration cleanup, see config-to-database.md.
Define model aliases backed by one or more providers. Choose how targets are selected:
| Selector | Behavior |
|---|---|
random |
Distribute requests randomly across healthy targets (default) |
in_order |
Try providers in order; fall back when one is unhealthy |
cost |
Always route to the cheapest configured provider |
performance |
Route to the highest tokens/sec provider (with exploration) |
latency |
Route to the lowest time-to-first-token provider |
Use priority: api_match to prefer providers that natively speak the incoming API format, enabling pass-through optimization.
β See Configuration: models
Plexus supports protocol translation between:
- OpenAI chat completions format (
/v1/chat/completions) - OpenAI responses format (
/v1/responses) - Anthropic messages format (
/v1/messages) - Google Gemini native format
- Any OpenAI-compatible provider (DeepSeek, Groq, OpenRouter, Together, etc.)
A request sent in Anthropic format can be routed to an OpenAI provider β Plexus handles the transformation in both directions, including streaming and tool use.
β See API Reference
Use AI services you already have subscriptions to without managing API keys. Plexus integrates with pi-ai to support OAuth-backed providers:
- Anthropic Claude
- OpenAI Codex
- GitHub Copilot
- Google Gemini CLI
- Google Antigravity
OAuth credentials are stored in auth.json and managed through the Admin UI.
β See Configuration: OAuth Providers
Limit how much each API key can consume using rolling, daily, or weekly windows:
Limit types: tokens, requests, or cost (dollar spending).
β See Configuration: user_quotas
When a provider fails, Plexus removes it from rotation using exponential backoff: 2 min β 4 min β 8 min β ... β 5 hr cap. Successful requests reset the counter. Set disable cooldown: true on a provider to opt it out entirely.
β See Configuration: cooldown
Proxy Model Context Protocol servers through Plexus. Only streamable HTTP transport is supported. Each request gets an isolated MCP session, preventing tool sprawl across clients.
β See Configuration: MCP Servers
Plexus supports AES-256-GCM encryption for all sensitive data stored in the database, including API key secrets, OAuth access/refresh tokens, provider API keys, and MCP server headers.
Enable encryption:
# Generate once and persist in your .env or secret manager:
# openssl rand -hex 32
export ENCRYPTION_KEY="your-generated-hex-key"On first startup with ENCRYPTION_KEY set, existing plaintext values are automatically encrypted. Without the key, the system operates in plaintext mode (backward compatible). See Configuration: Encryption for details.
Full support for OpenAI's /v1/responses endpoint including stateful multi-turn conversations via previous_response_id, response storage with 7-day TTL, and function calling.
β See Responses API Reference
MIT License β see LICENSE file.




