Problem
Currently, 9Router does not transmit images to upstream providers, even when using vision-capable models like Claude Sonnet 4.5.
Test Case
I tested sending an image via the OpenAI-compatible API format:
curl -X POST http://192.168.11.233:20128/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxx" \
-d '{
"model": "kr/claude-sonnet-4.5",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
]
}
]
}'
Expected: Claude should describe the image
Actual: Claude responds "I don't see any image in your message"
Impact
This prevents using 9Router with:
- Claude Sonnet/Opus (vision support)
- GPT-4V/GPT-5 (vision support)
- Gemini (multimodal)
- Any other vision-capable models
Use Case
I'm using 9Router with OpenClaw (personal AI assistant) and need vision support for analyzing screenshots, diagrams, and images sent via Telegram/WhatsApp.
Proposed Solution
- Detect
image_url content blocks in OpenAI format
- Translate to provider-specific format:
- Claude: Convert to Anthropic's vision format with
source.type: "base64"
- OpenAI: Pass through as-is
- Gemini: Convert to
inline_data format
- Preserve image data through the routing pipeline
Environment
- 9Router version: 0.2.99
- Model tested: kr/claude-sonnet-4.5 (via Kiro provider)
- Client: OpenClaw via custom provider config
Additional Context
The /v1/models endpoint doesn't expose model capabilities (text vs multimodal), which makes it hard for clients to know which models support images.
Would be great to also add:
- Model capability metadata in
/v1/models response
- Documentation on multimodal support per provider
Thanks for this amazing project! 🚀