The orra Plan Engine supports various reasoning and embedding models to power its orchestration capabilities. This document explains the supported models, configuration options, and hosting environments.
The Plan Engine supports the following reasoning models:
Model | Description | Recommended Use Cases |
---|---|---|
o1-mini |
OpenAI's smaller reasoning model | General purpose orchestration for most applications |
o3-mini |
OpenAI's enhanced reasoning model | Complex orchestration scenarios requiring deeper reasoning |
deepseek-r1 |
DeepSeek's reasoning model | Open source alternative with strong reasoning capabilities |
qwq-32b |
Qwen's 32B parameter model | On-prem or self-hosted deployments, privacy-sensitive applications |
The Plan Engine supports the following embedding models:
Model | Description | Recommended Use Cases |
---|---|---|
text-embedding-3-small |
OpenAI's embedding model | General purpose embedding for most applications |
jina-embeddings-v2-small-en |
Jina AI's embedding model | Open source alternative for self-hosted deployments |
The Plan Engine supports model variants with specific suffixes for specialized hardware, optimizations, or context windows:
- Hardware-specific optimization (e.g.,
deepseek-r1-mlx
) - Quantization levels (e.g.,
qwq-32b-q8
,deepseek-r1-q4
) - Context window sizes (e.g.,
model-8k
) - Version specifications (e.g.,
model-v1
,model-v2.5
) - Simple alphanumeric suffixes (e.g.,
model-beta
,model-cuda
)
The Plan Engine uses environment variables for model configuration. These can be:
- Set in your environment
- Specified in the
.env
file in theplanengine
directory - Set in the
_env
template file when deploying
Important: The Plan Engine requires all model endpoints to provide an OpenAI-compatible API. This is the standard interface used for communicating with AI models and most modern model serving solutions support this format.
# Reasoning Model Configuration
LLM_MODEL=o1-mini # Choose your preferred reasoning model
LLM_API_KEY=your_api_key # Optional if your endpoint doesn't require auth
LLM_API_BASE_URL=https://api.openai.com/v1 # Default for OpenAI, change for self-hosted/other providers
# Embedding Model Configuration
EMBEDDINGS_MODEL=text-embedding-3-small # Choose your preferred embedding model
EMBEDDINGS_API_KEY=your_api_key # Optional if your endpoint doesn't require auth
EMBEDDINGS_API_BASE_URL=https://api.openai.com/v1 # Default for OpenAI, change for self-hosted
orra allows flexible hosting options for models to meet different operational needs.
Recommended Models:
o1-mini
ando3-mini
via OpenAI or Azuretext-embedding-3-small
via OpenAI or Azure
Configuration Example for OpenAI:
LLM_MODEL=o1-mini
LLM_API_KEY=your_openai_key
LLM_API_BASE_URL=https://api.openai.com/v1
EMBEDDINGS_MODEL=text-embedding-3-small
EMBEDDINGS_API_KEY=your_openai_key
EMBEDDINGS_API_BASE_URL=https://api.openai.com/v1
Configuration Example for Azure:
LLM_MODEL=o1-mini
LLM_API_KEY=your_azure_key
LLM_API_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment/completions?api-version=2023-05-15
EMBEDDINGS_MODEL=text-embedding-3-small
EMBEDDINGS_API_KEY=your_azure_key
EMBEDDINGS_API_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-embedding-deployment/embeddings?api-version=2023-05-15
AI model hosting platforms provide an excellent option for accessing open source models without managing infrastructure. This is particularly useful for deepseek-r1
, qwq-32b
, and jina-embeddings-v2-small-en
.
Popular Platforms:
Configuration Example for Together AI:
# For DeepSeek-R1
LLM_MODEL=deepseek-r1
LLM_API_KEY=your_together_ai_key
LLM_API_BASE_URL=https://api.together.xyz/v1
# For Jina Embeddings
EMBEDDINGS_MODEL=jina-embeddings-v2-small-en
EMBEDDINGS_API_KEY=your_together_ai_key
EMBEDDINGS_API_BASE_URL=https://api.together.xyz/v1
Configuration Example for Fireworks AI:
# For QwQ-32B
LLM_MODEL=qwq-32b
LLM_API_KEY=your_fireworks_key
LLM_API_BASE_URL=https://api.fireworks.ai/inference/v1
# For Jina Embeddings
EMBEDDINGS_MODEL=jina-embeddings-v2-small-en
EMBEDDINGS_API_KEY=your_fireworks_key
EMBEDDINGS_API_BASE_URL=https://api.fireworks.ai/inference/v1
Benefits of Using AI Hosting Platforms:
- Quick Setup: No need to manage infrastructure or worry about hardware requirements
- Cost Efficiency: Pay-as-you-go pricing models often make these platforms cost-effective
- Model Availability: Easy access to the latest open-source models without deployment hassles
- Scalability: Built-in scaling based on your usage patterns
- API Compatibility: These platforms typically offer OpenAI-compatible APIs, making integration seamless
Recommended Models:
qwq-32b
for reasoningjina-embeddings-v2-small-en
for embeddings
Model Serving Requirements: When self-hosting models, your model serving solution must expose an OpenAI-compatible API. Here are popular options:
- vLLM: Provides high-performance model serving with an OpenAI-compatible interface by default
- Text Generation Inference (TGI): Can be configured with OpenAI-compatible endpoints
- Ollama: Enable the OpenAI compatibility mode
- LMStudio: Offers OpenAI-compatible endpoints
Configuration Example:
LLM_MODEL=qwq-32b
LLM_API_KEY=your_internal_key # If authentication is required
LLM_API_BASE_URL=http://internal-llm-endpoint:8000/v1
EMBEDDINGS_MODEL=jina-embeddings-v2-small-en
EMBEDDINGS_API_KEY=your_internal_key # If authentication is required
EMBEDDINGS_API_BASE_URL=http://internal-embedding-endpoint:8000/v1
Recommended Models:
qwq-32b
with quantization for reasoningjina-embeddings-v2-small-en
for embeddings
Configuration Example:
LLM_MODEL=qwq-32b-q8
LLM_API_KEY= # Often not needed for local deployments
LLM_API_BASE_URL=http://localhost:8000/v1
EMBEDDINGS_MODEL=jina-embeddings-v2-small-en
EMBEDDINGS_API_KEY= # Often not needed for local deployments
EMBEDDINGS_API_BASE_URL=http://localhost:8001/v1
When choosing your model configuration, consider these factors:
-
Reasoning Complexity: For complex multi-agent orchestration with sophisticated reasoning, prefer
o3-mini
ordeepseek-r1
-
Latency Requirements: For lower latency, choose:
- Cloud providers with regional deployments closer to your application
- Models with smaller parameter sizes like
o1-mini
- Quantized versions of larger models (e.g.,
qwq-32b-q8
)
-
Privacy and Security: For sensitive workloads requiring data privacy:
- Use Azure OpenAI with your own dedicated endpoint
- Deploy open-source models (
deepseek-r1
,qwq-32b
,jina-embeddings-v2-small-en
) in your own environment - Ensure proper network isolation and access controls
-
Resource Constraints: For resource-limited environments:
- Use quantized models (models with
-q4
or-q8
suffixes) - Consider hardware-optimized versions (e.g.,
-mlx
for Apple Silicon)
- Use quantized models (models with