Model Configuration for the orra Plan Engine

The orra Plan Engine supports various reasoning and embedding models to power its orchestration capabilities. This document explains the supported models, configuration options, and hosting environments.

Supported Models

Reasoning Models

The Plan Engine supports the following reasoning models:

Model	Description	Recommended Use Cases
`o1-mini`	OpenAI's smaller reasoning model	General purpose orchestration for most applications
`o3-mini`	OpenAI's enhanced reasoning model	Complex orchestration scenarios requiring deeper reasoning
`deepseek-r1`	DeepSeek's reasoning model	Open source alternative with strong reasoning capabilities
`qwq-32b`	Qwen's 32B parameter model	On-prem or self-hosted deployments, privacy-sensitive applications

Embedding Models

The Plan Engine supports the following embedding models:

Model	Description	Recommended Use Cases
`text-embedding-3-small`	OpenAI's embedding model	General purpose embedding for most applications
`jina-embeddings-v2-small-en`	Jina AI's embedding model	Open source alternative for self-hosted deployments

Model Variants

The Plan Engine supports model variants with specific suffixes for specialized hardware, optimizations, or context windows:

Hardware-specific optimization (e.g., deepseek-r1-mlx)
Quantization levels (e.g., qwq-32b-q8, deepseek-r1-q4)
Context window sizes (e.g., model-8k)
Version specifications (e.g., model-v1, model-v2.5)
Simple alphanumeric suffixes (e.g., model-beta, model-cuda)

Configuration

The Plan Engine uses environment variables for model configuration. These can be:

Set in your environment
Specified in the .env file in the planengine directory
Set in the _env template file when deploying

Important: The Plan Engine requires all model endpoints to provide an OpenAI-compatible API. This is the standard interface used for communicating with AI models and most modern model serving solutions support this format.

Basic Configuration

# Reasoning Model Configuration
LLM_MODEL=o1-mini                           # Choose your preferred reasoning model
LLM_API_KEY=your_api_key                    # Optional if your endpoint doesn't require auth
LLM_API_BASE_URL=https://api.openai.com/v1  # Default for OpenAI, change for self-hosted/other providers

# Embedding Model Configuration
EMBEDDINGS_MODEL=text-embedding-3-small     # Choose your preferred embedding model
EMBEDDINGS_API_KEY=your_api_key             # Optional if your endpoint doesn't require auth
EMBEDDINGS_API_BASE_URL=https://api.openai.com/v1  # Default for OpenAI, change for self-hosted

Hosting Options

orra allows flexible hosting options for models to meet different operational needs.

Cloud-based Hosting

OpenAI and Azure

Recommended Models:

o1-mini and o3-mini via OpenAI or Azure
text-embedding-3-small via OpenAI or Azure

Configuration Example for OpenAI:

LLM_MODEL=o1-mini
LLM_API_KEY=your_openai_key
LLM_API_BASE_URL=https://api.openai.com/v1

EMBEDDINGS_MODEL=text-embedding-3-small
EMBEDDINGS_API_KEY=your_openai_key
EMBEDDINGS_API_BASE_URL=https://api.openai.com/v1

Configuration Example for Azure:

LLM_MODEL=o1-mini
LLM_API_KEY=your_azure_key
LLM_API_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment/completions?api-version=2023-05-15

EMBEDDINGS_MODEL=text-embedding-3-small
EMBEDDINGS_API_KEY=your_azure_key
EMBEDDINGS_API_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-embedding-deployment/embeddings?api-version=2023-05-15

AI Model Hosting Platforms

AI model hosting platforms provide an excellent option for accessing open source models without managing infrastructure. This is particularly useful for deepseek-r1, qwq-32b, and jina-embeddings-v2-small-en.

Popular Platforms:

Together AI
Fireworks AI
HuggingFace Inference API
Replicate

Configuration Example for Together AI:

# For DeepSeek-R1
LLM_MODEL=deepseek-r1
LLM_API_KEY=your_together_ai_key
LLM_API_BASE_URL=https://api.together.xyz/v1

# For Jina Embeddings
EMBEDDINGS_MODEL=jina-embeddings-v2-small-en
EMBEDDINGS_API_KEY=your_together_ai_key
EMBEDDINGS_API_BASE_URL=https://api.together.xyz/v1

Configuration Example for Fireworks AI:

# For QwQ-32B
LLM_MODEL=qwq-32b
LLM_API_KEY=your_fireworks_key
LLM_API_BASE_URL=https://api.fireworks.ai/inference/v1

# For Jina Embeddings
EMBEDDINGS_MODEL=jina-embeddings-v2-small-en
EMBEDDINGS_API_KEY=your_fireworks_key
EMBEDDINGS_API_BASE_URL=https://api.fireworks.ai/inference/v1

Benefits of Using AI Hosting Platforms:

Quick Setup: No need to manage infrastructure or worry about hardware requirements
Cost Efficiency: Pay-as-you-go pricing models often make these platforms cost-effective
Model Availability: Easy access to the latest open-source models without deployment hassles
Scalability: Built-in scaling based on your usage patterns
API Compatibility: These platforms typically offer OpenAI-compatible APIs, making integration seamless

On-premises Deployment

Recommended Models:

qwq-32b for reasoning
jina-embeddings-v2-small-en for embeddings

Model Serving Requirements: When self-hosting models, your model serving solution must expose an OpenAI-compatible API. Here are popular options:

vLLM: Provides high-performance model serving with an OpenAI-compatible interface by default
Text Generation Inference (TGI): Can be configured with OpenAI-compatible endpoints
Ollama: Enable the OpenAI compatibility mode
LMStudio: Offers OpenAI-compatible endpoints

Configuration Example:

LLM_MODEL=qwq-32b
LLM_API_KEY=your_internal_key              # If authentication is required
LLM_API_BASE_URL=http://internal-llm-endpoint:8000/v1

EMBEDDINGS_MODEL=jina-embeddings-v2-small-en
EMBEDDINGS_API_KEY=your_internal_key       # If authentication is required
EMBEDDINGS_API_BASE_URL=http://internal-embedding-endpoint:8000/v1

Local Development

Recommended Models:

qwq-32b with quantization for reasoning
jina-embeddings-v2-small-en for embeddings

Configuration Example:

LLM_MODEL=qwq-32b-q8
LLM_API_KEY=                               # Often not needed for local deployments
LLM_API_BASE_URL=http://localhost:8000/v1

EMBEDDINGS_MODEL=jina-embeddings-v2-small-en
EMBEDDINGS_API_KEY=                        # Often not needed for local deployments
EMBEDDINGS_API_BASE_URL=http://localhost:8001/v1

Performance Considerations

When choosing your model configuration, consider these factors:

Reasoning Complexity: For complex multi-agent orchestration with sophisticated reasoning, prefer o3-mini or deepseek-r1
Latency Requirements: For lower latency, choose:
- Cloud providers with regional deployments closer to your application
- Models with smaller parameter sizes like o1-mini
- Quantized versions of larger models (e.g., qwq-32b-q8)
Privacy and Security: For sensitive workloads requiring data privacy:
- Use Azure OpenAI with your own dedicated endpoint
- Deploy open-source models (deepseek-r1, qwq-32b, jina-embeddings-v2-small-en) in your own environment
- Ensure proper network isolation and access controls
Resource Constraints: For resource-limited environments:
- Use quantized models (models with -q4 or -q8 suffixes)
- Consider hardware-optimized versions (e.g., -mlx for Apple Silicon)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model-configuration.md

model-configuration.md

Model Configuration for the orra Plan Engine

Supported Models

Reasoning Models

Embedding Models

Model Variants

Configuration

Basic Configuration

Hosting Options

Cloud-based Hosting

OpenAI and Azure

AI Model Hosting Platforms

On-premises Deployment

Local Development

Performance Considerations

Files

model-configuration.md

Latest commit

History

model-configuration.md

File metadata and controls

Model Configuration for the orra Plan Engine

Supported Models

Reasoning Models

Embedding Models

Model Variants

Configuration

Basic Configuration

Hosting Options

Cloud-based Hosting

OpenAI and Azure

AI Model Hosting Platforms

On-premises Deployment

Local Development

Performance Considerations