Skip to content

Commit e8a1368

Browse files
authored
Merge pull request #50 from thushan/feature/healthy-routing
feat: Improved health recovery
2 parents 9cb6e06 + 0728c23 commit e8a1368

File tree

77 files changed

+4273
-502
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+4273
-502
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ debug/
66
logs/
77
tmp/
88
*.exe
9-
olla
9+
/olla
1010
# Local config files (never ship these)
1111
config.local.yaml
1212
config/*.local.yaml

CLAUDE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,9 +87,9 @@ olla/
8787
## Response Headers
8888
- `X-Olla-Endpoint`: Backend name
8989
- `X-Olla-Model`: Model used
90-
- `X-Olla-Backend-Type`: ollama/openai/lmstudio
90+
- `X-Olla-Backend-Type`: ollama/openai/lmstudio/vllm
9191
- `X-Olla-Request-ID`: Request ID
92-
- `X-Olla-Response-Time`: Total time (trailer)
92+
- `X-Olla-Response-Time`: Total processing time
9393

9494
## Testing
9595
- Unit tests: Components in isolation

config/config.yaml

Lines changed: 41 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,12 +37,37 @@ proxy:
3737
connection_timeout: 60s
3838
response_timeout: 900s
3939
read_timeout: 600s
40-
max_retries: 3
41-
retry_backoff: 500ms
40+
41+
# DEPRECATED as of v0.0.16 - These fields are no longer used
42+
# max_retries: 3 # Replaced by retry.max_attempts
43+
# retry_backoff: 500ms # Now uses intelligent exponential backoff
44+
45+
# Connection failure retry settings (applies to both Sherpa and Olla engines)
46+
# When enabled, the proxy will automatically retry failed requests on other healthy endpoints
47+
retry:
48+
enabled: true # Enable automatic retry on connection failures
49+
on_connection_failure: true # Retry when connection to backend fails (connection refused, reset, timeout)
50+
max_attempts: 0 # Maximum retry attempts (0 = try all available endpoints once)
51+
# Connection errors that trigger retry:
52+
# - Connection refused (backend is down)
53+
# - Connection reset (backend crashed)
54+
# - Connection timeout (backend is overloaded)
55+
# - Network unreachable (network issues)
56+
# Failed endpoints are immediately marked as unhealthy and removed from the retry pool
4257

4358
discovery:
4459
type: "static"
4560
refresh_interval: 30s
61+
62+
# Health check and recovery settings
63+
health_check:
64+
initial_delay: 1s # Delay before first health check
65+
# When an endpoint fails during request processing:
66+
# - It's immediately marked as unhealthy
67+
# - Consecutive failures increment, causing exponential backoff
68+
# - Next check time = now + (consecutive_failures * 2) seconds (max 60s)
69+
# - Health checker will automatically recover endpoints when they're back online
70+
4671
static:
4772
endpoints:
4873
- url: "http://localhost:11434"
@@ -51,7 +76,7 @@ discovery:
5176
priority: 100
5277
model_url: "/api/tags"
5378
health_check_url: "/"
54-
check_interval: 2s
79+
check_interval: 2s # How often to check when healthy
5580
check_timeout: 1s
5681
- url: "http://localhost:11234"
5782
name: "local-lm-studio"
@@ -80,10 +105,23 @@ discovery:
80105
model_registry:
81106
type: "memory"
82107
enable_unifier: true
108+
83109
unification:
84110
enabled: true
85111
stale_threshold: 24h # How long to keep models in memory after last seen
86112
cleanup_interval: 10m # How often to check for stale models
113+
114+
# Model routing strategy (v0.0.16+)
115+
# Controls how requests are routed when models aren't available on all endpoints
116+
routing_strategy:
117+
type: "strict" # Options: strict, optimistic, discovery
118+
options:
119+
# Fallback behavior when model not found (optimistic mode)
120+
fallback_behavior: "compatible_only" # Options: compatible_only, all, none
121+
122+
# Discovery mode settings
123+
discovery_timeout: 2s # Timeout for discovery refresh
124+
discovery_refresh_on_miss: false # Refresh discovery when model not found
87125

88126
logging:
89127
level: "info" # debug, info, warn, error

docs/content/api-reference/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ All responses include:
8787
| `X-Olla-Endpoint` | Backend endpoint name |
8888
| `X-Olla-Model` | Model used (if applicable) |
8989
| `X-Olla-Backend-Type` | Provider type (ollama/lmstudio/openai/vllm) |
90-
| `X-Olla-Response-Time` | Total processing time (trailer) |
90+
| `X-Olla-Response-Time` | Total processing time |
9191

9292
## Error Responses
9393

docs/content/concepts/health-checking.md

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -128,12 +128,13 @@ endpoints:
128128
129129
When an endpoint fails, Olla implements exponential backoff:
130130
131-
1. **First failure**: Check again after `check_interval`
131+
1. **First failure**: Check again after `check_interval` (no backoff)
132132
2. **Second failure**: Wait `check_interval * 2`
133133
3. **Third failure**: Wait `check_interval * 4`
134-
4. **Max backoff**: Capped at 5 minutes
134+
4. **Fourth failure**: Wait `check_interval * 8`
135+
5. **Max backoff**: Capped at `check_interval * 12` or 60 seconds (whichever is lower)
135136

136-
This reduces load on failing endpoints while still detecting recovery.
137+
This reduces load on failing endpoints while still detecting recovery quickly on the first failure.
137138

138139
### Fast Recovery Detection
139140

@@ -143,6 +144,17 @@ When an unhealthy endpoint might be recovering:
143144
2. **Success Threshold**: After 2 successful checks, mark healthy
144145
3. **Full Traffic**: Resume normal routing
145146

147+
### Automatic Model Discovery on Recovery
148+
149+
When an endpoint recovers from an unhealthy state, Olla automatically:
150+
151+
1. **Detects Recovery**: Health check transitions from unhealthy to healthy
152+
2. **Triggers Discovery**: Automatically initiates model discovery
153+
3. **Updates Catalog**: Refreshes the unified model catalog with latest models
154+
4. **Resumes Routing**: Endpoint is immediately available for request routing
155+
156+
This ensures the model catalog stays up-to-date even if models were added/removed while the endpoint was down.
157+
146158
## Health Check Types
147159

148160
### HTTP GET Health Checks
@@ -169,6 +181,25 @@ endpoints:
169181
# Health check also validates model availability
170182
```
171183

184+
## Connection Failure Handling
185+
186+
### Automatic Retry on Connection Failures
187+
188+
When a request fails due to connection issues, Olla automatically:
189+
190+
1. **Detects Failure**: Identifies connection refused, reset, or timeout errors
191+
2. **Marks Unhealthy**: Immediately updates endpoint status to unhealthy
192+
3. **Retries Request**: Automatically tries the next available healthy endpoint
193+
4. **Updates Health**: Triggers exponential backoff for failed endpoint
194+
195+
This happens transparently without dropping the user request. The retry behaviour is automatic and built-in as of v0.0.16.
196+
197+
Connection errors that trigger automatic retry:
198+
- **Connection Refused**: Backend service is down
199+
- **Connection Reset**: Backend crashed or restarted
200+
- **Connection Timeout**: Backend is overloaded
201+
- **Network Unreachable**: Network connectivity issues
202+
172203
## Circuit Breaker Integration
173204

174205
Health checks work with the circuit breaker to prevent cascade failures:
@@ -193,10 +224,10 @@ Health checks work with the circuit breaker to prevent cascade failures:
193224
194225
The circuit breaker activates after consecutive failures:
195226
196-
1. **Failure Threshold**: 3 consecutive failures trigger opening
227+
1. **Failure Threshold**: 3 failures (health checker) or 5 failures (Olla proxy engine)
197228
2. **Open Duration**: Circuit stays open for 30 seconds
198-
3. **Half-Open Test**: Send 3 test requests
199-
4. **Recovery**: 2 successful tests close the circuit
229+
3. **Half-Open Test**: Allows one test request through
230+
4. **Recovery**: First successful request closes the circuit
200231
201232
## Monitoring Health Status
202233

0 commit comments

Comments
 (0)