Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ export LD_LIBRARY_PATH="../../sdk/runanywhere-commons/dist/linux/x86_64:../../sd
./build/test-pipeline /path/to/audio.wav
```

### Hybrid Routing System

The SDK includes a hybrid routing system for STT (extensible to LLM/TTS). See [docs/impl/hybrid-routing.md](docs/impl/hybrid-routing.md) for architecture, confidence cascade, API key setup, and adding new providers.

### Standard commands

See `CLAUDE.md` for comprehensive build/test/lint commands for all SDK platforms. See `CONTRIBUTING.md` for contributor setup flow.
5 changes: 5 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,12 +317,17 @@ The SDK uses Kotlin Multiplatform to share code across JVM, Android, and Native
- `ModelManager` - Model downloading and lifecycle
- `ConfigurationService` - Environment-specific configuration

### Hybrid Routing System

The SDK routes AI requests between local and cloud backends automatically. See [docs/impl/hybrid-routing.md](docs/impl/hybrid-routing.md) for full architecture, confidence cascade logic, API key setup, and how to add new providers.

### Design Patterns

1. **Repository Pattern**: Data access abstraction with platform-specific implementations
2. **Service Container**: Centralized dependency injection
3. **Event Bus**: Reactive communication between components
4. **Provider Pattern**: Platform-specific service providers (STT, VAD)
5. **Hybrid Router**: Condition-based backend selection with confidence cascade (see `sdk/runanywhere-kotlin/src/commonMain/.../routing/`)

### Platform Requirements

Expand Down
204 changes: 204 additions & 0 deletions docs/impl/hybrid-routing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Hybrid Routing System

The hybrid routing system decides which AI backend handles each request at runtime. Local backends are preferred by default. When local inference confidence is low, the system automatically cascades to a cloud backend. Adding a new provider requires implementing one interface and one registration call.

## How it works

When `transcribeWithOptions(audio, options)` is called:

1. The router gathers all registered backends for the requested capability (STT, LLM, TTS).
2. Backends whose conditions fail are excluded (e.g., cloud backend excluded when offline, local excluded when model not loaded).
3. The routing policy is applied. `LOCAL_ONLY` keeps only local backends, `CLOUD_ONLY` keeps only cloud.
4. Remaining candidates are scored and sorted. Highest score wins.
5. The primary backend (local) transcribes the audio.
6. A confidence score is checked. If below the threshold (0.5), the same audio is sent to the next candidate (cloud). This is the confidence cascade.
7. After cloud fallback, the local model is restored so the next request routes locally again.

The confidence score is currently a mock (`Random.nextFloat()`). Replace it with real inference confidence when available from the C++ layer.

## Confidence cascade

This is per-request routing, not per-chunk. The full audio is transcribed locally first. If confidence is low, the full audio is re-sent to cloud.

```
Record full audio
|
Whisper transcribes (local) -> result + confidence
|
confidence >= 0.5?
YES -> return local result
NO -> load Sarvam -> send same audio to cloud -> return cloud result
restore Whisper model for next request
```

The cascade only triggers when:
- The primary backend is local (`isLocalOnly`)
- Confidence is below threshold (currently 0.5)
- There is a next candidate in the sorted list (cloud backend)

If the cloud fallback also fails, the local result is returned despite low confidence.

## Routing result metadata

`STTOutput` includes routing fields the UI can display:

- `routingBackendId` — which backend produced the result (e.g., "whisper-local", "sarvam-cloud")
- `routingBackendName` — human-readable name (e.g., "Whisper (Local)", "Sarvam AI (Cloud)")
- `wasFallback` — true if the result came from cloud after a low-confidence local result
- `primaryConfidence` — the local confidence score that triggered the fallback (null if no fallback)
- `confidence` — the confidence score of the final result

## Routing conditions

Each backend declares its own conditions. The router never injects conditions from outside.

Available conditions:

- `LocalOnly` — marks the backend as local. Adds +50 score bonus.
- `NetworkRequired` — excluded when offline.
- `ModelAvailability(modelId, isModelLoaded)` — excluded when the specific model is not loaded. The check is a lambda evaluated at routing time. WhisperSTTBackend checks that the loaded model ID contains "whisper" or "sherpa". SarvamSTTBackend loads its model on demand.
- `Custom(description, check)` — arbitrary check. Used for "API key configured?" on Sarvam.
- `QualityTier(HIGH | STANDARD | LOW)` — affects ranking under `PREFER_ACCURACY` policy.
- `CostModel(costPerMinuteCents)` — free backends get +20 bonus.

## Routing policies

Set via `STTOptions.routingPolicy`:

- `AUTO` — local wins by default (score 270 vs 80). Cloud is fallback via confidence cascade.
- `PREFER_LOCAL` — local gets additional +50 bonus, cloud gets -30 penalty.
- `PREFER_ACCURACY` — `QualityTier(HIGH)` gets +50 bonus.
- `LOCAL_ONLY` — cloud excluded entirely. No cascade.
- `CLOUD_ONLY` — local excluded entirely.
- `FRAMEWORK_PREFERRED` — `preferredFramework` match gets +200 bonus.

## Default scoring

| Backend | Base | LocalOnly | CostFree | Total |
|----------------|------|-----------|----------|-------|
| Whisper (local) | 200 | +50 | +20 | 270 |
| Sarvam (cloud) | 80 | -- | -- | 80 |

## API key setup (Sarvam)

The Sarvam API key is set in the example app at:

```
examples/android/RunAnywhereAI/app/src/main/java/com/runanywhere/runanywhereai/data/ModelList.kt
```

```kotlin
Sarvam.register(apiKey = "YOUR_SARVAM_API_KEY")
```

If no API key is set, the `Custom("Sarvam API key configured")` condition fails and Sarvam is excluded from candidates. The app works with local-only STT.

For production, move the key to `secrets.properties`, `local.properties` / BuildConfig, or remote config.

## Language mapping (Sarvam)

Sarvam requires Indian locale codes (e.g., `en-IN`, `hi-IN`). The `SarvamSTTBackend` maps bare language codes automatically:

- `en` -> `en-IN`
- `hi` -> `hi-IN`
- `auto` -> `unknown`
- Codes already containing `-IN` are passed through

## Adding a new backend

Create one file implementing `STTBackend`:

```kotlin
class GoogleSTTBackend : STTBackend {

override fun descriptors() = listOf(
BackendDescriptor(
moduleId = "google-stt",
moduleName = "Google Cloud STT",
capability = SDKComponent.STT,
inferenceFramework = InferenceFramework.GOOGLE,
basePriority = 80,
conditions = listOf(
RoutingCondition.NetworkRequired,
RoutingCondition.QualityTier(BackendQuality.HIGH),
RoutingCondition.CostModel(costPerMinuteCents = 1.5f),
RoutingCondition.Custom("API key set", check = { GoogleBridge.hasApiKey() }),
),
)
)

override suspend fun transcribe(audioData: ByteArray, options: STTOptions): STTOutput {
// call your HTTP API here
}
}
```

Register it in `HybridRouterRegistry.initialize()`:

```kotlin
val backends: List<STTBackend> = listOf(
WhisperSTTBackend(),
SarvamSTTBackend(),
GoogleSTTBackend(), // add this line
)
```

Nothing else changes.

## Adding LLM or TTS support

Same pattern. Create an `LLMBackend` interface alongside `STTBackend`:

```kotlin
interface LLMBackend : RoutableBackend {
suspend fun generate(prompt: String, options: LLMOptions): LLMOutput
}
```

The `HybridRouter` class requires no changes -- it is capability-agnostic.

## File locations

```
sdk/runanywhere-kotlin/src/
commonMain/.../routing/
RoutingCondition.kt conditions a backend can declare
RoutingContext.kt runtime snapshot passed to the router
RoutingPolicy.kt user-level preference enum
BackendDescriptor.kt backend self-declaration type
RoutableBackend.kt interface all routable backends implement
STTBackend.kt interface for STT-capable backends
HybridRouter.kt the decision engine
RoutingResult.kt routing metadata type

commonMain/.../STT/STTTypes.kt
STTOutput includes routingBackendId, wasFallback, primaryConfidence

jvmAndroidMain/.../routing/
HybridRouterRegistry.kt singleton, initializes router, maps moduleId to backend
NetworkAvailability.kt cross-platform network check (reflection for Android)

jvmAndroidMain/.../backends/stt/
WhisperSTTBackend.kt local Whisper backend (ModelAvailability checks loaded model ID)
SarvamSTTBackend.kt Sarvam cloud backend (on-demand model loading, language mapping)

jvmAndroidMain/.../public/extensions/
RunAnywhere+STT.jvmAndroid.kt confidence cascade logic, model restoration after fallback

Tests:
commonTest/.../routing/HybridRouterTest.kt 9 unit tests, no device needed
androidInstrumentedTest/.../routing/STTRoutingInstrumentedTest.kt 5 device tests

Example app:
examples/android/RunAnywhereAI/.../data/ModelList.kt Sarvam.register(apiKey)
examples/android/RunAnywhereAI/.../stt/SpeechToTextViewModel.kt uses transcribeWithOptions, shows routing info
examples/android/RunAnywhereAI/.../stt/SpeechToTextScreen.kt RoutingInfoRow composable
```

## What is mocked

The confidence score used for cascade decisions is currently `Random.nextFloat()`. To replace with real confidence:

1. The C++ Whisper backend already returns a confidence value in `TranscriptionResult.confidence`
2. In `RunAnywhere+STT.jvmAndroid.kt`, replace `val mockConfidence = kotlin.random.Random.nextFloat()` with the actual `result.confidence` from the backend
3. Remove the `confidence = mockConfidence` override in the `copy()` call
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package com.runanywhere.runanywhereai.data

import timber.log.Timber
import com.runanywhere.runanywhereai.data.models.AppModel
import com.runanywhere.sdk.cloud.sarvam.Sarvam
import com.runanywhere.sdk.core.onnx.ONNX
import com.runanywhere.sdk.core.types.InferenceFramework
import com.runanywhere.sdk.llm.llamacpp.LlamaCPP
Expand Down Expand Up @@ -182,12 +183,30 @@ object ModelList {
try {
LlamaCPP.register(priority = 100)
ONNX.register(priority = 100)
Timber.i("Backends registered")
Timber.i("Local backends registered")
} catch (e: Exception) {
Timber.e(e, "Failed to register backends")
Timber.e(e, "Failed to register local backends")
return
}

// Cloud backends
try {
Sarvam.register(apiKey = "YOUR_SARVAM_API_KEY")

// Register Sarvam model in C++ registry (not in UI model lists)
RunAnywhere.registerModel(
id = "sarvam:saarika:v2.5",
name = "Sarvam Saarika v2.5",
url = "",
framework = InferenceFramework.SARVAM,
modality = ModelCategory.SPEECH_RECOGNITION,
memoryRequirement = 0,
)
Timber.i("Cloud backends registered")
} catch (e: Exception) {
Timber.w(e, "Failed to register cloud backends (non-fatal)")
}

val allModels = listOf(
"LLM/STT/TTS" to (llmModels + sttModels + ttsModels),
"Embedding" to embeddingModels,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,9 @@ class ModelSelectionViewModel(
// Filter models by context - matches iOS relevantCategories filtering
val filteredModels =
allModels.filter { model ->
isModelRelevantForContext(model.category, context)
isModelRelevantForContext(model.category, context) &&
// Exclude cloud backends from model picker — routing handles them
model.framework != InferenceFramework.SARVAM
}
Timber.d("📦 Filtered to ${filteredModels.size} models for context $context")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import androidx.compose.foundation.layout.width
import androidx.compose.foundation.shape.RoundedCornerShape
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.automirrored.filled.KeyboardArrowRight
import androidx.compose.material.icons.filled.Cloud
import androidx.compose.material.icons.filled.Description
import androidx.compose.material.icons.filled.GraphicEq
import androidx.compose.material.icons.filled.Speed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,16 @@ fun SpeechToTextScreen(
modifier = Modifier.weight(1f),
)

// Routing info
if (uiState.routingBackendId != null) {
RoutingInfoRow(
backendName = uiState.routingBackendName ?: uiState.routingBackendId!!,
wasFallback = uiState.wasFallback,
confidence = uiState.metrics?.confidence ?: 0f,
primaryConfidence = uiState.primaryConfidence,
)
}

uiState.errorMessage?.let { error ->
Text(
text = error,
Expand Down Expand Up @@ -1072,3 +1082,77 @@ private fun RecordingButton(
}
}
}

@Composable
private fun RoutingInfoRow(
backendName: String,
wasFallback: Boolean,
confidence: Float,
primaryConfidence: Float?,
) {
Surface(
modifier = Modifier
.fillMaxWidth()
.padding(horizontal = 16.dp, vertical = 4.dp),
shape = RoundedCornerShape(8.dp),
color = if (wasFallback) {
AppColors.primaryOrange.copy(alpha = 0.1f)
} else {
AppColors.primaryGreen.copy(alpha = 0.1f)
},
) {
Row(
modifier = Modifier.padding(horizontal = 12.dp, vertical = 8.dp),
verticalAlignment = Alignment.CenterVertically,
horizontalArrangement = Arrangement.SpaceBetween,
) {
Column(modifier = Modifier.weight(1f)) {
Text(
text = if (wasFallback) "Cloud Fallback"
else if (backendName.contains("Cloud", ignoreCase = true) || backendName.contains("Sarvam", ignoreCase = true)) "Cloud"
else "Local",
style = MaterialTheme.typography.labelMedium,
fontWeight = FontWeight.Bold,
color = if (wasFallback || backendName.contains("Cloud", ignoreCase = true) || backendName.contains("Sarvam", ignoreCase = true)) AppColors.primaryOrange else AppColors.primaryGreen,
)
Text(
text = backendName,
style = MaterialTheme.typography.bodySmall,
color = AppColors.textSecondary,
)
}
Column(horizontalAlignment = Alignment.End) {
Text(
text = "${(confidence * 100).toInt()}%",
style = MaterialTheme.typography.titleMedium,
fontWeight = FontWeight.Bold,
color = when {
confidence >= 0.5f -> AppColors.primaryGreen
else -> AppColors.primaryOrange
},
)
Text(
text = "confidence",
style = MaterialTheme.typography.labelSmall,
color = AppColors.textSecondary,
)
}
if (wasFallback && primaryConfidence != null) {
Spacer(modifier = Modifier.width(12.dp))
Column(horizontalAlignment = Alignment.End) {
Text(
text = "${(primaryConfidence * 100).toInt()}%",
style = MaterialTheme.typography.titleMedium,
fontWeight = FontWeight.Bold,
color = Color.Red.copy(alpha = 0.7f),
)
Text(
text = "local score",
style = MaterialTheme.typography.labelSmall,
color = AppColors.textSecondary,
)
}
}
}
}
}
Loading
Loading