This guide shows client-facing examples for the active unified RAG API. For a compact endpoint list, see RAG_API_Documentation.md. For implementation internals, see RAG-Developer-Guide.md.
http://localhost:8000/api/v1/rag
Single-user deployments use:
X-API-KEY: your-api-keyMulti-user deployments use:
Authorization: Bearer <JWT>All JSON examples below also require:
Content-Type: application/jsonPOST /ablatePOST /searchPOST /search/streamPOST /batchPOST /batch/resume/{checkpoint_id}GET /simpleGET /advancedGET /capabilitiesGET /vlm/backendsGET /featuresPOST /feedback/implicitGET /health/simpleGET /healthGET /health/liveGET /health/readyGET /cache/statsPOST /cache/clearGET /cache/warmGET /metrics/summaryGET /costs/summaryGET /batch/jobsPOST /quality-gatePOST /baseline/saveGET /regression/checkPOST /regression/check
POST /search accepts UnifiedRAGRequest.
Common fields:
type SearchMode = "fts" | "vector" | "hybrid";
type PublicSource = "media_db" | "notes" | "characters" | "chats" | "kanban" | "sql";
interface UnifiedSearchRequest {
query: string;
sources?: PublicSource[];
search_mode?: SearchMode;
top_k?: number;
min_score?: number;
keyword_filter?: string[];
expand_query?: boolean;
enable_reranking?: boolean;
reranking_strategy?: "flashrank" | "cross_encoder" | "hybrid" | "llama_cpp" | "llm_scoring" | "two_tier" | "none";
enable_citations?: boolean;
enable_generation?: boolean;
strategy?: "standard" | "agentic";
session_id?: string;
}curl -X POST http://localhost:8000/api/v1/rag/search \
-H "X-API-KEY: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"query": "machine learning concepts",
"sources": ["media_db"],
"search_mode": "hybrid",
"top_k": 5
}'curl -X POST http://localhost:8000/api/v1/rag/search \
-H "X-API-KEY: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"query": "How do transformer models use attention?",
"sources": ["media_db", "notes"],
"search_mode": "hybrid",
"enable_generation": true,
"enable_citations": true,
"top_k": 10
}'Agentic mode is selected through the unified search route.
curl -X POST http://localhost:8000/api/v1/rag/search \
-H "X-API-KEY: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"query": "Compare the competing explanations in my saved material",
"sources": ["media_db", "notes"],
"strategy": "agentic",
"enable_generation": true,
"agentic_top_k_docs": 5,
"agentic_quote_spans": true
}'POST /search/stream streams NDJSON and requires enable_generation: true.
const response = await fetch("http://localhost:8000/api/v1/rag/search/stream", {
method: "POST",
headers: {
"X-API-KEY": "your-api-key",
"Content-Type": "application/json"
},
body: JSON.stringify({
query: "Explain retrieval augmented generation",
sources: ["media_db"],
search_mode: "hybrid",
enable_generation: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
let newlineIndex;
while ((newlineIndex = buffer.indexOf("\n")) >= 0) {
const line = buffer.slice(0, newlineIndex);
buffer = buffer.slice(newlineIndex + 1);
if (!line.trim()) continue;
const event = JSON.parse(line);
if (event.type === "delta") {
process.stdout.write(event.text);
}
}
}curl -X POST http://localhost:8000/api/v1/rag/batch \
-H "X-API-KEY: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"queries": [
"What is RAG?",
"Explain vector search",
"When should I use FTS?"
],
"sources": ["media_db"],
"search_mode": "hybrid",
"max_concurrent": 5,
"enable_checkpoint": true
}'Resume a checkpointed batch:
curl -X POST http://localhost:8000/api/v1/rag/batch/resume/checkpoint-id \
-H "X-API-KEY: your-api-key"Simple:
curl -G http://localhost:8000/api/v1/rag/simple \
-H "X-API-KEY: your-api-key" \
--data-urlencode "query=machine learning" \
--data-urlencode "top_k=5"Advanced:
curl -G http://localhost:8000/api/v1/rag/advanced \
-H "X-API-KEY: your-api-key" \
--data-urlencode "query=quantum computing notes" \
--data-urlencode "with_citations=true" \
--data-urlencode "with_answer=true"Use runtime discovery rather than hard-coding optional feature support:
curl http://localhost:8000/api/v1/rag/capabilities \
-H "X-API-KEY: your-api-key"Related discovery endpoints:
curl http://localhost:8000/api/v1/rag/features \
-H "X-API-KEY: your-api-key"
curl http://localhost:8000/api/v1/rag/vlm/backends \
-H "X-API-KEY: your-api-key"curl http://localhost:8000/api/v1/rag/health/live \
-H "X-API-KEY: your-api-key"
curl http://localhost:8000/api/v1/rag/health/ready \
-H "X-API-KEY: your-api-key"
curl http://localhost:8000/api/v1/rag/cache/stats \
-H "X-API-KEY: your-api-key"UnifiedRAGResponse commonly includes:
{
"documents": [
{
"id": "doc_123",
"content": "Matched content...",
"metadata": {"title": "Example"},
"score": 0.92
}
],
"query": "machine learning concepts",
"expanded_queries": [],
"metadata": {"sources_searched": ["media_db"]},
"timings": {"total": 0.25},
"generated_answer": "Optional generated answer",
"citations": []
}- Prefer
POST /searchfor most use cases. - Use
search_mode: "hybrid"unless you explicitly need exact text matching or vector-only behavior. Vector behavior is source-dependent; sources without vector indexing use their source-specific retrieval path. - Use
strategy: "agentic"throughPOST /searchwhen you need the agentic retrieval strategy. - Use
POST /search/streamfor generated answers that should render incrementally. - Call
/capabilitiesat startup if your client exposes optional controls.