Skip to content

Commit bab38a4

Browse files
committed
docs: document 10-minute semantic router quickstart flow
Signed-off-by: samzong <[email protected]>
1 parent b3658bc commit bab38a4

File tree

8 files changed

+672
-0
lines changed

8 files changed

+672
-0
lines changed

examples/quickstart/README.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Semantic Router Quickstart
2+
3+
> ⚠️ This is an initial skeleton for the 10-minute quickstart workflow. Content will be expanded in follow-up tasks.
4+
5+
## Goal
6+
Provide a single command path (`make quickstart`) that prepares the router, runs a small evaluation, and surfaces a concise report so that new users can validate the system within 10 minutes.
7+
8+
## Structure
9+
- `quickstart.sh` – orchestrates dependency checks, model downloads, and service startup.
10+
- `quick-eval.sh` – executes a minimal benchmark run and captures results.
11+
- `config-quickstart.yaml` – opinionated defaults for running the router locally.
12+
- `sample-data/` – trimmed datasets used for fast evaluation.
13+
- `templates/` – report and config templates shared by quickstart scripts.
14+
15+
## Next Steps
16+
1. Teach the benchmark loader to honor `QUICKSTART_SAMPLE_ROOT` so local JSONL slices are used offline.
17+
2. Add a Makefile target (`quickstart`) that chains router bootstrap and quick evaluation.
18+
3. Create CI smoke tests that run the 10-minute flow with the trimmed datasets.
19+
20+
## Quick Evaluation
21+
Run the standalone evaluator once the router is healthy. A typical flow looks like:
22+
23+
```bash
24+
./examples/quickstart/quickstart.sh & # starts router (Ctrl+C to stop)
25+
./examples/quickstart/quick-eval.sh --dataset mmlu --samples 5 --mode router
26+
```
27+
28+
The evaluation script will place raw artifacts under `examples/quickstart/results/<timestamp>/raw` and derive:
29+
- `quickstart-summary.csv` – compact metrics table for spreadsheets or dashboards.
30+
- `quickstart-report.md` – Markdown summary suitable for PRs or runbooks.
31+
32+
Key flags:
33+
- `--mode router|vllm|both` to toggle which side runs.
34+
- `--samples` to tune runtime vs. statistical confidence.
35+
- `--output-dir` for custom destinations (defaults to timestamped folder).
36+
- All settings also respect `QUICKSTART_*` environment overrides.
37+
38+
## Local Sample Data
39+
The `sample-data/` directory now includes trimmed JSONL slices for quick runs:
40+
- `mmlu-sample.jsonl` – 10 multi-category academic questions.
41+
- `arc-sample.jsonl` – 10 middle-school science questions with ARC-style options.
42+
43+
Each record follows the same schema that the benchmark loader expects (`question_id`, `category`, `question`, `options`, `answer`, optional `cot_content`). Sizes stay under 10 KB per file so the quickstart remains lightweight.
44+
45+
**Integration hook**: upcoming work will extend `bench/vllm_semantic_router_bench` to read from these JSONL files whenever `QUICKSTART_SAMPLE_ROOT` is set (falling back to Hugging Face datasets otherwise). Keep the files committed and deterministic so that the automated 10-minute flow can depend on them once the loader change lands.
46+
47+
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Quickstart configuration tuned for a single-node developer setup.
2+
# Keeps routing options minimal while remaining compatible with the default assets
3+
# shipped by `make download-models`.
4+
5+
bert_model:
6+
model_id: sentence-transformers/all-MiniLM-L12-v2
7+
threshold: 0.6
8+
use_cpu: true
9+
10+
semantic_cache:
11+
enabled: false
12+
backend_type: "memory"
13+
14+
prompt_guard:
15+
enabled: false
16+
use_modernbert: true
17+
model_id: "models/jailbreak_classifier_modernbert-base_model"
18+
threshold: 0.7
19+
use_cpu: true
20+
jailbreak_mapping_path: "models/jailbreak_classifier_modernbert-base_model/jailbreak_type_mapping.json"
21+
22+
classifier:
23+
category_model:
24+
model_id: "models/category_classifier_modernbert-base_model"
25+
threshold: 0.6
26+
use_cpu: true
27+
use_modernbert: true
28+
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"
29+
pii_model:
30+
model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
31+
threshold: 0.7
32+
use_cpu: true
33+
pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"
34+
35+
vllm_endpoints:
36+
- name: "local-vllm"
37+
address: "127.0.0.1"
38+
port: 8000
39+
models:
40+
- "openai/gpt-oss-20b"
41+
weight: 1
42+
43+
model_config:
44+
"openai/gpt-oss-20b":
45+
preferred_endpoints: ["local-vllm"]
46+
reasoning_family: "gpt-oss"
47+
pii_policy:
48+
allow_by_default: true
49+
50+
categories:
51+
- name: general
52+
system_prompt: "You are a helpful and knowledgeable assistant. Provide concise, accurate answers."
53+
model_scores:
54+
- model: openai/gpt-oss-20b
55+
score: 0.7
56+
use_reasoning: false
57+
58+
- name: reasoning
59+
system_prompt: "You explain your reasoning with clear numbered steps before giving a final answer."
60+
model_scores:
61+
- model: openai/gpt-oss-20b
62+
score: 0.6
63+
use_reasoning: true
64+
65+
- name: safety
66+
system_prompt: "You prioritize safe completions and refuse harmful requests."
67+
model_scores:
68+
- model: openai/gpt-oss-20b
69+
score: 0.5
70+
use_reasoning: false
71+
72+
default_model: openai/gpt-oss-20b
73+
74+
reasoning_families:
75+
gpt-oss:
76+
type: "chat_template_kwargs"
77+
parameter: "thinking"
78+
79+
api:
80+
batch_classification:
81+
metrics:
82+
enabled: false
83+
84+
# Tool auto-selection is available but disabled for quickstart.
85+
tools:
86+
enabled: false
87+
top_k: 3
88+
similarity_threshold: 0.2
89+
tools_db_path: "config/tools_db.json"
90+
fallback_to_empty: true

0 commit comments

Comments
 (0)