Skip to content

Commit f3f699b

Browse files
committed
feat(llm): add sip flow audit example
1 parent 784ba60 commit f3f699b

12 files changed

Lines changed: 304 additions & 6 deletions

File tree

.mem/hot.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
- DTMF is media event; RFC4733 telephone-event primary.
3333
- G.711 PCMU/PCMA required early; do not depend on stdlib `audioop`.
3434
- Asterisk GPL risk reinforces separate Python process, not loadable Asterisk module.
35-
- Current implementation version: `1.6.1`.
35+
- Current implementation version: `1.7.0`.
3636
- `AGENTS.md` requires small commit blocks with version bump, `CHANGELOG.md`, `TODO.md`, `.spec/*`, `.mem/*`, validation, and explicit staged paths.
3737
- `sipx` package now exists with core modules for events, timeline, verdict, artifacts, metrics, capabilities, expectations, actors, scenarios, and harness runtime.
3838
- `MockBackend` is the default no-network backend for `Harness()`.
@@ -43,6 +43,7 @@
4343
- Public Mizu demo profile is in `examples/mizu/harness.toml`; private proxy test data must not be committed.
4444
- `LLMChatClient` exists in `sipx.llm`; live LLM validation is opt-in via `SIPX_LLM_API_KEY` and secrets must not be committed.
4545
- `LLMChatClient.from_env()` works with only `SIPX_LLM_API_KEY`; optional base URL/model/timeout use concrete defaults.
46+
- `examples/llm/sip_flow_audit.py` is the richer runnable LLM example; it emits structured SIP behavior/risk/findings/actions JSON.
4647
- Native softphone calls can send in-dialog SIP INFO DTMF via `NativeSoftphone.send_dtmf()` and CLI `sipx call --dtmf`.
4748
- Phone CLI network commands require a profile or explicit `--aor` and `--registrar`; without config they fail before opening sockets.
4849
- Phone CLI derives default remote host/port from `--registrar` when `--remote-host/--remote-port` are omitted.

.spec/checks.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -334,7 +334,7 @@
334334
| 2026-06-08 | final `uv run ty check` | pass | Configured type-check gate passes after `1.6.0` changes. |
335335
| 2026-06-08 | final `git diff --check` | pass/no output | No whitespace errors after `1.6.0` changes. |
336336
| 2026-06-08 | final `uv build --out-dir /tmp/opencode/sipx-build-1.6.0-final` | pass | Built final `1.6.0` sdist and wheel outside the repo. |
337-
| 2026-06-08 | final secret/provider-name scan | pass/no matches | No private proxy markers, inline LLM keys, SIP auth headers, or Mistral-specific names in code/tests/examples/docs/TOML/YAML. |
337+
| 2026-06-08 | final secret/provider-name scan | pass/no matches | No private proxy markers, inline LLM keys, SIP auth headers, or provider-specific names in code/tests/examples/docs/TOML/YAML. |
338338
| 2026-06-08 | final `docker --version` | blocked | Docker command is unavailable in this WSL environment; Asterisk integration remains opt-in and unrun. |
339339
| 2026-06-08 | `python -m pytest tests/test_llm.py tests/test_examples_templates.py` | pass | 8 passed, 1 skipped after LLM env default regression. |
340340
| 2026-06-08 | minimal LLM env default smoke | pass | `SIPX_LLM_API_KEY=test-key` builds `LLMChatClient.from_env()` with default base URL, model, and timeout. |
@@ -344,6 +344,16 @@
344344
| 2026-06-08 | final `uv run ty check` | pass | Configured type-check gate passes after `1.6.1` fix. |
345345
| 2026-06-08 | final `git diff --check` | pass/no output | No whitespace errors after `1.6.1` fix. |
346346
| 2026-06-08 | final `uv build --out-dir /tmp/opencode/sipx-build-1.6.1-final` | pass | Built final `1.6.1` sdist and wheel outside the repo. |
347+
| 2026-06-08 | `uv lock` | pass | Updated lockfile project version from `1.6.1` to `1.7.0`. |
348+
| 2026-06-08 | focused SIP-flow audit tests | pass | `tests/test_examples_templates.py tests/test_llm.py`: 11 passed, 1 skipped after auth-redaction audit fix. |
349+
| 2026-06-08 | runnable LLM/native example smoke | pass | Direct LLM smoke and SIP-flow audit skipped cleanly without key; native CLI flow printer emitted runnable commands. |
350+
| 2026-06-08 | final `python -m pytest` | pass | 128 passed, 3 skipped after `1.7.0` SIP-flow audit example. |
351+
| 2026-06-08 | final `ruff check .` | pass | All lint checks passed after `1.7.0` changes. |
352+
| 2026-06-08 | final `ruff format --check .` | pass | 80 files already formatted after `1.7.0` changes. |
353+
| 2026-06-08 | final `uv run ty check` | pass | Configured type-check gate passes after `1.7.0` changes. |
354+
| 2026-06-08 | final `git diff --check` | pass/no output | No whitespace errors after `1.7.0` changes. |
355+
| 2026-06-08 | final `uv build --out-dir /tmp/opencode/sipx-build-1.7.0-final` | pass | Built final `1.7.0` sdist and wheel outside the repo. |
356+
| 2026-06-08 | final secret/provider-name scan | pass/no matches | No private proxy markers, inline LLM keys, SIP auth headers, or provider-specific names in code/tests/examples/docs/TOML/YAML. |
347357

348358
## Validation Policy
349359

.spec/handoff.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Summary
44

5-
Project planning environment was initialized from `IDEA.md`. Blocks `0.2.0` through `1.6.1` added initial product code: harness core, mock backend, scenario artifacts, CLI run/export/replay/profile/phone/raw-SIP commands, reports, profiles, mixed actor binding, media primitives, redaction, SIP parser primitives, SDP audio offer/answer, RTP packet stats, RFC4733 DTMF, SIP dialog skeletons, INVITE/non-INVITE client transactions, REGISTER helper/flow, Digest auth helper, UAS INVITE skeleton, BYE helper, real UDP Native SIP transport/backend, strict INVITE/ACK/BYE call flow, CANCEL runtime, REGISTER over-UDP orchestration, INVITE Digest retry, transaction retransmission timers, parser fuzz tests, Asterisk ARI control-plane client/event backend, Asterisk channel/bridge/playback/hangup/DTMF timeline mapping, WebSocket media MVP, inbound `Stasis(sipx)` example, Docker Asterisk lab, headless native technical softphone, lab-only native SIP hooks, package-manager console script execution, GitHub CI/release workflows, fail-fast phone CLI config validation, curl-like SIP request commands, redacted packet debug output, native softphone SDP negotiation, simple `LLMChatClient` templates/tests, fixed LLM env defaults, in-dialog SIP INFO DTMF, runnable native examples, and passing `uv run ty check` type validation. SPEC T1-T43 plus V24-V35/B4-B10 are complete.
5+
Project planning environment was initialized from `IDEA.md`. Blocks `0.2.0` through `1.7.0` added initial product code: harness core, mock backend, scenario artifacts, CLI run/export/replay/profile/phone/raw-SIP commands, reports, profiles, mixed actor binding, media primitives, redaction, SIP parser primitives, SDP audio offer/answer, RTP packet stats, RFC4733 DTMF, SIP dialog skeletons, INVITE/non-INVITE client transactions, REGISTER helper/flow, Digest auth helper, UAS INVITE skeleton, BYE helper, real UDP Native SIP transport/backend, strict INVITE/ACK/BYE call flow, CANCEL runtime, REGISTER over-UDP orchestration, INVITE Digest retry, transaction retransmission timers, parser fuzz tests, Asterisk ARI control-plane client/event backend, Asterisk channel/bridge/playback/hangup/DTMF timeline mapping, WebSocket media MVP, inbound `Stasis(sipx)` example, Docker Asterisk lab, headless native technical softphone, lab-only native SIP hooks, package-manager console script execution, GitHub CI/release workflows, fail-fast phone CLI config validation, curl-like SIP request commands, redacted packet debug output, native softphone SDP negotiation, simple `LLMChatClient` templates/tests, fixed LLM env defaults, richer LLM SIP-flow audit example with auth redaction checks, in-dialog SIP INFO DTMF, runnable native examples, and passing `uv run ty check` type validation. SPEC T1-T45 plus V24-V37/B4-B11 are complete.
66

77
## Read First
88

.spec/state.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,9 @@ Implement `sipx` in verified commit blocks. Block `1.6.0` adds generic OpenAI-co
210210
- Bumped `pyproject.toml` version from `1.6.0` to `1.6.1`.
211211
- Recorded `SPEC.md` B10 and V35 after direct LLM example execution failed when optional `SIPX_LLM_TIMEOUT` was missing.
212212
- Fixed `LLMChatClient.from_env()` to use concrete defaults for optional env settings.
213+
- Bumped `pyproject.toml` version from `1.6.1` to `1.7.0`.
214+
- Added `examples/llm/sip_flow_audit.py` as a richer runnable LLM SIP-flow audit example with deterministic signals and structured JSON output.
215+
- Recorded `SPEC.md` B11 and V37 after focused validation caught redacted auth being treated as unredacted in the SIP-flow audit.
213216

214217
## Active Decision
215218

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# CHANGELOG
22

3+
## 1.7.0 - 2026-06-08
4+
5+
- Added runnable `examples/llm/sip_flow_audit.py` for richer LLM SIP-flow analysis.
6+
- The audit example extracts deterministic SIP signals, asks the LLM for structured JSON, and reports behavior, risk score, protocol findings, media assessment, and next actions.
7+
- The audit example flags unredacted SIP auth headers while accepting `[REDACTED]` auth markers.
8+
- Documented how to run the quick LLM smoke and richer SIP-flow audit examples directly or through `sipx scenario run`.
9+
- Bumped package version to `1.7.0`.
10+
311
## 1.6.1 - 2026-06-08
412

513
- Fixed `LLMChatClient.from_env()` so missing optional `SIPX_LLM_*` settings use concrete defaults instead of dataclass slot descriptors.

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -304,12 +304,15 @@ Scenario examples are run with the harness CLI:
304304

305305
```bash
306306
uv run sipx scenario run examples/llm/semantic_smoke.py --artifacts-dir artifacts
307+
uv run sipx scenario run examples/llm/sip_flow_audit.py --artifacts-dir artifacts
307308
```
308309

309310
You can also run the LLM scenario file directly:
310311

311312
```bash
312313
uv run python examples/llm/semantic_smoke.py
314+
uv run python examples/llm/sip_flow_audit.py
315+
uv run python examples/llm/sip_flow_audit.py --trace-file /path/to/sip-trace.txt
313316
```
314317

315318
The public Mizu demo server profile lives at `examples/mizu/harness.toml`:
@@ -326,8 +329,11 @@ export SIPX_LLM_API_KEY=...
326329
export SIPX_LLM_BASE_URL=https://api.openai.com/v1
327330
export SIPX_LLM_MODEL=gpt-4o-mini
328331
uv run sipx scenario run examples/llm/semantic_smoke.py --artifacts-dir artifacts
332+
uv run python examples/llm/sip_flow_audit.py --trace-file /path/to/sip-trace.txt
329333
```
330334

335+
`semantic_smoke.py` is the quick smoke test. `sip_flow_audit.py` is the richer example: it extracts deterministic SIP signals, asks the LLM for structured JSON, returns summary, behavior, risk score, protocol findings, media assessment, and next actions.
336+
331337
Templates live under `examples/llm`, `examples/asterisk`, and `examples/native`. The live LLM smoke test is skipped unless `SIPX_LLM_API_KEY` is set.
332338

333339
GitHub automation lives under `.github/workflows`:

SPEC.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ V32: LLM integrations ! external provider keys only via environment/runtime inje
112112
V33: validation gate ! `uv run ty check` passes before an implementation block is complete; system-interpreter tool absence is reported separately.
113113
V34: operational DTMF ! confirmed native calls can send DTMF via SIP INFO `application/dtmf-relay`; CLI examples show call, OPTIONS, MESSAGE, INFO, and DTMF flows without hardcoded private secrets.
114114
V35: LLM env config ! missing optional `SIPX_LLM_*` vars use concrete defaults; no dataclass descriptors or internal objects leak into runtime parsing.
115+
V36: LLM SIP audit examples ! runnable directly and via `sipx scenario run`; output structured behavior/risk/findings/actions while deterministic SIP checks remain separate from LLM judgment.
116+
V37: LLM SIP audit security ! redacted auth markers are accepted; unredacted `Authorization`/`Proxy-Authorization` values are flagged before LLM analysis.
115117

116118
## §T
117119

@@ -160,6 +162,8 @@ T40|x|add opt-in generic OpenAI-compatible LLM client/tests and LLM/native/Aster
160162
T41|x|clear baseline type-check diagnostics for current implementation surfaces|V33,I.ci
161163
T42|x|add in-dialog SIP INFO DTMF support plus richer native CLI/Python examples|V13,V25,V30,V31,V34,I.cmd
162164
T43|x|fix LLM env defaults for direct example execution|V32,V35,I.api
165+
T44|x|add richer runnable LLM SIP-flow audit example|V5,V13,V32,V36,I.cmd
166+
T45|x|fix SIP-flow audit auth redaction detection|V13,V36,V37,I.cmd
163167

164168
## §B
165169

@@ -175,3 +179,4 @@ B7|2026-06-08|authenticated real proxy INVITE reached `603 Declined`; outbound s
175179
B8|2026-06-08|`uv run ty check` baseline had 29 diagnostics from dynamic call, mapping, URI, SDP, and media frame typing|V33
176180
B9|2026-06-08|DTMF implementation added helper/softphone call path but backend method/import was incomplete during focused validation|V34
177181
B10|2026-06-08|`LLMChatClient.from_env()` read dataclass slot descriptors as defaults when optional env vars were missing, causing timeout float parsing failure|V35
182+
B11|2026-06-08|SIP-flow audit treated `Authorization: [REDACTED]` as unredacted auth during deterministic security checks|V37

TODO.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,6 +420,14 @@ Implement `sipx` in small verified blocks. Current code now has harness core, mo
420420
- [x] Fixed `LLMChatClient.from_env()` optional env defaults for direct example execution.
421421
- [x] Added regression coverage for minimal `SIPX_LLM_API_KEY` env config.
422422

423+
## Block 1.7.0 Done
424+
425+
- [x] Bumped package version to `1.7.0`.
426+
- [x] Added runnable `examples/llm/sip_flow_audit.py`.
427+
- [x] Added deterministic SIP signals plus structured LLM audit JSON output.
428+
- [x] Added deterministic auth-redaction checks for SIP-flow audit traces.
429+
- [x] Documented direct and `sipx scenario run` commands for the quick smoke and SIP-flow audit examples.
430+
423431
## Blocked Or Pending
424432

425433
- [ ] `python -m ty check` still needs the system interpreter environment synced; configured validation now uses passing `uv run ty check`.

examples/llm/sip_flow_audit.py

Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
from __future__ import annotations
2+
3+
import argparse
4+
import asyncio
5+
import json
6+
import os
7+
from pathlib import Path
8+
from typing import Any
9+
10+
from sipx import Harness, LLMChatClient, Verdict, scenario
11+
12+
13+
SAMPLE_TRACE = """
14+
SIP RX 203.0.113.20:5060
15+
SIP/2.0 401 Unauthorized
16+
WWW-Authenticate: Digest realm="pbx.example.com", nonce="n1", qop="auth"
17+
CSeq: 1 REGISTER
18+
19+
SIP TX 203.0.113.20:5060
20+
REGISTER sip:pbx.example.com SIP/2.0
21+
Authorization: [REDACTED]
22+
CSeq: 2 REGISTER
23+
24+
SIP RX 203.0.113.20:5060
25+
SIP/2.0 200 OK
26+
CSeq: 2 REGISTER
27+
28+
SIP TX 203.0.113.20:5060
29+
INVITE sip:ivr@example.com SIP/2.0
30+
Content-Type: application/sdp
31+
CSeq: 1 INVITE
32+
33+
v=0
34+
m=audio 41000 RTP/AVP 0 8 101
35+
a=rtpmap:0 PCMU/8000
36+
a=rtpmap:101 telephone-event/8000
37+
38+
SIP RX 203.0.113.20:5060
39+
SIP/2.0 180 Ringing
40+
CSeq: 1 INVITE
41+
42+
SIP RX 203.0.113.20:5060
43+
SIP/2.0 200 OK
44+
Content-Type: application/sdp
45+
CSeq: 1 INVITE
46+
47+
v=0
48+
m=audio 52000 RTP/AVP 0 101
49+
a=rtpmap:0 PCMU/8000
50+
a=rtpmap:101 telephone-event/8000
51+
52+
SIP TX 203.0.113.20:5060
53+
ACK sip:ivr@example.com SIP/2.0
54+
CSeq: 1 ACK
55+
56+
SIP TX 203.0.113.20:5060
57+
INFO sip:ivr@example.com SIP/2.0
58+
Content-Type: application/dtmf-relay
59+
CSeq: 2 INFO
60+
61+
Signal=1
62+
Duration=160
63+
64+
SIP RX 203.0.113.20:5060
65+
SIP/2.0 200 OK
66+
CSeq: 2 INFO
67+
68+
SIP TX 203.0.113.20:5060
69+
BYE sip:ivr@example.com SIP/2.0
70+
CSeq: 3 BYE
71+
72+
SIP RX 203.0.113.20:5060
73+
SIP/2.0 200 OK
74+
CSeq: 3 BYE
75+
""".strip()
76+
77+
78+
@scenario("sip_flow_audit", provider="openai-compatible")
79+
async def scenario(h: Harness) -> Verdict:
80+
trace = _load_trace_from_env()
81+
result = await audit_trace(trace)
82+
h.timeline.record("llm", "sip_flow_audit", data=result)
83+
print(json.dumps(result, indent=2, sort_keys=True))
84+
if result["status"] == "skipped":
85+
return Verdict.skipped(reason=str(result["reason"]))
86+
if result["deterministic"]["critical_findings"]:
87+
return Verdict.failed(reason="deterministic SIP audit found critical issues")
88+
return Verdict.passed(reason="SIP flow audit completed")
89+
90+
91+
async def audit_trace(trace: str) -> dict[str, Any]:
92+
deterministic = _deterministic_audit(trace)
93+
if not os.getenv("SIPX_LLM_API_KEY"):
94+
return {
95+
"status": "skipped",
96+
"reason": "SIPX_LLM_API_KEY not set",
97+
"deterministic": deterministic,
98+
}
99+
100+
client = LLMChatClient.from_env()
101+
prompt = _audit_prompt(trace, deterministic)
102+
raw = await client.complete(
103+
prompt,
104+
system=(
105+
"You audit SIP call flows. Return strict JSON only. "
106+
"Do not include markdown fences. Do not include secrets."
107+
),
108+
max_tokens=1200,
109+
)
110+
llm = _parse_json_object(raw)
111+
return {
112+
"status": "completed",
113+
"deterministic": deterministic,
114+
"llm": llm,
115+
}
116+
117+
118+
def _load_trace_from_env() -> str:
119+
path = os.getenv("SIPX_LLM_TRACE_FILE")
120+
if path:
121+
return Path(path).read_text(encoding="utf-8")
122+
return SAMPLE_TRACE
123+
124+
125+
def _deterministic_audit(trace: str) -> dict[str, Any]:
126+
upper = trace.upper()
127+
critical: list[str] = []
128+
warnings: list[str] = []
129+
signals = {
130+
"register_digest_challenge": "401 UNAUTHORIZED" in upper
131+
and "REGISTER" in upper,
132+
"invite_has_sdp_offer": "INVITE " in upper
133+
and "CONTENT-TYPE: APPLICATION/SDP" in upper
134+
and "M=AUDIO" in upper,
135+
"dtmf_info": "INFO " in upper and "APPLICATION/DTMF-RELAY" in upper,
136+
"clean_bye": "BYE " in upper and "CSEQ: 3 BYE" in upper and "200 OK" in upper,
137+
}
138+
if "INVITE " in upper and "SIP/2.0 200 OK" in upper:
139+
invite_ok_index = upper.find("SIP/2.0 200 OK", upper.find("INVITE "))
140+
invite_answer_window = upper[invite_ok_index : invite_ok_index + 500]
141+
if "CONTENT-TYPE: APPLICATION/SDP" not in invite_answer_window:
142+
critical.append("INVITE reached 200 OK without an SDP answer nearby")
143+
for line in trace.splitlines():
144+
normalized = line.strip().upper()
145+
if (
146+
normalized.startswith(("AUTHORIZATION:", "PROXY-AUTHORIZATION:"))
147+
and "[REDACTED]" not in normalized
148+
):
149+
critical.append("trace contains an unredacted authorization header")
150+
break
151+
if "APPLICATION/DTMF-RELAY" in upper and "DURATION=" not in upper:
152+
warnings.append("DTMF relay body has no Duration field")
153+
return {
154+
"signals": signals,
155+
"critical_findings": critical,
156+
"warnings": warnings,
157+
}
158+
159+
160+
def _audit_prompt(trace: str, deterministic: dict[str, Any]) -> str:
161+
return json.dumps(
162+
{
163+
"task": "Audit this SIP flow for interoperability and behavior.",
164+
"required_json_shape": {
165+
"summary": "one paragraph",
166+
"behavior": "accepted|rejected|incomplete|unknown",
167+
"risk_score": "integer 0-100",
168+
"protocol_findings": [
169+
{
170+
"severity": "info|warning|critical",
171+
"evidence": "quote from trace",
172+
"meaning": "what it implies",
173+
"recommendation": "what to do next",
174+
}
175+
],
176+
"media_assessment": {
177+
"sdp": "short assessment",
178+
"dtmf": "short assessment",
179+
"rtp_readiness": "short assessment",
180+
},
181+
"next_actions": ["ordered actions"],
182+
},
183+
"deterministic_signals": deterministic,
184+
"sip_trace": trace,
185+
},
186+
indent=2,
187+
)
188+
189+
190+
def _parse_json_object(text: str) -> dict[str, Any]:
191+
stripped = text.strip()
192+
if stripped.startswith("```"):
193+
stripped = stripped.strip("`")
194+
if stripped.lower().startswith("json"):
195+
stripped = stripped[4:].strip()
196+
start = stripped.find("{")
197+
end = stripped.rfind("}")
198+
if start < 0 or end < start:
199+
raise ValueError("LLM response did not contain a JSON object")
200+
value = json.loads(stripped[start : end + 1])
201+
if not isinstance(value, dict):
202+
raise ValueError("LLM response JSON must be an object")
203+
return value
204+
205+
206+
def main(argv: list[str] | None = None) -> int:
207+
parser = argparse.ArgumentParser(description="Audit a SIP trace with an LLM.")
208+
parser.add_argument(
209+
"--trace-file",
210+
help="Path to a text SIP trace. Defaults to the embedded healthy sample.",
211+
)
212+
args = parser.parse_args(argv)
213+
214+
if args.trace_file:
215+
os.environ["SIPX_LLM_TRACE_FILE"] = args.trace_file
216+
verdict = asyncio.run(Harness().run(scenario))
217+
reason = f": {verdict.reason}" if verdict.reason else ""
218+
print(f"{verdict.status}{reason}")
219+
return 0 if verdict.status in {"passed", "skipped"} else 1
220+
221+
222+
if __name__ == "__main__":
223+
raise SystemExit(main())

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "sipx"
3-
version = "1.6.1"
3+
version = "1.7.0"
44
description = "Python programmable Voice/SIP harness for call automation, IVR testing, technical softphones, and media validation"
55
readme = "README.md"
66
requires-python = ">=3.14"

0 commit comments

Comments
 (0)