feat(llm): add sip flow audit example

oornnery · oornnery · commit f3f699b9484c · 2026-06-08T23:30:23.000-03:00
diff --git a/.mem/hot.md b/.mem/hot.md
@@ -32,7 +32,7 @@
 - DTMF is media event; RFC4733 telephone-event primary.
 - G.711 PCMU/PCMA required early; do not depend on stdlib `audioop`.
 - Asterisk GPL risk reinforces separate Python process, not loadable Asterisk module.
-- Current implementation version: `1.6.1`.
+- Current implementation version: `1.7.0`.
 - `AGENTS.md` requires small commit blocks with version bump, `CHANGELOG.md`, `TODO.md`, `.spec/*`, `.mem/*`, validation, and explicit staged paths.
 - `sipx` package now exists with core modules for events, timeline, verdict, artifacts, metrics, capabilities, expectations, actors, scenarios, and harness runtime.
 - `MockBackend` is the default no-network backend for `Harness()`.
@@ -43,6 +43,7 @@
 - Public Mizu demo profile is in `examples/mizu/harness.toml`; private proxy test data must not be committed.
 - `LLMChatClient` exists in `sipx.llm`; live LLM validation is opt-in via `SIPX_LLM_API_KEY` and secrets must not be committed.
 - `LLMChatClient.from_env()` works with only `SIPX_LLM_API_KEY`; optional base URL/model/timeout use concrete defaults.
+- `examples/llm/sip_flow_audit.py` is the richer runnable LLM example; it emits structured SIP behavior/risk/findings/actions JSON.
 - Native softphone calls can send in-dialog SIP INFO DTMF via `NativeSoftphone.send_dtmf()` and CLI `sipx call --dtmf`.
 - Phone CLI network commands require a profile or explicit `--aor` and `--registrar`; without config they fail before opening sockets.
 - Phone CLI derives default remote host/port from `--registrar` when `--remote-host/--remote-port` are omitted.
diff --git a/.spec/checks.md b/.spec/checks.md
@@ -334,7 +334,7 @@
 | 2026-06-08 | final `uv run ty check` | pass | Configured type-check gate passes after `1.6.0` changes. |
 | 2026-06-08 | final `git diff --check` | pass/no output | No whitespace errors after `1.6.0` changes. |
 | 2026-06-08 | final `uv build --out-dir /tmp/opencode/sipx-build-1.6.0-final` | pass | Built final `1.6.0` sdist and wheel outside the repo. |
-| 2026-06-08 | final secret/provider-name scan | pass/no matches | No private proxy markers, inline LLM keys, SIP auth headers, or Mistral-specific names in code/tests/examples/docs/TOML/YAML. |
+| 2026-06-08 | final secret/provider-name scan | pass/no matches | No private proxy markers, inline LLM keys, SIP auth headers, or provider-specific names in code/tests/examples/docs/TOML/YAML. |
 | 2026-06-08 | final `docker --version` | blocked | Docker command is unavailable in this WSL environment; Asterisk integration remains opt-in and unrun. |
 | 2026-06-08 | `python -m pytest tests/test_llm.py tests/test_examples_templates.py` | pass | 8 passed, 1 skipped after LLM env default regression. |
 | 2026-06-08 | minimal LLM env default smoke | pass | `SIPX_LLM_API_KEY=test-key` builds `LLMChatClient.from_env()` with default base URL, model, and timeout. |
@@ -344,6 +344,16 @@
 | 2026-06-08 | final `uv run ty check` | pass | Configured type-check gate passes after `1.6.1` fix. |
 | 2026-06-08 | final `git diff --check` | pass/no output | No whitespace errors after `1.6.1` fix. |
 | 2026-06-08 | final `uv build --out-dir /tmp/opencode/sipx-build-1.6.1-final` | pass | Built final `1.6.1` sdist and wheel outside the repo. |
+| 2026-06-08 | `uv lock` | pass | Updated lockfile project version from `1.6.1` to `1.7.0`. |
+| 2026-06-08 | focused SIP-flow audit tests | pass | `tests/test_examples_templates.py tests/test_llm.py`: 11 passed, 1 skipped after auth-redaction audit fix. |
+| 2026-06-08 | runnable LLM/native example smoke | pass | Direct LLM smoke and SIP-flow audit skipped cleanly without key; native CLI flow printer emitted runnable commands. |
+| 2026-06-08 | final `python -m pytest` | pass | 128 passed, 3 skipped after `1.7.0` SIP-flow audit example. |
+| 2026-06-08 | final `ruff check .` | pass | All lint checks passed after `1.7.0` changes. |
+| 2026-06-08 | final `ruff format --check .` | pass | 80 files already formatted after `1.7.0` changes. |
+| 2026-06-08 | final `uv run ty check` | pass | Configured type-check gate passes after `1.7.0` changes. |
+| 2026-06-08 | final `git diff --check` | pass/no output | No whitespace errors after `1.7.0` changes. |
+| 2026-06-08 | final `uv build --out-dir /tmp/opencode/sipx-build-1.7.0-final` | pass | Built final `1.7.0` sdist and wheel outside the repo. |
+| 2026-06-08 | final secret/provider-name scan | pass/no matches | No private proxy markers, inline LLM keys, SIP auth headers, or provider-specific names in code/tests/examples/docs/TOML/YAML. |
 
 ## Validation Policy
 
diff --git a/.spec/handoff.md b/.spec/handoff.md
@@ -2,7 +2,7 @@
 
 ## Summary
 
-Project planning environment was initialized from `IDEA.md`. Blocks `0.2.0` through `1.6.1` added initial product code: harness core, mock backend, scenario artifacts, CLI run/export/replay/profile/phone/raw-SIP commands, reports, profiles, mixed actor binding, media primitives, redaction, SIP parser primitives, SDP audio offer/answer, RTP packet stats, RFC4733 DTMF, SIP dialog skeletons, INVITE/non-INVITE client transactions, REGISTER helper/flow, Digest auth helper, UAS INVITE skeleton, BYE helper, real UDP Native SIP transport/backend, strict INVITE/ACK/BYE call flow, CANCEL runtime, REGISTER over-UDP orchestration, INVITE Digest retry, transaction retransmission timers, parser fuzz tests, Asterisk ARI control-plane client/event backend, Asterisk channel/bridge/playback/hangup/DTMF timeline mapping, WebSocket media MVP, inbound `Stasis(sipx)` example, Docker Asterisk lab, headless native technical softphone, lab-only native SIP hooks, package-manager console script execution, GitHub CI/release workflows, fail-fast phone CLI config validation, curl-like SIP request commands, redacted packet debug output, native softphone SDP negotiation, simple `LLMChatClient` templates/tests, fixed LLM env defaults, in-dialog SIP INFO DTMF, runnable native examples, and passing `uv run ty check` type validation. SPEC T1-T43 plus V24-V35/B4-B10 are complete.
+Project planning environment was initialized from `IDEA.md`. Blocks `0.2.0` through `1.7.0` added initial product code: harness core, mock backend, scenario artifacts, CLI run/export/replay/profile/phone/raw-SIP commands, reports, profiles, mixed actor binding, media primitives, redaction, SIP parser primitives, SDP audio offer/answer, RTP packet stats, RFC4733 DTMF, SIP dialog skeletons, INVITE/non-INVITE client transactions, REGISTER helper/flow, Digest auth helper, UAS INVITE skeleton, BYE helper, real UDP Native SIP transport/backend, strict INVITE/ACK/BYE call flow, CANCEL runtime, REGISTER over-UDP orchestration, INVITE Digest retry, transaction retransmission timers, parser fuzz tests, Asterisk ARI control-plane client/event backend, Asterisk channel/bridge/playback/hangup/DTMF timeline mapping, WebSocket media MVP, inbound `Stasis(sipx)` example, Docker Asterisk lab, headless native technical softphone, lab-only native SIP hooks, package-manager console script execution, GitHub CI/release workflows, fail-fast phone CLI config validation, curl-like SIP request commands, redacted packet debug output, native softphone SDP negotiation, simple `LLMChatClient` templates/tests, fixed LLM env defaults, richer LLM SIP-flow audit example with auth redaction checks, in-dialog SIP INFO DTMF, runnable native examples, and passing `uv run ty check` type validation. SPEC T1-T45 plus V24-V37/B4-B11 are complete.
 
 ## Read First
 
diff --git a/.spec/state.md b/.spec/state.md
@@ -210,6 +210,9 @@ Implement `sipx` in verified commit blocks. Block `1.6.0` adds generic OpenAI-co
 - Bumped `pyproject.toml` version from `1.6.0` to `1.6.1`.
 - Recorded `SPEC.md` B10 and V35 after direct LLM example execution failed when optional `SIPX_LLM_TIMEOUT` was missing.
 - Fixed `LLMChatClient.from_env()` to use concrete defaults for optional env settings.
+- Bumped `pyproject.toml` version from `1.6.1` to `1.7.0`.
+- Added `examples/llm/sip_flow_audit.py` as a richer runnable LLM SIP-flow audit example with deterministic signals and structured JSON output.
+- Recorded `SPEC.md` B11 and V37 after focused validation caught redacted auth being treated as unredacted in the SIP-flow audit.
 
 ## Active Decision
 
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,13 @@
 # CHANGELOG
 
+## 1.7.0 - 2026-06-08
+
+- Added runnable `examples/llm/sip_flow_audit.py` for richer LLM SIP-flow analysis.
+- The audit example extracts deterministic SIP signals, asks the LLM for structured JSON, and reports behavior, risk score, protocol findings, media assessment, and next actions.
+- The audit example flags unredacted SIP auth headers while accepting `[REDACTED]` auth markers.
+- Documented how to run the quick LLM smoke and richer SIP-flow audit examples directly or through `sipx scenario run`.
+- Bumped package version to `1.7.0`.
+
 ## 1.6.1 - 2026-06-08
 
 - Fixed `LLMChatClient.from_env()` so missing optional `SIPX_LLM_*` settings use concrete defaults instead of dataclass slot descriptors.
diff --git a/README.md b/README.md
@@ -304,12 +304,15 @@ Scenario examples are run with the harness CLI:
 
 ```bash
 uv run sipx scenario run examples/llm/semantic_smoke.py --artifacts-dir artifacts
+uv run sipx scenario run examples/llm/sip_flow_audit.py --artifacts-dir artifacts
 ```
 
 You can also run the LLM scenario file directly:
 
 ```bash
 uv run python examples/llm/semantic_smoke.py
+uv run python examples/llm/sip_flow_audit.py
+uv run python examples/llm/sip_flow_audit.py --trace-file /path/to/sip-trace.txt
 ```
 
 The public Mizu demo server profile lives at `examples/mizu/harness.toml`:
@@ -326,8 +329,11 @@ export SIPX_LLM_API_KEY=...
 export SIPX_LLM_BASE_URL=https://api.openai.com/v1
 export SIPX_LLM_MODEL=gpt-4o-mini
 uv run sipx scenario run examples/llm/semantic_smoke.py --artifacts-dir artifacts
+uv run python examples/llm/sip_flow_audit.py --trace-file /path/to/sip-trace.txt
 ```
 
+`semantic_smoke.py` is the quick smoke test. `sip_flow_audit.py` is the richer example: it extracts deterministic SIP signals, asks the LLM for structured JSON, returns summary, behavior, risk score, protocol findings, media assessment, and next actions.
+
 Templates live under `examples/llm`, `examples/asterisk`, and `examples/native`. The live LLM smoke test is skipped unless `SIPX_LLM_API_KEY` is set.
 
 GitHub automation lives under `.github/workflows`:
diff --git a/SPEC.md b/SPEC.md
@@ -112,6 +112,8 @@ V32: LLM integrations ! external provider keys only via environment/runtime inje
 V33: validation gate ! `uv run ty check` passes before an implementation block is complete; system-interpreter tool absence is reported separately.
 V34: operational DTMF ! confirmed native calls can send DTMF via SIP INFO `application/dtmf-relay`; CLI examples show call, OPTIONS, MESSAGE, INFO, and DTMF flows without hardcoded private secrets.
 V35: LLM env config ! missing optional `SIPX_LLM_*` vars use concrete defaults; no dataclass descriptors or internal objects leak into runtime parsing.
+V36: LLM SIP audit examples ! runnable directly and via `sipx scenario run`; output structured behavior/risk/findings/actions while deterministic SIP checks remain separate from LLM judgment.
+V37: LLM SIP audit security ! redacted auth markers are accepted; unredacted `Authorization`/`Proxy-Authorization` values are flagged before LLM analysis.
 
 ## §T
 
@@ -160,6 +162,8 @@ T40|x|add opt-in generic OpenAI-compatible LLM client/tests and LLM/native/Aster
 T41|x|clear baseline type-check diagnostics for current implementation surfaces|V33,I.ci
 T42|x|add in-dialog SIP INFO DTMF support plus richer native CLI/Python examples|V13,V25,V30,V31,V34,I.cmd
 T43|x|fix LLM env defaults for direct example execution|V32,V35,I.api
+T44|x|add richer runnable LLM SIP-flow audit example|V5,V13,V32,V36,I.cmd
+T45|x|fix SIP-flow audit auth redaction detection|V13,V36,V37,I.cmd
 
 ## §B
 
@@ -175,3 +179,4 @@ B7|2026-06-08|authenticated real proxy INVITE reached `603 Declined`; outbound s
 B8|2026-06-08|`uv run ty check` baseline had 29 diagnostics from dynamic call, mapping, URI, SDP, and media frame typing|V33
 B9|2026-06-08|DTMF implementation added helper/softphone call path but backend method/import was incomplete during focused validation|V34
 B10|2026-06-08|`LLMChatClient.from_env()` read dataclass slot descriptors as defaults when optional env vars were missing, causing timeout float parsing failure|V35
+B11|2026-06-08|SIP-flow audit treated `Authorization: [REDACTED]` as unredacted auth during deterministic security checks|V37
diff --git a/TODO.md b/TODO.md
@@ -420,6 +420,14 @@ Implement `sipx` in small verified blocks. Current code now has harness core, mo
 - [x] Fixed `LLMChatClient.from_env()` optional env defaults for direct example execution.
 - [x] Added regression coverage for minimal `SIPX_LLM_API_KEY` env config.
 
+## Block 1.7.0 Done
+
+- [x] Bumped package version to `1.7.0`.
+- [x] Added runnable `examples/llm/sip_flow_audit.py`.
+- [x] Added deterministic SIP signals plus structured LLM audit JSON output.
+- [x] Added deterministic auth-redaction checks for SIP-flow audit traces.
+- [x] Documented direct and `sipx scenario run` commands for the quick smoke and SIP-flow audit examples.
+
 ## Blocked Or Pending
 
 - [ ] `python -m ty check` still needs the system interpreter environment synced; configured validation now uses passing `uv run ty check`.
diff --git a/examples/llm/sip_flow_audit.py b/examples/llm/sip_flow_audit.py
@@ -0,0 +1,223 @@
+from __future__ import annotations
+
+import argparse
+import asyncio
+import json
+import os
+from pathlib import Path
+from typing import Any
+
+from sipx import Harness, LLMChatClient, Verdict, scenario
+
+
+SAMPLE_TRACE = """
+SIP RX 203.0.113.20:5060
+SIP/2.0 401 Unauthorized
+WWW-Authenticate: Digest realm="pbx.example.com", nonce="n1", qop="auth"
+CSeq: 1 REGISTER
+
+SIP TX 203.0.113.20:5060
+REGISTER sip:pbx.example.com SIP/2.0
+Authorization: [REDACTED]
+CSeq: 2 REGISTER
+
+SIP RX 203.0.113.20:5060
+SIP/2.0 200 OK
+CSeq: 2 REGISTER
+
+SIP TX 203.0.113.20:5060
+INVITE sip:ivr@example.com SIP/2.0
+Content-Type: application/sdp
+CSeq: 1 INVITE
+
+v=0
+m=audio 41000 RTP/AVP 0 8 101
+a=rtpmap:0 PCMU/8000
+a=rtpmap:101 telephone-event/8000
+
+SIP RX 203.0.113.20:5060
+SIP/2.0 180 Ringing
+CSeq: 1 INVITE
+
+SIP RX 203.0.113.20:5060
+SIP/2.0 200 OK
+Content-Type: application/sdp
+CSeq: 1 INVITE
+
+v=0
+m=audio 52000 RTP/AVP 0 101
+a=rtpmap:0 PCMU/8000
+a=rtpmap:101 telephone-event/8000
+
+SIP TX 203.0.113.20:5060
+ACK sip:ivr@example.com SIP/2.0
+CSeq: 1 ACK
+
+SIP TX 203.0.113.20:5060
+INFO sip:ivr@example.com SIP/2.0
+Content-Type: application/dtmf-relay
+CSeq: 2 INFO
+
+Signal=1
+Duration=160
+
+SIP RX 203.0.113.20:5060
+SIP/2.0 200 OK
+CSeq: 2 INFO
+
+SIP TX 203.0.113.20:5060
+BYE sip:ivr@example.com SIP/2.0
+CSeq: 3 BYE
+
+SIP RX 203.0.113.20:5060
+SIP/2.0 200 OK
+CSeq: 3 BYE
+""".strip()
+
+
+@scenario("sip_flow_audit", provider="openai-compatible")
+async def scenario(h: Harness) -> Verdict:
+    trace = _load_trace_from_env()
+    result = await audit_trace(trace)
+    h.timeline.record("llm", "sip_flow_audit", data=result)
+    print(json.dumps(result, indent=2, sort_keys=True))
+    if result["status"] == "skipped":
+        return Verdict.skipped(reason=str(result["reason"]))
+    if result["deterministic"]["critical_findings"]:
+        return Verdict.failed(reason="deterministic SIP audit found critical issues")
+    return Verdict.passed(reason="SIP flow audit completed")
+
+
+async def audit_trace(trace: str) -> dict[str, Any]:
+    deterministic = _deterministic_audit(trace)
+    if not os.getenv("SIPX_LLM_API_KEY"):
+        return {
+            "status": "skipped",
+            "reason": "SIPX_LLM_API_KEY not set",
+            "deterministic": deterministic,
+        }
+
+    client = LLMChatClient.from_env()
+    prompt = _audit_prompt(trace, deterministic)
+    raw = await client.complete(
+        prompt,
+        system=(
+            "You audit SIP call flows. Return strict JSON only. "
+            "Do not include markdown fences. Do not include secrets."
+        ),
+        max_tokens=1200,
+    )
+    llm = _parse_json_object(raw)
+    return {
+        "status": "completed",
+        "deterministic": deterministic,
+        "llm": llm,
+    }
+
+
+def _load_trace_from_env() -> str:
+    path = os.getenv("SIPX_LLM_TRACE_FILE")
+    if path:
+        return Path(path).read_text(encoding="utf-8")
+    return SAMPLE_TRACE
+
+
+def _deterministic_audit(trace: str) -> dict[str, Any]:
+    upper = trace.upper()
+    critical: list[str] = []
+    warnings: list[str] = []
+    signals = {
+        "register_digest_challenge": "401 UNAUTHORIZED" in upper
+        and "REGISTER" in upper,
+        "invite_has_sdp_offer": "INVITE " in upper
+        and "CONTENT-TYPE: APPLICATION/SDP" in upper
+        and "M=AUDIO" in upper,
+        "dtmf_info": "INFO " in upper and "APPLICATION/DTMF-RELAY" in upper,
+        "clean_bye": "BYE " in upper and "CSEQ: 3 BYE" in upper and "200 OK" in upper,
+    }
+    if "INVITE " in upper and "SIP/2.0 200 OK" in upper:
+        invite_ok_index = upper.find("SIP/2.0 200 OK", upper.find("INVITE "))
+        invite_answer_window = upper[invite_ok_index : invite_ok_index + 500]
+        if "CONTENT-TYPE: APPLICATION/SDP" not in invite_answer_window:
+            critical.append("INVITE reached 200 OK without an SDP answer nearby")
+    for line in trace.splitlines():
+        normalized = line.strip().upper()
+        if (
+            normalized.startswith(("AUTHORIZATION:", "PROXY-AUTHORIZATION:"))
+            and "[REDACTED]" not in normalized
+        ):
+            critical.append("trace contains an unredacted authorization header")
+            break
+    if "APPLICATION/DTMF-RELAY" in upper and "DURATION=" not in upper:
+        warnings.append("DTMF relay body has no Duration field")
+    return {
+        "signals": signals,
+        "critical_findings": critical,
+        "warnings": warnings,
+    }
+
+
+def _audit_prompt(trace: str, deterministic: dict[str, Any]) -> str:
+    return json.dumps(
+        {
+            "task": "Audit this SIP flow for interoperability and behavior.",
+            "required_json_shape": {
+                "summary": "one paragraph",
+                "behavior": "accepted|rejected|incomplete|unknown",
+                "risk_score": "integer 0-100",
+                "protocol_findings": [
+                    {
+                        "severity": "info|warning|critical",
+                        "evidence": "quote from trace",
+                        "meaning": "what it implies",
+                        "recommendation": "what to do next",
+                    }
+                ],
+                "media_assessment": {
+                    "sdp": "short assessment",
+                    "dtmf": "short assessment",
+                    "rtp_readiness": "short assessment",
+                },
+                "next_actions": ["ordered actions"],
+            },
+            "deterministic_signals": deterministic,
+            "sip_trace": trace,
+        },
+        indent=2,
+    )
+
+
+def _parse_json_object(text: str) -> dict[str, Any]:
+    stripped = text.strip()
+    if stripped.startswith("```"):
+        stripped = stripped.strip("`")
+        if stripped.lower().startswith("json"):
+            stripped = stripped[4:].strip()
+    start = stripped.find("{")
+    end = stripped.rfind("}")
+    if start < 0 or end < start:
+        raise ValueError("LLM response did not contain a JSON object")
+    value = json.loads(stripped[start : end + 1])
+    if not isinstance(value, dict):
+        raise ValueError("LLM response JSON must be an object")
+    return value
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description="Audit a SIP trace with an LLM.")
+    parser.add_argument(
+        "--trace-file",
+        help="Path to a text SIP trace. Defaults to the embedded healthy sample.",
+    )
+    args = parser.parse_args(argv)
+
+    if args.trace_file:
+        os.environ["SIPX_LLM_TRACE_FILE"] = args.trace_file
+    verdict = asyncio.run(Harness().run(scenario))
+    reason = f": {verdict.reason}" if verdict.reason else ""
+    print(f"{verdict.status}{reason}")
+    return 0 if verdict.status in {"passed", "skipped"} else 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "sipx"
-version = "1.6.1"
+version = "1.7.0"
 description = "Python programmable Voice/SIP harness for call automation, IVR testing, technical softphones, and media validation"
 readme = "README.md"
 requires-python = ">=3.14"
diff --git a/tests/test_examples_templates.py b/tests/test_examples_templates.py
diff --git a/uv.lock b/uv.lock