WIP - Vogent + New Smart TURN + Audio utils usage #128

tbarbugli · 2025-10-27T19:22:14Z

Deepgram v2/flux
Vogent turn
Smart turn v3
Various bugfixes to agent
Audio utils

Summary by CodeRabbit

New Features
- Added Vogent and SmartTurn turn-detection systems, a Deepgram streaming STT implementation, AgentOptions, telemetry example, and new Vogent/SmartTurn tests.
Refactor
- Standardized audio on a shared PcmData/AudioFormat surface, switched event payloads to include participant context, introduced tracing spans across agent flows, and reworked VAD buffering/APIs.
Removals
- Removed Moonshine and Silero plugins, their docs and many associated tests.
Documentation
- Added PCM/audio management guidance, VAD refactor plans, onboarding content, and a repository workflow guide.

…t Silero VAD; align docs; ensure silero mia and silence tests pass

…ial-event logging; all VAD tests green (uv run)

…nto vogent-tommaso

coderabbitai · 2025-10-27T19:22:24Z

Walkthrough

Refactors audio plumbing to use external PcmData/AudioFormat, replaces user_metadata with participant across events, adds AgentOptions and span-based tracing, migrates VAD/turn-detection to PcmData buffers, removes Moonshine/Silero plugins, and introduces SmartTurn/Vogent/Deepgram rewrites plus docs and tests updates.

Changes

Cohort / File(s)	Summary
Agent core & tracing `agents-core/vision_agents/core/agents/agents.py`	Add `AgentOptions` + `default_agent_options()`, accept `options` in `Agent.__init__`, introduce span-based tracing (span context manager, start_tracing/end_tracing), integrate tracing into join/teardown, make `_setup_turn_detection` async, and propagate spans across handlers.
Event model & transport `agents-core/vision_agents/core/events/base.py`, `plugins/getstream/.../stream_edge_transport.py`, `agents-core/vision_agents/core/edge/sfu_events.py`	Rename `user_metadata` → `participant` on BaseEvent and emitted payloads (keep legacy `user_metadata` slot); add private `_participant` + setter on SFU event wrappers; include `participant` when emitting TrackAdded/TrackRemoved; add tracer usage in stream_edge.
PcmData externalization `agents-core/vision_agents/core/edge/types.py`, `agents-core/vision_agents/core/edge/events.py`, `agents-core/vision_agents/core/llm/realtime.py`, `conftest.py`, `tests/...`, `plugins///tts.py`, `plugins///stt.py`	Remove local `PcmData` implementation; import `PcmData, AudioFormat` from `getstream.video.rtc.track_util`; update callsites to use `AudioFormat.S16` and adapt constructors/usages.
Turn-detection core & API `agents-core/vision_agents/core/turn_detection/*`, `agents-core/vision_agents/core/turn_detection/events.py`	Replace `speaker_id` with `participant`, remove legacy `TurnEventData`/protocol, add async `start`/`stop`, change `process_audio` to accept `participant` and optional `conversation`, and emit `TurnStartedEvent`/`TurnEndedEvent` with updated fields (`trailing_silence_ms`, `duration_ms`).
VAD refactor & base API `agents-core/vision_agents/core/vad/vad.py`, `docs/vad-refactor-plan*.md`, `tests/test_vad_base.py`	Switch buffering/windowing to `PcmData` rolling buffers; ctor uses `window_samples/channels/audio_format` and derives frame_size; update frame processing, partial/full emission, flush/reset semantics; add VAD base docs/tests.
New turn-detection plugins & tests `plugins/smart_turn/.../smart_turn_detection.py`, `plugins/vogent/.../vogent_turn_detection.py`, `plugins//tests/`, `plugins/vogent/example/basic_usage.py`	Add `SmartTurnDetection` and `VogentTurnDetection` (Silero/Whisper/ONNX/Vogent flows), async model download/start, transcription/completion prediction, tests, examples, and package exports.
Deepgram STT rework `plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py`, `plugins/deepgram/vision_agents/plugins/deepgram/stt.py` (deleted), `plugins/deepgram/.../utils.py` (deleted)	Replace legacy Deepgram STT with new async Flux-style implementation (websocket/client lifecycle, participant-scoped emits); remove old module and silence utility.
Observability & TTS metrics `agents-core/vision_agents/core/observability/*`, `agents-core/vision_agents/core/tts/tts.py`, `agents-core/vision_agents/core/tts/manual_test.py`	Remove `tts_first_byte_ms` and `tts_bytes_streamed` metrics and per-chunk metric emissions; rename `user_metadata` → `participant` on TTS events; adapt test/manual usage.
Plugins removed / workspace updates `plugins/moonshine/`, `plugins/silero/`, `pyproject.toml`, `plugins/vogent/pyproject.toml`, `agents-core/pyproject.toml`	Remove Moonshine and Silero plugin code/tests/packaging; add `vogent` plugin and its tests/examples; update workspace membership and dependency adjustments (add torchvision, bump getstream, uv override for numpy).
Utilities & docs `agents-core/vision_agents/core/utils/utils.py`, `.gitignore`, `DEVELOPMENT.md`, `docs/ai/instructions/*`, `.claude/agents/repo-workflow-guide.md`	Remove `to_mono`, add `input_text` to `Instructions`, add `*.onnx` to .gitignore, update audio docs to recommend `PcmData/track_util`, add repo workflow guide and expanded onboarding/docs.
Examples & small plugin tweaks `examples/`, `plugins//example/`, `examples/other_examples/`	Update examples to use `TurnDetection(buffer_in_seconds=...)`, add telemetry helper, adapt transcript handlers to use `event.participant` for user info; minor formatting/annotation normalizations.
Tests: migrate/add/remove `tests/test_events.py`, `tests/test_tts_base.py`, `tests/test_utils.py`, `tests/test_pcm_data.py` (deleted), `tests/test_vad_base.py` (added), `plugins//tests/`	Migrate tests to `PcmData/AudioFormat`, tighten event assertions, add VAD base tests, remove legacy `PcmData` unit tests, add/remove plugin tests according to plugin changes.
SFU/events wrappers `agents-core/vision_agents/core/edge/sfu_events.py`	Add private `_participant` fields and public `participant` setters on SFU event wrappers; preserve payload-derived getters.
Misc & formatting many files	Numerous minor import/path, string-quote, annotation style, and formatting normalizations across the codebase.

Sequence Diagram(s)

%%{init: {"theme":"neutral","themeVariables":{"actorBackground":"#f3f4f6","actorBorder":"#cbd5e1"}}}%%
sequenceDiagram
    participant App as Application
    participant Agent as Agent
    participant TD as TurnDetection
    participant Audio as AudioSource
    participant Events as EventBus

    App->>+Agent: join(call)
    Agent->>Agent: merge AgentOptions
    Agent->>+TD: await _setup_turn_detection()
    TD->>TD: await start()
    TD-->>-Agent: ready
    Agent-->>-App: session created

    Audio->>+Agent: on_audio_received(pcm, participant)
    Agent->>+TD: process_audio(pcm, participant, conversation)
    TD->>TD: windowing (PcmData)
    alt speech starts
        TD->>Events: emit TurnStartedEvent(participant)
    else speech ends
        TD->>Events: emit TurnEndedEvent(participant)
    end
    TD-->>-Agent: processed

%%{init: {"theme":"neutral","themeVariables":{"actorBackground":"#fffaf0","actorBorder":"#e6e1d9"}}}%%
sequenceDiagram
    participant Smart as SmartTurnDetection
    participant Silero as SileroVAD
    participant Whisper as WhisperFeature
    participant ONNX as Turn ONNX
    participant Events as EventBus

    Smart->>+Smart: await start()
    Smart->>+Silero: prepare silero (async)
    Smart->>+Whisper: prepare whisper (thread)
    Smart->>+ONNX: prepare onnx (thread)
    Silero-->>-Smart: ready
    Whisper-->>-Smart: ready
    ONNX-->>-Smart: ready

    Note over Smart,Silero: per-chunk processing
    Smart->>+Silero: prob(chunk)
    Silero-->>-Smart: prob
    alt prob > threshold
        Smart->>Smart: buffer segment
        Smart->>+Whisper: transcribe segment
        Whisper-->>-Smart: transcript
        Smart->>+ONNX: predict completion(features)
        ONNX-->>-Smart: completion_prob
        alt completion_prob > vogent_threshold
            Smart->>Events: emit TurnEndedEvent(participant)
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas to focus on:

agents async initialization, AgentOptions propagation, and tracing lifecycle (agents-core/vision_agents/core/agents/agents.py)
Turn-detection signature/event refactor and cross-repo callsites (core, plugins, tests)
New SmartTurnDetection and VogentTurnDetection modules (async model downloads, ONNX/Whisper integration)
VAD refactor to PcmData rolling buffers and duration/frame calculations
Global migration to external PcmData/AudioFormat ensuring no leftover local assumptions

Possibly related PRs

fix for agent published tracks internally, no event spawn, added user… #91 — stream_edge TrackAdded/TrackRemoved participant/agent filtering and event emission changes; overlaps stream_edge transport edits.
Update Deepgram plugin to use SDK v5.0.0 #98 — Deepgram STT rewrite and related tests/README changes; directly related to the new Deepgram implementation.
Add CI secrets #72 — CI workflow updates adding secret-derived env vars (HF_TOKEN); overlaps .github/workflows/run_tests.yml changes.

Suggested reviewers

maxkahan
dangusev
d3xvn

Poem

I name the participant; the room goes black,
a thin wire of sound snaps, opening its mouth.
Buffers gather like ash, windows shuttering time—
a small, precise end is measured, boxed and signed.
The turn folds inward; a dark petal of silence falls.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 49.83% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title references three specific components present in the changeset: Vogent (new turn detection plugin), Smart TURN (new turn detection system v3), and audio utilities (externalized PcmData usage). These elements align with the stated PR objectives and represent real, significant changes within the repository. However, the changeset encompasses many other substantial modifications not reflected in the title, including comprehensive agent architecture refactoring (AgentOptions, tracing integration), removal of two existing plugins (Silero, Moonshine), major event system changes (user_metadata to participant field rename), and significant PcmData externalization affecting numerous files. The "WIP" prefix suggests incomplete work, though the changes appear comprehensive. The title captures certain highlights but does not reflect the full scope or most invasive changes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch vogent-tommaso

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 21

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

agents-core/vision_agents/core/utils/utils.py (1)

84-97: Add empty bytes validation in gemini_realtime.py:440-442.

The function now returns empty bytes on conversion failure instead of raising exceptions. The current try-except block in gemini_realtime.py will not catch this silent failure—empty bytes will be passed directly to Blob() and sent to the API, masking the underlying conversion error.

Add an explicit check after calling frame_to_png_bytes():
png_bytes = frame_to_png_bytes(frame)
if not png_bytes:
    self.logger.error("Failed to convert frame to PNG bytes")
    return
blob = Blob(data=png_bytes, mime_type="image/png")
The test in test_gemini_realtime.py correctly validates the empty bytes case (lines 96-97).

docs/ai/instructions/ai-tts.md (1)

18-25: Update example to use AudioFormat enum.

The example still uses format="s16" (string literal), but the codebase has migrated to format=AudioFormat.S16. Update the documentation to reflect current best practices.

- from getstream.video.rtc.track_util import PcmData
+ from getstream.video.rtc.track_util import PcmData, AudioFormat

  async def stream_audio(self, text: str, *_, **__) -> PcmData:
      audio_bytes = await my_sdk.tts.bytes(text=..., ...)
      # sample_rate, channels and format depend on what the TTS model returns
-     return PcmData.from_bytes(audio_bytes, sample_rate=16000, channels=1, format="s16")
+     return PcmData.from_bytes(audio_bytes, sample_rate=16000, channels=1, format=AudioFormat.S16)

agents-core/vision_agents/core/agents/agents.py (1)

825-879: Turn events use participant, not speaker_id → barge-in and responses won’t trigger.

The plugins emit TurnStartedEvent/TurnEndedEvent(participant=...), but this handler checks event.speaker_id. This prevents TTS interruption and LLM responses.

-        if isinstance(event, TurnStartedEvent):
-            # Interrupt TTS when user starts speaking (barge-in)
-            if event.speaker_id and event.speaker_id != self.agent_user.id:
+        if isinstance(event, TurnStartedEvent):
+            speaker_id = getattr(event, "speaker_id", None) or (
+                event.participant.user_id if getattr(event, "participant", None) else None
+            )
+            # Interrupt TTS when user starts speaking (barge-in)
+            if speaker_id and speaker_id != self.agent_user.id:
                 if self.tts:
                     self.logger.info(
-                        f"👉 Turn started - interrupting TTS for participant {event.speaker_id}"
+                        f"👉 Turn started - interrupting TTS for participant {speaker_id}"
                     )
                     try:
                         await self.tts.stop_audio()
                     except Exception as e:
                         self.logger.error(f"Error stopping TTS: {e}")
                 else:
                     self.logger.info(
-                        f"👉 Turn started - participant speaking {event.speaker_id} : {event.confidence}"
+                        f"👉 Turn started - participant speaking {speaker_id} : {event.confidence}"
                     )
             else:
                 # Agent itself started speaking - this is normal
                 self.logger.debug(
-                    f"👉 Turn started - agent speaking {event.speaker_id}"
+                    f"👉 Turn started - agent speaking {speaker_id}"
                 )
-        elif isinstance(event, TurnEndedEvent):
+        elif isinstance(event, TurnEndedEvent):
+            speaker_id = getattr(event, "speaker_id", None) or (
+                event.participant.user_id if getattr(event, "participant", None) else None
+            )
             self.logger.info(
-                f"👉 Turn ended - participant {event.speaker_id} finished (duration: {event.duration}, confidence: {event.confidence})"
+                f"👉 Turn ended - participant {speaker_id} finished (duration: {event.duration}, confidence: {event.confidence})"
             )
             # When turn detection is enabled, trigger LLM response when user's turn ends
             # This is the signal that the user has finished speaking and expects a response
-            if event.speaker_id and event.speaker_id != self.agent_user.id:
+            if speaker_id and speaker_id != self.agent_user.id:
                 # Get the accumulated transcript for this speaker
-                transcript = self._pending_user_transcripts.get(event.speaker_id, "")
+                transcript = self._pending_user_transcripts.get(speaker_id, "")
                 if transcript and transcript.strip():
                     self.logger.info(
-                        f"🤖 Triggering LLM response after turn ended for {event.speaker_id}"
+                        f"🤖 Triggering LLM response after turn ended for {speaker_id}"
                     )
-                    # Create participant object if we have metadata
-                    participant = None
-                    if hasattr(event, "custom") and event.custom:
-                        # Try to extract participant info from custom metadata
-                        participant = event.custom.get("participant")
+                    # Prefer the participant carried by the event
+                    participant = getattr(event, "participant", None)
                     # Trigger LLM response with the complete transcript
                     if self.llm:
                         await self.simple_response(transcript, participant)
                     # Clear the pending transcript for this speaker
-                    self._pending_user_transcripts[event.speaker_id] = ""
+                    self._pending_user_transcripts[speaker_id] = ""

agents-core/vision_agents/core/turn_detection/events.py (2)

12-25: Update the docstring to match the actual implementation.

The docstring references speaker_id and duration fields, but the actual implementation uses participant (line 23) and duration is not present. This creates confusion for developers reading the API documentation.

Apply this diff to update the docstring:
     """
     Event emitted when a speaker starts their turn.
     
     Attributes:
-        speaker_id: ID of the speaker who started speaking
+        participant: Participant who started speaking
         confidence: Confidence level of the turn detection (0.0-1.0)
-        duration: Duration of audio processed (seconds)
         custom: Additional metadata specific to the turn detection implementation
     """
31-43: Update the docstring to match the actual implementation.

The docstring references speaker_id and duration fields, but the implementation uses participant (line 41) and duration is not present.

Apply this diff:
     """
     Event emitted when a speaker completes their turn.
     
     Attributes:
-        speaker_id: ID of the speaker who finished speaking
+        participant: Participant who finished speaking
         confidence: Confidence level of the turn completion detection (0.0-1.0)
-        duration: Duration of the turn (seconds)
         custom: Additional metadata specific to the turn detection implementation
     """

♻️ Duplicate comments (1)

plugins/openai/vision_agents/plugins/openai/tts.py (1)

45-47: Verify AudioFormat.S16 enum value exists.

Same concern as in elevenlabs plugin: ensure getstream.video.rtc.track_util.AudioFormat.S16 exists, as the relevant code snippet shows PCM_S16 in the internal enum.

🧹 Nitpick comments (15)

plugins/fish/vision_agents/plugins/fish/stt.py (1)

107-109: Consider clarifying the error log message.

The log message includes pcm_data.duration but doesn't explicitly label what the value represents. For better debuggability, consider making it explicit.

-                logger.error(
-                    "No transcript returned from Fish Audio %s", pcm_data.duration
-                )
+                logger.error(
+                    "No transcript returned from Fish Audio (audio duration: %s)", pcm_data.duration
+                )

docs/ai/instructions/ai-stt.md (1)

20-41: Add missing imports and align PCM guidance in example.

The example uses PcmData and Participant without imports. Add a minimal prelude and point to the new track_util path.
 ```python
-from vision_agents.core import stt
-from vision_agents.core.stt.events import TranscriptResponse
+from typing import Optional
+from vision_agents.core import stt
+from vision_agents.core.stt.events import TranscriptResponse
+from vision_agents.core.edge.types import Participant
+from getstream.video.rtc.track_util import PcmData
 ...
 class MySTT(stt.STT):
The new “PCM / Audio management” note is great; consider linking to a short snippet:
from getstream.video.rtc.track_util import PcmData, AudioFormat
pcm = PcmData(samples=my_np_int16, sample_rate=16000, format=AudioFormat.S16)
Also applies to: 46-50

plugins/openai/tests/test_openai_realtime.py (1)

68-79: Dedup imports; prefer resample_poly for 16k→48k.

Move PcmData/AudioFormat imports to module scope to avoid repeated per-test imports.

For 16k→48k, scipy.signal.resample_poly(up=3, down=1) is faster and with better fidelity than FFT resample for integer-rate conversion.
-        from scipy import signal
-        from getstream.video.rtc.track_util import PcmData, AudioFormat
+        from scipy import signal
@@
-        samples_48k = signal.resample(samples_16k, num_samples_48k).astype(np.int16)
+        samples_48k = signal.resample_poly(
+            samples_16k.astype(np.float32), up=3, down=1
+        ).astype(np.int16)
@@
-        audio_48khz = PcmData(
-            samples=samples_48k, sample_rate=48000, format=AudioFormat.S16
-        )
+        audio_48khz = PcmData(samples=samples_48k, sample_rate=48000, format=AudioFormat.S16)
At top-of-file:
-from dotenv import load_dotenv
+from dotenv import load_dotenv
+from getstream.video.rtc.track_util import PcmData, AudioFormat
Apply the same import cleanup to other tests constructing PcmData (lines ~242-249, ~299-307). If track_util exposes a resample helper, prefer that instead.

Also applies to: 242-249, 299-307

plugins/cartesia/vision_agents/plugins/cartesia/tts.py (1)

38-41: Set provider_name for consistency/telemetry.

Other plugins pass provider_name to the base TTS; do the same here.
-        super().__init__()
+        super().__init__(provider_name="cartesia")
The AudioFormat/S16 switch looks good.

Also applies to: 80-82

plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)

50-53: Use asyncio.to_thread instead of get_event_loop/run_in_executor.

Cleaner and future-proof in 3.11+.

-        loop = asyncio.get_event_loop()
-        chunks: List[bytes] = await loop.run_in_executor(
-            None, lambda: list(self._generate_chunks(text))
-        )
+        chunks: List[bytes] = await asyncio.to_thread(
+            lambda: list(self._generate_chunks(text))
+        )

The AudioFormat.S16 switch in PcmData.from_bytes looks correct.

Also applies to: 56-62

plugins/wizper/vision_agents/plugins/wizper/stt.py (1)

85-87: Centralized PCM→WAV conversion is solid; consider eliminating temp file for better serverless performance.

The refactoring from custom WAV code to pcm_data.to_wav_bytes() is a good move. Currently, the code writes bytes to disk and uploads via file path. Since fal-client's AsyncClient supports upload(data, content_type) with raw bytes, you can pass wav_data directly with content_type="audio/wav" and skip the temp file entirely—eliminating unnecessary I/O in serverless environments while keeping cleanup logic simpler.

.claude/agents/repo-workflow-guide.md (1)

1-6: Frontmatter readability: prefer YAML block scalar for multi-line description.

Replace embedded “\n” escapes with a block scalar for clarity and fewer parsing gotchas.

 ---
 name: repo-workflow-guide
-description: Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.\n\nExamples:\n- <example>\nContext: User wants to add a new feature to the project.\nuser: "I need to implement a new authentication module"\nassistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."\n<Task tool call to repo-workflow-guide>\nassistant: "Based on the project guidelines, here's how we should approach this..."\n</example>\n\n- <example>\nContext: User asks a question about code organization.\nuser: "Where should I put the new utility functions?"\nassistant: "Let me check the repository workflow guidelines to give you the correct answer."\n<Task tool call to repo-workflow-guide>\nassistant: "According to the project structure guidelines..."\n</example>\n\n- <example>\nContext: Starting a new task that requires understanding project conventions.\nuser: "Can you help me refactor this component?"\nassistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."\n<Task tool call to repo-workflow-guide>\n</example>
+description: |
+  Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.
+
+  Examples:
+  - <example>
+    Context: User wants to add a new feature to the project.
+    user: "I need to implement a new authentication module"
+    assistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."
+    <Task tool call to repo-workflow-guide>
+    assistant: "Based on the project guidelines, here's how we should approach this..."
+    </example>
+
+  - <example>
+    Context: User asks a question about code organization.
+    user: "Where should I put the new utility functions?"
+    assistant: "Let me check the repository workflow guidelines to give you the correct answer."
+    <Task tool call to repo-workflow-guide>
+    assistant: "According to the project structure guidelines..."
+    </example>
+
+  - <example>
+    Context: Starting a new task that requires understanding project conventions.
+    user: "Can you help me refactor this component?"
+    assistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."
+    <Task tool call to repo-workflow-guide>
+    </example>
 ---

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)

107-121: ORT sessions: specify providers and threads for deterministic CPU behavior.

Standardize provider and thread config (as in SileroVAD) to avoid unexpected GPU selection and reduce contention.

-        # Load ONNX session in thread pool to avoid blocking event loop
-        self.smart_turn = await asyncio.to_thread(build_session, SMART_TURN_ONNX_PATH)
+        # Load ONNX session in thread pool to avoid blocking event loop
+        self.smart_turn = await asyncio.to_thread(build_session, SMART_TURN_ONNX_PATH)

And change build_session:

-def build_session(onnx_path):
-    so = ort.SessionOptions()
-    so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
-    so.inter_op_num_threads = 1
-    so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
-    return ort.InferenceSession(onnx_path, sess_options=so)
+def build_session(onnx_path):
+    so = ort.SessionOptions()
+    so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
+    so.inter_op_num_threads = 1
+    so.intra_op_num_threads = 1
+    so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
+    return ort.InferenceSession(
+        onnx_path,
+        sess_options=so,
+        providers=["CPUExecutionProvider"],
+    )

268-310: Model download: avoid buffering entire file in memory and use logging instead of print.

Stream to disk as you download; replace prints with a module logger.

-async def ensure_model(path: str, url: str) -> str:
+import logging
+logger = logging.getLogger(__name__)
+
+async def ensure_model(path: str, url: str) -> str:
@@
-    if not os.path.exists(path):
+    if not os.path.exists(path):
         model_name = os.path.basename(path)
-        print(f"Downloading {model_name}...")
+        logger.info("Downloading %s ...", model_name)
@@
-            async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
-                async with client.stream("GET", url) as response:
+            async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
+                async with client.stream("GET", url) as response:
                     response.raise_for_status()
-                    
-                    # Write file in chunks to avoid loading entire file in memory
-                    # Use asyncio.to_thread for blocking file I/O operations
-                    chunks = []
-                    async for chunk in response.aiter_bytes(chunk_size=8192):
-                        chunks.append(chunk)
-                    
-                    # Write all chunks to file in thread to avoid blocking event loop
-                    def write_file():
-                        with open(path, "wb") as f:
-                            for chunk in chunks:
-                                f.write(chunk)
-                    
-                    await asyncio.to_thread(write_file)
+                    # Stream directly to file to keep memory bounded
+                    def write_stream():
+                        with open(path, "wb") as f:
+                            for chunk in response.iter_raw():
+                                if chunk:
+                                    f.write(chunk)
+                    await asyncio.to_thread(write_stream)
@@
-            print(f"{model_name} downloaded.")
+            logger.info("%s downloaded.", model_name)

Note: If you prefer pure-async file I/O, switch to aiofiles and write chunks inside the async loop.

agents-core/vision_agents/core/turn_detection/turn_detection.py (1)

68-74: Lifecycle methods implement default behavior.

The start() and stop() methods provide sensible defaults by setting is_active appropriately. However, the docstrings suggest subclasses "want to" override these, but they're not marked as abstract.

Consider whether these should remain concrete with default implementations or be marked as abstract to enforce subclass implementation. The current approach (concrete with defaults) is reasonable for optional warmup/cleanup, but the docstrings could be clearer:
 async def start(self) -> None:
-    """Some turn detection systems want to run warmup etc here"""
+    """Start turn detection. Override to add warmup logic if needed."""
     self.is_active = True

 async def stop(self) -> None:
-    """Again, some turn detection systems want to run cleanup here"""
+    """Stop turn detection. Override to add cleanup logic if needed."""
     self.is_active = False

agents-core/vision_agents/core/edge/events.py (1)

12-12: Consider consistent string quote style.

The string literals for type fields were changed from single quotes to double quotes. While functionally equivalent, ensure this aligns with the project's style guidelines for consistency across the codebase.

Also applies to: 21-21, 31-31, 41-41

DEVELOPMENT.md (1)

326-338: Fix markdown formatting for headers.

The static analysis tool flagged that emphasis (bold text) is being used instead of proper markdown headings for "Audio Formats", "Using Cursor", and "Learning Roadmap".

Apply this diff to use proper markdown headings:
 ## Onboarding Plan for new contributors
 
-**Audio Formats**
+### Audio Formats
 
 You'll notice that audio comes in many formats. PCM, wav, mp3. 16khz, 48khz. 
 Encoded as i16 or f32. Note that webrtc by default is 48khz.
 
 A good first intro to audio formats can be found here:
 
-**Using Cursor**
+### Using Cursor
 
 You can ask cursor something like "read @ai-plugin and build me a plugin called fish"
 See the docs folder for other ai instruction files
 
-**Learning Roadmap**
+### Learning Roadmap

plugins/smart_turn/tests/test_turn2.py (1)

62-85: Reduce sleep duration and add integration marker.

Line 81 includes a 5-second sleep which significantly slows down test execution. Additionally, this test performs end-to-end turn detection with model inference and should be marked as an integration test.

Apply this diff:

+    @pytest.mark.integration
     async def test_turn_detection(self, td, mia_audio_16khz):
         participant = Participant(user_id="mia", original={})
         conversation = InMemoryConversation(instructions="be nice", messages=[])
         event_order = []

         # Subscribe to events
         @td.events.subscribe
         async def on_start(event: TurnStartedEvent):
             logger.info(f"Smart turn turn started on {event.session_id}")
             event_order.append("start")

         @td.events.subscribe
         async def on_stop(event: TurnEndedEvent):
             logger.info(f"Smart turn turn ended on {event.session_id}")
             event_order.append("stop")

         await td.process_audio(mia_audio_16khz, participant, conversation)
         await asyncio.sleep(0.001)

-        await asyncio.sleep(5)
+        await asyncio.sleep(0.1)

         # Verify that turn detection is working - we should get at least some turn events
         # With continuous processing, we may get multiple start/stop cycles
         assert event_order == ["start", "stop"]

Based on coding guidelines.

conftest.py (1)

175-226: Consider reusing existing fixture to reduce duplication.

The audio loading logic in mia_audio_48khz_chunked duplicates the mia_audio_48khz fixture. You could reduce duplication by loading the full audio first, then chunking it.

 @pytest.fixture
 def mia_audio_48khz_chunked():
-    """Load mia.mp3 and yield 48kHz PCM data in 20ms chunks."""
-    audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")
-
-    # Load audio file using PyAV
-    container = av.open(audio_file_path)
-    audio_stream = container.streams.audio[0]
-    original_sample_rate = audio_stream.sample_rate
-    target_rate = 48000
-
-    # Create resampler if needed
-    resampler = None
-    if original_sample_rate != target_rate:
-        resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)
-
-    # Read all audio frames
-    samples = []
-    for frame in container.decode(audio_stream):
-        # Resample if needed
-        if resampler:
-            frame = resampler.resample(frame)[0]
-
-        # Convert to numpy array
-        frame_array = frame.to_ndarray()
-        if len(frame_array.shape) > 1:
-            # Convert stereo to mono
-            frame_array = np.mean(frame_array, axis=0)
-        samples.append(frame_array)
-
-    # Concatenate all samples
-    samples = np.concatenate(samples)
-
-    # Convert to int16
-    samples = samples.astype(np.int16)
-    container.close()
+    """Load mia.mp3 and yield 48kHz PCM data in 20ms chunks."""
+    # Reuse the full audio fixture logic
+    audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")
+    container = av.open(audio_file_path)
+    audio_stream = container.streams.audio[0]
+    original_sample_rate = audio_stream.sample_rate
+    target_rate = 48000
+    
+    resampler = None
+    if original_sample_rate != target_rate:
+        resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)
+    
+    samples = []
+    for frame in container.decode(audio_stream):
+        if resampler:
+            frame = resampler.resample(frame)[0]
+        frame_array = frame.to_ndarray()
+        if len(frame_array.shape) > 1:
+            frame_array = np.mean(frame_array, axis=0)
+        samples.append(frame_array)
+    
+    samples = np.concatenate(samples).astype(np.int16)
+    container.close()
 
     # Calculate chunk size for 20ms at 48kHz
     chunk_size = int(target_rate * 0.020)  # 960 samples per 20ms

agents-core/vision_agents/core/vad/vad.py (1)

408-416: Consider simplifying frame availability check.

The frame_data_available check is functionally correct but complex. Consider extracting to a helper method for readability.

+    def _has_buffered_audio(self) -> bool:
+        """Check if speech buffer contains audio data."""
+        if self.speech_buffer is None:
+            return False
+        if isinstance(self.speech_buffer.samples, np.ndarray):
+            return len(self.speech_buffer.samples) > 0
+        if isinstance(self.speech_buffer.samples, (bytes, bytearray)):
+            return len(self.speech_buffer.samples) > 0
+        return False
+
     def _emit_error_event(
         self,
         error: Exception,
         context: str = "",
         user_metadata: Optional[Dict[str, Any]] = None,
     ):
         """Emit a structured error event."""
         self.events.send(
             VADErrorEvent(
                 session_id=self.session_id,
                 plugin_name=self.provider_name,
                 error=error,
                 context=context,
                 user_metadata=user_metadata,
-                frame_data_available=(
-                    self.speech_buffer is not None
-                    and (
-                        (isinstance(self.speech_buffer.samples, np.ndarray) and len(self.speech_buffer.samples) > 0)
-                        or (isinstance(self.speech_buffer.samples, (bytes, bytearray)) and len(self.speech_buffer.samples) > 0)
-                    )
-                ),
+                frame_data_available=self._has_buffered_audio(),
             )
         )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e1072e8 and f3aee97.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (74)

.claude/agents/repo-workflow-guide.md (1 hunks)
.gitignore (1 hunks)
DEVELOPMENT.md (3 hunks)
agents-core/pyproject.toml (2 hunks)
agents-core/vision_agents/core/agents/agents.py (4 hunks)
agents-core/vision_agents/core/edge/events.py (3 hunks)
agents-core/vision_agents/core/edge/types.py (1 hunks)
agents-core/vision_agents/core/llm/realtime.py (2 hunks)
agents-core/vision_agents/core/observability/__init__.py (0 hunks)
agents-core/vision_agents/core/observability/metrics.py (0 hunks)
agents-core/vision_agents/core/tts/manual_test.py (2 hunks)
agents-core/vision_agents/core/tts/tts.py (0 hunks)
agents-core/vision_agents/core/turn_detection/__init__.py (0 hunks)
agents-core/vision_agents/core/turn_detection/events.py (3 hunks)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2 hunks)
agents-core/vision_agents/core/utils/utils.py (4 hunks)
agents-core/vision_agents/core/vad/vad.py (13 hunks)
aiortc (1 hunks)
conftest.py (3 hunks)
docs/ai/instructions/ai-plugin.md (2 hunks)
docs/ai/instructions/ai-realtime-llm.md (5 hunks)
docs/ai/instructions/ai-stt.md (2 hunks)
docs/ai/instructions/ai-tests.md (1 hunks)
docs/ai/instructions/ai-tts.md (2 hunks)
docs/ai/instructions/ai-turn-detector.md (1 hunks)
docs/vad-refactor-plan-extended.md (1 hunks)
docs/vad-refactor-plan.md (1 hunks)
examples/01_simple_agent_example/simple_agent_example.py (1 hunks)
plugins/aws/example/aws_qwen_example.py (1 hunks)
plugins/aws/vision_agents/plugins/aws/aws_realtime.py (8 hunks)
plugins/aws/vision_agents/plugins/aws/tts.py (2 hunks)
plugins/cartesia/vision_agents/plugins/cartesia/tts.py (2 hunks)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (2 hunks)
plugins/fish/example/fish_example.py (1 hunks)
plugins/fish/vision_agents/plugins/fish/stt.py (2 hunks)
plugins/fish/vision_agents/plugins/fish/tts.py (2 hunks)
plugins/kokoro/vision_agents/plugins/kokoro/tts.py (2 hunks)
plugins/krisp/vision_agents/plugins/krisp/turn_detection.py (8 hunks)
plugins/moonshine/README.md (0 hunks)
plugins/moonshine/tests/test_moonshine_stt.py (0 hunks)
plugins/moonshine/vision_agents/plugins/moonshine/__init__.py (0 hunks)
plugins/moonshine/vision_agents/plugins/moonshine/stt.py (0 hunks)
plugins/openai/tests/test_openai_realtime.py (4 hunks)
plugins/openai/vision_agents/plugins/openai/tts.py (2 hunks)
plugins/openrouter/README.md (1 hunks)
plugins/openrouter/example/openrouter_example.py (1 hunks)
plugins/sample_plugin/example/my_example.py (1 hunks)
plugins/silero/README.md (0 hunks)
plugins/silero/pyproject.toml (0 hunks)
plugins/silero/tests/test_bench.py (0 hunks)
plugins/silero/tests/test_performance.py (0 hunks)
plugins/silero/tests/test_vad.py (0 hunks)
plugins/silero/vision_agents/plugins/silero/__init__.py (0 hunks)
plugins/silero/vision_agents/plugins/silero/vad.py (0 hunks)
plugins/smart_turn/README.md (1 hunks)
plugins/smart_turn/pyproject.toml (1 hunks)
plugins/smart_turn/tests/test_turn2.py (1 hunks)
plugins/smart_turn/tests/test_turn_detection.py (0 hunks)
plugins/smart_turn/vision_agents/plugins/smart_turn/__init__.py (1 hunks)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (1 hunks)
plugins/vogent/README.md (1 hunks)
plugins/vogent/example/basic_usage.py (1 hunks)
plugins/vogent/pyproject.toml (2 hunks)
plugins/vogent/tests/test_vogent_td.py (1 hunks)
plugins/vogent/vision_agents/plugins/vogent/__init__.py (1 hunks)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (7 hunks)
plugins/wizper/vision_agents/plugins/wizper/stt.py (1 hunks)
pyproject.toml (2 hunks)
tests/base_test.py (5 hunks)
tests/test_events.py (7 hunks)
tests/test_pcm_data.py (0 hunks)
tests/test_tts_base.py (2 hunks)
tests/test_utils.py (10 hunks)
tests/test_vad_base.py (1 hunks)

💤 Files with no reviewable changes (17)

agents-core/vision_agents/core/turn_detection/init.py
plugins/moonshine/tests/test_moonshine_stt.py
plugins/moonshine/README.md
plugins/silero/README.md
plugins/silero/vision_agents/plugins/silero/vad.py
agents-core/vision_agents/core/observability/metrics.py
agents-core/vision_agents/core/tts/tts.py
plugins/silero/tests/test_performance.py
plugins/moonshine/vision_agents/plugins/moonshine/init.py
plugins/moonshine/vision_agents/plugins/moonshine/stt.py
agents-core/vision_agents/core/observability/init.py
plugins/silero/tests/test_bench.py
plugins/smart_turn/tests/test_turn_detection.py
plugins/silero/tests/test_vad.py
tests/test_pcm_data.py
plugins/silero/pyproject.toml
plugins/silero/vision_agents/plugins/silero/init.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/openrouter/example/openrouter_example.py
plugins/aws/vision_agents/plugins/aws/tts.py
plugins/aws/example/aws_qwen_example.py
plugins/fish/vision_agents/plugins/fish/tts.py
examples/01_simple_agent_example/simple_agent_example.py
plugins/wizper/vision_agents/plugins/wizper/stt.py
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py
plugins/fish/vision_agents/plugins/fish/stt.py
plugins/openai/tests/test_openai_realtime.py
agents-core/vision_agents/core/edge/types.py
tests/test_utils.py
plugins/kokoro/vision_agents/plugins/kokoro/tts.py
tests/test_vad_base.py
plugins/sample_plugin/example/my_example.py
agents-core/vision_agents/core/edge/events.py
tests/test_events.py
plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py
plugins/vogent/example/basic_usage.py
agents-core/vision_agents/core/agents/agents.py
plugins/fish/example/fish_example.py
agents-core/vision_agents/core/turn_detection/turn_detection.py
conftest.py
plugins/openai/vision_agents/plugins/openai/tts.py
plugins/vogent/tests/test_vogent_td.py
plugins/cartesia/vision_agents/plugins/cartesia/tts.py
plugins/krisp/vision_agents/plugins/krisp/turn_detection.py
agents-core/vision_agents/core/tts/manual_test.py
plugins/smart_turn/vision_agents/plugins/smart_turn/__init__.py
agents-core/vision_agents/core/llm/realtime.py
agents-core/vision_agents/core/turn_detection/events.py
tests/base_test.py
plugins/vogent/vision_agents/plugins/vogent/__init__.py
tests/test_tts_base.py
agents-core/vision_agents/core/vad/vad.py
plugins/smart_turn/tests/test_turn2.py
plugins/aws/vision_agents/plugins/aws/aws_realtime.py
agents-core/vision_agents/core/utils/utils.py
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

tests/**/*.py: Never use mocking utilities (e.g., unittest.mock, pytest-mock) in test files
Write tests using pytest (avoid unittest.TestCase or other frameworks)
Mark integration tests with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic

Files:

tests/test_utils.py
tests/test_vad_base.py
tests/test_events.py
tests/base_test.py
tests/test_tts_base.py

🧠 Learnings (2)

📚 Learning: 2025-10-20T19:23:41.259Z

Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Use uv to run Python and tests; do not invoke python -m directly

Applied to files:

docs/ai/instructions/ai-tests.md

📚 Learning: 2025-10-20T19:23:41.259Z

Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Applies to tests/**/*.py : Write tests using pytest (avoid unittest.TestCase or other frameworks)

Applied to files:

docs/ai/instructions/ai-tests.md

🧬 Code graph analysis (27)

plugins/aws/vision_agents/plugins/aws/tts.py (1)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

plugins/fish/vision_agents/plugins/fish/tts.py (1)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (3)

agents-core/vision_agents/core/agents/conversation.py (1)

Conversation (67-227)

agents-core/vision_agents/core/turn_detection/turn_detection.py (5)

TurnDetector (22-74)

start (68-70)

process_audio (52-64)

_emit_start_turn_event (37-42)

_emit_end_turn_event (44-49)

agents-core/vision_agents/core/turn_detection/events.py (2)

TurnStartedEvent (11-25)

TurnEndedEvent (29-43)

plugins/openai/tests/test_openai_realtime.py (3)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

conftest.py (1)

mia_audio_16khz (90-129)

tests/base_test.py (1)

mia_audio_16khz (18-58)

tests/test_utils.py (1)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

tests/test_vad_base.py (3)

agents-core/vision_agents/core/vad/vad.py (5)

VAD (35-433)

is_speech (128-138)

process_audio (140-189)

flush (375-382)

close (418-433)

agents-core/vision_agents/core/vad/events.py (2)

VADAudioEvent (30-40)

VADPartialEvent (44-55)

conftest.py (1)

get_assets_dir (78-80)

agents-core/vision_agents/core/edge/events.py (1)

agents-core/vision_agents/core/events/base.py (1)

PluginBaseEvent (50-52)

tests/test_events.py (2)

agents-core/vision_agents/core/events/manager.py (1)

send (426-468)

agents-core/vision_agents/core/edge/sfu_events.py (1)

level (496-500)

plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (1)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

plugins/vogent/example/basic_usage.py (2)

plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (4)

TurnDetection (26-315)

start (37-38)

process_audio (101-159)

stop (300-315)

agents-core/vision_agents/core/turn_detection/turn_detection.py (3)

start (68-70)

process_audio (52-64)

stop (72-74)

agents-core/vision_agents/core/agents/agents.py (3)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)

start (97-105)

process_audio (122-160)

agents-core/vision_agents/core/turn_detection/turn_detection.py (2)

start (68-70)

process_audio (52-64)

plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (2)

start (37-38)

process_audio (101-159)

agents-core/vision_agents/core/turn_detection/turn_detection.py (4)

agents-core/vision_agents/core/turn_detection/events.py (2)

TurnStartedEvent (11-25)

TurnEndedEvent (29-43)

agents-core/vision_agents/core/agents/conversation.py (1)

Conversation (67-227)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)

process_audio (122-160)

start (97-105)

plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (2)

process_audio (101-159)

start (37-38)

conftest.py (1)

plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)

join (237-289)

plugins/openai/vision_agents/plugins/openai/tts.py (1)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

plugins/vogent/tests/test_vogent_td.py (5)

plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (4)

TurnDetection (26-315)

start (37-38)

stop (300-315)

process_audio (101-159)

agents-core/vision_agents/core/agents/conversation.py (1)

InMemoryConversation (230-237)

agents-core/vision_agents/core/stt/events.py (2)

STTTranscriptEvent (16-47)

STTPartialTranscriptEvent (51-78)

agents-core/vision_agents/core/turn_detection/events.py (2)

TurnEndedEvent (29-43)

TurnStartedEvent (11-25)

conftest.py (1)

mia_audio_48khz_chunked (176-226)

plugins/krisp/vision_agents/plugins/krisp/turn_detection.py (2)

agents-core/vision_agents/core/agents/conversation.py (1)

Conversation (67-227)

agents-core/vision_agents/core/edge/types.py (1)

Participant (29-31)

agents-core/vision_agents/core/tts/manual_test.py (2)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

agents-core/vision_agents/core/edge/types.py (1)

play_pcm_with_ffplay (57-111)

plugins/smart_turn/vision_agents/plugins/smart_turn/__init__.py (1)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (1)

SmartTurnDetection (42-204)

agents-core/vision_agents/core/llm/realtime.py (1)

agents-core/vision_agents/core/edge/sfu_events.py (1)

Participant (229-270)

tests/base_test.py (2)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

conftest.py (3)

assets_dir (84-86)

bunny_video_track (241-285)

RealVideoTrack (247-279)

plugins/vogent/vision_agents/plugins/vogent/__init__.py (1)

plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (1)

TurnDetection (26-315)

tests/test_tts_base.py (2)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

plugins/aws/vision_agents/plugins/aws/tts.py (1)

stream_audio (61-94)

agents-core/vision_agents/core/vad/vad.py (2)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

agents-core/vision_agents/core/vad/events.py (1)

VADAudioEvent (30-40)

plugins/smart_turn/tests/test_turn2.py (5)

agents-core/vision_agents/core/agents/conversation.py (1)

InMemoryConversation (230-237)

agents-core/vision_agents/core/turn_detection/events.py (2)

TurnStartedEvent (11-25)

TurnEndedEvent (29-43)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (6)

ensure_model (268-309)

SileroVAD (207-265)

SmartTurnDetection (42-204)

start (97-105)

prob (240-265)

process_audio (122-160)

agents-core/vision_agents/core/turn_detection/turn_detection.py (3)

start (68-70)

stop (72-74)

process_audio (52-64)

agents-core/vision_agents/core/agents/agents.py (1)

subscribe (189-201)

plugins/aws/vision_agents/plugins/aws/aws_realtime.py (3)

agents-core/vision_agents/core/utils/video_forwarder.py (1)

VideoForwarder (13-188)

agents-core/vision_agents/core/events/base.py (1)

AudioFormat (23-30)

agents-core/vision_agents/core/llm/events.py (1)

RealtimeAudioOutputEvent (37-45)

plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (3)

agents-core/vision_agents/core/turn_detection/turn_detection.py (3)

TurnDetector (22-74)

start (68-70)

process_audio (52-64)

agents-core/vision_agents/core/edge/types.py (1)

Participant (29-31)

agents-core/vision_agents/core/agents/conversation.py (1)

Conversation (67-227)

🪛 LanguageTool

docs/vad-refactor-plan.md

[grammar] ~91-~91: Use a hyphen to join words.
Context: ...n counts via real assets. - Add base-VAD specific tests (future work): - Normal...

(QB_NEW_EN_HYPHEN)

DEVELOPMENT.md

[grammar] ~328-~328: Ensure spelling is correct
Context: ... in many formats. PCM, wav, mp3. 16khz, 48khz. Encoded as i16 or f32. Note that webr...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)

DEVELOPMENT.md

326-326: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

333-333: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

338-338: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

agents-core/pyproject.toml

coderabbitai · 2025-10-27T19:31:02Z

agents-core/vision_agents/core/agents/agents.py

+    async def _setup_turn_detection(self):
        if self.turn_detection:
            self.logger.info("🎙️ Setting up turn detection listeners")
            self.events.subscribe(self._on_turn_event)
-            self.turn_detection.start()
+            await self.turn_detection.start()



⚠️ Potential issue | 🔴 Critical

Make start() invocation robust to sync/async implementations and set active state.

Awaiting unconditionally will raise if a detector exposes a synchronous start(). Also ensure detectors mark themselves active.

- async def _setup_turn_detection(self): - if self.turn_detection: - self.logger.info("🎙️ Setting up turn detection listeners") - self.events.subscribe(self._on_turn_event) - await self.turn_detection.start() + async def _setup_turn_detection(self): + if not self.turn_detection: + return + self.logger.info("🎙️ Setting up turn detection listeners") + self.events.subscribe(self._on_turn_event) + # Support both async and sync start() while standardizing to async over time + import inspect + maybe = self.turn_detection.start() + if inspect.iscoroutine(maybe): + await maybe

Also consider enforcing an async start() in the TurnDetector base and updating all implementations to match for consistency.

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/agents.py around lines 639 to 644, the call to self.turn_detection.start() assumes an async implementation and unconditionally awaits it; change this to handle both sync and async starts by calling start(), checking if the return value is awaitable (using inspect.isawaitable) and awaiting only if needed, wrap the start call in try/except to log errors, and after a successful start set the detector's active state (e.g., setattr(self.turn_detection, "active", True) or call its activation method if one exists) so detectors mark themselves active; optionally note that the TurnDetector base class should be updated to require an async start() for consistency and update implementations accordingly.

docs/ai/instructions/ai-turn-detector.md

coderabbitai · 2025-10-27T19:31:02Z

examples/01_simple_agent_example/simple_agent_example.py

        turn_detection=smart_turn.TurnDetection(
-            buffer_duration=2.0, confidence_threshold=0.5
+            buffer_in_seconds=2.0, confidence_threshold=0.5
        ),  # Enable turn detection with FAL/ Smart turn


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

LGTM! Parameter rename improves clarity.

The rename from buffer_duration to buffer_in_seconds makes the API more explicit. The value and behavior are preserved.

Run the following script to verify all TurnDetection constructor calls have been updated:

🏁 Script executed:

#!/bin/bash # Description: Verify that no usages of the old parameter name remain. # Search for any remaining usage of buffer_duration in TurnDetection calls rg -n 'TurnDetection' -A 3 -B 1 | rg 'buffer_duration'

Length of output: 428

Update other files to match the parameter rename for consistency.

The reviewed file correctly uses buffer_in_seconds, but the refactoring is incomplete across the codebase. Update:

plugins/aws/example/basic_usage.py:28 - change buffer_duration=2.0 to buffer_in_seconds=2.0

Documentation references in aws/README.md:25 and agents-core/vision_agents/core/turn_detection/README.md:24

🤖 Prompt for AI Agents

In examples/01_simple_agent_example/simple_agent_example.py around lines 25-27 and across the repo, the parameter was renamed to buffer_in_seconds but other references still use buffer_duration; update plugins/aws/example/basic_usage.py line ~28 to replace buffer_duration=2.0 with buffer_in_seconds=2.0, and update the docs aws/README.md line ~25 and agents-core/vision_agents/core/turn_detection/README.md line ~24 to reflect the new parameter name (search for and replace any remaining buffer_duration occurrences with buffer_in_seconds and adjust any example code snippets or explanatory text to use the new name).

plugins/vogent/example/basic_usage.py

coderabbitai · 2025-10-27T19:31:04Z

plugins/vogent/tests/test_vogent_td.py

+        async def on_start(event: TurnStartedEvent):
+            logger.info(f"Vogent turn started on {event.session_id}")
+
+        @td.events.subscribe
+        async def on_stop(event: TurnEndedEvent):
+            logger.info(f"Vogent turn ended on {event.session_id}")


⚠️ Potential issue | 🔴 Critical

Fix incorrect event field access.

The event handlers attempt to access event.session_id, but according to the event definitions in agents-core/vision_agents/core/turn_detection/events.py, both TurnStartedEvent and TurnEndedEvent have a participant field, not session_id.

Apply this diff to fix the field access:

@td.events.subscribe async def on_start(event: TurnStartedEvent): - logger.info(f"Vogent turn started on {event.session_id}") + logger.info(f"Vogent turn started for {event.participant}") @td.events.subscribe async def on_stop(event: TurnEndedEvent): - logger.info(f"Vogent turn ended on {event.session_id}") + logger.info(f"Vogent turn ended for {event.participant}")

🤖 Prompt for AI Agents

In plugins/vogent/tests/test_vogent_td.py around lines 44 to 49, the event handlers incorrectly access event.session_id; change both handlers to use event.participant instead (e.g., logger.info(f"Vogent turn started on {event.participant}") and logger.info(f"Vogent turn ended on {event.participant}")) so they reference the correct field defined on TurnStartedEvent and TurnEndedEvent.

coderabbitai · 2025-10-27T19:31:05Z

pyproject.toml

+[tool.uv]
+# Workspace-level override to resolve numpy version conflicts
+# vogent-turn requires numpy<2.0, so we override to use 1.26.x
+override-dependencies = [
+    "numpy>=1.26.0,<2.0"
+]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Workspace-wide numpy override: verify blast radius or scope it.

The global override to numpy>=1.26,<2.0 may constrain unrelated plugins (e.g., CV libs) and cause resolution conflicts. Prefer scoping this to the minimal surface (plugin-level dependency) or confirm no other member requires numpy>=2.

Run to see who depends on numpy and potential conflicts:

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Files referencing numpy constraints:" rg -nP --glob '!**/.venv/**' 'numpy\s*[<>=]' -S echo "" echo "uv workspace members:" rg -nP '^\s*"plugins/.*"|^\s*"agents-core"' pyproject.toml -n -C0 echo "" echo "Check pyprojects that declare numpy>=2:" fd -a pyproject.toml | xargs -I{} rg -nP 'numpy\s*>=\s*2' {}

Length of output: 1398

Workspace numpy override is incompatible with 6+ plugins—will cause dependency resolution failures.

The override numpy>=1.26.0,<2.0 forces numpy <2.0 workspace-wide, but plugins/wizper, plugins/kokoro, plugins/gemini, plugins/deepgram, plugins/ultralytics, and plugins/smart_turn all explicitly require numpy>=2.0+. This creates an unsolvable constraint conflict.

Fix options:

Remove the workspace override and scope the vogent-turn constraint to its plugin only (if vogent is a plugin)

Update all conflicting plugins to accept numpy<2.0 (likely breaking for audio/vision dependencies)

Scope override only to agents-core/vogent-turn if that's the sole consumer

Verify which package actually requires numpy<2.0 and confine the override there rather than forcing it workspace-wide.

🤖 Prompt for AI Agents

In pyproject.toml around lines 26 to 31, the workspace-level override forcing "numpy>=1.26.0,<2.0" conflicts with multiple plugins (wizper, kokoro, gemini, deepgram, ultralytics, smart_turn) that require numpy>=2.0 and will break dependency resolution; remove the global override and instead apply the numpy<2.0 constraint only to the package that truly needs it (e.g., agents-core/vogent-turn) by moving that constraint into that package's pyproject or its package-specific dependencies, or alternatively update the conflicting plugins to accept the older numpy if feasible—verify which package requires numpy<2.0 and scope the override to that package rather than workspace-wide.

coderabbitai · 2025-10-27T19:31:05Z

tests/base_test.py

-                format='s16',
-                layout='mono',
-                rate=target_rate
+                format=AudioFormat.S16, layout="mono", rate=target_rate


⚠️ Potential issue | 🔴 Critical

Verify AudioFormat enum member name in PyAV resampler.

The code uses AudioFormat.S16 as the format parameter for av.AudioResampler. However, according to the relevant snippets, the enum is defined with member PCM_S16 = "s16", not S16. This would cause an AttributeError at runtime.

Verify the correct enum member name. Based on the enum definition in agents-core/vision_agents/core/events/base.py, it should likely be:

resampler = av.AudioResampler( - format=AudioFormat.S16, layout="mono", rate=target_rate + format="s16", layout="mono", rate=target_rate )

Note: PyAV's AudioResampler expects a string format name, not an enum. The AudioFormat enum appears to be for internal use in PcmData, not for PyAV.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

format=AudioFormat.S16, layout="mono", rate=target_rate

format="s16", layout="mono", rate=target_rate

🤖 Prompt for AI Agents

In tests/base_test.py around line 31, the resampler call uses AudioFormat.S16 which does not exist and will raise AttributeError; PyAV expects a string format like "s16" (or the enum's value). Replace AudioFormat.S16 with the correct member/value (e.g., AudioFormat.PCM_S16.value or the literal "s16") when passing the format to av.AudioResampler, ensuring the import of AudioFormat remains valid or use the string directly.

coderabbitai · 2025-10-27T19:31:05Z

tests/test_vad_base.py

+def _decode_mia_to_pcm16k() -> PcmData:
+    import av
+    import os
+    from conftest import get_assets_dir
+
+    audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")
+    container = av.open(audio_file_path)
+    audio_stream = container.streams.audio[0]
+    target_rate = 16000
+    resampler = None
+    if audio_stream.sample_rate != target_rate:
+        resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)
+
+    samples_list = []
+    for frame in container.decode(audio_stream):
+        if resampler:
+            frame = resampler.resample(frame)[0]
+        arr = frame.to_ndarray()
+        if arr.ndim > 1:
+            arr = arr.mean(axis=0).astype(np.int16)
+        samples_list.append(arr)
+    container.close()
+    if not samples_list:
+        return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16")
+    samples = np.concatenate(samples_list).astype(np.int16)
+    return PcmData(samples=samples, sample_rate=target_rate, format="s16")
+
+
+@pytest.mark.asyncio
+async def test_base_vad_mia_detects_speech_segments():
+    vad = EnergyVAD(
+        sample_rate=16000,
+        window_samples=512,
+        energy_th=0.01,
+        activation_th=0.5,
+        deactivation_th=0.3,
+        speech_pad_ms=200,
+        min_speech_ms=100,
+        max_speech_ms=30000,
+        partial_frames=5,
+    )
+
+    pcm = _decode_mia_to_pcm16k()
+
+    audio_events = []
+    partial_events = []
+
+    @vad.events.subscribe
+    async def on_audio(ev: VADAudioEvent):
+        audio_events.append(ev)
+
+    @vad.events.subscribe
+    async def on_partial(ev: VADPartialEvent):
+        partial_events.append(ev)
+
+    await vad.process_audio(pcm)
+    await vad.flush()
+    await asyncio.sleep(0.05)
+
+    assert len(audio_events) > 0, "Expected at least one speech segment from mia"
+    # Sanity: total bytes > 0 and duration present
+    total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events)
+    assert total_bytes > 0
+    assert any((ev.duration_ms or 0) > 0 for ev in audio_events)
+


🛠️ Refactor suggestion | 🟠 Major

Mark the mia audio test as an integration test.

The test at lines 121-155 decodes an MP3 file from disk using the av library, processes it through VAD, and validates speech detection. This involves file I/O and codec processing, making it an integration test.

Apply this diff:

+@pytest.mark.integration @pytest.mark.asyncio async def test_base_vad_mia_detects_speech_segments(): vad = EnergyVAD(

Based on coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _decode_mia_to_pcm16k() -> PcmData:

import av

import os

from conftest import get_assets_dir

audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")

container = av.open(audio_file_path)

audio_stream = container.streams.audio[0]

target_rate = 16000

resampler = None

if audio_stream.sample_rate != target_rate:

resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)

samples_list = []

for frame in container.decode(audio_stream):

if resampler:

frame = resampler.resample(frame)[0]

arr = frame.to_ndarray()

if arr.ndim > 1:

arr = arr.mean(axis=0).astype(np.int16)

samples_list.append(arr)

container.close()

if not samples_list:

return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16")

samples = np.concatenate(samples_list).astype(np.int16)

return PcmData(samples=samples, sample_rate=target_rate, format="s16")

@pytest.mark.asyncio

async def test_base_vad_mia_detects_speech_segments():

vad = EnergyVAD(

sample_rate=16000,

window_samples=512,

energy_th=0.01,

activation_th=0.5,

deactivation_th=0.3,

speech_pad_ms=200,

min_speech_ms=100,

max_speech_ms=30000,

partial_frames=5,

)

pcm = _decode_mia_to_pcm16k()

audio_events = []

partial_events = []

@vad.events.subscribe

async def on_audio(ev: VADAudioEvent):

audio_events.append(ev)

@vad.events.subscribe

async def on_partial(ev: VADPartialEvent):

partial_events.append(ev)

await vad.process_audio(pcm)

await vad.flush()

await asyncio.sleep(0.05)

assert len(audio_events) > 0, "Expected at least one speech segment from mia"

# Sanity: total bytes > 0 and duration present

total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events)

assert total_bytes > 0

assert any((ev.duration_ms or 0) > 0 for ev in audio_events)

def _decode_mia_to_pcm16k() -> PcmData:

import av

import os

from conftest import get_assets_dir

audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")

container = av.open(audio_file_path)

audio_stream = container.streams.audio[0]

target_rate = 16000

resampler = None

if audio_stream.sample_rate != target_rate:

resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)

samples_list = []

for frame in container.decode(audio_stream):

if resampler:

frame = resampler.resample(frame)[0]

arr = frame.to_ndarray()

if arr.ndim > 1:

arr = arr.mean(axis=0).astype(np.int16)

samples_list.append(arr)

container.close()

if not samples_list:

return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16")

samples = np.concatenate(samples_list).astype(np.int16)

return PcmData(samples=samples, sample_rate=target_rate, format="s16")

@pytest.mark.integration

@pytest.mark.asyncio

async def test_base_vad_mia_detects_speech_segments():

vad = EnergyVAD(

sample_rate=16000,

window_samples=512,

energy_th=0.01,

activation_th=0.5,

deactivation_th=0.3,

speech_pad_ms=200,

min_speech_ms=100,

max_speech_ms=30000,

partial_frames=5,

)

pcm = _decode_mia_to_pcm16k()

audio_events = []

partial_events = []

@vad.events.subscribe

async def on_audio(ev: VADAudioEvent):

audio_events.append(ev)

@vad.events.subscribe

async def on_partial(ev: VADPartialEvent):

partial_events.append(ev)

await vad.process_audio(pcm)

await vad.flush()

await asyncio.sleep(0.05)

assert len(audio_events) > 0, "Expected at least one speech segment from mia"

# Sanity: total bytes > 0 and duration present

total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events)

assert total_bytes > 0

assert any((ev.duration_ms or 0) > 0 for ev in audio_events)

🤖 Prompt for AI Agents

In tests/test_vad_base.py around lines 92 to 156, the mia audio test performs file I/O and codec processing and should be marked as an integration test; add the pytest integration marker by importing pytest if needed and decorating the test function with @pytest.mark.integration (or set pytestmark = pytest.mark.integration for the module) so the test is categorized as an integration test per the coding guidelines.

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (2)

299-314: stop(): async override + ensure deactivation and clean cancellation

Base TurnDetector.stop() is async and clears is_active. Current sync method neither awaits task cancellation nor deactivates the plugin.

Apply:

-    def stop(self) -> None:
-        # Cancel any running processing tasks
-        for task in self._processing_tasks.values():
-            if not task.done():
-                task.cancel()
-        self._processing_tasks.clear()
+    async def stop(self) -> None:
+        # Cancel any running processing tasks
+        pending = [t for t in self._processing_tasks.values() if not t.done()]
+        for t in pending:
+            t.cancel()
+        if pending:
+            # Drain cancellations
+            await asyncio.gather(*pending, return_exceptions=True)
+        self._processing_tasks.clear()
@@
-        self._current_speaker = None
-
-        self.logger.info("Vogent Turn detection stopped")
+        self._current_speaker = None
+        await super().stop()
+        self.logger.info("Vogent Turn detection stopped")

120-126: Make sample format check case‑insensitive

Avoid rejecting valid formats due to capitalization differences.

Apply:

-        valid_formats = ["int16", "s16", "pcm_s16le"]
-        if audio_data.format not in valid_formats:
+        valid_formats = {"int16", "s16", "pcm_s16le"}
+        fmt = (audio_data.format or "").lower()
+        if fmt not in valid_formats:
             self.logger.error(
-                f"Invalid sample format: {audio_data.format}. Expected one of {valid_formats}."
+                f"Invalid sample format: {audio_data.format}. Expected one of {sorted(valid_formats)}."
             )
             return

♻️ Duplicate comments (5)

agents-core/vision_agents/core/agents/agents.py (1)

667-671: Unconditional await breaks synchronous start() implementations.

The Vogent turn detector has a synchronous start() method while SmartTurn uses async. Awaiting unconditionally will raise TypeError: object NoneType can't be used in 'await' expression or similar when a detector returns None from a sync method.

Apply this diff to handle both sync and async start():
     async def _setup_turn_detection(self):
-        if self.turn_detection:
-            self.logger.info("🎙️ Setting up turn detection listeners")
-            self.events.subscribe(self._on_turn_event)
-            await self.turn_detection.start()
+        if not self.turn_detection:
+            return
+        self.logger.info("🎙️ Setting up turn detection listeners")
+        self.events.subscribe(self._on_turn_event)
+        
+        # Support both sync and async start() implementations
+        import inspect
+        result = self.turn_detection.start()
+        if inspect.iscoroutine(result):
+            await result
Alternatively, standardize the TurnDetector base class to require async start() and update Vogent's implementation to be async (even if it's just async def start(self) -> None: pass).

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (1)

36-38: start(): async override required and must activate the detector

Current override is a no-op and sync; base TurnDetector.start() is async and sets is_active=True. Processing is gated on is_active, so nothing runs.

Apply:
-    def start(self) -> None:
-        pass
+    async def start(self) -> None:
+        # Mark active and run any future warmups here
+        await super().start()

plugins/smart_turn/tests/test_smart_turn.py (1)

36-43: Mark external model download tests as integration.

These perform network I/O; mark accordingly.

-    async def test_smart_turn_download(self):
+    @pytest.mark.integration
+    async def test_smart_turn_download(self):
@@
-    async def test_silero_download(self):
+    @pytest.mark.integration
+    async def test_silero_download(self):

Based on learnings

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)

99-108: Honor base contract: mark detector active on startup.

Call the base start after model prep to set is_active and run base hooks.

 async def start(self):
     # Ensure model directory exists
     os.makedirs(MODEL_BASE_DIR, exist_ok=True)
     
     # Prepare both models in parallel
     await asyncio.gather(
         self._prepare_smart_turn(),
         self._prepare_silero_vad(),
     )
+    # Signal readiness
+    await super().start()

154-163: Use model signal to decide end-of-turn and respect confidence threshold.

You compute prediction but ignore it; end condition relies only on silence. Combine both and use self._confidence_threshold.

-        # Process all collected segments
-        for segment in segments_to_process:
-            prediction = await self._predict_turn_completed(segment, participant)
-            turn_completed = prediction > 0.5
-
-        # End turn if we're in a turn and collector is done (silence detected)
-        if self.turn_in_progress and not self.collector.is_collecting:
-            self._emit_end_turn_event(TurnEndedEvent(participant=participant))
-            self.turn_in_progress = False
+        # Aggregate model signal across segments
+        model_says_done = False
+        end_confidence: Optional[float] = None
+        for segment in segments_to_process:
+            prob = await self._predict_turn_completed(segment, participant)
+            if prob >= self._confidence_threshold:
+                model_says_done = True
+                end_confidence = prob
+                break
+
+        # End turn if we're in a turn and either silence or model indicates completion
+        if self.turn_in_progress and (model_says_done or not self.collector.is_collecting):
+            self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=end_confidence))
+            self.turn_in_progress = False

🧹 Nitpick comments (12)

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (5)
210-219: Use get_running_loop() inside coroutines

get_event_loop() is legacy in async contexts.

Apply:
-            loop = asyncio.get_event_loop()
+            loop = asyncio.get_running_loop()
192-208: Comment says “sorted by timestamp” but code isn’t sorting

Either sort explicitly (if Message exposes a timestamp) or adjust the comment.

Example if Message has created_at:
-                # Get messages for this user, sorted by timestamp
-                user_messages = [
-                    m
-                    for m in conversation.messages
-                    if m.user_id == user_id and m.content
-                ]
+                # Get messages for this user (newest last)
+                user_messages = [m for m in conversation.messages if m.user_id == user_id and m.content]
+                # Optionally enforce by timestamp:
+                # user_messages.sort(key=lambda m: m.created_at)
75-83: Unused state fields

_user_last_audio is written but never read; _infer_channels is unused. Consider removing or using them for buffer GC/format inference.

68-74: Defer heavy model init to start() (optional)

Constructing/compiling and warmup in init can block. Consider lazy init in start().

100-113: Backpressure: large buffers may pile up per user

If prediction is slower than ingest, buffers grow without bounds. Consider a max buffer per user and drop/merge strategy or a queue with size limits.

Also applies to: 160-176
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (5)
124-133: Guard processing when inactive.

Avoid work before startup or after stop.
 async def process_audio(
     self,
     audio_data: PcmData,
     participant: Participant,
     conversation: Optional[Conversation],
 ) -> None:
+    if not self.is_active:
+        logger.debug("process_audio called while inactive; ignoring.")
+        return
     # Ensure audio is in the right format: 16kHz, float32
111-114: Initialize WhisperFeatureExtractor via from_pretrained for correct config.

Direct ctor may miss expected normalization/mel settings. Prefer the pretrained config used by Smart‑Turn (Whisper Tiny).
-        self._whisper_extractor = await asyncio.to_thread(WhisperFeatureExtractor, chunk_length=8)
+        # Load feature extractor with proper config
+        self._whisper_extractor = await asyncio.to_thread(
+            WhisperFeatureExtractor.from_pretrained, "openai/whisper-tiny"
+        )
326-331: Specify providers and add a brief docstring for build_session.

Ensures deterministic CPU execution and documents behavior.
-def build_session(onnx_path):
-    so = ort.SessionOptions()
+def build_session(onnx_path: str):
+    """Build an ONNXRuntime session (CPU, optimized)."""
+    so = ort.SessionOptions()
     so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
     so.inter_op_num_threads = 1
     so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
-    return ort.InferenceSession(onnx_path, sess_options=so)
+    return ort.InferenceSession(
+        onnx_path,
+        sess_options=so,
+        providers=["CPUExecutionProvider"],
+    )
42-60: Class doc: tiny nits.

“only look at audio features” → “only looks at audio features.” Consider minor grammar cleanups while you’re in here. No functional impact.

164-206: Minor: remove unused parameter or use it.

participant in _predict_turn_completed is unused. Either drop it and update callers, or use for logging/telemetry. Defer if you plan to use it soon.
plugins/smart_turn/tests/test_smart_turn.py (2)
1-7: Import AudioFormat for consistent PCM construction.
 from getstream.video.rtc.track_util import PcmData
+from getstream.video.rtc.track_util import AudioFormat
48-60: Use enum format and robust chunking; add assertions.

Avoid frombuffer reinterpretation; use provided utilities.
-    def test_silero_predict(self, mia_audio_16khz):
+    def test_silero_predict(self, mia_audio_16khz):
         vad = SileroVAD(SILERO_ONNX_PATH)
-        # TODO: chunk in 512
-        chunk = mia_audio_16khz.samples[:512]
-        iteration = 0
-        for i in range(100):
-            chunk = mia_audio_16khz.samples[i*512:(i+1)*512]
-            int16 = np.frombuffer(chunk, dtype=np.int16)
-            f32 = (int16.astype(np.float32)) / 32768.0
-            # 16khz, float32 on 512 chunk size
-            pcm_chunk = PcmData(samples=f32, format="f32", sample_rate=16000)
-            result = vad.prob(pcm_chunk.samples)
-            print(result)
+        pcm = mia_audio_16khz.resample(16000).to_float32()
+        seen = 0
+        for chunk in pcm.chunks(chunk_size=512):
+            prob = vad.prob(chunk.samples)
+            assert 0.0 <= prob <= 1.0
+            seen += 1
+        assert seen > 0

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f3aee97 and 63f820f.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

agents-core/vision_agents/core/agents/agents.py (8 hunks)
docs/ai/instructions/ai-turn-detector.md (1 hunks)
plugins/smart_turn/pyproject.toml (1 hunks)
plugins/smart_turn/tests/test_smart_turn.py (1 hunks)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (1 hunks)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/smart_turn/pyproject.toml

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py
plugins/smart_turn/tests/test_smart_turn.py
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
agents-core/vision_agents/core/agents/agents.py

🧠 Learnings (1)

📚 Learning: 2025-10-20T19:23:41.259Z

Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Applies to tests/**/*.py : Mark integration tests with pytest.mark.integration

Applied to files:

plugins/smart_turn/tests/test_smart_turn.py

🧬 Code graph analysis (4)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (3)

agents-core/vision_agents/core/agents/conversation.py (1)

Conversation (67-227)

agents-core/vision_agents/core/turn_detection/turn_detection.py (5)

TurnDetector (22-74)

start (68-70)

process_audio (52-64)

_emit_start_turn_event (37-42)

_emit_end_turn_event (44-49)

agents-core/vision_agents/core/turn_detection/events.py (2)

TurnStartedEvent (11-25)

TurnEndedEvent (29-43)

plugins/smart_turn/tests/test_smart_turn.py (3)

agents-core/vision_agents/core/agents/conversation.py (1)

InMemoryConversation (230-237)

agents-core/vision_agents/core/turn_detection/events.py (2)

TurnStartedEvent (11-25)

TurnEndedEvent (29-43)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (6)

ensure_model (269-310)

SileroVAD (209-266)

SmartTurnDetection (43-206)

start (99-107)

prob (241-266)

process_audio (124-162)

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (3)

agents-core/vision_agents/core/turn_detection/turn_detection.py (2)

TurnDetector (22-74)

TurnEvent (14-18)

agents-core/vision_agents/core/agents/conversation.py (2)

Conversation (67-227)

clear (49-52)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)

start (99-107)

process_audio (124-162)

agents-core/vision_agents/core/agents/agents.py (3)

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)

start (99-107)

process_audio (124-162)

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (2)

start (36-37)

process_audio (100-158)

agents-core/vision_agents/core/turn_detection/turn_detection.py (2)

start (68-70)

process_audio (52-64)

🔇 Additional comments (4)

agents-core/vision_agents/core/agents/agents.py (3)

126-132: LGTM!

The options parameter handling correctly provides defaults and merges user-supplied options appropriately.

372-372: Proper async setup.

Awaiting the turn detection setup before establishing the connection is the correct approach.

688-688: LGTM!

Passing the full participant and conversation context to process_audio() correctly aligns with the updated turn detection API and enables richer decision-making.
plugins/smart_turn/tests/test_smart_turn.py (1)
44-47: Fix: async test and valid API call with assertions.

predict_endpoint doesn't exist; use the internal predictor for now.
-    def test_smart_turn_predict(self, td, mia_audio_16khz):
-        result = td.predict_endpoint(mia_audio_16khz)
-        print(result)
+    @pytest.mark.asyncio
+    async def test_smart_turn_predict(self, td, mia_audio_16khz):
+        pcm = mia_audio_16khz.resample(16000).to_float32()
+        prob = await td._predict_turn_completed(pcm, Participant(user_id="mia", original={}))
+        assert 0.0 <= prob <= 1.0

coderabbitai · 2025-10-27T20:10:10Z

agents-core/vision_agents/core/agents/agents.py

 from ..vad.events import VADAudioEvent
 from . import events
 from .conversation import Conversation
+from dataclasses import dataclass


⚠️ Potential issue | 🟡 Minor

Remove duplicate import.

dataclass is already imported at line 6. This duplicate import is unnecessary.

Apply this diff:

-from dataclasses import dataclass -

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from dataclasses import dataclass

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/agents.py around line 42, there is a duplicate import "from dataclasses import dataclass" which is already imported at line 6; remove the redundant import statement at line 42 (delete that line) so the file has a single dataclass import and run linters/tests to confirm no further import duplication.

coderabbitai · 2025-10-27T20:10:10Z

agents-core/vision_agents/core/agents/agents.py

+@dataclass
+class AgentOptions:
+    model_dir: str
+
+    def update(self, other: "AgentOptions") -> "AgentOptions":
+        merged_dict = asdict(self)
+
+        for key, value in asdict(other).items():
+            if value is not None:
+                merged_dict[key] = value
+
+        return AgentOptions(**merged_dict)
+
+def default_agent_options():
+    return AgentOptions(
+        model_dir=tempfile.gettempdir()
+    )


🛠️ Refactor suggestion | 🟠 Major

Add Google-style docstrings for public API.

The new AgentOptions dataclass, its update() method, and the default_agent_options() factory function lack docstrings.

As per coding guidelines.

Apply this diff:

@dataclass class AgentOptions: + """Configuration options for Agent initialization. + + Attributes: + model_dir: Directory path for storing downloaded models and temporary files. + """ model_dir: str def update(self, other: "AgentOptions") -> "AgentOptions": + """Merge options from another AgentOptions instance. + + Args: + other: Another AgentOptions instance whose non-None values will override this instance's values. + + Returns: + A new AgentOptions instance with merged values. + """ merged_dict = asdict(self) for key, value in asdict(other).items(): @@ -70,6 +80,11 @@ return AgentOptions(**merged_dict) def default_agent_options(): + """Create default agent options. + + Returns: + AgentOptions instance with model_dir set to the system's temporary directory. + """ return AgentOptions( model_dir=tempfile.gettempdir() )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@dataclass

class AgentOptions:

model_dir: str

def update(self, other: "AgentOptions") -> "AgentOptions":

merged_dict = asdict(self)

for key, value in asdict(other).items():

if value is not None:

merged_dict[key] = value

return AgentOptions(**merged_dict)

def default_agent_options():

return AgentOptions(

model_dir=tempfile.gettempdir()

)

@dataclass

class AgentOptions:

"""Configuration options for Agent initialization.

Attributes:

model_dir: Directory path for storing downloaded models and temporary files.

"""

model_dir: str

def update(self, other: "AgentOptions") -> "AgentOptions":

"""Merge options from another AgentOptions instance.

Args:

other: Another AgentOptions instance whose non-None values will override this instance's values.

Returns:

A new AgentOptions instance with merged values.

"""

merged_dict = asdict(self)

for key, value in asdict(other).items():

if value is not None:

merged_dict[key] = value

return AgentOptions(**merged_dict)

def default_agent_options():

"""Create default agent options.

Returns:

AgentOptions instance with model_dir set to the system's temporary directory.

"""

return AgentOptions(

model_dir=tempfile.gettempdir()

)

🤖 Prompt for AI Agents

In agents-core/vision_agents/core/agents/agents.py around lines 59 to 75, the AgentOptions dataclass, its update() method, and the default_agent_options() factory function are missing Google-style docstrings; add concise Google-style docstrings for the class (one-line summary plus Args describing model_dir), for update() (one-line summary, Args: other: AgentOptions, Returns: AgentOptions, and a note that None fields are ignored), and for default_agent_options() (one-line summary and Returns: AgentOptions with default model_dir set to tempfile.gettempdir()).

coderabbitai · 2025-10-27T20:10:10Z

docs/ai/instructions/ai-turn-detector.md

+```python
+
+class MyTurnDetector(TurnDetector):
+    async def process_audio(
+        self,
+        audio_data: PcmData,
+        participant: Participant,
+        conversation: Optional[Conversation],
+    ) -> None:
+
+        self._emit_start_turn_event(TurnStartedEvent(participant=participant))
+        self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=0.7))
+
+    def start(self):
+        super().start()
+        # Any custom model loading/ or other heavy prep steps go here
+
+    def stop(self):
+        super().stop()
+        # cleanup time. start and stop are optional
+
+```


🛠️ Refactor suggestion | 🟠 Major

Fix example: add imports and make start/stop async (await super()).

Current snippet omits required imports and uses sync start/stop, which violates the async base contract. Apply:

```python +from typing import Optional +from getstream.video.rtc.track_util import PcmData +from vision_agents.core.edge.types import Participant, Conversation +from vision_agents.core.turn_detection import ( + TurnDetector, + TurnStartedEvent, + TurnEndedEvent, +) class MyTurnDetector(TurnDetector): async def process_audio( self, - audio_data: PcmData, + audio_data: PcmData, participant: Participant, conversation: Optional[Conversation], ) -> None: - self._emit_start_turn_event(TurnStartedEvent(participant=participant)) self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=0.7)) - def start(self): - super().start() + async def start(self): + await super().start() # Any custom model loading/ or other heavy prep steps go here - - def stop(self): - super().stop() + + async def stop(self): + await super().stop() # cleanup time. start and stop are optional

🤖 Prompt for AI Agents

In docs/ai/instructions/ai-turn-detector.md around lines 5 to 26, the example omits required imports and defines start/stop as synchronous methods which conflict with the async base class; add the listed imports (typing.Optional, PcmData, Participant, Conversation, TurnDetector, TurnStartedEvent, TurnEndedEvent) at the top of the snippet, change start() and stop() to async def start/stop and call await super().start() and await super().stop(), respectively, and keep process_audio signature as shown.

coderabbitai · 2025-10-27T20:10:10Z

docs/ai/instructions/ai-turn-detector.md

+### Testing turn detection
+
+An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py`


⚠️ Potential issue | 🟡 Minor

Correct the test file path in docs.

Use the actual repo path:

-An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py` +An example test suite for turn detection can be found in `plugins/smart_turn/tests/test_smart_turn.py`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Testing turn detection

An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py`

### Testing turn detection

An example test suite for turn detection can be found in `plugins/smart_turn/tests/test_smart_turn.py`

🤖 Prompt for AI Agents

In docs/ai/instructions/ai-turn-detector.md around lines 28 to 30, the example test suite path is incorrect; update the referenced test file path to the actual repository location (replace `smart_turn/tests/test_smart_turn.py` with the correct path used in the repo) so the docs point to the real test file.

plugins/smart_turn/tests/test_smart_turn.py

coderabbitai · 2025-10-27T20:10:11Z

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

+async def ensure_model(path: str, url: str) -> str:
+    """
+    Download a model file asynchronously if it doesn't exist.
+    
+    Args:
+        path: Local path where the model should be saved
+        url: URL to download the model from
+        
+    Returns:
+        The path to the model file
+    """
+    if not os.path.exists(path):
+        model_name = os.path.basename(path)
+        logger.info(f"Downloading {model_name}...")
+
+        try:
+            async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
+                async with client.stream("GET", url) as response:
+                    response.raise_for_status()
+
+                    # Write file in chunks to avoid loading entire file in memory
+                    # Use asyncio.to_thread for blocking file I/O operations
+                    chunks = []
+                    async for chunk in response.aiter_bytes(chunk_size=8192):
+                        chunks.append(chunk)
+
+                    # Write all chunks to file in thread to avoid blocking event loop
+                    def write_file():
+                        with open(path, "wb") as f:
+                            for chunk in chunks:
+                                f.write(chunk)
+
+                    await asyncio.to_thread(write_file)
+
+            logger.info(f"{model_name} downloaded.")
+        except httpx.HTTPError as e:
+            # Clean up partial download on error
+            if os.path.exists(path):
+                os.remove(path)
+            raise RuntimeError(f"Failed to download {model_name}: {e}")
+
+    return path
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Stream model download to a temp file, add atomic rename and optional SHA256.

Avoid buffering entire files in RAM and ensure integrity. Also guard concurrent downloads per path.

-async def ensure_model(path: str, url: str) -> str: +async def ensure_model(path: str, url: str, sha256: Optional[str] = None) -> str: @@ - if not os.path.exists(path): + if not os.path.exists(path): model_name = os.path.basename(path) logger.info(f"Downloading {model_name}...") - - try: - async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client: - async with client.stream("GET", url) as response: - response.raise_for_status() - - # Write file in chunks to avoid loading entire file in memory - # Use asyncio.to_thread for blocking file I/O operations - chunks = [] - async for chunk in response.aiter_bytes(chunk_size=8192): - chunks.append(chunk) - - # Write all chunks to file in thread to avoid blocking event loop - def write_file(): - with open(path, "wb") as f: - for chunk in chunks: - f.write(chunk) - - await asyncio.to_thread(write_file) - - logger.info(f"{model_name} downloaded.") - except httpx.HTTPError as e: - # Clean up partial download on error - if os.path.exists(path): - os.remove(path) - raise RuntimeError(f"Failed to download {model_name}: {e}") + try: + # Per-path async lock to avoid concurrent clobber + lock = _download_locks.setdefault(path, asyncio.Lock()) + async with lock: + if os.path.exists(path): + return path + tmp_path = f"{path}.tmp" + hasher = hashlib.sha256() if sha256 else None + async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client: + async with client.stream("GET", url) as response: + response.raise_for_status() + # Open once; write each chunk in a worker thread + f = open(tmp_path, "wb") + try: + async for chunk in response.aiter_bytes(chunk_size=1 << 14): + if hasher: + hasher.update(chunk) + await asyncio.to_thread(f.write, chunk) + finally: + await asyncio.to_thread(f.close) + # Verify integrity if provided + if hasher and hasher.hexdigest().lower() != sha256.lower(): + os.remove(tmp_path) + raise RuntimeError(f"Checksum mismatch for {model_name}") + # Atomic rename + os.replace(tmp_path, path) + logger.info(f"{model_name} downloaded.") + except Exception as e: + # Clean up partial download on error + if os.path.exists(f"{path}.tmp"): + os.remove(f"{path}.tmp") + raise RuntimeError(f"Failed to download {model_name}: {e}") @@ return path

Add these near the imports/top of file:

+import hashlib +from typing import Optional, Dict +_download_locks: Dict[str, asyncio.Lock] = {}

I'll verify this review comment by examining the codebase and the proposed changes.

Once I locate the file, I'll inspect the current code at lines 269-311 and verify: - Whether the current code matches the "final state" snippet shown - Whether the suggested imports are already present - Whether the proposed changes address legitimate concerns (temp file streaming, atomic rename, SHA256, locks) - Any Python/asyncio compatibility issues Let me search for the file:

Then I need to:

Verify the file contents match the snippet

Check current import statements

Validate the proposed refactoring suggestions

Ensure no conflicts with existing code

Could you confirm the repository is available, or should I proceed with generating verification scripts that will clone the repo first?

🏁 Script executed:

#!/bin/bash # Step 1: Locate the file fd -p "turn_detection_2.py" -t f

Length of output: 71

🏁 Script executed:

#!/bin/bash # Step 2: Verify file exists and inspect the target function find . -name "turn_detection_2.py" -type f 2>/dev/null | head -5

Length of output: 73

🏁 Script executed:

#!/bin/bash # Search for the turn_detection_2.py file find . -path "*smart_turn/turn_detection_2.py" -type f 2>/dev/null

Length of output: 73

🏁 Script executed:

#!/bin/bash # Inspect lines 269-311 to see current implementation sed -n '269,311p' ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 1677

🏁 Script executed:

#!/bin/bash # Check imports at top of file (first 50 lines) head -50 ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 1622

🏁 Script executed:

#!/bin/bash # Search for _download_locks, asyncio.Lock, or similar patterns in the file grep -n "_download_locks\|asyncio.Lock\|tmp_path\|atomic" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check if hashlib is already imported grep -n "^import hashlib\|^from hashlib" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check for all calls to ensure_model to verify backward compatibility rg -n 'ensure_model\s*\(' ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py -A2 -B2

Length of output: 689

🏁 Script executed:

#!/bin/bash # Verify file totality - check end of file to understand module structure tail -20 ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 763

🏁 Script executed:

#!/bin/bash # Check if Dict is imported anywhere in typing imports grep -n "from typing import" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 29

Implement streaming download with atomic rename, checksum verification, and concurrency protection.

The current implementation buffers entire model files in memory and lacks protection against concurrent downloads of the same model, creating both RAM pressure and race conditions. The proposed changes address these issues legitimately:

Temp file + atomic rename: Prevents partial file corruption if download fails midway

SHA256 verification: Ensures model integrity (critical for ML model reproducibility)

Per-path asyncio.Lock: Guards against concurrent download races for the same model path

Direct streaming write: Avoids buffering entire file, reducing memory footprint for large ONNX models

Add the required imports at the top of the file:

import hashlib from typing import Optional, Dict

Add this module-level state near the imports:

_download_locks: Dict[str, asyncio.Lock] = {}

Then implement the refactored ensure_model function as shown in the diff. The optional sha256 parameter is backward compatible with existing callers at lines 110 and 116.

🤖 Prompt for AI Agents

In plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py around lines 269 to 311, replace the current buffered-download implementation with a streaming, atomic, checksum-verified, and concurrency-safe downloader: add imports hashlib and typing Optional/Dict; add module-level _download_locks: Dict[str, asyncio.Lock] = {}; change ensure_model signature to accept optional sha256: Optional[str] = None; acquire/await a per-path asyncio.Lock from _download_locks to prevent concurrent downloads; stream response.aiter_bytes(chunk_size=8192) and write chunks directly to a temporary file (e.g., path + ".tmp") using asyncio.to_thread for blocking file writes to avoid buffering whole file in memory; compute SHA256 incrementally while writing and if sha256 is provided, compare it after download and raise on mismatch; on successful completion atomically rename the temp file to the target path (os.replace); ensure cleanup of the temp file and release the lock on any exception and wrap HTTP errors with a clear runtime error including the original exception.

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py

coderabbitai · 2025-10-27T20:10:11Z

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py

+        # Resample to 16 kHz mono
+        samples = audio_data.resample(16_000, 1).samples



⚠️ Potential issue | 🟠 Major

Align resample rate with buffer math

You hardcode 16 kHz here but compute required_bytes using self.sample_rate. Keep them consistent.

Apply:

- # Resample to 16 kHz mono - samples = audio_data.resample(16_000, 1).samples + # Resample to target sample rate (mono) + samples = audio_data.resample(self.sample_rate, 1).samples

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Resample to 16 kHz mono

samples = audio_data.resample(16_000, 1).samples

# Resample to target sample rate (mono)

samples = audio_data.resample(self.sample_rate, 1).samples

🤖 Prompt for AI Agents

In plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py around lines 135 to 137, the code resamples audio to a hardcoded 16_000 Hz while buffer size/required_bytes are computed using self.sample_rate; replace the hardcoded 16_000 with self.sample_rate (ensuring self.sample_rate is an int/Hz value) so the resample rate and buffer math are consistent, i.e. call audio_data.resample(self.sample_rate, 1).samples and update any related constants/variables to use self.sample_rate.

…nto vogent-tommaso

…-tommaso

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

plugins/vogent/tests/test_vogent_td.py (1)

36-42: Fix incorrect event field access (previously flagged).

The event handlers attempt to access event.session_id, but TurnStartedEvent and TurnEndedEvent have a participant field, not session_id.

Apply this diff to fix the field access:

         @td.events.subscribe
         async def on_start(event: TurnStartedEvent):
-            logger.info(f"Vogent turn started on {event.session_id}")
+            logger.info(f"Vogent turn started for {event.participant}")
             event_order.append("start")
 
         @td.events.subscribe
         async def on_stop(event: TurnEndedEvent):
-            logger.info(f"Vogent turn ended on {event.session_id}")
+            logger.info(f"Vogent turn ended for {event.participant}")
             event_order.append("stop")

🧹 Nitpick comments (2)

plugins/vogent/tests/test_vogent_td.py (2)
45-50: Replace arbitrary sleeps with purpose-built wait method.

Instead of using arbitrary sleep durations, leverage the wait_for_processing_complete method available on VogentTurnDetection for more robust synchronization.

Apply this diff:
         await td.process_audio(mia_audio_16khz, participant, conversation)
         await td.process_audio(silence_2s_48khz, participant, conversation)
 
-        await asyncio.sleep(0.001)
-
-        await asyncio.sleep(5)
+        # Wait for all queued audio to be processed
+        await td.wait_for_processing_complete(timeout=5.0)
53-53: Consider more precise assertions.

The current assertion allows two different event sequences. If the behavior is deterministic given the test inputs, consider asserting the exact expected sequence. If non-deterministic, document why both outcomes are valid.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 820e42d and 8fb6d82.

📒 Files selected for processing (1)

plugins/vogent/tests/test_vogent_td.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/vogent/tests/test_vogent_td.py

🧬 Code graph analysis (1)

plugins/vogent/tests/test_vogent_td.py (4)

agents-core/vision_agents/core/agents/conversation.py (1)

InMemoryConversation (230-237)

agents-core/vision_agents/core/turn_detection/events.py (2)

TurnEndedEvent (29-45)

TurnStartedEvent (11-25)

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (3)

start (113-129)

stop (329-340)

process_audio (165-176)

conftest.py (2)

mia_audio_16khz (95-134)

silence_2s_48khz (181-195)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

🔇 Additional comments (3)

plugins/vogent/tests/test_vogent_td.py (3)

1-15: LGTM! Clean imports and module structure.

The imports are well-organized and include all necessary dependencies for testing Vogent turn detection.

19-26: LGTM! Proper async fixture with cleanup.

The try/finally pattern ensures proper cleanup of the VogentTurnDetection instance even if the test fails.

28-28: Verify re-enablement plan for skipped test.

Since the PR is marked as WIP and this test is temporarily disabled, ensure there's a plan to enable it before merging to main.

tschellenbach and others added 26 commits October 23, 2025 17:19

first attempt at vogent

6395dec

fix smart turn

a9e950b

well

3a26b6b

happyt ests

e80f718

Merge branch 'main' of github.com:GetStream/Vision-Agents into vogent

89a99cc

work on smart turn

1546297

Merge branch 'main' into vogent

6cdaac5

use audio util

1c4160c

remove _pcm_to_wav_bytes and use util

8ffbeb5

cleanup silero from manual audio

cd6d4b5

smart turn locally

425fa59

step 1: refactor VAD base to own normalization/windowing/events; adap…

fee4f29

…t Silero VAD; align docs; ensure silero mia and silence tests pass

step 2: add base VAD tests for silence, mia and white noise; fix part…

1a98d2b

…ial-event logging; all VAD tests green (uv run)

wip on cleanup

8fa259d

working test

fb85335

left some todo

c122c78

Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…

e5d4a56

…nto vogent-tommaso

update code to use stream-py latest utils

74ac469

sprinkle docs for humans and AIs about audio mgmt

038fafe

todo

da01c8a

Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…

16a7512

…nto vogent-tommaso

wip

5b57a00

fix imports

3580c17

Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…

d0e2234

…nto vogent-tommaso

bit more cleanup

23eee89

Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…

f3aee97

…nto vogent-tommaso

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

nice docs for turn keeping

63f820f

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

github-actions bot added plugin-deepgram plugin-elevenlabs plugin-cartesia plugin-kokoro plugin-smart-turn plugin-krisp examples tests config core-agents core-infrastructure plugin-openai plugin-moonshine plugin-silero plugin-wizper docs cli project-info labels Oct 30, 2025

tschellenbach added 3 commits October 30, 2025 09:50

set hf token

a8a90a4

Merge branch 'vogent-tommaso' of github.com:GetStream/Vision-Agents i…

06f2365

…nto vogent-tommaso

Merge branch 'main' of github.com:GetStream/Vision-Agents into vogent…

820e42d

…-tommaso

github-actions bot added the ci label Oct 30, 2025

disable test

8fb6d82

coderabbitai bot reviewed Oct 30, 2025

View reviewed changes

tschellenbach merged commit b32c56d into main Oct 30, 2025
6 checks passed

tschellenbach deleted the vogent-tommaso branch October 30, 2025 16:03

This was referenced Oct 31, 2025

feat: add AWS Bedrock function calling implementation #120

Merged

Add more metrics #143

Open

feat: implemented heygen avatars #126

Merged

Optimize delays - realtime, waiting logic and error handling #132

Merged

	format=AudioFormat.S16, layout="mono", rate=target_rate
	format="s16", layout="mono", rate=target_rate

		### Testing turn detection

		An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py` No newline at end of file

		# Resample to 16 kHz mono
		samples = audio_data.resample(16_000, 1).samples

WIP - Vogent + New Smart TURN + Audio utils usage #128

WIP - Vogent + New Smart TURN + Audio utils usage #128

Uh oh!

Conversation

tbarbugli commented Oct 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbarbugli commented Oct 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 27, 2025 •

edited

Loading