Skip to content

Conversation

@tbarbugli
Copy link
Member

@tbarbugli tbarbugli commented Oct 27, 2025

  • Deepgram v2/flux
  • Vogent turn
  • Smart turn v3
  • Various bugfixes to agent
  • Audio utils

Summary by CodeRabbit

  • New Features

    • Added Vogent and SmartTurn turn-detection systems, a Deepgram streaming STT implementation, AgentOptions, telemetry example, and new Vogent/SmartTurn tests.
  • Refactor

    • Standardized audio on a shared PcmData/AudioFormat surface, switched event payloads to include participant context, introduced tracing spans across agent flows, and reworked VAD buffering/APIs.
  • Removals

    • Removed Moonshine and Silero plugins, their docs and many associated tests.
  • Documentation

    • Added PCM/audio management guidance, VAD refactor plans, onboarding content, and a repository workflow guide.

@coderabbitai
Copy link

coderabbitai bot commented Oct 27, 2025

Walkthrough

Refactors audio plumbing to use external PcmData/AudioFormat, replaces user_metadata with participant across events, adds AgentOptions and span-based tracing, migrates VAD/turn-detection to PcmData buffers, removes Moonshine/Silero plugins, and introduces SmartTurn/Vogent/Deepgram rewrites plus docs and tests updates.

Changes

Cohort / File(s) Summary
Agent core & tracing
agents-core/vision_agents/core/agents/agents.py
Add AgentOptions + default_agent_options(), accept options in Agent.__init__, introduce span-based tracing (span context manager, start_tracing/end_tracing), integrate tracing into join/teardown, make _setup_turn_detection async, and propagate spans across handlers.
Event model & transport
agents-core/vision_agents/core/events/base.py, plugins/getstream/.../stream_edge_transport.py, agents-core/vision_agents/core/edge/sfu_events.py
Rename user_metadataparticipant on BaseEvent and emitted payloads (keep legacy user_metadata slot); add private _participant + setter on SFU event wrappers; include participant when emitting TrackAdded/TrackRemoved; add tracer usage in stream_edge.
PcmData externalization
agents-core/vision_agents/core/edge/types.py, agents-core/vision_agents/core/edge/events.py, agents-core/vision_agents/core/llm/realtime.py, conftest.py, tests/..., plugins/*/*/tts.py, plugins/*/*/stt.py
Remove local PcmData implementation; import PcmData, AudioFormat from getstream.video.rtc.track_util; update callsites to use AudioFormat.S16 and adapt constructors/usages.
Turn-detection core & API
agents-core/vision_agents/core/turn_detection/*, agents-core/vision_agents/core/turn_detection/events.py
Replace speaker_id with participant, remove legacy TurnEventData/protocol, add async start/stop, change process_audio to accept participant and optional conversation, and emit TurnStartedEvent/TurnEndedEvent with updated fields (trailing_silence_ms, duration_ms).
VAD refactor & base API
agents-core/vision_agents/core/vad/vad.py, docs/vad-refactor-plan*.md, tests/test_vad_base.py
Switch buffering/windowing to PcmData rolling buffers; ctor uses window_samples/channels/audio_format and derives frame_size; update frame processing, partial/full emission, flush/reset semantics; add VAD base docs/tests.
New turn-detection plugins & tests
plugins/smart_turn/.../smart_turn_detection.py, plugins/vogent/.../vogent_turn_detection.py, plugins/*/tests/*, plugins/vogent/example/basic_usage.py
Add SmartTurnDetection and VogentTurnDetection (Silero/Whisper/ONNX/Vogent flows), async model download/start, transcription/completion prediction, tests, examples, and package exports.
Deepgram STT rework
plugins/deepgram/vision_agents/plugins/deepgram/deepgram_stt.py, plugins/deepgram/vision_agents/plugins/deepgram/stt.py (deleted), plugins/deepgram/.../utils.py (deleted)
Replace legacy Deepgram STT with new async Flux-style implementation (websocket/client lifecycle, participant-scoped emits); remove old module and silence utility.
Observability & TTS metrics
agents-core/vision_agents/core/observability/*, agents-core/vision_agents/core/tts/tts.py, agents-core/vision_agents/core/tts/manual_test.py
Remove tts_first_byte_ms and tts_bytes_streamed metrics and per-chunk metric emissions; rename user_metadataparticipant on TTS events; adapt test/manual usage.
Plugins removed / workspace updates
plugins/moonshine/**, plugins/silero/**, pyproject.toml, plugins/vogent/pyproject.toml, agents-core/pyproject.toml
Remove Moonshine and Silero plugin code/tests/packaging; add vogent plugin and its tests/examples; update workspace membership and dependency adjustments (add torchvision, bump getstream, uv override for numpy).
Utilities & docs
agents-core/vision_agents/core/utils/utils.py, .gitignore, DEVELOPMENT.md, docs/ai/instructions/*, .claude/agents/repo-workflow-guide.md
Remove to_mono, add input_text to Instructions, add *.onnx to .gitignore, update audio docs to recommend PcmData/track_util, add repo workflow guide and expanded onboarding/docs.
Examples & small plugin tweaks
examples/*, plugins/*/example/*, examples/other_examples/*
Update examples to use TurnDetection(buffer_in_seconds=...), add telemetry helper, adapt transcript handlers to use event.participant for user info; minor formatting/annotation normalizations.
Tests: migrate/add/remove
tests/test_events.py, tests/test_tts_base.py, tests/test_utils.py, tests/test_pcm_data.py (deleted), tests/test_vad_base.py (added), plugins/*/tests/*
Migrate tests to PcmData/AudioFormat, tighten event assertions, add VAD base tests, remove legacy PcmData unit tests, add/remove plugin tests according to plugin changes.
SFU/events wrappers
agents-core/vision_agents/core/edge/sfu_events.py
Add private _participant fields and public participant setters on SFU event wrappers; preserve payload-derived getters.
Misc & formatting
many files
Numerous minor import/path, string-quote, annotation style, and formatting normalizations across the codebase.

Sequence Diagram(s)

%%{init: {"theme":"neutral","themeVariables":{"actorBackground":"#f3f4f6","actorBorder":"#cbd5e1"}}}%%
sequenceDiagram
    participant App as Application
    participant Agent as Agent
    participant TD as TurnDetection
    participant Audio as AudioSource
    participant Events as EventBus

    App->>+Agent: join(call)
    Agent->>Agent: merge AgentOptions
    Agent->>+TD: await _setup_turn_detection()
    TD->>TD: await start()
    TD-->>-Agent: ready
    Agent-->>-App: session created

    Audio->>+Agent: on_audio_received(pcm, participant)
    Agent->>+TD: process_audio(pcm, participant, conversation)
    TD->>TD: windowing (PcmData)
    alt speech starts
        TD->>Events: emit TurnStartedEvent(participant)
    else speech ends
        TD->>Events: emit TurnEndedEvent(participant)
    end
    TD-->>-Agent: processed
Loading
%%{init: {"theme":"neutral","themeVariables":{"actorBackground":"#fffaf0","actorBorder":"#e6e1d9"}}}%%
sequenceDiagram
    participant Smart as SmartTurnDetection
    participant Silero as SileroVAD
    participant Whisper as WhisperFeature
    participant ONNX as Turn ONNX
    participant Events as EventBus

    Smart->>+Smart: await start()
    Smart->>+Silero: prepare silero (async)
    Smart->>+Whisper: prepare whisper (thread)
    Smart->>+ONNX: prepare onnx (thread)
    Silero-->>-Smart: ready
    Whisper-->>-Smart: ready
    ONNX-->>-Smart: ready

    Note over Smart,Silero: per-chunk processing
    Smart->>+Silero: prob(chunk)
    Silero-->>-Smart: prob
    alt prob > threshold
        Smart->>Smart: buffer segment
        Smart->>+Whisper: transcribe segment
        Whisper-->>-Smart: transcript
        Smart->>+ONNX: predict completion(features)
        ONNX-->>-Smart: completion_prob
        alt completion_prob > vogent_threshold
            Smart->>Events: emit TurnEndedEvent(participant)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Areas to focus on:

  • agents async initialization, AgentOptions propagation, and tracing lifecycle (agents-core/vision_agents/core/agents/agents.py)
  • Turn-detection signature/event refactor and cross-repo callsites (core, plugins, tests)
  • New SmartTurnDetection and VogentTurnDetection modules (async model downloads, ONNX/Whisper integration)
  • VAD refactor to PcmData rolling buffers and duration/frame calculations
  • Global migration to external PcmData/AudioFormat ensuring no leftover local assumptions

Possibly related PRs

Suggested reviewers

  • maxkahan
  • dangusev
  • d3xvn

Poem

I name the participant; the room goes black,
a thin wire of sound snaps, opening its mouth.
Buffers gather like ash, windows shuttering time—
a small, precise end is measured, boxed and signed.
The turn folds inward; a dark petal of silence falls.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 49.83% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title references three specific components present in the changeset: Vogent (new turn detection plugin), Smart TURN (new turn detection system v3), and audio utilities (externalized PcmData usage). These elements align with the stated PR objectives and represent real, significant changes within the repository. However, the changeset encompasses many other substantial modifications not reflected in the title, including comprehensive agent architecture refactoring (AgentOptions, tracing integration), removal of two existing plugins (Silero, Moonshine), major event system changes (user_metadata to participant field rename), and significant PcmData externalization affecting numerous files. The "WIP" prefix suggests incomplete work, though the changes appear comprehensive. The title captures certain highlights but does not reflect the full scope or most invasive changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch vogent-tommaso

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 21

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
agents-core/vision_agents/core/utils/utils.py (1)

84-97: Add empty bytes validation in gemini_realtime.py:440-442.

The function now returns empty bytes on conversion failure instead of raising exceptions. The current try-except block in gemini_realtime.py will not catch this silent failure—empty bytes will be passed directly to Blob() and sent to the API, masking the underlying conversion error.

Add an explicit check after calling frame_to_png_bytes():

png_bytes = frame_to_png_bytes(frame)
if not png_bytes:
    self.logger.error("Failed to convert frame to PNG bytes")
    return
blob = Blob(data=png_bytes, mime_type="image/png")

The test in test_gemini_realtime.py correctly validates the empty bytes case (lines 96-97).

docs/ai/instructions/ai-tts.md (1)

18-25: Update example to use AudioFormat enum.

The example still uses format="s16" (string literal), but the codebase has migrated to format=AudioFormat.S16. Update the documentation to reflect current best practices.

- from getstream.video.rtc.track_util import PcmData
+ from getstream.video.rtc.track_util import PcmData, AudioFormat

  async def stream_audio(self, text: str, *_, **__) -> PcmData:
      audio_bytes = await my_sdk.tts.bytes(text=..., ...)
      # sample_rate, channels and format depend on what the TTS model returns
-     return PcmData.from_bytes(audio_bytes, sample_rate=16000, channels=1, format="s16")
+     return PcmData.from_bytes(audio_bytes, sample_rate=16000, channels=1, format=AudioFormat.S16)
agents-core/vision_agents/core/agents/agents.py (1)

825-879: Turn events use participant, not speaker_id → barge-in and responses won’t trigger.

The plugins emit TurnStartedEvent/TurnEndedEvent(participant=...), but this handler checks event.speaker_id. This prevents TTS interruption and LLM responses.

-        if isinstance(event, TurnStartedEvent):
-            # Interrupt TTS when user starts speaking (barge-in)
-            if event.speaker_id and event.speaker_id != self.agent_user.id:
+        if isinstance(event, TurnStartedEvent):
+            speaker_id = getattr(event, "speaker_id", None) or (
+                event.participant.user_id if getattr(event, "participant", None) else None
+            )
+            # Interrupt TTS when user starts speaking (barge-in)
+            if speaker_id and speaker_id != self.agent_user.id:
                 if self.tts:
                     self.logger.info(
-                        f"👉 Turn started - interrupting TTS for participant {event.speaker_id}"
+                        f"👉 Turn started - interrupting TTS for participant {speaker_id}"
                     )
                     try:
                         await self.tts.stop_audio()
                     except Exception as e:
                         self.logger.error(f"Error stopping TTS: {e}")
                 else:
                     self.logger.info(
-                        f"👉 Turn started - participant speaking {event.speaker_id} : {event.confidence}"
+                        f"👉 Turn started - participant speaking {speaker_id} : {event.confidence}"
                     )
             else:
                 # Agent itself started speaking - this is normal
                 self.logger.debug(
-                    f"👉 Turn started - agent speaking {event.speaker_id}"
+                    f"👉 Turn started - agent speaking {speaker_id}"
                 )
-        elif isinstance(event, TurnEndedEvent):
+        elif isinstance(event, TurnEndedEvent):
+            speaker_id = getattr(event, "speaker_id", None) or (
+                event.participant.user_id if getattr(event, "participant", None) else None
+            )
             self.logger.info(
-                f"👉 Turn ended - participant {event.speaker_id} finished (duration: {event.duration}, confidence: {event.confidence})"
+                f"👉 Turn ended - participant {speaker_id} finished (duration: {event.duration}, confidence: {event.confidence})"
             )
             # When turn detection is enabled, trigger LLM response when user's turn ends
             # This is the signal that the user has finished speaking and expects a response
-            if event.speaker_id and event.speaker_id != self.agent_user.id:
+            if speaker_id and speaker_id != self.agent_user.id:
                 # Get the accumulated transcript for this speaker
-                transcript = self._pending_user_transcripts.get(event.speaker_id, "")
+                transcript = self._pending_user_transcripts.get(speaker_id, "")
                 if transcript and transcript.strip():
                     self.logger.info(
-                        f"🤖 Triggering LLM response after turn ended for {event.speaker_id}"
+                        f"🤖 Triggering LLM response after turn ended for {speaker_id}"
                     )
-                    # Create participant object if we have metadata
-                    participant = None
-                    if hasattr(event, "custom") and event.custom:
-                        # Try to extract participant info from custom metadata
-                        participant = event.custom.get("participant")
+                    # Prefer the participant carried by the event
+                    participant = getattr(event, "participant", None)
                     # Trigger LLM response with the complete transcript
                     if self.llm:
                         await self.simple_response(transcript, participant)
                     # Clear the pending transcript for this speaker
-                    self._pending_user_transcripts[event.speaker_id] = ""
+                    self._pending_user_transcripts[speaker_id] = ""
agents-core/vision_agents/core/turn_detection/events.py (2)

12-25: Update the docstring to match the actual implementation.

The docstring references speaker_id and duration fields, but the actual implementation uses participant (line 23) and duration is not present. This creates confusion for developers reading the API documentation.

Apply this diff to update the docstring:

     """
     Event emitted when a speaker starts their turn.
     
     Attributes:
-        speaker_id: ID of the speaker who started speaking
+        participant: Participant who started speaking
         confidence: Confidence level of the turn detection (0.0-1.0)
-        duration: Duration of audio processed (seconds)
         custom: Additional metadata specific to the turn detection implementation
     """

31-43: Update the docstring to match the actual implementation.

The docstring references speaker_id and duration fields, but the implementation uses participant (line 41) and duration is not present.

Apply this diff:

     """
     Event emitted when a speaker completes their turn.
     
     Attributes:
-        speaker_id: ID of the speaker who finished speaking
+        participant: Participant who finished speaking
         confidence: Confidence level of the turn completion detection (0.0-1.0)
-        duration: Duration of the turn (seconds)
         custom: Additional metadata specific to the turn detection implementation
     """
♻️ Duplicate comments (1)
plugins/openai/vision_agents/plugins/openai/tts.py (1)

45-47: Verify AudioFormat.S16 enum value exists.

Same concern as in elevenlabs plugin: ensure getstream.video.rtc.track_util.AudioFormat.S16 exists, as the relevant code snippet shows PCM_S16 in the internal enum.

🧹 Nitpick comments (15)
plugins/fish/vision_agents/plugins/fish/stt.py (1)

107-109: Consider clarifying the error log message.

The log message includes pcm_data.duration but doesn't explicitly label what the value represents. For better debuggability, consider making it explicit.

-                logger.error(
-                    "No transcript returned from Fish Audio %s", pcm_data.duration
-                )
+                logger.error(
+                    "No transcript returned from Fish Audio (audio duration: %s)", pcm_data.duration
+                )
docs/ai/instructions/ai-stt.md (1)

20-41: Add missing imports and align PCM guidance in example.

The example uses PcmData and Participant without imports. Add a minimal prelude and point to the new track_util path.

 ```python
-from vision_agents.core import stt
-from vision_agents.core.stt.events import TranscriptResponse
+from typing import Optional
+from vision_agents.core import stt
+from vision_agents.core.stt.events import TranscriptResponse
+from vision_agents.core.edge.types import Participant
+from getstream.video.rtc.track_util import PcmData
 ...
 class MySTT(stt.STT):

The new “PCM / Audio management” note is great; consider linking to a short snippet:

from getstream.video.rtc.track_util import PcmData, AudioFormat
pcm = PcmData(samples=my_np_int16, sample_rate=16000, format=AudioFormat.S16)

Also applies to: 46-50

plugins/openai/tests/test_openai_realtime.py (1)

68-79: Dedup imports; prefer resample_poly for 16k→48k.

  • Move PcmData/AudioFormat imports to module scope to avoid repeated per-test imports.
  • For 16k→48k, scipy.signal.resample_poly(up=3, down=1) is faster and with better fidelity than FFT resample for integer-rate conversion.
-        from scipy import signal
-        from getstream.video.rtc.track_util import PcmData, AudioFormat
+        from scipy import signal
@@
-        samples_48k = signal.resample(samples_16k, num_samples_48k).astype(np.int16)
+        samples_48k = signal.resample_poly(
+            samples_16k.astype(np.float32), up=3, down=1
+        ).astype(np.int16)
@@
-        audio_48khz = PcmData(
-            samples=samples_48k, sample_rate=48000, format=AudioFormat.S16
-        )
+        audio_48khz = PcmData(samples=samples_48k, sample_rate=48000, format=AudioFormat.S16)

At top-of-file:

-from dotenv import load_dotenv
+from dotenv import load_dotenv
+from getstream.video.rtc.track_util import PcmData, AudioFormat

Apply the same import cleanup to other tests constructing PcmData (lines ~242-249, ~299-307). If track_util exposes a resample helper, prefer that instead.

Also applies to: 242-249, 299-307

plugins/cartesia/vision_agents/plugins/cartesia/tts.py (1)

38-41: Set provider_name for consistency/telemetry.

Other plugins pass provider_name to the base TTS; do the same here.

-        super().__init__()
+        super().__init__(provider_name="cartesia")

The AudioFormat/S16 switch looks good.

Also applies to: 80-82

plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)

50-53: Use asyncio.to_thread instead of get_event_loop/run_in_executor.

Cleaner and future-proof in 3.11+.

-        loop = asyncio.get_event_loop()
-        chunks: List[bytes] = await loop.run_in_executor(
-            None, lambda: list(self._generate_chunks(text))
-        )
+        chunks: List[bytes] = await asyncio.to_thread(
+            lambda: list(self._generate_chunks(text))
+        )

The AudioFormat.S16 switch in PcmData.from_bytes looks correct.

Also applies to: 56-62

plugins/wizper/vision_agents/plugins/wizper/stt.py (1)

85-87: Centralized PCM→WAV conversion is solid; consider eliminating temp file for better serverless performance.

The refactoring from custom WAV code to pcm_data.to_wav_bytes() is a good move. Currently, the code writes bytes to disk and uploads via file path. Since fal-client's AsyncClient supports upload(data, content_type) with raw bytes, you can pass wav_data directly with content_type="audio/wav" and skip the temp file entirely—eliminating unnecessary I/O in serverless environments while keeping cleanup logic simpler.

.claude/agents/repo-workflow-guide.md (1)

1-6: Frontmatter readability: prefer YAML block scalar for multi-line description.

Replace embedded “\n” escapes with a block scalar for clarity and fewer parsing gotchas.

 ---
 name: repo-workflow-guide
-description: Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.\n\nExamples:\n- <example>\nContext: User wants to add a new feature to the project.\nuser: "I need to implement a new authentication module"\nassistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."\n<Task tool call to repo-workflow-guide>\nassistant: "Based on the project guidelines, here's how we should approach this..."\n</example>\n\n- <example>\nContext: User asks a question about code organization.\nuser: "Where should I put the new utility functions?"\nassistant: "Let me check the repository workflow guidelines to give you the correct answer."\n<Task tool call to repo-workflow-guide>\nassistant: "According to the project structure guidelines..."\n</example>\n\n- <example>\nContext: Starting a new task that requires understanding project conventions.\nuser: "Can you help me refactor this component?"\nassistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."\n<Task tool call to repo-workflow-guide>\n</example>
+description: |
+  Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.
+
+  Examples:
+  - <example>
+    Context: User wants to add a new feature to the project.
+    user: "I need to implement a new authentication module"
+    assistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."
+    <Task tool call to repo-workflow-guide>
+    assistant: "Based on the project guidelines, here's how we should approach this..."
+    </example>
+
+  - <example>
+    Context: User asks a question about code organization.
+    user: "Where should I put the new utility functions?"
+    assistant: "Let me check the repository workflow guidelines to give you the correct answer."
+    <Task tool call to repo-workflow-guide>
+    assistant: "According to the project structure guidelines..."
+    </example>
+
+  - <example>
+    Context: Starting a new task that requires understanding project conventions.
+    user: "Can you help me refactor this component?"
+    assistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."
+    <Task tool call to repo-workflow-guide>
+    </example>
 ---
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)

107-121: ORT sessions: specify providers and threads for deterministic CPU behavior.

Standardize provider and thread config (as in SileroVAD) to avoid unexpected GPU selection and reduce contention.

-        # Load ONNX session in thread pool to avoid blocking event loop
-        self.smart_turn = await asyncio.to_thread(build_session, SMART_TURN_ONNX_PATH)
+        # Load ONNX session in thread pool to avoid blocking event loop
+        self.smart_turn = await asyncio.to_thread(build_session, SMART_TURN_ONNX_PATH)

And change build_session:

-def build_session(onnx_path):
-    so = ort.SessionOptions()
-    so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
-    so.inter_op_num_threads = 1
-    so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
-    return ort.InferenceSession(onnx_path, sess_options=so)
+def build_session(onnx_path):
+    so = ort.SessionOptions()
+    so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
+    so.inter_op_num_threads = 1
+    so.intra_op_num_threads = 1
+    so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
+    return ort.InferenceSession(
+        onnx_path,
+        sess_options=so,
+        providers=["CPUExecutionProvider"],
+    )

268-310: Model download: avoid buffering entire file in memory and use logging instead of print.

Stream to disk as you download; replace prints with a module logger.

-async def ensure_model(path: str, url: str) -> str:
+import logging
+logger = logging.getLogger(__name__)
+
+async def ensure_model(path: str, url: str) -> str:
@@
-    if not os.path.exists(path):
+    if not os.path.exists(path):
         model_name = os.path.basename(path)
-        print(f"Downloading {model_name}...")
+        logger.info("Downloading %s ...", model_name)
@@
-            async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
-                async with client.stream("GET", url) as response:
+            async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
+                async with client.stream("GET", url) as response:
                     response.raise_for_status()
-                    
-                    # Write file in chunks to avoid loading entire file in memory
-                    # Use asyncio.to_thread for blocking file I/O operations
-                    chunks = []
-                    async for chunk in response.aiter_bytes(chunk_size=8192):
-                        chunks.append(chunk)
-                    
-                    # Write all chunks to file in thread to avoid blocking event loop
-                    def write_file():
-                        with open(path, "wb") as f:
-                            for chunk in chunks:
-                                f.write(chunk)
-                    
-                    await asyncio.to_thread(write_file)
+                    # Stream directly to file to keep memory bounded
+                    def write_stream():
+                        with open(path, "wb") as f:
+                            for chunk in response.iter_raw():
+                                if chunk:
+                                    f.write(chunk)
+                    await asyncio.to_thread(write_stream)
@@
-            print(f"{model_name} downloaded.")
+            logger.info("%s downloaded.", model_name)

Note: If you prefer pure-async file I/O, switch to aiofiles and write chunks inside the async loop.

agents-core/vision_agents/core/turn_detection/turn_detection.py (1)

68-74: Lifecycle methods implement default behavior.

The start() and stop() methods provide sensible defaults by setting is_active appropriately. However, the docstrings suggest subclasses "want to" override these, but they're not marked as abstract.

Consider whether these should remain concrete with default implementations or be marked as abstract to enforce subclass implementation. The current approach (concrete with defaults) is reasonable for optional warmup/cleanup, but the docstrings could be clearer:

 async def start(self) -> None:
-    """Some turn detection systems want to run warmup etc here"""
+    """Start turn detection. Override to add warmup logic if needed."""
     self.is_active = True

 async def stop(self) -> None:
-    """Again, some turn detection systems want to run cleanup here"""
+    """Stop turn detection. Override to add cleanup logic if needed."""
     self.is_active = False
agents-core/vision_agents/core/edge/events.py (1)

12-12: Consider consistent string quote style.

The string literals for type fields were changed from single quotes to double quotes. While functionally equivalent, ensure this aligns with the project's style guidelines for consistency across the codebase.

Also applies to: 21-21, 31-31, 41-41

DEVELOPMENT.md (1)

326-338: Fix markdown formatting for headers.

The static analysis tool flagged that emphasis (bold text) is being used instead of proper markdown headings for "Audio Formats", "Using Cursor", and "Learning Roadmap".

Apply this diff to use proper markdown headings:

 ## Onboarding Plan for new contributors
 
-**Audio Formats**
+### Audio Formats
 
 You'll notice that audio comes in many formats. PCM, wav, mp3. 16khz, 48khz. 
 Encoded as i16 or f32. Note that webrtc by default is 48khz.
 
 A good first intro to audio formats can be found here:
 
-**Using Cursor**
+### Using Cursor
 
 You can ask cursor something like "read @ai-plugin and build me a plugin called fish"
 See the docs folder for other ai instruction files
 
-**Learning Roadmap**
+### Learning Roadmap
plugins/smart_turn/tests/test_turn2.py (1)

62-85: Reduce sleep duration and add integration marker.

Line 81 includes a 5-second sleep which significantly slows down test execution. Additionally, this test performs end-to-end turn detection with model inference and should be marked as an integration test.

Apply this diff:

+    @pytest.mark.integration
     async def test_turn_detection(self, td, mia_audio_16khz):
         participant = Participant(user_id="mia", original={})
         conversation = InMemoryConversation(instructions="be nice", messages=[])
         event_order = []

         # Subscribe to events
         @td.events.subscribe
         async def on_start(event: TurnStartedEvent):
             logger.info(f"Smart turn turn started on {event.session_id}")
             event_order.append("start")

         @td.events.subscribe
         async def on_stop(event: TurnEndedEvent):
             logger.info(f"Smart turn turn ended on {event.session_id}")
             event_order.append("stop")

         await td.process_audio(mia_audio_16khz, participant, conversation)
         await asyncio.sleep(0.001)

-        await asyncio.sleep(5)
+        await asyncio.sleep(0.1)

         # Verify that turn detection is working - we should get at least some turn events
         # With continuous processing, we may get multiple start/stop cycles
         assert event_order == ["start", "stop"]

Based on coding guidelines.

conftest.py (1)

175-226: Consider reusing existing fixture to reduce duplication.

The audio loading logic in mia_audio_48khz_chunked duplicates the mia_audio_48khz fixture. You could reduce duplication by loading the full audio first, then chunking it.

 @pytest.fixture
 def mia_audio_48khz_chunked():
-    """Load mia.mp3 and yield 48kHz PCM data in 20ms chunks."""
-    audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")
-
-    # Load audio file using PyAV
-    container = av.open(audio_file_path)
-    audio_stream = container.streams.audio[0]
-    original_sample_rate = audio_stream.sample_rate
-    target_rate = 48000
-
-    # Create resampler if needed
-    resampler = None
-    if original_sample_rate != target_rate:
-        resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)
-
-    # Read all audio frames
-    samples = []
-    for frame in container.decode(audio_stream):
-        # Resample if needed
-        if resampler:
-            frame = resampler.resample(frame)[0]
-
-        # Convert to numpy array
-        frame_array = frame.to_ndarray()
-        if len(frame_array.shape) > 1:
-            # Convert stereo to mono
-            frame_array = np.mean(frame_array, axis=0)
-        samples.append(frame_array)
-
-    # Concatenate all samples
-    samples = np.concatenate(samples)
-
-    # Convert to int16
-    samples = samples.astype(np.int16)
-    container.close()
+    """Load mia.mp3 and yield 48kHz PCM data in 20ms chunks."""
+    # Reuse the full audio fixture logic
+    audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")
+    container = av.open(audio_file_path)
+    audio_stream = container.streams.audio[0]
+    original_sample_rate = audio_stream.sample_rate
+    target_rate = 48000
+    
+    resampler = None
+    if original_sample_rate != target_rate:
+        resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)
+    
+    samples = []
+    for frame in container.decode(audio_stream):
+        if resampler:
+            frame = resampler.resample(frame)[0]
+        frame_array = frame.to_ndarray()
+        if len(frame_array.shape) > 1:
+            frame_array = np.mean(frame_array, axis=0)
+        samples.append(frame_array)
+    
+    samples = np.concatenate(samples).astype(np.int16)
+    container.close()
 
     # Calculate chunk size for 20ms at 48kHz
     chunk_size = int(target_rate * 0.020)  # 960 samples per 20ms
agents-core/vision_agents/core/vad/vad.py (1)

408-416: Consider simplifying frame availability check.

The frame_data_available check is functionally correct but complex. Consider extracting to a helper method for readability.

+    def _has_buffered_audio(self) -> bool:
+        """Check if speech buffer contains audio data."""
+        if self.speech_buffer is None:
+            return False
+        if isinstance(self.speech_buffer.samples, np.ndarray):
+            return len(self.speech_buffer.samples) > 0
+        if isinstance(self.speech_buffer.samples, (bytes, bytearray)):
+            return len(self.speech_buffer.samples) > 0
+        return False
+
     def _emit_error_event(
         self,
         error: Exception,
         context: str = "",
         user_metadata: Optional[Dict[str, Any]] = None,
     ):
         """Emit a structured error event."""
         self.events.send(
             VADErrorEvent(
                 session_id=self.session_id,
                 plugin_name=self.provider_name,
                 error=error,
                 context=context,
                 user_metadata=user_metadata,
-                frame_data_available=(
-                    self.speech_buffer is not None
-                    and (
-                        (isinstance(self.speech_buffer.samples, np.ndarray) and len(self.speech_buffer.samples) > 0)
-                        or (isinstance(self.speech_buffer.samples, (bytes, bytearray)) and len(self.speech_buffer.samples) > 0)
-                    )
-                ),
+                frame_data_available=self._has_buffered_audio(),
             )
         )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e1072e8 and f3aee97.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (74)
  • .claude/agents/repo-workflow-guide.md (1 hunks)
  • .gitignore (1 hunks)
  • DEVELOPMENT.md (3 hunks)
  • agents-core/pyproject.toml (2 hunks)
  • agents-core/vision_agents/core/agents/agents.py (4 hunks)
  • agents-core/vision_agents/core/edge/events.py (3 hunks)
  • agents-core/vision_agents/core/edge/types.py (1 hunks)
  • agents-core/vision_agents/core/llm/realtime.py (2 hunks)
  • agents-core/vision_agents/core/observability/__init__.py (0 hunks)
  • agents-core/vision_agents/core/observability/metrics.py (0 hunks)
  • agents-core/vision_agents/core/tts/manual_test.py (2 hunks)
  • agents-core/vision_agents/core/tts/tts.py (0 hunks)
  • agents-core/vision_agents/core/turn_detection/__init__.py (0 hunks)
  • agents-core/vision_agents/core/turn_detection/events.py (3 hunks)
  • agents-core/vision_agents/core/turn_detection/turn_detection.py (2 hunks)
  • agents-core/vision_agents/core/utils/utils.py (4 hunks)
  • agents-core/vision_agents/core/vad/vad.py (13 hunks)
  • aiortc (1 hunks)
  • conftest.py (3 hunks)
  • docs/ai/instructions/ai-plugin.md (2 hunks)
  • docs/ai/instructions/ai-realtime-llm.md (5 hunks)
  • docs/ai/instructions/ai-stt.md (2 hunks)
  • docs/ai/instructions/ai-tests.md (1 hunks)
  • docs/ai/instructions/ai-tts.md (2 hunks)
  • docs/ai/instructions/ai-turn-detector.md (1 hunks)
  • docs/vad-refactor-plan-extended.md (1 hunks)
  • docs/vad-refactor-plan.md (1 hunks)
  • examples/01_simple_agent_example/simple_agent_example.py (1 hunks)
  • plugins/aws/example/aws_qwen_example.py (1 hunks)
  • plugins/aws/vision_agents/plugins/aws/aws_realtime.py (8 hunks)
  • plugins/aws/vision_agents/plugins/aws/tts.py (2 hunks)
  • plugins/cartesia/vision_agents/plugins/cartesia/tts.py (2 hunks)
  • plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (2 hunks)
  • plugins/fish/example/fish_example.py (1 hunks)
  • plugins/fish/vision_agents/plugins/fish/stt.py (2 hunks)
  • plugins/fish/vision_agents/plugins/fish/tts.py (2 hunks)
  • plugins/kokoro/vision_agents/plugins/kokoro/tts.py (2 hunks)
  • plugins/krisp/vision_agents/plugins/krisp/turn_detection.py (8 hunks)
  • plugins/moonshine/README.md (0 hunks)
  • plugins/moonshine/tests/test_moonshine_stt.py (0 hunks)
  • plugins/moonshine/vision_agents/plugins/moonshine/__init__.py (0 hunks)
  • plugins/moonshine/vision_agents/plugins/moonshine/stt.py (0 hunks)
  • plugins/openai/tests/test_openai_realtime.py (4 hunks)
  • plugins/openai/vision_agents/plugins/openai/tts.py (2 hunks)
  • plugins/openrouter/README.md (1 hunks)
  • plugins/openrouter/example/openrouter_example.py (1 hunks)
  • plugins/sample_plugin/example/my_example.py (1 hunks)
  • plugins/silero/README.md (0 hunks)
  • plugins/silero/pyproject.toml (0 hunks)
  • plugins/silero/tests/test_bench.py (0 hunks)
  • plugins/silero/tests/test_performance.py (0 hunks)
  • plugins/silero/tests/test_vad.py (0 hunks)
  • plugins/silero/vision_agents/plugins/silero/__init__.py (0 hunks)
  • plugins/silero/vision_agents/plugins/silero/vad.py (0 hunks)
  • plugins/smart_turn/README.md (1 hunks)
  • plugins/smart_turn/pyproject.toml (1 hunks)
  • plugins/smart_turn/tests/test_turn2.py (1 hunks)
  • plugins/smart_turn/tests/test_turn_detection.py (0 hunks)
  • plugins/smart_turn/vision_agents/plugins/smart_turn/__init__.py (1 hunks)
  • plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (1 hunks)
  • plugins/vogent/README.md (1 hunks)
  • plugins/vogent/example/basic_usage.py (1 hunks)
  • plugins/vogent/pyproject.toml (2 hunks)
  • plugins/vogent/tests/test_vogent_td.py (1 hunks)
  • plugins/vogent/vision_agents/plugins/vogent/__init__.py (1 hunks)
  • plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (7 hunks)
  • plugins/wizper/vision_agents/plugins/wizper/stt.py (1 hunks)
  • pyproject.toml (2 hunks)
  • tests/base_test.py (5 hunks)
  • tests/test_events.py (7 hunks)
  • tests/test_pcm_data.py (0 hunks)
  • tests/test_tts_base.py (2 hunks)
  • tests/test_utils.py (10 hunks)
  • tests/test_vad_base.py (1 hunks)
💤 Files with no reviewable changes (17)
  • agents-core/vision_agents/core/turn_detection/init.py
  • plugins/moonshine/tests/test_moonshine_stt.py
  • plugins/moonshine/README.md
  • plugins/silero/README.md
  • plugins/silero/vision_agents/plugins/silero/vad.py
  • agents-core/vision_agents/core/observability/metrics.py
  • agents-core/vision_agents/core/tts/tts.py
  • plugins/silero/tests/test_performance.py
  • plugins/moonshine/vision_agents/plugins/moonshine/init.py
  • plugins/moonshine/vision_agents/plugins/moonshine/stt.py
  • agents-core/vision_agents/core/observability/init.py
  • plugins/silero/tests/test_bench.py
  • plugins/smart_turn/tests/test_turn_detection.py
  • plugins/silero/tests/test_vad.py
  • tests/test_pcm_data.py
  • plugins/silero/pyproject.toml
  • plugins/silero/vision_agents/plugins/silero/init.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

  • plugins/openrouter/example/openrouter_example.py
  • plugins/aws/vision_agents/plugins/aws/tts.py
  • plugins/aws/example/aws_qwen_example.py
  • plugins/fish/vision_agents/plugins/fish/tts.py
  • examples/01_simple_agent_example/simple_agent_example.py
  • plugins/wizper/vision_agents/plugins/wizper/stt.py
  • plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py
  • plugins/fish/vision_agents/plugins/fish/stt.py
  • plugins/openai/tests/test_openai_realtime.py
  • agents-core/vision_agents/core/edge/types.py
  • tests/test_utils.py
  • plugins/kokoro/vision_agents/plugins/kokoro/tts.py
  • tests/test_vad_base.py
  • plugins/sample_plugin/example/my_example.py
  • agents-core/vision_agents/core/edge/events.py
  • tests/test_events.py
  • plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py
  • plugins/vogent/example/basic_usage.py
  • agents-core/vision_agents/core/agents/agents.py
  • plugins/fish/example/fish_example.py
  • agents-core/vision_agents/core/turn_detection/turn_detection.py
  • conftest.py
  • plugins/openai/vision_agents/plugins/openai/tts.py
  • plugins/vogent/tests/test_vogent_td.py
  • plugins/cartesia/vision_agents/plugins/cartesia/tts.py
  • plugins/krisp/vision_agents/plugins/krisp/turn_detection.py
  • agents-core/vision_agents/core/tts/manual_test.py
  • plugins/smart_turn/vision_agents/plugins/smart_turn/__init__.py
  • agents-core/vision_agents/core/llm/realtime.py
  • agents-core/vision_agents/core/turn_detection/events.py
  • tests/base_test.py
  • plugins/vogent/vision_agents/plugins/vogent/__init__.py
  • tests/test_tts_base.py
  • agents-core/vision_agents/core/vad/vad.py
  • plugins/smart_turn/tests/test_turn2.py
  • plugins/aws/vision_agents/plugins/aws/aws_realtime.py
  • agents-core/vision_agents/core/utils/utils.py
  • plugins/vogent/vision_agents/plugins/vogent/turn_detection.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

tests/**/*.py: Never use mocking utilities (e.g., unittest.mock, pytest-mock) in test files
Write tests using pytest (avoid unittest.TestCase or other frameworks)
Mark integration tests with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic

Files:

  • tests/test_utils.py
  • tests/test_vad_base.py
  • tests/test_events.py
  • tests/base_test.py
  • tests/test_tts_base.py
🧠 Learnings (2)
📚 Learning: 2025-10-20T19:23:41.259Z
Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Use uv to run Python and tests; do not invoke python -m directly

Applied to files:

  • docs/ai/instructions/ai-tests.md
📚 Learning: 2025-10-20T19:23:41.259Z
Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Applies to tests/**/*.py : Write tests using pytest (avoid unittest.TestCase or other frameworks)

Applied to files:

  • docs/ai/instructions/ai-tests.md
🧬 Code graph analysis (27)
plugins/aws/vision_agents/plugins/aws/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
plugins/fish/vision_agents/plugins/fish/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (3)
agents-core/vision_agents/core/agents/conversation.py (1)
  • Conversation (67-227)
agents-core/vision_agents/core/turn_detection/turn_detection.py (5)
  • TurnDetector (22-74)
  • start (68-70)
  • process_audio (52-64)
  • _emit_start_turn_event (37-42)
  • _emit_end_turn_event (44-49)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnStartedEvent (11-25)
  • TurnEndedEvent (29-43)
plugins/openai/tests/test_openai_realtime.py (3)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
conftest.py (1)
  • mia_audio_16khz (90-129)
tests/base_test.py (1)
  • mia_audio_16khz (18-58)
tests/test_utils.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
tests/test_vad_base.py (3)
agents-core/vision_agents/core/vad/vad.py (5)
  • VAD (35-433)
  • is_speech (128-138)
  • process_audio (140-189)
  • flush (375-382)
  • close (418-433)
agents-core/vision_agents/core/vad/events.py (2)
  • VADAudioEvent (30-40)
  • VADPartialEvent (44-55)
conftest.py (1)
  • get_assets_dir (78-80)
agents-core/vision_agents/core/edge/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • PluginBaseEvent (50-52)
tests/test_events.py (2)
agents-core/vision_agents/core/events/manager.py (1)
  • send (426-468)
agents-core/vision_agents/core/edge/sfu_events.py (1)
  • level (496-500)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
plugins/vogent/example/basic_usage.py (2)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (4)
  • TurnDetection (26-315)
  • start (37-38)
  • process_audio (101-159)
  • stop (300-315)
agents-core/vision_agents/core/turn_detection/turn_detection.py (3)
  • start (68-70)
  • process_audio (52-64)
  • stop (72-74)
agents-core/vision_agents/core/agents/agents.py (3)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
  • start (97-105)
  • process_audio (122-160)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
  • start (68-70)
  • process_audio (52-64)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (2)
  • start (37-38)
  • process_audio (101-159)
agents-core/vision_agents/core/turn_detection/turn_detection.py (4)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnStartedEvent (11-25)
  • TurnEndedEvent (29-43)
agents-core/vision_agents/core/agents/conversation.py (1)
  • Conversation (67-227)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
  • process_audio (122-160)
  • start (97-105)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (2)
  • process_audio (101-159)
  • start (37-38)
conftest.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
  • join (237-289)
plugins/openai/vision_agents/plugins/openai/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
plugins/vogent/tests/test_vogent_td.py (5)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (4)
  • TurnDetection (26-315)
  • start (37-38)
  • stop (300-315)
  • process_audio (101-159)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
agents-core/vision_agents/core/stt/events.py (2)
  • STTTranscriptEvent (16-47)
  • STTPartialTranscriptEvent (51-78)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnEndedEvent (29-43)
  • TurnStartedEvent (11-25)
conftest.py (1)
  • mia_audio_48khz_chunked (176-226)
plugins/krisp/vision_agents/plugins/krisp/turn_detection.py (2)
agents-core/vision_agents/core/agents/conversation.py (1)
  • Conversation (67-227)
agents-core/vision_agents/core/edge/types.py (1)
  • Participant (29-31)
agents-core/vision_agents/core/tts/manual_test.py (2)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
agents-core/vision_agents/core/edge/types.py (1)
  • play_pcm_with_ffplay (57-111)
plugins/smart_turn/vision_agents/plugins/smart_turn/__init__.py (1)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (1)
  • SmartTurnDetection (42-204)
agents-core/vision_agents/core/llm/realtime.py (1)
agents-core/vision_agents/core/edge/sfu_events.py (1)
  • Participant (229-270)
tests/base_test.py (2)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
conftest.py (3)
  • assets_dir (84-86)
  • bunny_video_track (241-285)
  • RealVideoTrack (247-279)
plugins/vogent/vision_agents/plugins/vogent/__init__.py (1)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (1)
  • TurnDetection (26-315)
tests/test_tts_base.py (2)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
plugins/aws/vision_agents/plugins/aws/tts.py (1)
  • stream_audio (61-94)
agents-core/vision_agents/core/vad/vad.py (2)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
agents-core/vision_agents/core/vad/events.py (1)
  • VADAudioEvent (30-40)
plugins/smart_turn/tests/test_turn2.py (5)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnStartedEvent (11-25)
  • TurnEndedEvent (29-43)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (6)
  • ensure_model (268-309)
  • SileroVAD (207-265)
  • SmartTurnDetection (42-204)
  • start (97-105)
  • prob (240-265)
  • process_audio (122-160)
agents-core/vision_agents/core/turn_detection/turn_detection.py (3)
  • start (68-70)
  • stop (72-74)
  • process_audio (52-64)
agents-core/vision_agents/core/agents/agents.py (1)
  • subscribe (189-201)
plugins/aws/vision_agents/plugins/aws/aws_realtime.py (3)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • VideoForwarder (13-188)
agents-core/vision_agents/core/events/base.py (1)
  • AudioFormat (23-30)
agents-core/vision_agents/core/llm/events.py (1)
  • RealtimeAudioOutputEvent (37-45)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (3)
agents-core/vision_agents/core/turn_detection/turn_detection.py (3)
  • TurnDetector (22-74)
  • start (68-70)
  • process_audio (52-64)
agents-core/vision_agents/core/edge/types.py (1)
  • Participant (29-31)
agents-core/vision_agents/core/agents/conversation.py (1)
  • Conversation (67-227)
🪛 LanguageTool
docs/vad-refactor-plan.md

[grammar] ~91-~91: Use a hyphen to join words.
Context: ...n counts via real assets. - Add base-VAD specific tests (future work): - Normal...

(QB_NEW_EN_HYPHEN)

DEVELOPMENT.md

[grammar] ~328-~328: Ensure spelling is correct
Context: ... in many formats. PCM, wav, mp3. 16khz, 48khz. Encoded as i16 or f32. Note that webr...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
DEVELOPMENT.md

326-326: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


333-333: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


338-338: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

Comment on lines +639 to 644
async def _setup_turn_detection(self):
if self.turn_detection:
self.logger.info("🎙️ Setting up turn detection listeners")
self.events.subscribe(self._on_turn_event)
self.turn_detection.start()
await self.turn_detection.start()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make start() invocation robust to sync/async implementations and set active state.

Awaiting unconditionally will raise if a detector exposes a synchronous start(). Also ensure detectors mark themselves active.

-    async def _setup_turn_detection(self):
-        if self.turn_detection:
-            self.logger.info("🎙️ Setting up turn detection listeners")
-            self.events.subscribe(self._on_turn_event)
-            await self.turn_detection.start()
+    async def _setup_turn_detection(self):
+        if not self.turn_detection:
+            return
+        self.logger.info("🎙️ Setting up turn detection listeners")
+        self.events.subscribe(self._on_turn_event)
+        # Support both async and sync start() while standardizing to async over time
+        import inspect
+        maybe = self.turn_detection.start()
+        if inspect.iscoroutine(maybe):
+            await maybe

Also consider enforcing an async start() in the TurnDetector base and updating all implementations to match for consistency.

🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 639 to 644, the
call to self.turn_detection.start() assumes an async implementation and
unconditionally awaits it; change this to handle both sync and async starts by
calling start(), checking if the return value is awaitable (using
inspect.isawaitable) and awaiting only if needed, wrap the start call in
try/except to log errors, and after a successful start set the detector's active
state (e.g., setattr(self.turn_detection, "active", True) or call its activation
method if one exists) so detectors mark themselves active; optionally note that
the TurnDetector base class should be updated to require an async start() for
consistency and update implementations accordingly.

Comment on lines 25 to 27
turn_detection=smart_turn.TurnDetection(
buffer_duration=2.0, confidence_threshold=0.5
buffer_in_seconds=2.0, confidence_threshold=0.5
), # Enable turn detection with FAL/ Smart turn
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

LGTM! Parameter rename improves clarity.

The rename from buffer_duration to buffer_in_seconds makes the API more explicit. The value and behavior are preserved.

Run the following script to verify all TurnDetection constructor calls have been updated:


🏁 Script executed:

#!/bin/bash
# Description: Verify that no usages of the old parameter name remain.

# Search for any remaining usage of buffer_duration in TurnDetection calls
rg -n 'TurnDetection' -A 3 -B 1 | rg 'buffer_duration'

Length of output: 428


Update other files to match the parameter rename for consistency.

The reviewed file correctly uses buffer_in_seconds, but the refactoring is incomplete across the codebase. Update:

  • plugins/aws/example/basic_usage.py:28 - change buffer_duration=2.0 to buffer_in_seconds=2.0
  • Documentation references in aws/README.md:25 and agents-core/vision_agents/core/turn_detection/README.md:24
🤖 Prompt for AI Agents
In examples/01_simple_agent_example/simple_agent_example.py around lines 25-27
and across the repo, the parameter was renamed to buffer_in_seconds but other
references still use buffer_duration; update plugins/aws/example/basic_usage.py
line ~28 to replace buffer_duration=2.0 with buffer_in_seconds=2.0, and update
the docs aws/README.md line ~25 and
agents-core/vision_agents/core/turn_detection/README.md line ~24 to reflect the
new parameter name (search for and replace any remaining buffer_duration
occurrences with buffer_in_seconds and adjust any example code snippets or
explanatory text to use the new name).

Comment on lines 44 to 49
async def on_start(event: TurnStartedEvent):
logger.info(f"Vogent turn started on {event.session_id}")

@td.events.subscribe
async def on_stop(event: TurnEndedEvent):
logger.info(f"Vogent turn ended on {event.session_id}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix incorrect event field access.

The event handlers attempt to access event.session_id, but according to the event definitions in agents-core/vision_agents/core/turn_detection/events.py, both TurnStartedEvent and TurnEndedEvent have a participant field, not session_id.

Apply this diff to fix the field access:

         @td.events.subscribe
         async def on_start(event: TurnStartedEvent):
-            logger.info(f"Vogent turn started on {event.session_id}")
+            logger.info(f"Vogent turn started for {event.participant}")
 
         @td.events.subscribe
         async def on_stop(event: TurnEndedEvent):
-            logger.info(f"Vogent turn ended on {event.session_id}")
+            logger.info(f"Vogent turn ended for {event.participant}")
🤖 Prompt for AI Agents
In plugins/vogent/tests/test_vogent_td.py around lines 44 to 49, the event
handlers incorrectly access event.session_id; change both handlers to use
event.participant instead (e.g., logger.info(f"Vogent turn started on
{event.participant}") and logger.info(f"Vogent turn ended on
{event.participant}")) so they reference the correct field defined on
TurnStartedEvent and TurnEndedEvent.

Comment on lines +26 to +31
[tool.uv]
# Workspace-level override to resolve numpy version conflicts
# vogent-turn requires numpy<2.0, so we override to use 1.26.x
override-dependencies = [
"numpy>=1.26.0,<2.0"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Workspace-wide numpy override: verify blast radius or scope it.

The global override to numpy>=1.26,<2.0 may constrain unrelated plugins (e.g., CV libs) and cause resolution conflicts. Prefer scoping this to the minimal surface (plugin-level dependency) or confirm no other member requires numpy>=2.

Run to see who depends on numpy and potential conflicts:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Files referencing numpy constraints:"
rg -nP --glob '!**/.venv/**' 'numpy\s*[<>=]' -S

echo ""
echo "uv workspace members:"
rg -nP '^\s*"plugins/.*"|^\s*"agents-core"' pyproject.toml -n -C0

echo ""
echo "Check pyprojects that declare numpy>=2:"
fd -a pyproject.toml | xargs -I{} rg -nP 'numpy\s*>=\s*2' {}

Length of output: 1398


Workspace numpy override is incompatible with 6+ plugins—will cause dependency resolution failures.

The override numpy>=1.26.0,<2.0 forces numpy <2.0 workspace-wide, but plugins/wizper, plugins/kokoro, plugins/gemini, plugins/deepgram, plugins/ultralytics, and plugins/smart_turn all explicitly require numpy>=2.0+. This creates an unsolvable constraint conflict.

Fix options:

  • Remove the workspace override and scope the vogent-turn constraint to its plugin only (if vogent is a plugin)
  • Update all conflicting plugins to accept numpy<2.0 (likely breaking for audio/vision dependencies)
  • Scope override only to agents-core/vogent-turn if that's the sole consumer

Verify which package actually requires numpy<2.0 and confine the override there rather than forcing it workspace-wide.

🤖 Prompt for AI Agents
In pyproject.toml around lines 26 to 31, the workspace-level override forcing
"numpy>=1.26.0,<2.0" conflicts with multiple plugins (wizper, kokoro, gemini,
deepgram, ultralytics, smart_turn) that require numpy>=2.0 and will break
dependency resolution; remove the global override and instead apply the
numpy<2.0 constraint only to the package that truly needs it (e.g.,
agents-core/vogent-turn) by moving that constraint into that package's pyproject
or its package-specific dependencies, or alternatively update the conflicting
plugins to accept the older numpy if feasible—verify which package requires
numpy<2.0 and scope the override to that package rather than workspace-wide.

format='s16',
layout='mono',
rate=target_rate
format=AudioFormat.S16, layout="mono", rate=target_rate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Verify AudioFormat enum member name in PyAV resampler.

The code uses AudioFormat.S16 as the format parameter for av.AudioResampler. However, according to the relevant snippets, the enum is defined with member PCM_S16 = "s16", not S16. This would cause an AttributeError at runtime.

Verify the correct enum member name. Based on the enum definition in agents-core/vision_agents/core/events/base.py, it should likely be:

         resampler = av.AudioResampler(
-            format=AudioFormat.S16, layout="mono", rate=target_rate
+            format="s16", layout="mono", rate=target_rate
         )

Note: PyAV's AudioResampler expects a string format name, not an enum. The AudioFormat enum appears to be for internal use in PcmData, not for PyAV.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
format=AudioFormat.S16, layout="mono", rate=target_rate
format="s16", layout="mono", rate=target_rate
🤖 Prompt for AI Agents
In tests/base_test.py around line 31, the resampler call uses AudioFormat.S16
which does not exist and will raise AttributeError; PyAV expects a string format
like "s16" (or the enum's value). Replace AudioFormat.S16 with the correct
member/value (e.g., AudioFormat.PCM_S16.value or the literal "s16") when passing
the format to av.AudioResampler, ensuring the import of AudioFormat remains
valid or use the string directly.

Comment on lines +92 to +156
def _decode_mia_to_pcm16k() -> PcmData:
import av
import os
from conftest import get_assets_dir

audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")
container = av.open(audio_file_path)
audio_stream = container.streams.audio[0]
target_rate = 16000
resampler = None
if audio_stream.sample_rate != target_rate:
resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)

samples_list = []
for frame in container.decode(audio_stream):
if resampler:
frame = resampler.resample(frame)[0]
arr = frame.to_ndarray()
if arr.ndim > 1:
arr = arr.mean(axis=0).astype(np.int16)
samples_list.append(arr)
container.close()
if not samples_list:
return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16")
samples = np.concatenate(samples_list).astype(np.int16)
return PcmData(samples=samples, sample_rate=target_rate, format="s16")


@pytest.mark.asyncio
async def test_base_vad_mia_detects_speech_segments():
vad = EnergyVAD(
sample_rate=16000,
window_samples=512,
energy_th=0.01,
activation_th=0.5,
deactivation_th=0.3,
speech_pad_ms=200,
min_speech_ms=100,
max_speech_ms=30000,
partial_frames=5,
)

pcm = _decode_mia_to_pcm16k()

audio_events = []
partial_events = []

@vad.events.subscribe
async def on_audio(ev: VADAudioEvent):
audio_events.append(ev)

@vad.events.subscribe
async def on_partial(ev: VADPartialEvent):
partial_events.append(ev)

await vad.process_audio(pcm)
await vad.flush()
await asyncio.sleep(0.05)

assert len(audio_events) > 0, "Expected at least one speech segment from mia"
# Sanity: total bytes > 0 and duration present
total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events)
assert total_bytes > 0
assert any((ev.duration_ms or 0) > 0 for ev in audio_events)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Mark the mia audio test as an integration test.

The test at lines 121-155 decodes an MP3 file from disk using the av library, processes it through VAD, and validates speech detection. This involves file I/O and codec processing, making it an integration test.

Apply this diff:

+@pytest.mark.integration
 @pytest.mark.asyncio
 async def test_base_vad_mia_detects_speech_segments():
     vad = EnergyVAD(

Based on coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _decode_mia_to_pcm16k() -> PcmData:
import av
import os
from conftest import get_assets_dir
audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")
container = av.open(audio_file_path)
audio_stream = container.streams.audio[0]
target_rate = 16000
resampler = None
if audio_stream.sample_rate != target_rate:
resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)
samples_list = []
for frame in container.decode(audio_stream):
if resampler:
frame = resampler.resample(frame)[0]
arr = frame.to_ndarray()
if arr.ndim > 1:
arr = arr.mean(axis=0).astype(np.int16)
samples_list.append(arr)
container.close()
if not samples_list:
return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16")
samples = np.concatenate(samples_list).astype(np.int16)
return PcmData(samples=samples, sample_rate=target_rate, format="s16")
@pytest.mark.asyncio
async def test_base_vad_mia_detects_speech_segments():
vad = EnergyVAD(
sample_rate=16000,
window_samples=512,
energy_th=0.01,
activation_th=0.5,
deactivation_th=0.3,
speech_pad_ms=200,
min_speech_ms=100,
max_speech_ms=30000,
partial_frames=5,
)
pcm = _decode_mia_to_pcm16k()
audio_events = []
partial_events = []
@vad.events.subscribe
async def on_audio(ev: VADAudioEvent):
audio_events.append(ev)
@vad.events.subscribe
async def on_partial(ev: VADPartialEvent):
partial_events.append(ev)
await vad.process_audio(pcm)
await vad.flush()
await asyncio.sleep(0.05)
assert len(audio_events) > 0, "Expected at least one speech segment from mia"
# Sanity: total bytes > 0 and duration present
total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events)
assert total_bytes > 0
assert any((ev.duration_ms or 0) > 0 for ev in audio_events)
def _decode_mia_to_pcm16k() -> PcmData:
import av
import os
from conftest import get_assets_dir
audio_file_path = os.path.join(get_assets_dir(), "mia.mp3")
container = av.open(audio_file_path)
audio_stream = container.streams.audio[0]
target_rate = 16000
resampler = None
if audio_stream.sample_rate != target_rate:
resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate)
samples_list = []
for frame in container.decode(audio_stream):
if resampler:
frame = resampler.resample(frame)[0]
arr = frame.to_ndarray()
if arr.ndim > 1:
arr = arr.mean(axis=0).astype(np.int16)
samples_list.append(arr)
container.close()
if not samples_list:
return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16")
samples = np.concatenate(samples_list).astype(np.int16)
return PcmData(samples=samples, sample_rate=target_rate, format="s16")
@pytest.mark.integration
@pytest.mark.asyncio
async def test_base_vad_mia_detects_speech_segments():
vad = EnergyVAD(
sample_rate=16000,
window_samples=512,
energy_th=0.01,
activation_th=0.5,
deactivation_th=0.3,
speech_pad_ms=200,
min_speech_ms=100,
max_speech_ms=30000,
partial_frames=5,
)
pcm = _decode_mia_to_pcm16k()
audio_events = []
partial_events = []
@vad.events.subscribe
async def on_audio(ev: VADAudioEvent):
audio_events.append(ev)
@vad.events.subscribe
async def on_partial(ev: VADPartialEvent):
partial_events.append(ev)
await vad.process_audio(pcm)
await vad.flush()
await asyncio.sleep(0.05)
assert len(audio_events) > 0, "Expected at least one speech segment from mia"
# Sanity: total bytes > 0 and duration present
total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events)
assert total_bytes > 0
assert any((ev.duration_ms or 0) > 0 for ev in audio_events)
🤖 Prompt for AI Agents
In tests/test_vad_base.py around lines 92 to 156, the mia audio test performs
file I/O and codec processing and should be marked as an integration test; add
the pytest integration marker by importing pytest if needed and decorating the
test function with @pytest.mark.integration (or set pytestmark =
pytest.mark.integration for the module) so the test is categorized as an
integration test per the coding guidelines.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (2)

299-314: stop(): async override + ensure deactivation and clean cancellation

Base TurnDetector.stop() is async and clears is_active. Current sync method neither awaits task cancellation nor deactivates the plugin.

Apply:

-    def stop(self) -> None:
-        # Cancel any running processing tasks
-        for task in self._processing_tasks.values():
-            if not task.done():
-                task.cancel()
-        self._processing_tasks.clear()
+    async def stop(self) -> None:
+        # Cancel any running processing tasks
+        pending = [t for t in self._processing_tasks.values() if not t.done()]
+        for t in pending:
+            t.cancel()
+        if pending:
+            # Drain cancellations
+            await asyncio.gather(*pending, return_exceptions=True)
+        self._processing_tasks.clear()
@@
-        self._current_speaker = None
-
-        self.logger.info("Vogent Turn detection stopped")
+        self._current_speaker = None
+        await super().stop()
+        self.logger.info("Vogent Turn detection stopped")

120-126: Make sample format check case‑insensitive

Avoid rejecting valid formats due to capitalization differences.

Apply:

-        valid_formats = ["int16", "s16", "pcm_s16le"]
-        if audio_data.format not in valid_formats:
+        valid_formats = {"int16", "s16", "pcm_s16le"}
+        fmt = (audio_data.format or "").lower()
+        if fmt not in valid_formats:
             self.logger.error(
-                f"Invalid sample format: {audio_data.format}. Expected one of {valid_formats}."
+                f"Invalid sample format: {audio_data.format}. Expected one of {sorted(valid_formats)}."
             )
             return
♻️ Duplicate comments (5)
agents-core/vision_agents/core/agents/agents.py (1)

667-671: Unconditional await breaks synchronous start() implementations.

The Vogent turn detector has a synchronous start() method while SmartTurn uses async. Awaiting unconditionally will raise TypeError: object NoneType can't be used in 'await' expression or similar when a detector returns None from a sync method.

Apply this diff to handle both sync and async start():

     async def _setup_turn_detection(self):
-        if self.turn_detection:
-            self.logger.info("🎙️ Setting up turn detection listeners")
-            self.events.subscribe(self._on_turn_event)
-            await self.turn_detection.start()
+        if not self.turn_detection:
+            return
+        self.logger.info("🎙️ Setting up turn detection listeners")
+        self.events.subscribe(self._on_turn_event)
+        
+        # Support both sync and async start() implementations
+        import inspect
+        result = self.turn_detection.start()
+        if inspect.iscoroutine(result):
+            await result

Alternatively, standardize the TurnDetector base class to require async start() and update Vogent's implementation to be async (even if it's just async def start(self) -> None: pass).

plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (1)

36-38: start(): async override required and must activate the detector

Current override is a no-op and sync; base TurnDetector.start() is async and sets is_active=True. Processing is gated on is_active, so nothing runs.

Apply:

-    def start(self) -> None:
-        pass
+    async def start(self) -> None:
+        # Mark active and run any future warmups here
+        await super().start()
plugins/smart_turn/tests/test_smart_turn.py (1)

36-43: Mark external model download tests as integration.

These perform network I/O; mark accordingly.

-    async def test_smart_turn_download(self):
+    @pytest.mark.integration
+    async def test_smart_turn_download(self):
@@
-    async def test_silero_download(self):
+    @pytest.mark.integration
+    async def test_silero_download(self):

Based on learnings

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)

99-108: Honor base contract: mark detector active on startup.

Call the base start after model prep to set is_active and run base hooks.

 async def start(self):
     # Ensure model directory exists
     os.makedirs(MODEL_BASE_DIR, exist_ok=True)
     
     # Prepare both models in parallel
     await asyncio.gather(
         self._prepare_smart_turn(),
         self._prepare_silero_vad(),
     )
+    # Signal readiness
+    await super().start()

154-163: Use model signal to decide end-of-turn and respect confidence threshold.

You compute prediction but ignore it; end condition relies only on silence. Combine both and use self._confidence_threshold.

-        # Process all collected segments
-        for segment in segments_to_process:
-            prediction = await self._predict_turn_completed(segment, participant)
-            turn_completed = prediction > 0.5
-
-        # End turn if we're in a turn and collector is done (silence detected)
-        if self.turn_in_progress and not self.collector.is_collecting:
-            self._emit_end_turn_event(TurnEndedEvent(participant=participant))
-            self.turn_in_progress = False
+        # Aggregate model signal across segments
+        model_says_done = False
+        end_confidence: Optional[float] = None
+        for segment in segments_to_process:
+            prob = await self._predict_turn_completed(segment, participant)
+            if prob >= self._confidence_threshold:
+                model_says_done = True
+                end_confidence = prob
+                break
+
+        # End turn if we're in a turn and either silence or model indicates completion
+        if self.turn_in_progress and (model_says_done or not self.collector.is_collecting):
+            self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=end_confidence))
+            self.turn_in_progress = False
🧹 Nitpick comments (12)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (5)

210-219: Use get_running_loop() inside coroutines

get_event_loop() is legacy in async contexts.

Apply:

-            loop = asyncio.get_event_loop()
+            loop = asyncio.get_running_loop()

192-208: Comment says “sorted by timestamp” but code isn’t sorting

Either sort explicitly (if Message exposes a timestamp) or adjust the comment.

Example if Message has created_at:

-                # Get messages for this user, sorted by timestamp
-                user_messages = [
-                    m
-                    for m in conversation.messages
-                    if m.user_id == user_id and m.content
-                ]
+                # Get messages for this user (newest last)
+                user_messages = [m for m in conversation.messages if m.user_id == user_id and m.content]
+                # Optionally enforce by timestamp:
+                # user_messages.sort(key=lambda m: m.created_at)

75-83: Unused state fields

_user_last_audio is written but never read; _infer_channels is unused. Consider removing or using them for buffer GC/format inference.


68-74: Defer heavy model init to start() (optional)

Constructing/compiling and warmup in init can block. Consider lazy init in start().


100-113: Backpressure: large buffers may pile up per user

If prediction is slower than ingest, buffers grow without bounds. Consider a max buffer per user and drop/merge strategy or a queue with size limits.

Also applies to: 160-176

plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (5)

124-133: Guard processing when inactive.

Avoid work before startup or after stop.

 async def process_audio(
     self,
     audio_data: PcmData,
     participant: Participant,
     conversation: Optional[Conversation],
 ) -> None:
+    if not self.is_active:
+        logger.debug("process_audio called while inactive; ignoring.")
+        return
     # Ensure audio is in the right format: 16kHz, float32

111-114: Initialize WhisperFeatureExtractor via from_pretrained for correct config.

Direct ctor may miss expected normalization/mel settings. Prefer the pretrained config used by Smart‑Turn (Whisper Tiny).

-        self._whisper_extractor = await asyncio.to_thread(WhisperFeatureExtractor, chunk_length=8)
+        # Load feature extractor with proper config
+        self._whisper_extractor = await asyncio.to_thread(
+            WhisperFeatureExtractor.from_pretrained, "openai/whisper-tiny"
+        )

326-331: Specify providers and add a brief docstring for build_session.

Ensures deterministic CPU execution and documents behavior.

-def build_session(onnx_path):
-    so = ort.SessionOptions()
+def build_session(onnx_path: str):
+    """Build an ONNXRuntime session (CPU, optimized)."""
+    so = ort.SessionOptions()
     so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL
     so.inter_op_num_threads = 1
     so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
-    return ort.InferenceSession(onnx_path, sess_options=so)
+    return ort.InferenceSession(
+        onnx_path,
+        sess_options=so,
+        providers=["CPUExecutionProvider"],
+    )

42-60: Class doc: tiny nits.

“only look at audio features” → “only looks at audio features.” Consider minor grammar cleanups while you’re in here. No functional impact.


164-206: Minor: remove unused parameter or use it.

participant in _predict_turn_completed is unused. Either drop it and update callers, or use for logging/telemetry. Defer if you plan to use it soon.

plugins/smart_turn/tests/test_smart_turn.py (2)

1-7: Import AudioFormat for consistent PCM construction.

 from getstream.video.rtc.track_util import PcmData
+from getstream.video.rtc.track_util import AudioFormat

48-60: Use enum format and robust chunking; add assertions.

Avoid frombuffer reinterpretation; use provided utilities.

-    def test_silero_predict(self, mia_audio_16khz):
+    def test_silero_predict(self, mia_audio_16khz):
         vad = SileroVAD(SILERO_ONNX_PATH)
-        # TODO: chunk in 512
-        chunk = mia_audio_16khz.samples[:512]
-        iteration = 0
-        for i in range(100):
-            chunk = mia_audio_16khz.samples[i*512:(i+1)*512]
-            int16 = np.frombuffer(chunk, dtype=np.int16)
-            f32 = (int16.astype(np.float32)) / 32768.0
-            # 16khz, float32 on 512 chunk size
-            pcm_chunk = PcmData(samples=f32, format="f32", sample_rate=16000)
-            result = vad.prob(pcm_chunk.samples)
-            print(result)
+        pcm = mia_audio_16khz.resample(16000).to_float32()
+        seen = 0
+        for chunk in pcm.chunks(chunk_size=512):
+            prob = vad.prob(chunk.samples)
+            assert 0.0 <= prob <= 1.0
+            seen += 1
+        assert seen > 0
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f3aee97 and 63f820f.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • agents-core/vision_agents/core/agents/agents.py (8 hunks)
  • docs/ai/instructions/ai-turn-detector.md (1 hunks)
  • plugins/smart_turn/pyproject.toml (1 hunks)
  • plugins/smart_turn/tests/test_smart_turn.py (1 hunks)
  • plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (1 hunks)
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/smart_turn/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

  • plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py
  • plugins/smart_turn/tests/test_smart_turn.py
  • plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
  • agents-core/vision_agents/core/agents/agents.py
🧠 Learnings (1)
📚 Learning: 2025-10-20T19:23:41.259Z
Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Applies to tests/**/*.py : Mark integration tests with pytest.mark.integration

Applied to files:

  • plugins/smart_turn/tests/test_smart_turn.py
🧬 Code graph analysis (4)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (3)
agents-core/vision_agents/core/agents/conversation.py (1)
  • Conversation (67-227)
agents-core/vision_agents/core/turn_detection/turn_detection.py (5)
  • TurnDetector (22-74)
  • start (68-70)
  • process_audio (52-64)
  • _emit_start_turn_event (37-42)
  • _emit_end_turn_event (44-49)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnStartedEvent (11-25)
  • TurnEndedEvent (29-43)
plugins/smart_turn/tests/test_smart_turn.py (3)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnStartedEvent (11-25)
  • TurnEndedEvent (29-43)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (6)
  • ensure_model (269-310)
  • SileroVAD (209-266)
  • SmartTurnDetection (43-206)
  • start (99-107)
  • prob (241-266)
  • process_audio (124-162)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (3)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
  • TurnDetector (22-74)
  • TurnEvent (14-18)
agents-core/vision_agents/core/agents/conversation.py (2)
  • Conversation (67-227)
  • clear (49-52)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
  • start (99-107)
  • process_audio (124-162)
agents-core/vision_agents/core/agents/agents.py (3)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
  • start (99-107)
  • process_audio (124-162)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (2)
  • start (36-37)
  • process_audio (100-158)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
  • start (68-70)
  • process_audio (52-64)
🔇 Additional comments (4)
agents-core/vision_agents/core/agents/agents.py (3)

126-132: LGTM!

The options parameter handling correctly provides defaults and merges user-supplied options appropriately.


372-372: Proper async setup.

Awaiting the turn detection setup before establishing the connection is the correct approach.


688-688: LGTM!

Passing the full participant and conversation context to process_audio() correctly aligns with the updated turn detection API and enables richer decision-making.

plugins/smart_turn/tests/test_smart_turn.py (1)

44-47: Fix: async test and valid API call with assertions.

predict_endpoint doesn't exist; use the internal predictor for now.

-    def test_smart_turn_predict(self, td, mia_audio_16khz):
-        result = td.predict_endpoint(mia_audio_16khz)
-        print(result)
+    @pytest.mark.asyncio
+    async def test_smart_turn_predict(self, td, mia_audio_16khz):
+        pcm = mia_audio_16khz.resample(16000).to_float32()
+        prob = await td._predict_turn_completed(pcm, Participant(user_id="mia", original={}))
+        assert 0.0 <= prob <= 1.0

from ..vad.events import VADAudioEvent
from . import events
from .conversation import Conversation
from dataclasses import dataclass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate import.

dataclass is already imported at line 6. This duplicate import is unnecessary.

Apply this diff:

-from dataclasses import dataclass
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from dataclasses import dataclass
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around line 42, there is a
duplicate import "from dataclasses import dataclass" which is already imported
at line 6; remove the redundant import statement at line 42 (delete that line)
so the file has a single dataclass import and run linters/tests to confirm no
further import duplication.

Comment on lines 59 to 75
@dataclass
class AgentOptions:
model_dir: str

def update(self, other: "AgentOptions") -> "AgentOptions":
merged_dict = asdict(self)

for key, value in asdict(other).items():
if value is not None:
merged_dict[key] = value

return AgentOptions(**merged_dict)

def default_agent_options():
return AgentOptions(
model_dir=tempfile.gettempdir()
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add Google-style docstrings for public API.

The new AgentOptions dataclass, its update() method, and the default_agent_options() factory function lack docstrings.

As per coding guidelines.

Apply this diff:

 @dataclass
 class AgentOptions:
+    """Configuration options for Agent initialization.
+    
+    Attributes:
+        model_dir: Directory path for storing downloaded models and temporary files.
+    """
     model_dir: str
 
     def update(self, other: "AgentOptions") -> "AgentOptions":
+        """Merge options from another AgentOptions instance.
+        
+        Args:
+            other: Another AgentOptions instance whose non-None values will override this instance's values.
+            
+        Returns:
+            A new AgentOptions instance with merged values.
+        """
         merged_dict = asdict(self)
 
         for key, value in asdict(other).items():
@@ -70,6 +80,11 @@
         return AgentOptions(**merged_dict)
 
 def default_agent_options():
+    """Create default agent options.
+    
+    Returns:
+        AgentOptions instance with model_dir set to the system's temporary directory.
+    """
     return AgentOptions(
         model_dir=tempfile.gettempdir()
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@dataclass
class AgentOptions:
model_dir: str
def update(self, other: "AgentOptions") -> "AgentOptions":
merged_dict = asdict(self)
for key, value in asdict(other).items():
if value is not None:
merged_dict[key] = value
return AgentOptions(**merged_dict)
def default_agent_options():
return AgentOptions(
model_dir=tempfile.gettempdir()
)
@dataclass
class AgentOptions:
"""Configuration options for Agent initialization.
Attributes:
model_dir: Directory path for storing downloaded models and temporary files.
"""
model_dir: str
def update(self, other: "AgentOptions") -> "AgentOptions":
"""Merge options from another AgentOptions instance.
Args:
other: Another AgentOptions instance whose non-None values will override this instance's values.
Returns:
A new AgentOptions instance with merged values.
"""
merged_dict = asdict(self)
for key, value in asdict(other).items():
if value is not None:
merged_dict[key] = value
return AgentOptions(**merged_dict)
def default_agent_options():
"""Create default agent options.
Returns:
AgentOptions instance with model_dir set to the system's temporary directory.
"""
return AgentOptions(
model_dir=tempfile.gettempdir()
)
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 59 to 75, the
AgentOptions dataclass, its update() method, and the default_agent_options()
factory function are missing Google-style docstrings; add concise Google-style
docstrings for the class (one-line summary plus Args describing model_dir), for
update() (one-line summary, Args: other: AgentOptions, Returns: AgentOptions,
and a note that None fields are ignored), and for default_agent_options()
(one-line summary and Returns: AgentOptions with default model_dir set to
tempfile.gettempdir()).

Comment on lines +5 to +26
```python

class MyTurnDetector(TurnDetector):
async def process_audio(
self,
audio_data: PcmData,
participant: Participant,
conversation: Optional[Conversation],
) -> None:

self._emit_start_turn_event(TurnStartedEvent(participant=participant))
self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=0.7))

def start(self):
super().start()
# Any custom model loading/ or other heavy prep steps go here

def stop(self):
super().stop()
# cleanup time. start and stop are optional

```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Fix example: add imports and make start/stop async (await super()).

Current snippet omits required imports and uses sync start/stop, which violates the async base contract. Apply:

 ```python
+from typing import Optional
+from getstream.video.rtc.track_util import PcmData
+from vision_agents.core.edge.types import Participant, Conversation
+from vision_agents.core.turn_detection import (
+    TurnDetector,
+    TurnStartedEvent,
+    TurnEndedEvent,
+)
 
 class MyTurnDetector(TurnDetector):
     async def process_audio(
         self,
-        audio_data: PcmData,
+        audio_data: PcmData,
         participant: Participant,
         conversation: Optional[Conversation],
     ) -> None:
-    
         self._emit_start_turn_event(TurnStartedEvent(participant=participant))
         self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=0.7))
 
-    def start(self):
-        super().start()
+    async def start(self):
+        await super().start()
         # Any custom model loading/ or other heavy prep steps go here
-        
-    def stop(self):
-        super().stop()
+
+    async def stop(self):
+        await super().stop()
         # cleanup time. start and stop are optional
🤖 Prompt for AI Agents
In docs/ai/instructions/ai-turn-detector.md around lines 5 to 26, the example
omits required imports and defines start/stop as synchronous methods which
conflict with the async base class; add the listed imports (typing.Optional,
PcmData, Participant, Conversation, TurnDetector, TurnStartedEvent,
TurnEndedEvent) at the top of the snippet, change start() and stop() to async
def start/stop and call await super().start() and await super().stop(),
respectively, and keep process_audio signature as shown.

Comment on lines +28 to +30
### Testing turn detection

An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py` No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct the test file path in docs.

Use the actual repo path:

-An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py`
+An example test suite for turn detection can be found in `plugins/smart_turn/tests/test_smart_turn.py`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Testing turn detection
An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py`
### Testing turn detection
An example test suite for turn detection can be found in `plugins/smart_turn/tests/test_smart_turn.py`
🤖 Prompt for AI Agents
In docs/ai/instructions/ai-turn-detector.md around lines 28 to 30, the example
test suite path is incorrect; update the referenced test file path to the actual
repository location (replace `smart_turn/tests/test_smart_turn.py` with the
correct path used in the repo) so the docs point to the real test file.

Comment on lines 269 to 311
async def ensure_model(path: str, url: str) -> str:
"""
Download a model file asynchronously if it doesn't exist.
Args:
path: Local path where the model should be saved
url: URL to download the model from
Returns:
The path to the model file
"""
if not os.path.exists(path):
model_name = os.path.basename(path)
logger.info(f"Downloading {model_name}...")

try:
async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
async with client.stream("GET", url) as response:
response.raise_for_status()

# Write file in chunks to avoid loading entire file in memory
# Use asyncio.to_thread for blocking file I/O operations
chunks = []
async for chunk in response.aiter_bytes(chunk_size=8192):
chunks.append(chunk)

# Write all chunks to file in thread to avoid blocking event loop
def write_file():
with open(path, "wb") as f:
for chunk in chunks:
f.write(chunk)

await asyncio.to_thread(write_file)

logger.info(f"{model_name} downloaded.")
except httpx.HTTPError as e:
# Clean up partial download on error
if os.path.exists(path):
os.remove(path)
raise RuntimeError(f"Failed to download {model_name}: {e}")

return path

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Stream model download to a temp file, add atomic rename and optional SHA256.

Avoid buffering entire files in RAM and ensure integrity. Also guard concurrent downloads per path.

-async def ensure_model(path: str, url: str) -> str:
+async def ensure_model(path: str, url: str, sha256: Optional[str] = None) -> str:
@@
-    if not os.path.exists(path):
+    if not os.path.exists(path):
         model_name = os.path.basename(path)
         logger.info(f"Downloading {model_name}...")
-        
-        try:
-            async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
-                async with client.stream("GET", url) as response:
-                    response.raise_for_status()
-                    
-                    # Write file in chunks to avoid loading entire file in memory
-                    # Use asyncio.to_thread for blocking file I/O operations
-                    chunks = []
-                    async for chunk in response.aiter_bytes(chunk_size=8192):
-                        chunks.append(chunk)
-                    
-                    # Write all chunks to file in thread to avoid blocking event loop
-                    def write_file():
-                        with open(path, "wb") as f:
-                            for chunk in chunks:
-                                f.write(chunk)
-                    
-                    await asyncio.to_thread(write_file)
-            
-            logger.info(f"{model_name} downloaded.")
-        except httpx.HTTPError as e:
-            # Clean up partial download on error
-            if os.path.exists(path):
-                os.remove(path)
-            raise RuntimeError(f"Failed to download {model_name}: {e}")
+        try:
+            # Per-path async lock to avoid concurrent clobber
+            lock = _download_locks.setdefault(path, asyncio.Lock())
+            async with lock:
+                if os.path.exists(path):
+                    return path
+                tmp_path = f"{path}.tmp"
+                hasher = hashlib.sha256() if sha256 else None
+                async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
+                    async with client.stream("GET", url) as response:
+                        response.raise_for_status()
+                        # Open once; write each chunk in a worker thread
+                        f = open(tmp_path, "wb")
+                        try:
+                            async for chunk in response.aiter_bytes(chunk_size=1 << 14):
+                                if hasher:
+                                    hasher.update(chunk)
+                                await asyncio.to_thread(f.write, chunk)
+                        finally:
+                            await asyncio.to_thread(f.close)
+                # Verify integrity if provided
+                if hasher and hasher.hexdigest().lower() != sha256.lower():
+                    os.remove(tmp_path)
+                    raise RuntimeError(f"Checksum mismatch for {model_name}")
+                # Atomic rename
+                os.replace(tmp_path, path)
+                logger.info(f"{model_name} downloaded.")
+        except Exception as e:
+            # Clean up partial download on error
+            if os.path.exists(f"{path}.tmp"):
+                os.remove(f"{path}.tmp")
+            raise RuntimeError(f"Failed to download {model_name}: {e}")
@@
     return path

Add these near the imports/top of file:

+import hashlib
+from typing import Optional, Dict
+_download_locks: Dict[str, asyncio.Lock] = {}

I'll verify this review comment by examining the codebase and the proposed changes.




Once I locate the file, I'll inspect the current code at lines 269-311 and verify:
- Whether the current code matches the "final state" snippet shown
- Whether the suggested imports are already present
- Whether the proposed changes address legitimate concerns (temp file streaming, atomic rename, SHA256, locks)
- Any Python/asyncio compatibility issues

Let me search for the file:


Then I need to:

  1. Verify the file contents match the snippet
  2. Check current import statements
  3. Validate the proposed refactoring suggestions
  4. Ensure no conflicts with existing code

Could you confirm the repository is available, or should I proceed with generating verification scripts that will clone the repo first?


🏁 Script executed:

#!/bin/bash
# Step 1: Locate the file
fd -p "turn_detection_2.py" -t f

Length of output: 71


🏁 Script executed:

#!/bin/bash
# Step 2: Verify file exists and inspect the target function
find . -name "turn_detection_2.py" -type f 2>/dev/null | head -5

Length of output: 73


🏁 Script executed:

#!/bin/bash
# Search for the turn_detection_2.py file
find . -path "*smart_turn/turn_detection_2.py" -type f 2>/dev/null

Length of output: 73


🏁 Script executed:

#!/bin/bash
# Inspect lines 269-311 to see current implementation
sed -n '269,311p' ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 1677


🏁 Script executed:

#!/bin/bash
# Check imports at top of file (first 50 lines)
head -50 ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 1622


🏁 Script executed:

#!/bin/bash
# Search for _download_locks, asyncio.Lock, or similar patterns in the file
grep -n "_download_locks\|asyncio.Lock\|tmp_path\|atomic" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if hashlib is already imported
grep -n "^import hashlib\|^from hashlib" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check for all calls to ensure_model to verify backward compatibility
rg -n 'ensure_model\s*\(' ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py -A2 -B2

Length of output: 689


🏁 Script executed:

#!/bin/bash
# Verify file totality - check end of file to understand module structure
tail -20 ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 763


🏁 Script executed:

#!/bin/bash
# Check if Dict is imported anywhere in typing imports
grep -n "from typing import" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py

Length of output: 29


Implement streaming download with atomic rename, checksum verification, and concurrency protection.

The current implementation buffers entire model files in memory and lacks protection against concurrent downloads of the same model, creating both RAM pressure and race conditions. The proposed changes address these issues legitimately:

  • Temp file + atomic rename: Prevents partial file corruption if download fails midway
  • SHA256 verification: Ensures model integrity (critical for ML model reproducibility)
  • Per-path asyncio.Lock: Guards against concurrent download races for the same model path
  • Direct streaming write: Avoids buffering entire file, reducing memory footprint for large ONNX models

Add the required imports at the top of the file:

import hashlib
from typing import Optional, Dict

Add this module-level state near the imports:

_download_locks: Dict[str, asyncio.Lock] = {}

Then implement the refactored ensure_model function as shown in the diff. The optional sha256 parameter is backward compatible with existing callers at lines 110 and 116.

🤖 Prompt for AI Agents
In plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py
around lines 269 to 311, replace the current buffered-download implementation
with a streaming, atomic, checksum-verified, and concurrency-safe downloader:
add imports hashlib and typing Optional/Dict; add module-level _download_locks:
Dict[str, asyncio.Lock] = {}; change ensure_model signature to accept optional
sha256: Optional[str] = None; acquire/await a per-path asyncio.Lock from
_download_locks to prevent concurrent downloads; stream
response.aiter_bytes(chunk_size=8192) and write chunks directly to a temporary
file (e.g., path + ".tmp") using asyncio.to_thread for blocking file writes to
avoid buffering whole file in memory; compute SHA256 incrementally while writing
and if sha256 is provided, compare it after download and raise on mismatch; on
successful completion atomically rename the temp file to the target path
(os.replace); ensure cleanup of the temp file and release the lock on any
exception and wrap HTTP errors with a clear runtime error including the original
exception.

Comment on lines 135 to 137
# Resample to 16 kHz mono
samples = audio_data.resample(16_000, 1).samples

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Align resample rate with buffer math

You hardcode 16 kHz here but compute required_bytes using self.sample_rate. Keep them consistent.

Apply:

-        # Resample to 16 kHz mono
-        samples = audio_data.resample(16_000, 1).samples
+        # Resample to target sample rate (mono)
+        samples = audio_data.resample(self.sample_rate, 1).samples
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Resample to 16 kHz mono
samples = audio_data.resample(16_000, 1).samples
# Resample to target sample rate (mono)
samples = audio_data.resample(self.sample_rate, 1).samples
🤖 Prompt for AI Agents
In plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py around
lines 135 to 137, the code resamples audio to a hardcoded 16_000 Hz while buffer
size/required_bytes are computed using self.sample_rate; replace the hardcoded
16_000 with self.sample_rate (ensuring self.sample_rate is an int/Hz value) so
the resample rate and buffer math are consistent, i.e. call
audio_data.resample(self.sample_rate, 1).samples and update any related
constants/variables to use self.sample_rate.

@github-actions github-actions bot added the ci label Oct 30, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
plugins/vogent/tests/test_vogent_td.py (1)

36-42: Fix incorrect event field access (previously flagged).

The event handlers attempt to access event.session_id, but TurnStartedEvent and TurnEndedEvent have a participant field, not session_id.

Apply this diff to fix the field access:

         @td.events.subscribe
         async def on_start(event: TurnStartedEvent):
-            logger.info(f"Vogent turn started on {event.session_id}")
+            logger.info(f"Vogent turn started for {event.participant}")
             event_order.append("start")
 
         @td.events.subscribe
         async def on_stop(event: TurnEndedEvent):
-            logger.info(f"Vogent turn ended on {event.session_id}")
+            logger.info(f"Vogent turn ended for {event.participant}")
             event_order.append("stop")
🧹 Nitpick comments (2)
plugins/vogent/tests/test_vogent_td.py (2)

45-50: Replace arbitrary sleeps with purpose-built wait method.

Instead of using arbitrary sleep durations, leverage the wait_for_processing_complete method available on VogentTurnDetection for more robust synchronization.

Apply this diff:

         await td.process_audio(mia_audio_16khz, participant, conversation)
         await td.process_audio(silence_2s_48khz, participant, conversation)
 
-        await asyncio.sleep(0.001)
-
-        await asyncio.sleep(5)
+        # Wait for all queued audio to be processed
+        await td.wait_for_processing_complete(timeout=5.0)

53-53: Consider more precise assertions.

The current assertion allows two different event sequences. If the behavior is deterministic given the test inputs, consider asserting the exact expected sequence. If non-deterministic, document why both outcomes are valid.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 820e42d and 8fb6d82.

📒 Files selected for processing (1)
  • plugins/vogent/tests/test_vogent_td.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

  • plugins/vogent/tests/test_vogent_td.py
🧬 Code graph analysis (1)
plugins/vogent/tests/test_vogent_td.py (4)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
agents-core/vision_agents/core/turn_detection/events.py (2)
  • TurnEndedEvent (29-45)
  • TurnStartedEvent (11-25)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (3)
  • start (113-129)
  • stop (329-340)
  • process_audio (165-176)
conftest.py (2)
  • mia_audio_16khz (95-134)
  • silence_2s_48khz (181-195)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (3)
plugins/vogent/tests/test_vogent_td.py (3)

1-15: LGTM! Clean imports and module structure.

The imports are well-organized and include all necessary dependencies for testing Vogent turn detection.


19-26: LGTM! Proper async fixture with cleanup.

The try/finally pattern ensures proper cleanup of the VogentTurnDetection instance even if the test fails.


28-28: Verify re-enablement plan for skipped test.

Since the PR is marked as WIP and this test is temporarily disabled, ensure there's a plan to enable it before merging to main.

@tschellenbach tschellenbach merged commit b32c56d into main Oct 30, 2025
6 checks passed
@tschellenbach tschellenbach deleted the vogent-tommaso branch October 30, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment