-
Notifications
You must be signed in to change notification settings - Fork 72
WIP - Vogent + New Smart TURN + Audio utils usage #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t Silero VAD; align docs; ensure silero mia and silence tests pass
…ial-event logging; all VAD tests green (uv run)
…nto vogent-tommaso
…nto vogent-tommaso
…nto vogent-tommaso
…nto vogent-tommaso
WalkthroughRefactors audio plumbing to use external PcmData/AudioFormat, replaces user_metadata with participant across events, adds AgentOptions and span-based tracing, migrates VAD/turn-detection to PcmData buffers, removes Moonshine/Silero plugins, and introduces SmartTurn/Vogent/Deepgram rewrites plus docs and tests updates. Changes
Sequence Diagram(s)%%{init: {"theme":"neutral","themeVariables":{"actorBackground":"#f3f4f6","actorBorder":"#cbd5e1"}}}%%
sequenceDiagram
participant App as Application
participant Agent as Agent
participant TD as TurnDetection
participant Audio as AudioSource
participant Events as EventBus
App->>+Agent: join(call)
Agent->>Agent: merge AgentOptions
Agent->>+TD: await _setup_turn_detection()
TD->>TD: await start()
TD-->>-Agent: ready
Agent-->>-App: session created
Audio->>+Agent: on_audio_received(pcm, participant)
Agent->>+TD: process_audio(pcm, participant, conversation)
TD->>TD: windowing (PcmData)
alt speech starts
TD->>Events: emit TurnStartedEvent(participant)
else speech ends
TD->>Events: emit TurnEndedEvent(participant)
end
TD-->>-Agent: processed
%%{init: {"theme":"neutral","themeVariables":{"actorBackground":"#fffaf0","actorBorder":"#e6e1d9"}}}%%
sequenceDiagram
participant Smart as SmartTurnDetection
participant Silero as SileroVAD
participant Whisper as WhisperFeature
participant ONNX as Turn ONNX
participant Events as EventBus
Smart->>+Smart: await start()
Smart->>+Silero: prepare silero (async)
Smart->>+Whisper: prepare whisper (thread)
Smart->>+ONNX: prepare onnx (thread)
Silero-->>-Smart: ready
Whisper-->>-Smart: ready
ONNX-->>-Smart: ready
Note over Smart,Silero: per-chunk processing
Smart->>+Silero: prob(chunk)
Silero-->>-Smart: prob
alt prob > threshold
Smart->>Smart: buffer segment
Smart->>+Whisper: transcribe segment
Whisper-->>-Smart: transcript
Smart->>+ONNX: predict completion(features)
ONNX-->>-Smart: completion_prob
alt completion_prob > vogent_threshold
Smart->>Events: emit TurnEndedEvent(participant)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Areas to focus on:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 21
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
agents-core/vision_agents/core/utils/utils.py (1)
84-97: Add empty bytes validation in gemini_realtime.py:440-442.The function now returns empty bytes on conversion failure instead of raising exceptions. The current try-except block in gemini_realtime.py will not catch this silent failure—empty bytes will be passed directly to
Blob()and sent to the API, masking the underlying conversion error.Add an explicit check after calling
frame_to_png_bytes():png_bytes = frame_to_png_bytes(frame) if not png_bytes: self.logger.error("Failed to convert frame to PNG bytes") return blob = Blob(data=png_bytes, mime_type="image/png")The test in test_gemini_realtime.py correctly validates the empty bytes case (lines 96-97).
docs/ai/instructions/ai-tts.md (1)
18-25: Update example to use AudioFormat enum.The example still uses
format="s16"(string literal), but the codebase has migrated toformat=AudioFormat.S16. Update the documentation to reflect current best practices.- from getstream.video.rtc.track_util import PcmData + from getstream.video.rtc.track_util import PcmData, AudioFormat async def stream_audio(self, text: str, *_, **__) -> PcmData: audio_bytes = await my_sdk.tts.bytes(text=..., ...) # sample_rate, channels and format depend on what the TTS model returns - return PcmData.from_bytes(audio_bytes, sample_rate=16000, channels=1, format="s16") + return PcmData.from_bytes(audio_bytes, sample_rate=16000, channels=1, format=AudioFormat.S16)agents-core/vision_agents/core/agents/agents.py (1)
825-879: Turn events use participant, not speaker_id → barge-in and responses won’t trigger.The plugins emit TurnStartedEvent/TurnEndedEvent(participant=...), but this handler checks event.speaker_id. This prevents TTS interruption and LLM responses.
- if isinstance(event, TurnStartedEvent): - # Interrupt TTS when user starts speaking (barge-in) - if event.speaker_id and event.speaker_id != self.agent_user.id: + if isinstance(event, TurnStartedEvent): + speaker_id = getattr(event, "speaker_id", None) or ( + event.participant.user_id if getattr(event, "participant", None) else None + ) + # Interrupt TTS when user starts speaking (barge-in) + if speaker_id and speaker_id != self.agent_user.id: if self.tts: self.logger.info( - f"👉 Turn started - interrupting TTS for participant {event.speaker_id}" + f"👉 Turn started - interrupting TTS for participant {speaker_id}" ) try: await self.tts.stop_audio() except Exception as e: self.logger.error(f"Error stopping TTS: {e}") else: self.logger.info( - f"👉 Turn started - participant speaking {event.speaker_id} : {event.confidence}" + f"👉 Turn started - participant speaking {speaker_id} : {event.confidence}" ) else: # Agent itself started speaking - this is normal self.logger.debug( - f"👉 Turn started - agent speaking {event.speaker_id}" + f"👉 Turn started - agent speaking {speaker_id}" ) - elif isinstance(event, TurnEndedEvent): + elif isinstance(event, TurnEndedEvent): + speaker_id = getattr(event, "speaker_id", None) or ( + event.participant.user_id if getattr(event, "participant", None) else None + ) self.logger.info( - f"👉 Turn ended - participant {event.speaker_id} finished (duration: {event.duration}, confidence: {event.confidence})" + f"👉 Turn ended - participant {speaker_id} finished (duration: {event.duration}, confidence: {event.confidence})" ) # When turn detection is enabled, trigger LLM response when user's turn ends # This is the signal that the user has finished speaking and expects a response - if event.speaker_id and event.speaker_id != self.agent_user.id: + if speaker_id and speaker_id != self.agent_user.id: # Get the accumulated transcript for this speaker - transcript = self._pending_user_transcripts.get(event.speaker_id, "") + transcript = self._pending_user_transcripts.get(speaker_id, "") if transcript and transcript.strip(): self.logger.info( - f"🤖 Triggering LLM response after turn ended for {event.speaker_id}" + f"🤖 Triggering LLM response after turn ended for {speaker_id}" ) - # Create participant object if we have metadata - participant = None - if hasattr(event, "custom") and event.custom: - # Try to extract participant info from custom metadata - participant = event.custom.get("participant") + # Prefer the participant carried by the event + participant = getattr(event, "participant", None) # Trigger LLM response with the complete transcript if self.llm: await self.simple_response(transcript, participant) # Clear the pending transcript for this speaker - self._pending_user_transcripts[event.speaker_id] = "" + self._pending_user_transcripts[speaker_id] = ""agents-core/vision_agents/core/turn_detection/events.py (2)
12-25: Update the docstring to match the actual implementation.The docstring references
speaker_idanddurationfields, but the actual implementation usesparticipant(line 23) anddurationis not present. This creates confusion for developers reading the API documentation.Apply this diff to update the docstring:
""" Event emitted when a speaker starts their turn. Attributes: - speaker_id: ID of the speaker who started speaking + participant: Participant who started speaking confidence: Confidence level of the turn detection (0.0-1.0) - duration: Duration of audio processed (seconds) custom: Additional metadata specific to the turn detection implementation """
31-43: Update the docstring to match the actual implementation.The docstring references
speaker_idanddurationfields, but the implementation usesparticipant(line 41) anddurationis not present.Apply this diff:
""" Event emitted when a speaker completes their turn. Attributes: - speaker_id: ID of the speaker who finished speaking + participant: Participant who finished speaking confidence: Confidence level of the turn completion detection (0.0-1.0) - duration: Duration of the turn (seconds) custom: Additional metadata specific to the turn detection implementation """
♻️ Duplicate comments (1)
plugins/openai/vision_agents/plugins/openai/tts.py (1)
45-47: Verify AudioFormat.S16 enum value exists.Same concern as in elevenlabs plugin: ensure
getstream.video.rtc.track_util.AudioFormat.S16exists, as the relevant code snippet showsPCM_S16in the internal enum.
🧹 Nitpick comments (15)
plugins/fish/vision_agents/plugins/fish/stt.py (1)
107-109: Consider clarifying the error log message.The log message includes
pcm_data.durationbut doesn't explicitly label what the value represents. For better debuggability, consider making it explicit.- logger.error( - "No transcript returned from Fish Audio %s", pcm_data.duration - ) + logger.error( + "No transcript returned from Fish Audio (audio duration: %s)", pcm_data.duration + )docs/ai/instructions/ai-stt.md (1)
20-41: Add missing imports and align PCM guidance in example.The example uses PcmData and Participant without imports. Add a minimal prelude and point to the new track_util path.
```python -from vision_agents.core import stt -from vision_agents.core.stt.events import TranscriptResponse +from typing import Optional +from vision_agents.core import stt +from vision_agents.core.stt.events import TranscriptResponse +from vision_agents.core.edge.types import Participant +from getstream.video.rtc.track_util import PcmData ... class MySTT(stt.STT):The new “PCM / Audio management” note is great; consider linking to a short snippet:
from getstream.video.rtc.track_util import PcmData, AudioFormat pcm = PcmData(samples=my_np_int16, sample_rate=16000, format=AudioFormat.S16)Also applies to: 46-50
plugins/openai/tests/test_openai_realtime.py (1)
68-79: Dedup imports; prefer resample_poly for 16k→48k.
- Move PcmData/AudioFormat imports to module scope to avoid repeated per-test imports.
- For 16k→48k, scipy.signal.resample_poly(up=3, down=1) is faster and with better fidelity than FFT resample for integer-rate conversion.
- from scipy import signal - from getstream.video.rtc.track_util import PcmData, AudioFormat + from scipy import signal @@ - samples_48k = signal.resample(samples_16k, num_samples_48k).astype(np.int16) + samples_48k = signal.resample_poly( + samples_16k.astype(np.float32), up=3, down=1 + ).astype(np.int16) @@ - audio_48khz = PcmData( - samples=samples_48k, sample_rate=48000, format=AudioFormat.S16 - ) + audio_48khz = PcmData(samples=samples_48k, sample_rate=48000, format=AudioFormat.S16)At top-of-file:
-from dotenv import load_dotenv +from dotenv import load_dotenv +from getstream.video.rtc.track_util import PcmData, AudioFormatApply the same import cleanup to other tests constructing PcmData (lines ~242-249, ~299-307). If track_util exposes a resample helper, prefer that instead.
Also applies to: 242-249, 299-307
plugins/cartesia/vision_agents/plugins/cartesia/tts.py (1)
38-41: Set provider_name for consistency/telemetry.Other plugins pass provider_name to the base TTS; do the same here.
- super().__init__() + super().__init__(provider_name="cartesia")The AudioFormat/S16 switch looks good.
Also applies to: 80-82
plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)
50-53: Use asyncio.to_thread instead of get_event_loop/run_in_executor.Cleaner and future-proof in 3.11+.
- loop = asyncio.get_event_loop() - chunks: List[bytes] = await loop.run_in_executor( - None, lambda: list(self._generate_chunks(text)) - ) + chunks: List[bytes] = await asyncio.to_thread( + lambda: list(self._generate_chunks(text)) + )The AudioFormat.S16 switch in PcmData.from_bytes looks correct.
Also applies to: 56-62
plugins/wizper/vision_agents/plugins/wizper/stt.py (1)
85-87: Centralized PCM→WAV conversion is solid; consider eliminating temp file for better serverless performance.The refactoring from custom WAV code to
pcm_data.to_wav_bytes()is a good move. Currently, the code writes bytes to disk and uploads via file path. Since fal-client's AsyncClient supportsupload(data, content_type)with raw bytes, you can passwav_datadirectly withcontent_type="audio/wav"and skip the temp file entirely—eliminating unnecessary I/O in serverless environments while keeping cleanup logic simpler..claude/agents/repo-workflow-guide.md (1)
1-6: Frontmatter readability: prefer YAML block scalar for multi-line description.Replace embedded “\n” escapes with a block scalar for clarity and fewer parsing gotchas.
--- name: repo-workflow-guide -description: Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.\n\nExamples:\n- <example>\nContext: User wants to add a new feature to the project.\nuser: "I need to implement a new authentication module"\nassistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."\n<Task tool call to repo-workflow-guide>\nassistant: "Based on the project guidelines, here's how we should approach this..."\n</example>\n\n- <example>\nContext: User asks a question about code organization.\nuser: "Where should I put the new utility functions?"\nassistant: "Let me check the repository workflow guidelines to give you the correct answer."\n<Task tool call to repo-workflow-guide>\nassistant: "According to the project structure guidelines..."\n</example>\n\n- <example>\nContext: Starting a new task that requires understanding project conventions.\nuser: "Can you help me refactor this component?"\nassistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."\n<Task tool call to repo-workflow-guide>\n</example> +description: | + Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase. + + Examples: + - <example> + Context: User wants to add a new feature to the project. + user: "I need to implement a new authentication module" + assistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines." + <Task tool call to repo-workflow-guide> + assistant: "Based on the project guidelines, here's how we should approach this..." + </example> + + - <example> + Context: User asks a question about code organization. + user: "Where should I put the new utility functions?" + assistant: "Let me check the repository workflow guidelines to give you the correct answer." + <Task tool call to repo-workflow-guide> + assistant: "According to the project structure guidelines..." + </example> + + - <example> + Context: Starting a new task that requires understanding project conventions. + user: "Can you help me refactor this component?" + assistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions." + <Task tool call to repo-workflow-guide> + </example> ---plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
107-121: ORT sessions: specify providers and threads for deterministic CPU behavior.Standardize provider and thread config (as in SileroVAD) to avoid unexpected GPU selection and reduce contention.
- # Load ONNX session in thread pool to avoid blocking event loop - self.smart_turn = await asyncio.to_thread(build_session, SMART_TURN_ONNX_PATH) + # Load ONNX session in thread pool to avoid blocking event loop + self.smart_turn = await asyncio.to_thread(build_session, SMART_TURN_ONNX_PATH)And change build_session:
-def build_session(onnx_path): - so = ort.SessionOptions() - so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL - so.inter_op_num_threads = 1 - so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL - return ort.InferenceSession(onnx_path, sess_options=so) +def build_session(onnx_path): + so = ort.SessionOptions() + so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL + so.inter_op_num_threads = 1 + so.intra_op_num_threads = 1 + so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL + return ort.InferenceSession( + onnx_path, + sess_options=so, + providers=["CPUExecutionProvider"], + )
268-310: Model download: avoid buffering entire file in memory and use logging instead of print.Stream to disk as you download; replace prints with a module logger.
-async def ensure_model(path: str, url: str) -> str: +import logging +logger = logging.getLogger(__name__) + +async def ensure_model(path: str, url: str) -> str: @@ - if not os.path.exists(path): + if not os.path.exists(path): model_name = os.path.basename(path) - print(f"Downloading {model_name}...") + logger.info("Downloading %s ...", model_name) @@ - async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client: - async with client.stream("GET", url) as response: + async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client: + async with client.stream("GET", url) as response: response.raise_for_status() - - # Write file in chunks to avoid loading entire file in memory - # Use asyncio.to_thread for blocking file I/O operations - chunks = [] - async for chunk in response.aiter_bytes(chunk_size=8192): - chunks.append(chunk) - - # Write all chunks to file in thread to avoid blocking event loop - def write_file(): - with open(path, "wb") as f: - for chunk in chunks: - f.write(chunk) - - await asyncio.to_thread(write_file) + # Stream directly to file to keep memory bounded + def write_stream(): + with open(path, "wb") as f: + for chunk in response.iter_raw(): + if chunk: + f.write(chunk) + await asyncio.to_thread(write_stream) @@ - print(f"{model_name} downloaded.") + logger.info("%s downloaded.", model_name)Note: If you prefer pure-async file I/O, switch to aiofiles and write chunks inside the async loop.
agents-core/vision_agents/core/turn_detection/turn_detection.py (1)
68-74: Lifecycle methods implement default behavior.The
start()andstop()methods provide sensible defaults by settingis_activeappropriately. However, the docstrings suggest subclasses "want to" override these, but they're not marked as abstract.Consider whether these should remain concrete with default implementations or be marked as abstract to enforce subclass implementation. The current approach (concrete with defaults) is reasonable for optional warmup/cleanup, but the docstrings could be clearer:
async def start(self) -> None: - """Some turn detection systems want to run warmup etc here""" + """Start turn detection. Override to add warmup logic if needed.""" self.is_active = True async def stop(self) -> None: - """Again, some turn detection systems want to run cleanup here""" + """Stop turn detection. Override to add cleanup logic if needed.""" self.is_active = Falseagents-core/vision_agents/core/edge/events.py (1)
12-12: Consider consistent string quote style.The string literals for
typefields were changed from single quotes to double quotes. While functionally equivalent, ensure this aligns with the project's style guidelines for consistency across the codebase.Also applies to: 21-21, 31-31, 41-41
DEVELOPMENT.md (1)
326-338: Fix markdown formatting for headers.The static analysis tool flagged that emphasis (bold text) is being used instead of proper markdown headings for "Audio Formats", "Using Cursor", and "Learning Roadmap".
Apply this diff to use proper markdown headings:
## Onboarding Plan for new contributors -**Audio Formats** +### Audio Formats You'll notice that audio comes in many formats. PCM, wav, mp3. 16khz, 48khz. Encoded as i16 or f32. Note that webrtc by default is 48khz. A good first intro to audio formats can be found here: -**Using Cursor** +### Using Cursor You can ask cursor something like "read @ai-plugin and build me a plugin called fish" See the docs folder for other ai instruction files -**Learning Roadmap** +### Learning Roadmapplugins/smart_turn/tests/test_turn2.py (1)
62-85: Reduce sleep duration and add integration marker.Line 81 includes a 5-second sleep which significantly slows down test execution. Additionally, this test performs end-to-end turn detection with model inference and should be marked as an integration test.
Apply this diff:
+ @pytest.mark.integration async def test_turn_detection(self, td, mia_audio_16khz): participant = Participant(user_id="mia", original={}) conversation = InMemoryConversation(instructions="be nice", messages=[]) event_order = [] # Subscribe to events @td.events.subscribe async def on_start(event: TurnStartedEvent): logger.info(f"Smart turn turn started on {event.session_id}") event_order.append("start") @td.events.subscribe async def on_stop(event: TurnEndedEvent): logger.info(f"Smart turn turn ended on {event.session_id}") event_order.append("stop") await td.process_audio(mia_audio_16khz, participant, conversation) await asyncio.sleep(0.001) - await asyncio.sleep(5) + await asyncio.sleep(0.1) # Verify that turn detection is working - we should get at least some turn events # With continuous processing, we may get multiple start/stop cycles assert event_order == ["start", "stop"]Based on coding guidelines.
conftest.py (1)
175-226: Consider reusing existing fixture to reduce duplication.The audio loading logic in
mia_audio_48khz_chunkedduplicates themia_audio_48khzfixture. You could reduce duplication by loading the full audio first, then chunking it.@pytest.fixture def mia_audio_48khz_chunked(): - """Load mia.mp3 and yield 48kHz PCM data in 20ms chunks.""" - audio_file_path = os.path.join(get_assets_dir(), "mia.mp3") - - # Load audio file using PyAV - container = av.open(audio_file_path) - audio_stream = container.streams.audio[0] - original_sample_rate = audio_stream.sample_rate - target_rate = 48000 - - # Create resampler if needed - resampler = None - if original_sample_rate != target_rate: - resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate) - - # Read all audio frames - samples = [] - for frame in container.decode(audio_stream): - # Resample if needed - if resampler: - frame = resampler.resample(frame)[0] - - # Convert to numpy array - frame_array = frame.to_ndarray() - if len(frame_array.shape) > 1: - # Convert stereo to mono - frame_array = np.mean(frame_array, axis=0) - samples.append(frame_array) - - # Concatenate all samples - samples = np.concatenate(samples) - - # Convert to int16 - samples = samples.astype(np.int16) - container.close() + """Load mia.mp3 and yield 48kHz PCM data in 20ms chunks.""" + # Reuse the full audio fixture logic + audio_file_path = os.path.join(get_assets_dir(), "mia.mp3") + container = av.open(audio_file_path) + audio_stream = container.streams.audio[0] + original_sample_rate = audio_stream.sample_rate + target_rate = 48000 + + resampler = None + if original_sample_rate != target_rate: + resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate) + + samples = [] + for frame in container.decode(audio_stream): + if resampler: + frame = resampler.resample(frame)[0] + frame_array = frame.to_ndarray() + if len(frame_array.shape) > 1: + frame_array = np.mean(frame_array, axis=0) + samples.append(frame_array) + + samples = np.concatenate(samples).astype(np.int16) + container.close() # Calculate chunk size for 20ms at 48kHz chunk_size = int(target_rate * 0.020) # 960 samples per 20msagents-core/vision_agents/core/vad/vad.py (1)
408-416: Consider simplifying frame availability check.The
frame_data_availablecheck is functionally correct but complex. Consider extracting to a helper method for readability.+ def _has_buffered_audio(self) -> bool: + """Check if speech buffer contains audio data.""" + if self.speech_buffer is None: + return False + if isinstance(self.speech_buffer.samples, np.ndarray): + return len(self.speech_buffer.samples) > 0 + if isinstance(self.speech_buffer.samples, (bytes, bytearray)): + return len(self.speech_buffer.samples) > 0 + return False + def _emit_error_event( self, error: Exception, context: str = "", user_metadata: Optional[Dict[str, Any]] = None, ): """Emit a structured error event.""" self.events.send( VADErrorEvent( session_id=self.session_id, plugin_name=self.provider_name, error=error, context=context, user_metadata=user_metadata, - frame_data_available=( - self.speech_buffer is not None - and ( - (isinstance(self.speech_buffer.samples, np.ndarray) and len(self.speech_buffer.samples) > 0) - or (isinstance(self.speech_buffer.samples, (bytes, bytearray)) and len(self.speech_buffer.samples) > 0) - ) - ), + frame_data_available=self._has_buffered_audio(), ) )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (74)
.claude/agents/repo-workflow-guide.md(1 hunks).gitignore(1 hunks)DEVELOPMENT.md(3 hunks)agents-core/pyproject.toml(2 hunks)agents-core/vision_agents/core/agents/agents.py(4 hunks)agents-core/vision_agents/core/edge/events.py(3 hunks)agents-core/vision_agents/core/edge/types.py(1 hunks)agents-core/vision_agents/core/llm/realtime.py(2 hunks)agents-core/vision_agents/core/observability/__init__.py(0 hunks)agents-core/vision_agents/core/observability/metrics.py(0 hunks)agents-core/vision_agents/core/tts/manual_test.py(2 hunks)agents-core/vision_agents/core/tts/tts.py(0 hunks)agents-core/vision_agents/core/turn_detection/__init__.py(0 hunks)agents-core/vision_agents/core/turn_detection/events.py(3 hunks)agents-core/vision_agents/core/turn_detection/turn_detection.py(2 hunks)agents-core/vision_agents/core/utils/utils.py(4 hunks)agents-core/vision_agents/core/vad/vad.py(13 hunks)aiortc(1 hunks)conftest.py(3 hunks)docs/ai/instructions/ai-plugin.md(2 hunks)docs/ai/instructions/ai-realtime-llm.md(5 hunks)docs/ai/instructions/ai-stt.md(2 hunks)docs/ai/instructions/ai-tests.md(1 hunks)docs/ai/instructions/ai-tts.md(2 hunks)docs/ai/instructions/ai-turn-detector.md(1 hunks)docs/vad-refactor-plan-extended.md(1 hunks)docs/vad-refactor-plan.md(1 hunks)examples/01_simple_agent_example/simple_agent_example.py(1 hunks)plugins/aws/example/aws_qwen_example.py(1 hunks)plugins/aws/vision_agents/plugins/aws/aws_realtime.py(8 hunks)plugins/aws/vision_agents/plugins/aws/tts.py(2 hunks)plugins/cartesia/vision_agents/plugins/cartesia/tts.py(2 hunks)plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py(2 hunks)plugins/fish/example/fish_example.py(1 hunks)plugins/fish/vision_agents/plugins/fish/stt.py(2 hunks)plugins/fish/vision_agents/plugins/fish/tts.py(2 hunks)plugins/kokoro/vision_agents/plugins/kokoro/tts.py(2 hunks)plugins/krisp/vision_agents/plugins/krisp/turn_detection.py(8 hunks)plugins/moonshine/README.md(0 hunks)plugins/moonshine/tests/test_moonshine_stt.py(0 hunks)plugins/moonshine/vision_agents/plugins/moonshine/__init__.py(0 hunks)plugins/moonshine/vision_agents/plugins/moonshine/stt.py(0 hunks)plugins/openai/tests/test_openai_realtime.py(4 hunks)plugins/openai/vision_agents/plugins/openai/tts.py(2 hunks)plugins/openrouter/README.md(1 hunks)plugins/openrouter/example/openrouter_example.py(1 hunks)plugins/sample_plugin/example/my_example.py(1 hunks)plugins/silero/README.md(0 hunks)plugins/silero/pyproject.toml(0 hunks)plugins/silero/tests/test_bench.py(0 hunks)plugins/silero/tests/test_performance.py(0 hunks)plugins/silero/tests/test_vad.py(0 hunks)plugins/silero/vision_agents/plugins/silero/__init__.py(0 hunks)plugins/silero/vision_agents/plugins/silero/vad.py(0 hunks)plugins/smart_turn/README.md(1 hunks)plugins/smart_turn/pyproject.toml(1 hunks)plugins/smart_turn/tests/test_turn2.py(1 hunks)plugins/smart_turn/tests/test_turn_detection.py(0 hunks)plugins/smart_turn/vision_agents/plugins/smart_turn/__init__.py(1 hunks)plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py(1 hunks)plugins/vogent/README.md(1 hunks)plugins/vogent/example/basic_usage.py(1 hunks)plugins/vogent/pyproject.toml(2 hunks)plugins/vogent/tests/test_vogent_td.py(1 hunks)plugins/vogent/vision_agents/plugins/vogent/__init__.py(1 hunks)plugins/vogent/vision_agents/plugins/vogent/turn_detection.py(7 hunks)plugins/wizper/vision_agents/plugins/wizper/stt.py(1 hunks)pyproject.toml(2 hunks)tests/base_test.py(5 hunks)tests/test_events.py(7 hunks)tests/test_pcm_data.py(0 hunks)tests/test_tts_base.py(2 hunks)tests/test_utils.py(10 hunks)tests/test_vad_base.py(1 hunks)
💤 Files with no reviewable changes (17)
- agents-core/vision_agents/core/turn_detection/init.py
- plugins/moonshine/tests/test_moonshine_stt.py
- plugins/moonshine/README.md
- plugins/silero/README.md
- plugins/silero/vision_agents/plugins/silero/vad.py
- agents-core/vision_agents/core/observability/metrics.py
- agents-core/vision_agents/core/tts/tts.py
- plugins/silero/tests/test_performance.py
- plugins/moonshine/vision_agents/plugins/moonshine/init.py
- plugins/moonshine/vision_agents/plugins/moonshine/stt.py
- agents-core/vision_agents/core/observability/init.py
- plugins/silero/tests/test_bench.py
- plugins/smart_turn/tests/test_turn_detection.py
- plugins/silero/tests/test_vad.py
- tests/test_pcm_data.py
- plugins/silero/pyproject.toml
- plugins/silero/vision_agents/plugins/silero/init.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/openrouter/example/openrouter_example.pyplugins/aws/vision_agents/plugins/aws/tts.pyplugins/aws/example/aws_qwen_example.pyplugins/fish/vision_agents/plugins/fish/tts.pyexamples/01_simple_agent_example/simple_agent_example.pyplugins/wizper/vision_agents/plugins/wizper/stt.pyplugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.pyplugins/fish/vision_agents/plugins/fish/stt.pyplugins/openai/tests/test_openai_realtime.pyagents-core/vision_agents/core/edge/types.pytests/test_utils.pyplugins/kokoro/vision_agents/plugins/kokoro/tts.pytests/test_vad_base.pyplugins/sample_plugin/example/my_example.pyagents-core/vision_agents/core/edge/events.pytests/test_events.pyplugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.pyplugins/vogent/example/basic_usage.pyagents-core/vision_agents/core/agents/agents.pyplugins/fish/example/fish_example.pyagents-core/vision_agents/core/turn_detection/turn_detection.pyconftest.pyplugins/openai/vision_agents/plugins/openai/tts.pyplugins/vogent/tests/test_vogent_td.pyplugins/cartesia/vision_agents/plugins/cartesia/tts.pyplugins/krisp/vision_agents/plugins/krisp/turn_detection.pyagents-core/vision_agents/core/tts/manual_test.pyplugins/smart_turn/vision_agents/plugins/smart_turn/__init__.pyagents-core/vision_agents/core/llm/realtime.pyagents-core/vision_agents/core/turn_detection/events.pytests/base_test.pyplugins/vogent/vision_agents/plugins/vogent/__init__.pytests/test_tts_base.pyagents-core/vision_agents/core/vad/vad.pyplugins/smart_turn/tests/test_turn2.pyplugins/aws/vision_agents/plugins/aws/aws_realtime.pyagents-core/vision_agents/core/utils/utils.pyplugins/vogent/vision_agents/plugins/vogent/turn_detection.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
tests/**/*.py: Never use mocking utilities (e.g., unittest.mock, pytest-mock) in test files
Write tests using pytest (avoid unittest.TestCase or other frameworks)
Mark integration tests with @pytest.mark.integration
Do not use @pytest.mark.asyncio; async support is automatic
Files:
tests/test_utils.pytests/test_vad_base.pytests/test_events.pytests/base_test.pytests/test_tts_base.py
🧠 Learnings (2)
📚 Learning: 2025-10-20T19:23:41.259Z
Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Use uv to run Python and tests; do not invoke python -m directly
Applied to files:
docs/ai/instructions/ai-tests.md
📚 Learning: 2025-10-20T19:23:41.259Z
Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Applies to tests/**/*.py : Write tests using pytest (avoid unittest.TestCase or other frameworks)
Applied to files:
docs/ai/instructions/ai-tests.md
🧬 Code graph analysis (27)
plugins/aws/vision_agents/plugins/aws/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)
plugins/fish/vision_agents/plugins/fish/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (3)
agents-core/vision_agents/core/agents/conversation.py (1)
Conversation(67-227)agents-core/vision_agents/core/turn_detection/turn_detection.py (5)
TurnDetector(22-74)start(68-70)process_audio(52-64)_emit_start_turn_event(37-42)_emit_end_turn_event(44-49)agents-core/vision_agents/core/turn_detection/events.py (2)
TurnStartedEvent(11-25)TurnEndedEvent(29-43)
plugins/openai/tests/test_openai_realtime.py (3)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)conftest.py (1)
mia_audio_16khz(90-129)tests/base_test.py (1)
mia_audio_16khz(18-58)
tests/test_utils.py (1)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)
plugins/kokoro/vision_agents/plugins/kokoro/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)
tests/test_vad_base.py (3)
agents-core/vision_agents/core/vad/vad.py (5)
VAD(35-433)is_speech(128-138)process_audio(140-189)flush(375-382)close(418-433)agents-core/vision_agents/core/vad/events.py (2)
VADAudioEvent(30-40)VADPartialEvent(44-55)conftest.py (1)
get_assets_dir(78-80)
agents-core/vision_agents/core/edge/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
PluginBaseEvent(50-52)
tests/test_events.py (2)
agents-core/vision_agents/core/events/manager.py (1)
send(426-468)agents-core/vision_agents/core/edge/sfu_events.py (1)
level(496-500)
plugins/elevenlabs/vision_agents/plugins/elevenlabs/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)
plugins/vogent/example/basic_usage.py (2)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (4)
TurnDetection(26-315)start(37-38)process_audio(101-159)stop(300-315)agents-core/vision_agents/core/turn_detection/turn_detection.py (3)
start(68-70)process_audio(52-64)stop(72-74)
agents-core/vision_agents/core/agents/agents.py (3)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
start(97-105)process_audio(122-160)agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
start(68-70)process_audio(52-64)plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (2)
start(37-38)process_audio(101-159)
agents-core/vision_agents/core/turn_detection/turn_detection.py (4)
agents-core/vision_agents/core/turn_detection/events.py (2)
TurnStartedEvent(11-25)TurnEndedEvent(29-43)agents-core/vision_agents/core/agents/conversation.py (1)
Conversation(67-227)plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
process_audio(122-160)start(97-105)plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (2)
process_audio(101-159)start(37-38)
conftest.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
join(237-289)
plugins/openai/vision_agents/plugins/openai/tts.py (1)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)
plugins/vogent/tests/test_vogent_td.py (5)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (4)
TurnDetection(26-315)start(37-38)stop(300-315)process_audio(101-159)agents-core/vision_agents/core/agents/conversation.py (1)
InMemoryConversation(230-237)agents-core/vision_agents/core/stt/events.py (2)
STTTranscriptEvent(16-47)STTPartialTranscriptEvent(51-78)agents-core/vision_agents/core/turn_detection/events.py (2)
TurnEndedEvent(29-43)TurnStartedEvent(11-25)conftest.py (1)
mia_audio_48khz_chunked(176-226)
plugins/krisp/vision_agents/plugins/krisp/turn_detection.py (2)
agents-core/vision_agents/core/agents/conversation.py (1)
Conversation(67-227)agents-core/vision_agents/core/edge/types.py (1)
Participant(29-31)
agents-core/vision_agents/core/tts/manual_test.py (2)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)agents-core/vision_agents/core/edge/types.py (1)
play_pcm_with_ffplay(57-111)
plugins/smart_turn/vision_agents/plugins/smart_turn/__init__.py (1)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (1)
SmartTurnDetection(42-204)
agents-core/vision_agents/core/llm/realtime.py (1)
agents-core/vision_agents/core/edge/sfu_events.py (1)
Participant(229-270)
tests/base_test.py (2)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)conftest.py (3)
assets_dir(84-86)bunny_video_track(241-285)RealVideoTrack(247-279)
plugins/vogent/vision_agents/plugins/vogent/__init__.py (1)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (1)
TurnDetection(26-315)
tests/test_tts_base.py (2)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)plugins/aws/vision_agents/plugins/aws/tts.py (1)
stream_audio(61-94)
agents-core/vision_agents/core/vad/vad.py (2)
agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)agents-core/vision_agents/core/vad/events.py (1)
VADAudioEvent(30-40)
plugins/smart_turn/tests/test_turn2.py (5)
agents-core/vision_agents/core/agents/conversation.py (1)
InMemoryConversation(230-237)agents-core/vision_agents/core/turn_detection/events.py (2)
TurnStartedEvent(11-25)TurnEndedEvent(29-43)plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (6)
ensure_model(268-309)SileroVAD(207-265)SmartTurnDetection(42-204)start(97-105)prob(240-265)process_audio(122-160)agents-core/vision_agents/core/turn_detection/turn_detection.py (3)
start(68-70)stop(72-74)process_audio(52-64)agents-core/vision_agents/core/agents/agents.py (1)
subscribe(189-201)
plugins/aws/vision_agents/plugins/aws/aws_realtime.py (3)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
VideoForwarder(13-188)agents-core/vision_agents/core/events/base.py (1)
AudioFormat(23-30)agents-core/vision_agents/core/llm/events.py (1)
RealtimeAudioOutputEvent(37-45)
plugins/vogent/vision_agents/plugins/vogent/turn_detection.py (3)
agents-core/vision_agents/core/turn_detection/turn_detection.py (3)
TurnDetector(22-74)start(68-70)process_audio(52-64)agents-core/vision_agents/core/edge/types.py (1)
Participant(29-31)agents-core/vision_agents/core/agents/conversation.py (1)
Conversation(67-227)
🪛 LanguageTool
docs/vad-refactor-plan.md
[grammar] ~91-~91: Use a hyphen to join words.
Context: ...n counts via real assets. - Add base-VAD specific tests (future work): - Normal...
(QB_NEW_EN_HYPHEN)
DEVELOPMENT.md
[grammar] ~328-~328: Ensure spelling is correct
Context: ... in many formats. PCM, wav, mp3. 16khz, 48khz. Encoded as i16 or f32. Note that webr...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
DEVELOPMENT.md
326-326: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
333-333: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
338-338: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
| async def _setup_turn_detection(self): | ||
| if self.turn_detection: | ||
| self.logger.info("🎙️ Setting up turn detection listeners") | ||
| self.events.subscribe(self._on_turn_event) | ||
| self.turn_detection.start() | ||
| await self.turn_detection.start() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make start() invocation robust to sync/async implementations and set active state.
Awaiting unconditionally will raise if a detector exposes a synchronous start(). Also ensure detectors mark themselves active.
- async def _setup_turn_detection(self):
- if self.turn_detection:
- self.logger.info("🎙️ Setting up turn detection listeners")
- self.events.subscribe(self._on_turn_event)
- await self.turn_detection.start()
+ async def _setup_turn_detection(self):
+ if not self.turn_detection:
+ return
+ self.logger.info("🎙️ Setting up turn detection listeners")
+ self.events.subscribe(self._on_turn_event)
+ # Support both async and sync start() while standardizing to async over time
+ import inspect
+ maybe = self.turn_detection.start()
+ if inspect.iscoroutine(maybe):
+ await maybeAlso consider enforcing an async start() in the TurnDetector base and updating all implementations to match for consistency.
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 639 to 644, the
call to self.turn_detection.start() assumes an async implementation and
unconditionally awaits it; change this to handle both sync and async starts by
calling start(), checking if the return value is awaitable (using
inspect.isawaitable) and awaiting only if needed, wrap the start call in
try/except to log errors, and after a successful start set the detector's active
state (e.g., setattr(self.turn_detection, "active", True) or call its activation
method if one exists) so detectors mark themselves active; optionally note that
the TurnDetector base class should be updated to require an async start() for
consistency and update implementations accordingly.
| turn_detection=smart_turn.TurnDetection( | ||
| buffer_duration=2.0, confidence_threshold=0.5 | ||
| buffer_in_seconds=2.0, confidence_threshold=0.5 | ||
| ), # Enable turn detection with FAL/ Smart turn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
LGTM! Parameter rename improves clarity.
The rename from buffer_duration to buffer_in_seconds makes the API more explicit. The value and behavior are preserved.
Run the following script to verify all TurnDetection constructor calls have been updated:
🏁 Script executed:
#!/bin/bash
# Description: Verify that no usages of the old parameter name remain.
# Search for any remaining usage of buffer_duration in TurnDetection calls
rg -n 'TurnDetection' -A 3 -B 1 | rg 'buffer_duration'Length of output: 428
Update other files to match the parameter rename for consistency.
The reviewed file correctly uses buffer_in_seconds, but the refactoring is incomplete across the codebase. Update:
plugins/aws/example/basic_usage.py:28- changebuffer_duration=2.0tobuffer_in_seconds=2.0- Documentation references in
aws/README.md:25andagents-core/vision_agents/core/turn_detection/README.md:24
🤖 Prompt for AI Agents
In examples/01_simple_agent_example/simple_agent_example.py around lines 25-27
and across the repo, the parameter was renamed to buffer_in_seconds but other
references still use buffer_duration; update plugins/aws/example/basic_usage.py
line ~28 to replace buffer_duration=2.0 with buffer_in_seconds=2.0, and update
the docs aws/README.md line ~25 and
agents-core/vision_agents/core/turn_detection/README.md line ~24 to reflect the
new parameter name (search for and replace any remaining buffer_duration
occurrences with buffer_in_seconds and adjust any example code snippets or
explanatory text to use the new name).
| async def on_start(event: TurnStartedEvent): | ||
| logger.info(f"Vogent turn started on {event.session_id}") | ||
|
|
||
| @td.events.subscribe | ||
| async def on_stop(event: TurnEndedEvent): | ||
| logger.info(f"Vogent turn ended on {event.session_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect event field access.
The event handlers attempt to access event.session_id, but according to the event definitions in agents-core/vision_agents/core/turn_detection/events.py, both TurnStartedEvent and TurnEndedEvent have a participant field, not session_id.
Apply this diff to fix the field access:
@td.events.subscribe
async def on_start(event: TurnStartedEvent):
- logger.info(f"Vogent turn started on {event.session_id}")
+ logger.info(f"Vogent turn started for {event.participant}")
@td.events.subscribe
async def on_stop(event: TurnEndedEvent):
- logger.info(f"Vogent turn ended on {event.session_id}")
+ logger.info(f"Vogent turn ended for {event.participant}")🤖 Prompt for AI Agents
In plugins/vogent/tests/test_vogent_td.py around lines 44 to 49, the event
handlers incorrectly access event.session_id; change both handlers to use
event.participant instead (e.g., logger.info(f"Vogent turn started on
{event.participant}") and logger.info(f"Vogent turn ended on
{event.participant}")) so they reference the correct field defined on
TurnStartedEvent and TurnEndedEvent.
| [tool.uv] | ||
| # Workspace-level override to resolve numpy version conflicts | ||
| # vogent-turn requires numpy<2.0, so we override to use 1.26.x | ||
| override-dependencies = [ | ||
| "numpy>=1.26.0,<2.0" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Workspace-wide numpy override: verify blast radius or scope it.
The global override to numpy>=1.26,<2.0 may constrain unrelated plugins (e.g., CV libs) and cause resolution conflicts. Prefer scoping this to the minimal surface (plugin-level dependency) or confirm no other member requires numpy>=2.
Run to see who depends on numpy and potential conflicts:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Files referencing numpy constraints:"
rg -nP --glob '!**/.venv/**' 'numpy\s*[<>=]' -S
echo ""
echo "uv workspace members:"
rg -nP '^\s*"plugins/.*"|^\s*"agents-core"' pyproject.toml -n -C0
echo ""
echo "Check pyprojects that declare numpy>=2:"
fd -a pyproject.toml | xargs -I{} rg -nP 'numpy\s*>=\s*2' {}Length of output: 1398
Workspace numpy override is incompatible with 6+ plugins—will cause dependency resolution failures.
The override numpy>=1.26.0,<2.0 forces numpy <2.0 workspace-wide, but plugins/wizper, plugins/kokoro, plugins/gemini, plugins/deepgram, plugins/ultralytics, and plugins/smart_turn all explicitly require numpy>=2.0+. This creates an unsolvable constraint conflict.
Fix options:
- Remove the workspace override and scope the vogent-turn constraint to its plugin only (if vogent is a plugin)
- Update all conflicting plugins to accept numpy<2.0 (likely breaking for audio/vision dependencies)
- Scope override only to agents-core/vogent-turn if that's the sole consumer
Verify which package actually requires numpy<2.0 and confine the override there rather than forcing it workspace-wide.
🤖 Prompt for AI Agents
In pyproject.toml around lines 26 to 31, the workspace-level override forcing
"numpy>=1.26.0,<2.0" conflicts with multiple plugins (wizper, kokoro, gemini,
deepgram, ultralytics, smart_turn) that require numpy>=2.0 and will break
dependency resolution; remove the global override and instead apply the
numpy<2.0 constraint only to the package that truly needs it (e.g.,
agents-core/vogent-turn) by moving that constraint into that package's pyproject
or its package-specific dependencies, or alternatively update the conflicting
plugins to accept the older numpy if feasible—verify which package requires
numpy<2.0 and scope the override to that package rather than workspace-wide.
| format='s16', | ||
| layout='mono', | ||
| rate=target_rate | ||
| format=AudioFormat.S16, layout="mono", rate=target_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify AudioFormat enum member name in PyAV resampler.
The code uses AudioFormat.S16 as the format parameter for av.AudioResampler. However, according to the relevant snippets, the enum is defined with member PCM_S16 = "s16", not S16. This would cause an AttributeError at runtime.
Verify the correct enum member name. Based on the enum definition in agents-core/vision_agents/core/events/base.py, it should likely be:
resampler = av.AudioResampler(
- format=AudioFormat.S16, layout="mono", rate=target_rate
+ format="s16", layout="mono", rate=target_rate
)Note: PyAV's AudioResampler expects a string format name, not an enum. The AudioFormat enum appears to be for internal use in PcmData, not for PyAV.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| format=AudioFormat.S16, layout="mono", rate=target_rate | |
| format="s16", layout="mono", rate=target_rate |
🤖 Prompt for AI Agents
In tests/base_test.py around line 31, the resampler call uses AudioFormat.S16
which does not exist and will raise AttributeError; PyAV expects a string format
like "s16" (or the enum's value). Replace AudioFormat.S16 with the correct
member/value (e.g., AudioFormat.PCM_S16.value or the literal "s16") when passing
the format to av.AudioResampler, ensuring the import of AudioFormat remains
valid or use the string directly.
| def _decode_mia_to_pcm16k() -> PcmData: | ||
| import av | ||
| import os | ||
| from conftest import get_assets_dir | ||
|
|
||
| audio_file_path = os.path.join(get_assets_dir(), "mia.mp3") | ||
| container = av.open(audio_file_path) | ||
| audio_stream = container.streams.audio[0] | ||
| target_rate = 16000 | ||
| resampler = None | ||
| if audio_stream.sample_rate != target_rate: | ||
| resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate) | ||
|
|
||
| samples_list = [] | ||
| for frame in container.decode(audio_stream): | ||
| if resampler: | ||
| frame = resampler.resample(frame)[0] | ||
| arr = frame.to_ndarray() | ||
| if arr.ndim > 1: | ||
| arr = arr.mean(axis=0).astype(np.int16) | ||
| samples_list.append(arr) | ||
| container.close() | ||
| if not samples_list: | ||
| return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16") | ||
| samples = np.concatenate(samples_list).astype(np.int16) | ||
| return PcmData(samples=samples, sample_rate=target_rate, format="s16") | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_base_vad_mia_detects_speech_segments(): | ||
| vad = EnergyVAD( | ||
| sample_rate=16000, | ||
| window_samples=512, | ||
| energy_th=0.01, | ||
| activation_th=0.5, | ||
| deactivation_th=0.3, | ||
| speech_pad_ms=200, | ||
| min_speech_ms=100, | ||
| max_speech_ms=30000, | ||
| partial_frames=5, | ||
| ) | ||
|
|
||
| pcm = _decode_mia_to_pcm16k() | ||
|
|
||
| audio_events = [] | ||
| partial_events = [] | ||
|
|
||
| @vad.events.subscribe | ||
| async def on_audio(ev: VADAudioEvent): | ||
| audio_events.append(ev) | ||
|
|
||
| @vad.events.subscribe | ||
| async def on_partial(ev: VADPartialEvent): | ||
| partial_events.append(ev) | ||
|
|
||
| await vad.process_audio(pcm) | ||
| await vad.flush() | ||
| await asyncio.sleep(0.05) | ||
|
|
||
| assert len(audio_events) > 0, "Expected at least one speech segment from mia" | ||
| # Sanity: total bytes > 0 and duration present | ||
| total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events) | ||
| assert total_bytes > 0 | ||
| assert any((ev.duration_ms or 0) > 0 for ev in audio_events) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Mark the mia audio test as an integration test.
The test at lines 121-155 decodes an MP3 file from disk using the av library, processes it through VAD, and validates speech detection. This involves file I/O and codec processing, making it an integration test.
Apply this diff:
+@pytest.mark.integration
@pytest.mark.asyncio
async def test_base_vad_mia_detects_speech_segments():
vad = EnergyVAD(Based on coding guidelines.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _decode_mia_to_pcm16k() -> PcmData: | |
| import av | |
| import os | |
| from conftest import get_assets_dir | |
| audio_file_path = os.path.join(get_assets_dir(), "mia.mp3") | |
| container = av.open(audio_file_path) | |
| audio_stream = container.streams.audio[0] | |
| target_rate = 16000 | |
| resampler = None | |
| if audio_stream.sample_rate != target_rate: | |
| resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate) | |
| samples_list = [] | |
| for frame in container.decode(audio_stream): | |
| if resampler: | |
| frame = resampler.resample(frame)[0] | |
| arr = frame.to_ndarray() | |
| if arr.ndim > 1: | |
| arr = arr.mean(axis=0).astype(np.int16) | |
| samples_list.append(arr) | |
| container.close() | |
| if not samples_list: | |
| return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16") | |
| samples = np.concatenate(samples_list).astype(np.int16) | |
| return PcmData(samples=samples, sample_rate=target_rate, format="s16") | |
| @pytest.mark.asyncio | |
| async def test_base_vad_mia_detects_speech_segments(): | |
| vad = EnergyVAD( | |
| sample_rate=16000, | |
| window_samples=512, | |
| energy_th=0.01, | |
| activation_th=0.5, | |
| deactivation_th=0.3, | |
| speech_pad_ms=200, | |
| min_speech_ms=100, | |
| max_speech_ms=30000, | |
| partial_frames=5, | |
| ) | |
| pcm = _decode_mia_to_pcm16k() | |
| audio_events = [] | |
| partial_events = [] | |
| @vad.events.subscribe | |
| async def on_audio(ev: VADAudioEvent): | |
| audio_events.append(ev) | |
| @vad.events.subscribe | |
| async def on_partial(ev: VADPartialEvent): | |
| partial_events.append(ev) | |
| await vad.process_audio(pcm) | |
| await vad.flush() | |
| await asyncio.sleep(0.05) | |
| assert len(audio_events) > 0, "Expected at least one speech segment from mia" | |
| # Sanity: total bytes > 0 and duration present | |
| total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events) | |
| assert total_bytes > 0 | |
| assert any((ev.duration_ms or 0) > 0 for ev in audio_events) | |
| def _decode_mia_to_pcm16k() -> PcmData: | |
| import av | |
| import os | |
| from conftest import get_assets_dir | |
| audio_file_path = os.path.join(get_assets_dir(), "mia.mp3") | |
| container = av.open(audio_file_path) | |
| audio_stream = container.streams.audio[0] | |
| target_rate = 16000 | |
| resampler = None | |
| if audio_stream.sample_rate != target_rate: | |
| resampler = av.AudioResampler(format="s16", layout="mono", rate=target_rate) | |
| samples_list = [] | |
| for frame in container.decode(audio_stream): | |
| if resampler: | |
| frame = resampler.resample(frame)[0] | |
| arr = frame.to_ndarray() | |
| if arr.ndim > 1: | |
| arr = arr.mean(axis=0).astype(np.int16) | |
| samples_list.append(arr) | |
| container.close() | |
| if not samples_list: | |
| return PcmData(samples=np.zeros(0, dtype=np.int16), sample_rate=target_rate, format="s16") | |
| samples = np.concatenate(samples_list).astype(np.int16) | |
| return PcmData(samples=samples, sample_rate=target_rate, format="s16") | |
| @pytest.mark.integration | |
| @pytest.mark.asyncio | |
| async def test_base_vad_mia_detects_speech_segments(): | |
| vad = EnergyVAD( | |
| sample_rate=16000, | |
| window_samples=512, | |
| energy_th=0.01, | |
| activation_th=0.5, | |
| deactivation_th=0.3, | |
| speech_pad_ms=200, | |
| min_speech_ms=100, | |
| max_speech_ms=30000, | |
| partial_frames=5, | |
| ) | |
| pcm = _decode_mia_to_pcm16k() | |
| audio_events = [] | |
| partial_events = [] | |
| @vad.events.subscribe | |
| async def on_audio(ev: VADAudioEvent): | |
| audio_events.append(ev) | |
| @vad.events.subscribe | |
| async def on_partial(ev: VADPartialEvent): | |
| partial_events.append(ev) | |
| await vad.process_audio(pcm) | |
| await vad.flush() | |
| await asyncio.sleep(0.05) | |
| assert len(audio_events) > 0, "Expected at least one speech segment from mia" | |
| # Sanity: total bytes > 0 and duration present | |
| total_bytes = sum(len(ev.audio_data or b"") for ev in audio_events) | |
| assert total_bytes > 0 | |
| assert any((ev.duration_ms or 0) > 0 for ev in audio_events) |
🤖 Prompt for AI Agents
In tests/test_vad_base.py around lines 92 to 156, the mia audio test performs
file I/O and codec processing and should be marked as an integration test; add
the pytest integration marker by importing pytest if needed and decorating the
test function with @pytest.mark.integration (or set pytestmark =
pytest.mark.integration for the module) so the test is categorized as an
integration test per the coding guidelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (2)
299-314: stop(): async override + ensure deactivation and clean cancellationBase TurnDetector.stop() is async and clears is_active. Current sync method neither awaits task cancellation nor deactivates the plugin.
Apply:
- def stop(self) -> None: - # Cancel any running processing tasks - for task in self._processing_tasks.values(): - if not task.done(): - task.cancel() - self._processing_tasks.clear() + async def stop(self) -> None: + # Cancel any running processing tasks + pending = [t for t in self._processing_tasks.values() if not t.done()] + for t in pending: + t.cancel() + if pending: + # Drain cancellations + await asyncio.gather(*pending, return_exceptions=True) + self._processing_tasks.clear() @@ - self._current_speaker = None - - self.logger.info("Vogent Turn detection stopped") + self._current_speaker = None + await super().stop() + self.logger.info("Vogent Turn detection stopped")
120-126: Make sample format check case‑insensitiveAvoid rejecting valid formats due to capitalization differences.
Apply:
- valid_formats = ["int16", "s16", "pcm_s16le"] - if audio_data.format not in valid_formats: + valid_formats = {"int16", "s16", "pcm_s16le"} + fmt = (audio_data.format or "").lower() + if fmt not in valid_formats: self.logger.error( - f"Invalid sample format: {audio_data.format}. Expected one of {valid_formats}." + f"Invalid sample format: {audio_data.format}. Expected one of {sorted(valid_formats)}." ) return
♻️ Duplicate comments (5)
agents-core/vision_agents/core/agents/agents.py (1)
667-671: Unconditional await breaks synchronous start() implementations.The Vogent turn detector has a synchronous
start()method while SmartTurn uses async. Awaiting unconditionally will raiseTypeError: object NoneType can't be used in 'await' expressionor similar when a detector returns None from a sync method.Apply this diff to handle both sync and async start():
async def _setup_turn_detection(self): - if self.turn_detection: - self.logger.info("🎙️ Setting up turn detection listeners") - self.events.subscribe(self._on_turn_event) - await self.turn_detection.start() + if not self.turn_detection: + return + self.logger.info("🎙️ Setting up turn detection listeners") + self.events.subscribe(self._on_turn_event) + + # Support both sync and async start() implementations + import inspect + result = self.turn_detection.start() + if inspect.iscoroutine(result): + await resultAlternatively, standardize the TurnDetector base class to require async start() and update Vogent's implementation to be async (even if it's just
async def start(self) -> None: pass).plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (1)
36-38: start(): async override required and must activate the detectorCurrent override is a no-op and sync; base TurnDetector.start() is async and sets is_active=True. Processing is gated on is_active, so nothing runs.
Apply:
- def start(self) -> None: - pass + async def start(self) -> None: + # Mark active and run any future warmups here + await super().start()plugins/smart_turn/tests/test_smart_turn.py (1)
36-43: Mark external model download tests as integration.These perform network I/O; mark accordingly.
- async def test_smart_turn_download(self): + @pytest.mark.integration + async def test_smart_turn_download(self): @@ - async def test_silero_download(self): + @pytest.mark.integration + async def test_silero_download(self):Based on learnings
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
99-108: Honor base contract: mark detector active on startup.Call the base start after model prep to set is_active and run base hooks.
async def start(self): # Ensure model directory exists os.makedirs(MODEL_BASE_DIR, exist_ok=True) # Prepare both models in parallel await asyncio.gather( self._prepare_smart_turn(), self._prepare_silero_vad(), ) + # Signal readiness + await super().start()
154-163: Use model signal to decide end-of-turn and respect confidence threshold.You compute prediction but ignore it; end condition relies only on silence. Combine both and use self._confidence_threshold.
- # Process all collected segments - for segment in segments_to_process: - prediction = await self._predict_turn_completed(segment, participant) - turn_completed = prediction > 0.5 - - # End turn if we're in a turn and collector is done (silence detected) - if self.turn_in_progress and not self.collector.is_collecting: - self._emit_end_turn_event(TurnEndedEvent(participant=participant)) - self.turn_in_progress = False + # Aggregate model signal across segments + model_says_done = False + end_confidence: Optional[float] = None + for segment in segments_to_process: + prob = await self._predict_turn_completed(segment, participant) + if prob >= self._confidence_threshold: + model_says_done = True + end_confidence = prob + break + + # End turn if we're in a turn and either silence or model indicates completion + if self.turn_in_progress and (model_says_done or not self.collector.is_collecting): + self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=end_confidence)) + self.turn_in_progress = False
🧹 Nitpick comments (12)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (5)
210-219: Use get_running_loop() inside coroutinesget_event_loop() is legacy in async contexts.
Apply:
- loop = asyncio.get_event_loop() + loop = asyncio.get_running_loop()
192-208: Comment says “sorted by timestamp” but code isn’t sortingEither sort explicitly (if Message exposes a timestamp) or adjust the comment.
Example if Message has created_at:
- # Get messages for this user, sorted by timestamp - user_messages = [ - m - for m in conversation.messages - if m.user_id == user_id and m.content - ] + # Get messages for this user (newest last) + user_messages = [m for m in conversation.messages if m.user_id == user_id and m.content] + # Optionally enforce by timestamp: + # user_messages.sort(key=lambda m: m.created_at)
75-83: Unused state fields_user_last_audio is written but never read; _infer_channels is unused. Consider removing or using them for buffer GC/format inference.
68-74: Defer heavy model init to start() (optional)Constructing/compiling and warmup in init can block. Consider lazy init in start().
100-113: Backpressure: large buffers may pile up per userIf prediction is slower than ingest, buffers grow without bounds. Consider a max buffer per user and drop/merge strategy or a queue with size limits.
Also applies to: 160-176
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (5)
124-133: Guard processing when inactive.Avoid work before startup or after stop.
async def process_audio( self, audio_data: PcmData, participant: Participant, conversation: Optional[Conversation], ) -> None: + if not self.is_active: + logger.debug("process_audio called while inactive; ignoring.") + return # Ensure audio is in the right format: 16kHz, float32
111-114: Initialize WhisperFeatureExtractor via from_pretrained for correct config.Direct ctor may miss expected normalization/mel settings. Prefer the pretrained config used by Smart‑Turn (Whisper Tiny).
- self._whisper_extractor = await asyncio.to_thread(WhisperFeatureExtractor, chunk_length=8) + # Load feature extractor with proper config + self._whisper_extractor = await asyncio.to_thread( + WhisperFeatureExtractor.from_pretrained, "openai/whisper-tiny" + )
326-331: Specify providers and add a brief docstring for build_session.Ensures deterministic CPU execution and documents behavior.
-def build_session(onnx_path): - so = ort.SessionOptions() +def build_session(onnx_path: str): + """Build an ONNXRuntime session (CPU, optimized).""" + so = ort.SessionOptions() so.execution_mode = ort.ExecutionMode.ORT_SEQUENTIAL so.inter_op_num_threads = 1 so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL - return ort.InferenceSession(onnx_path, sess_options=so) + return ort.InferenceSession( + onnx_path, + sess_options=so, + providers=["CPUExecutionProvider"], + )
42-60: Class doc: tiny nits.“only look at audio features” → “only looks at audio features.” Consider minor grammar cleanups while you’re in here. No functional impact.
164-206: Minor: remove unused parameter or use it.participant in _predict_turn_completed is unused. Either drop it and update callers, or use for logging/telemetry. Defer if you plan to use it soon.
plugins/smart_turn/tests/test_smart_turn.py (2)
1-7: Import AudioFormat for consistent PCM construction.from getstream.video.rtc.track_util import PcmData +from getstream.video.rtc.track_util import AudioFormat
48-60: Use enum format and robust chunking; add assertions.Avoid frombuffer reinterpretation; use provided utilities.
- def test_silero_predict(self, mia_audio_16khz): + def test_silero_predict(self, mia_audio_16khz): vad = SileroVAD(SILERO_ONNX_PATH) - # TODO: chunk in 512 - chunk = mia_audio_16khz.samples[:512] - iteration = 0 - for i in range(100): - chunk = mia_audio_16khz.samples[i*512:(i+1)*512] - int16 = np.frombuffer(chunk, dtype=np.int16) - f32 = (int16.astype(np.float32)) / 32768.0 - # 16khz, float32 on 512 chunk size - pcm_chunk = PcmData(samples=f32, format="f32", sample_rate=16000) - result = vad.prob(pcm_chunk.samples) - print(result) + pcm = mia_audio_16khz.resample(16000).to_float32() + seen = 0 + for chunk in pcm.chunks(chunk_size=512): + prob = vad.prob(chunk.samples) + assert 0.0 <= prob <= 1.0 + seen += 1 + assert seen > 0
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
agents-core/vision_agents/core/agents/agents.py(8 hunks)docs/ai/instructions/ai-turn-detector.md(1 hunks)plugins/smart_turn/pyproject.toml(1 hunks)plugins/smart_turn/tests/test_smart_turn.py(1 hunks)plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py(1 hunks)plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/smart_turn/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.pyplugins/smart_turn/tests/test_smart_turn.pyplugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.pyagents-core/vision_agents/core/agents/agents.py
🧠 Learnings (1)
📚 Learning: 2025-10-20T19:23:41.259Z
Learnt from: CR
PR: GetStream/Vision-Agents#0
File: .cursor/rules/python.mdc:0-0
Timestamp: 2025-10-20T19:23:41.259Z
Learning: Applies to tests/**/*.py : Mark integration tests with pytest.mark.integration
Applied to files:
plugins/smart_turn/tests/test_smart_turn.py
🧬 Code graph analysis (4)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (3)
agents-core/vision_agents/core/agents/conversation.py (1)
Conversation(67-227)agents-core/vision_agents/core/turn_detection/turn_detection.py (5)
TurnDetector(22-74)start(68-70)process_audio(52-64)_emit_start_turn_event(37-42)_emit_end_turn_event(44-49)agents-core/vision_agents/core/turn_detection/events.py (2)
TurnStartedEvent(11-25)TurnEndedEvent(29-43)
plugins/smart_turn/tests/test_smart_turn.py (3)
agents-core/vision_agents/core/agents/conversation.py (1)
InMemoryConversation(230-237)agents-core/vision_agents/core/turn_detection/events.py (2)
TurnStartedEvent(11-25)TurnEndedEvent(29-43)plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (6)
ensure_model(269-310)SileroVAD(209-266)SmartTurnDetection(43-206)start(99-107)prob(241-266)process_audio(124-162)
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (3)
agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
TurnDetector(22-74)TurnEvent(14-18)agents-core/vision_agents/core/agents/conversation.py (2)
Conversation(67-227)clear(49-52)plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
start(99-107)process_audio(124-162)
agents-core/vision_agents/core/agents/agents.py (3)
plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py (2)
start(99-107)process_audio(124-162)plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (2)
start(36-37)process_audio(100-158)agents-core/vision_agents/core/turn_detection/turn_detection.py (2)
start(68-70)process_audio(52-64)
🔇 Additional comments (4)
agents-core/vision_agents/core/agents/agents.py (3)
126-132: LGTM!The options parameter handling correctly provides defaults and merges user-supplied options appropriately.
372-372: Proper async setup.Awaiting the turn detection setup before establishing the connection is the correct approach.
688-688: LGTM!Passing the full participant and conversation context to
process_audio()correctly aligns with the updated turn detection API and enables richer decision-making.plugins/smart_turn/tests/test_smart_turn.py (1)
44-47: Fix: async test and valid API call with assertions.predict_endpoint doesn't exist; use the internal predictor for now.
- def test_smart_turn_predict(self, td, mia_audio_16khz): - result = td.predict_endpoint(mia_audio_16khz) - print(result) + @pytest.mark.asyncio + async def test_smart_turn_predict(self, td, mia_audio_16khz): + pcm = mia_audio_16khz.resample(16000).to_float32() + prob = await td._predict_turn_completed(pcm, Participant(user_id="mia", original={})) + assert 0.0 <= prob <= 1.0
| from ..vad.events import VADAudioEvent | ||
| from . import events | ||
| from .conversation import Conversation | ||
| from dataclasses import dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate import.
dataclass is already imported at line 6. This duplicate import is unnecessary.
Apply this diff:
-from dataclasses import dataclass
-📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from dataclasses import dataclass |
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around line 42, there is a
duplicate import "from dataclasses import dataclass" which is already imported
at line 6; remove the redundant import statement at line 42 (delete that line)
so the file has a single dataclass import and run linters/tests to confirm no
further import duplication.
| @dataclass | ||
| class AgentOptions: | ||
| model_dir: str | ||
|
|
||
| def update(self, other: "AgentOptions") -> "AgentOptions": | ||
| merged_dict = asdict(self) | ||
|
|
||
| for key, value in asdict(other).items(): | ||
| if value is not None: | ||
| merged_dict[key] = value | ||
|
|
||
| return AgentOptions(**merged_dict) | ||
|
|
||
| def default_agent_options(): | ||
| return AgentOptions( | ||
| model_dir=tempfile.gettempdir() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add Google-style docstrings for public API.
The new AgentOptions dataclass, its update() method, and the default_agent_options() factory function lack docstrings.
As per coding guidelines.
Apply this diff:
@dataclass
class AgentOptions:
+ """Configuration options for Agent initialization.
+
+ Attributes:
+ model_dir: Directory path for storing downloaded models and temporary files.
+ """
model_dir: str
def update(self, other: "AgentOptions") -> "AgentOptions":
+ """Merge options from another AgentOptions instance.
+
+ Args:
+ other: Another AgentOptions instance whose non-None values will override this instance's values.
+
+ Returns:
+ A new AgentOptions instance with merged values.
+ """
merged_dict = asdict(self)
for key, value in asdict(other).items():
@@ -70,6 +80,11 @@
return AgentOptions(**merged_dict)
def default_agent_options():
+ """Create default agent options.
+
+ Returns:
+ AgentOptions instance with model_dir set to the system's temporary directory.
+ """
return AgentOptions(
model_dir=tempfile.gettempdir()
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @dataclass | |
| class AgentOptions: | |
| model_dir: str | |
| def update(self, other: "AgentOptions") -> "AgentOptions": | |
| merged_dict = asdict(self) | |
| for key, value in asdict(other).items(): | |
| if value is not None: | |
| merged_dict[key] = value | |
| return AgentOptions(**merged_dict) | |
| def default_agent_options(): | |
| return AgentOptions( | |
| model_dir=tempfile.gettempdir() | |
| ) | |
| @dataclass | |
| class AgentOptions: | |
| """Configuration options for Agent initialization. | |
| Attributes: | |
| model_dir: Directory path for storing downloaded models and temporary files. | |
| """ | |
| model_dir: str | |
| def update(self, other: "AgentOptions") -> "AgentOptions": | |
| """Merge options from another AgentOptions instance. | |
| Args: | |
| other: Another AgentOptions instance whose non-None values will override this instance's values. | |
| Returns: | |
| A new AgentOptions instance with merged values. | |
| """ | |
| merged_dict = asdict(self) | |
| for key, value in asdict(other).items(): | |
| if value is not None: | |
| merged_dict[key] = value | |
| return AgentOptions(**merged_dict) | |
| def default_agent_options(): | |
| """Create default agent options. | |
| Returns: | |
| AgentOptions instance with model_dir set to the system's temporary directory. | |
| """ | |
| return AgentOptions( | |
| model_dir=tempfile.gettempdir() | |
| ) |
🤖 Prompt for AI Agents
In agents-core/vision_agents/core/agents/agents.py around lines 59 to 75, the
AgentOptions dataclass, its update() method, and the default_agent_options()
factory function are missing Google-style docstrings; add concise Google-style
docstrings for the class (one-line summary plus Args describing model_dir), for
update() (one-line summary, Args: other: AgentOptions, Returns: AgentOptions,
and a note that None fields are ignored), and for default_agent_options()
(one-line summary and Returns: AgentOptions with default model_dir set to
tempfile.gettempdir()).
| ```python | ||
|
|
||
| class MyTurnDetector(TurnDetector): | ||
| async def process_audio( | ||
| self, | ||
| audio_data: PcmData, | ||
| participant: Participant, | ||
| conversation: Optional[Conversation], | ||
| ) -> None: | ||
|
|
||
| self._emit_start_turn_event(TurnStartedEvent(participant=participant)) | ||
| self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=0.7)) | ||
|
|
||
| def start(self): | ||
| super().start() | ||
| # Any custom model loading/ or other heavy prep steps go here | ||
|
|
||
| def stop(self): | ||
| super().stop() | ||
| # cleanup time. start and stop are optional | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Fix example: add imports and make start/stop async (await super()).
Current snippet omits required imports and uses sync start/stop, which violates the async base contract. Apply:
```python
+from typing import Optional
+from getstream.video.rtc.track_util import PcmData
+from vision_agents.core.edge.types import Participant, Conversation
+from vision_agents.core.turn_detection import (
+ TurnDetector,
+ TurnStartedEvent,
+ TurnEndedEvent,
+)
class MyTurnDetector(TurnDetector):
async def process_audio(
self,
- audio_data: PcmData,
+ audio_data: PcmData,
participant: Participant,
conversation: Optional[Conversation],
) -> None:
-
self._emit_start_turn_event(TurnStartedEvent(participant=participant))
self._emit_end_turn_event(TurnEndedEvent(participant=participant, confidence=0.7))
- def start(self):
- super().start()
+ async def start(self):
+ await super().start()
# Any custom model loading/ or other heavy prep steps go here
-
- def stop(self):
- super().stop()
+
+ async def stop(self):
+ await super().stop()
# cleanup time. start and stop are optional🤖 Prompt for AI Agents
In docs/ai/instructions/ai-turn-detector.md around lines 5 to 26, the example
omits required imports and defines start/stop as synchronous methods which
conflict with the async base class; add the listed imports (typing.Optional,
PcmData, Participant, Conversation, TurnDetector, TurnStartedEvent,
TurnEndedEvent) at the top of the snippet, change start() and stop() to async
def start/stop and call await super().start() and await super().stop(),
respectively, and keep process_audio signature as shown.
| ### Testing turn detection | ||
|
|
||
| An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py` No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the test file path in docs.
Use the actual repo path:
-An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py`
+An example test suite for turn detection can be found in `plugins/smart_turn/tests/test_smart_turn.py`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Testing turn detection | |
| An example test suite for turn detection can be found in `smart_turn/tests/test_smart_turn.py` | |
| ### Testing turn detection | |
| An example test suite for turn detection can be found in `plugins/smart_turn/tests/test_smart_turn.py` |
🤖 Prompt for AI Agents
In docs/ai/instructions/ai-turn-detector.md around lines 28 to 30, the example
test suite path is incorrect; update the referenced test file path to the actual
repository location (replace `smart_turn/tests/test_smart_turn.py` with the
correct path used in the repo) so the docs point to the real test file.
| async def ensure_model(path: str, url: str) -> str: | ||
| """ | ||
| Download a model file asynchronously if it doesn't exist. | ||
| Args: | ||
| path: Local path where the model should be saved | ||
| url: URL to download the model from | ||
| Returns: | ||
| The path to the model file | ||
| """ | ||
| if not os.path.exists(path): | ||
| model_name = os.path.basename(path) | ||
| logger.info(f"Downloading {model_name}...") | ||
|
|
||
| try: | ||
| async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client: | ||
| async with client.stream("GET", url) as response: | ||
| response.raise_for_status() | ||
|
|
||
| # Write file in chunks to avoid loading entire file in memory | ||
| # Use asyncio.to_thread for blocking file I/O operations | ||
| chunks = [] | ||
| async for chunk in response.aiter_bytes(chunk_size=8192): | ||
| chunks.append(chunk) | ||
|
|
||
| # Write all chunks to file in thread to avoid blocking event loop | ||
| def write_file(): | ||
| with open(path, "wb") as f: | ||
| for chunk in chunks: | ||
| f.write(chunk) | ||
|
|
||
| await asyncio.to_thread(write_file) | ||
|
|
||
| logger.info(f"{model_name} downloaded.") | ||
| except httpx.HTTPError as e: | ||
| # Clean up partial download on error | ||
| if os.path.exists(path): | ||
| os.remove(path) | ||
| raise RuntimeError(f"Failed to download {model_name}: {e}") | ||
|
|
||
| return path | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Stream model download to a temp file, add atomic rename and optional SHA256.
Avoid buffering entire files in RAM and ensure integrity. Also guard concurrent downloads per path.
-async def ensure_model(path: str, url: str) -> str:
+async def ensure_model(path: str, url: str, sha256: Optional[str] = None) -> str:
@@
- if not os.path.exists(path):
+ if not os.path.exists(path):
model_name = os.path.basename(path)
logger.info(f"Downloading {model_name}...")
-
- try:
- async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
- async with client.stream("GET", url) as response:
- response.raise_for_status()
-
- # Write file in chunks to avoid loading entire file in memory
- # Use asyncio.to_thread for blocking file I/O operations
- chunks = []
- async for chunk in response.aiter_bytes(chunk_size=8192):
- chunks.append(chunk)
-
- # Write all chunks to file in thread to avoid blocking event loop
- def write_file():
- with open(path, "wb") as f:
- for chunk in chunks:
- f.write(chunk)
-
- await asyncio.to_thread(write_file)
-
- logger.info(f"{model_name} downloaded.")
- except httpx.HTTPError as e:
- # Clean up partial download on error
- if os.path.exists(path):
- os.remove(path)
- raise RuntimeError(f"Failed to download {model_name}: {e}")
+ try:
+ # Per-path async lock to avoid concurrent clobber
+ lock = _download_locks.setdefault(path, asyncio.Lock())
+ async with lock:
+ if os.path.exists(path):
+ return path
+ tmp_path = f"{path}.tmp"
+ hasher = hashlib.sha256() if sha256 else None
+ async with httpx.AsyncClient(timeout=300.0, follow_redirects=True) as client:
+ async with client.stream("GET", url) as response:
+ response.raise_for_status()
+ # Open once; write each chunk in a worker thread
+ f = open(tmp_path, "wb")
+ try:
+ async for chunk in response.aiter_bytes(chunk_size=1 << 14):
+ if hasher:
+ hasher.update(chunk)
+ await asyncio.to_thread(f.write, chunk)
+ finally:
+ await asyncio.to_thread(f.close)
+ # Verify integrity if provided
+ if hasher and hasher.hexdigest().lower() != sha256.lower():
+ os.remove(tmp_path)
+ raise RuntimeError(f"Checksum mismatch for {model_name}")
+ # Atomic rename
+ os.replace(tmp_path, path)
+ logger.info(f"{model_name} downloaded.")
+ except Exception as e:
+ # Clean up partial download on error
+ if os.path.exists(f"{path}.tmp"):
+ os.remove(f"{path}.tmp")
+ raise RuntimeError(f"Failed to download {model_name}: {e}")
@@
return pathAdd these near the imports/top of file:
+import hashlib
+from typing import Optional, Dict
+_download_locks: Dict[str, asyncio.Lock] = {}I'll verify this review comment by examining the codebase and the proposed changes.
Once I locate the file, I'll inspect the current code at lines 269-311 and verify:
- Whether the current code matches the "final state" snippet shown
- Whether the suggested imports are already present
- Whether the proposed changes address legitimate concerns (temp file streaming, atomic rename, SHA256, locks)
- Any Python/asyncio compatibility issues
Let me search for the file:
Then I need to:
- Verify the file contents match the snippet
- Check current import statements
- Validate the proposed refactoring suggestions
- Ensure no conflicts with existing code
Could you confirm the repository is available, or should I proceed with generating verification scripts that will clone the repo first?
🏁 Script executed:
#!/bin/bash
# Step 1: Locate the file
fd -p "turn_detection_2.py" -t fLength of output: 71
🏁 Script executed:
#!/bin/bash
# Step 2: Verify file exists and inspect the target function
find . -name "turn_detection_2.py" -type f 2>/dev/null | head -5Length of output: 73
🏁 Script executed:
#!/bin/bash
# Search for the turn_detection_2.py file
find . -path "*smart_turn/turn_detection_2.py" -type f 2>/dev/nullLength of output: 73
🏁 Script executed:
#!/bin/bash
# Inspect lines 269-311 to see current implementation
sed -n '269,311p' ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.pyLength of output: 1677
🏁 Script executed:
#!/bin/bash
# Check imports at top of file (first 50 lines)
head -50 ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.pyLength of output: 1622
🏁 Script executed:
#!/bin/bash
# Search for _download_locks, asyncio.Lock, or similar patterns in the file
grep -n "_download_locks\|asyncio.Lock\|tmp_path\|atomic" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.pyLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check if hashlib is already imported
grep -n "^import hashlib\|^from hashlib" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.pyLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check for all calls to ensure_model to verify backward compatibility
rg -n 'ensure_model\s*\(' ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py -A2 -B2Length of output: 689
🏁 Script executed:
#!/bin/bash
# Verify file totality - check end of file to understand module structure
tail -20 ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.pyLength of output: 763
🏁 Script executed:
#!/bin/bash
# Check if Dict is imported anywhere in typing imports
grep -n "from typing import" ./plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.pyLength of output: 29
Implement streaming download with atomic rename, checksum verification, and concurrency protection.
The current implementation buffers entire model files in memory and lacks protection against concurrent downloads of the same model, creating both RAM pressure and race conditions. The proposed changes address these issues legitimately:
- Temp file + atomic rename: Prevents partial file corruption if download fails midway
- SHA256 verification: Ensures model integrity (critical for ML model reproducibility)
- Per-path asyncio.Lock: Guards against concurrent download races for the same model path
- Direct streaming write: Avoids buffering entire file, reducing memory footprint for large ONNX models
Add the required imports at the top of the file:
import hashlib
from typing import Optional, DictAdd this module-level state near the imports:
_download_locks: Dict[str, asyncio.Lock] = {}Then implement the refactored ensure_model function as shown in the diff. The optional sha256 parameter is backward compatible with existing callers at lines 110 and 116.
🤖 Prompt for AI Agents
In plugins/smart_turn/vision_agents/plugins/smart_turn/turn_detection_2.py
around lines 269 to 311, replace the current buffered-download implementation
with a streaming, atomic, checksum-verified, and concurrency-safe downloader:
add imports hashlib and typing Optional/Dict; add module-level _download_locks:
Dict[str, asyncio.Lock] = {}; change ensure_model signature to accept optional
sha256: Optional[str] = None; acquire/await a per-path asyncio.Lock from
_download_locks to prevent concurrent downloads; stream
response.aiter_bytes(chunk_size=8192) and write chunks directly to a temporary
file (e.g., path + ".tmp") using asyncio.to_thread for blocking file writes to
avoid buffering whole file in memory; compute SHA256 incrementally while writing
and if sha256 is provided, compare it after download and raise on mismatch; on
successful completion atomically rename the temp file to the target path
(os.replace); ensure cleanup of the temp file and release the lock on any
exception and wrap HTTP errors with a clear runtime error including the original
exception.
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
Outdated
Show resolved
Hide resolved
plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py
Outdated
Show resolved
Hide resolved
| # Resample to 16 kHz mono | ||
| samples = audio_data.resample(16_000, 1).samples | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align resample rate with buffer math
You hardcode 16 kHz here but compute required_bytes using self.sample_rate. Keep them consistent.
Apply:
- # Resample to 16 kHz mono
- samples = audio_data.resample(16_000, 1).samples
+ # Resample to target sample rate (mono)
+ samples = audio_data.resample(self.sample_rate, 1).samples📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Resample to 16 kHz mono | |
| samples = audio_data.resample(16_000, 1).samples | |
| # Resample to target sample rate (mono) | |
| samples = audio_data.resample(self.sample_rate, 1).samples | |
🤖 Prompt for AI Agents
In plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py around
lines 135 to 137, the code resamples audio to a hardcoded 16_000 Hz while buffer
size/required_bytes are computed using self.sample_rate; replace the hardcoded
16_000 with self.sample_rate (ensuring self.sample_rate is an int/Hz value) so
the resample rate and buffer math are consistent, i.e. call
audio_data.resample(self.sample_rate, 1).samples and update any related
constants/variables to use self.sample_rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
plugins/vogent/tests/test_vogent_td.py (1)
36-42: Fix incorrect event field access (previously flagged).The event handlers attempt to access
event.session_id, butTurnStartedEventandTurnEndedEventhave aparticipantfield, notsession_id.Apply this diff to fix the field access:
@td.events.subscribe async def on_start(event: TurnStartedEvent): - logger.info(f"Vogent turn started on {event.session_id}") + logger.info(f"Vogent turn started for {event.participant}") event_order.append("start") @td.events.subscribe async def on_stop(event: TurnEndedEvent): - logger.info(f"Vogent turn ended on {event.session_id}") + logger.info(f"Vogent turn ended for {event.participant}") event_order.append("stop")
🧹 Nitpick comments (2)
plugins/vogent/tests/test_vogent_td.py (2)
45-50: Replace arbitrary sleeps with purpose-built wait method.Instead of using arbitrary sleep durations, leverage the
wait_for_processing_completemethod available onVogentTurnDetectionfor more robust synchronization.Apply this diff:
await td.process_audio(mia_audio_16khz, participant, conversation) await td.process_audio(silence_2s_48khz, participant, conversation) - await asyncio.sleep(0.001) - - await asyncio.sleep(5) + # Wait for all queued audio to be processed + await td.wait_for_processing_complete(timeout=5.0)
53-53: Consider more precise assertions.The current assertion allows two different event sequences. If the behavior is deterministic given the test inputs, consider asserting the exact expected sequence. If non-deterministic, document why both outcomes are valid.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
plugins/vogent/tests/test_vogent_td.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/vogent/tests/test_vogent_td.py
🧬 Code graph analysis (1)
plugins/vogent/tests/test_vogent_td.py (4)
agents-core/vision_agents/core/agents/conversation.py (1)
InMemoryConversation(230-237)agents-core/vision_agents/core/turn_detection/events.py (2)
TurnEndedEvent(29-45)TurnStartedEvent(11-25)plugins/vogent/vision_agents/plugins/vogent/vogent_turn_detection.py (3)
start(113-129)stop(329-340)process_audio(165-176)conftest.py (2)
mia_audio_16khz(95-134)silence_2s_48khz(181-195)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (3)
plugins/vogent/tests/test_vogent_td.py (3)
1-15: LGTM! Clean imports and module structure.The imports are well-organized and include all necessary dependencies for testing Vogent turn detection.
19-26: LGTM! Proper async fixture with cleanup.The try/finally pattern ensures proper cleanup of the VogentTurnDetection instance even if the test fails.
28-28: Verify re-enablement plan for skipped test.Since the PR is marked as WIP and this test is temporarily disabled, ensure there's a plan to enable it before merging to main.
Summary by CodeRabbit
New Features
Refactor
Removals
Documentation