Skip to content

Conversation

@dangusev
Copy link
Contributor

@dangusev dangusev commented Nov 7, 2025

What's changed

  • New Features

    • OpenAI Chat Completions LLM and Vision LLM plugins for Chat Completions API integration.
      • openai.ChatCompletionsVLM buffers video frames, converts to JPEG, and streams responses via OpenAI-compatible client;
    • Qwen3-VL video understanding example
    • Public set_conversation() method for improved conversation handling
    • Video frame JPEG encoding utility
  • Documentation

    • Qwen3-VL example documentation with setup, configuration, and troubleshooting guide

Tests:

  • Update Anthropic, AWS, Gemini, and OpenRouter tests to use set_conversation instead of direct _conversation assignment.

Summary by CodeRabbit

  • New Features

    • Added OpenAI ChatCompletions support (text and vision) and Qwen3‑VL example with streaming video.
    • New JPEG frame encoder utility for optimized video-to-JPEG conversion.
    • Public API to attach conversation state to LLMs (set conversation via setter).
  • Refactor

    • Tests and internal wiring updated to use the public conversation setter instead of direct private assignment.
  • Documentation

    • Added Qwen3‑VL integration guide and usage examples.
  • Tests

    • Expanded ChatCompletions tests, streaming and video test stubs added.

@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Moves conversation and instruction state into LLM instance fields, adds a public set_conversation(conversation) API and wires Agent.join to call it, replaces tests' direct _conversation assignments with the setter, adds OpenAI ChatCompletions LLM/VLM plugins (streaming + vision), adds frame_to_jpeg_bytes utility, and introduces example/tests for ChatCompletions VLM.

Changes

Cohort / File(s) Change Summary
LLM Core API
agents-core/vision_agents/core/llm/llm.py
Import Instructions; move conversation/instruction state to instance fields (instructions, parsed_instructions, _conversation); add set_conversation(self, conversation); update _set_instructions() to parse and store instructions.
Agent Integration
agents-core/vision_agents/core/agents/agents.py
After creating conversation in join flow, call self.llm.set_conversation(self.conversation).
Plugin Tests (API usage)
plugins/anthropic/tests/test_anthropic_llm.py, plugins/aws/tests/test_aws.py, plugins/gemini/tests/test_gemini_llm.py, plugins/openrouter/tests/test_openrouter_llm.py
Replace direct llm._conversation = ... assignments with llm.set_conversation(...); minor formatting tweaks.
Video Utilities
agents-core/vision_agents/core/utils/video_utils.py
Add frame_to_jpeg_bytes(frame, target_width, target_height, quality=85) to convert/rescale an av.VideoFrame to optimized JPEG bytes (uses PIL LANCZOS resampling).
OpenAI Chat Completions LLM
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py, plugins/openai/vision_agents/plugins/openai/__init__.py
Add ChatCompletionsLLM (streaming chat-completions integration) and export it via plugin __all__.
OpenAI Chat Completions VLM
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py, plugins/openai/vision_agents/plugins/openai/__init__.py
Add ChatCompletionsVLM with frame buffering, JPEG encoding, video forwarder handling, and streaming response events; export via plugin __all__.
OpenAI Tests
plugins/openai/tests/test_chat_completions.py
Add tests and stubs (AsyncStreamStub, VideoStreamTrackStub) for ChatCompletionsLLM and ChatCompletionsVLM covering streaming success and failure.
OpenAI Example (Qwen3-VL)
plugins/openai/examples/qwen_vl_example/*
Add example project (README, pyproject.toml, example script) demonstrating Qwen3-VL Baseten integration with streaming video processing.
Docs
plugins/openai/README.md
Rebrand/readjust examples to reference Vision Agents and add usage for ChatCompletions LLM/VLM and Realtime modes.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant LLM
    participant Conversation
    participant ExternalModel

    Agent->>Agent: create/join call
    Agent->>Conversation: create conversation
    Conversation-->>Agent: conversation

    Note over Agent,LLM: Provide conversation via public API
    Agent->>LLM: set_conversation(conversation)
    LLM-->>LLM: store conversation & parsed instructions

    Agent->>LLM: simple_response(text) / VLM request
    LLM->>Conversation: read history (if present)
    alt VLM includes frames
        LLM->>LLM: _get_frames_bytes -> JPEG/base64 frames
        LLM->>ExternalModel: stream/request (includes frames)
    else LLM only
        LLM->>ExternalModel: stream/request
    end
    ExternalModel-->>LLM: streaming chunks / final
    LLM->>Agent: emit LLMResponseChunkEvent / LLMResponseCompletedEvent
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas to inspect closely:

  • Agent join wiring and potential timing/race when calling set_conversation.
  • ChatCompletionsVLM: frame buffering, frame resizing/encoding (frame_to_jpeg_bytes), and FPS/window calculations.
  • LLM instruction lifecycle: parsing, None-safety for parsed_instructions, and backward compatibility with existing LLM implementations.
  • New tests/stubs: ensure AsyncStreamStub and VideoStreamTrackStub behavior matches production streaming semantics.

Possibly related PRs

Suggested labels

plugin-openai, tests, core-agents

Suggested reviewers

  • maxkahan
  • yarikdevcom
  • d3xvn

Poem

The instructions hang like knives within the throat,
a hush of frames rescaled, sharpened, swallowed whole.
I set the conversation — an organ of thin light —
and watch the model murmur back in stitched small mouths.
A quiet harvest of words, compressed and kept.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.55% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main addition of the PR: OpenAI Chat Completions support for OSS models, which aligns with the primary changes across the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/baseten

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 290849e and 16c53c8.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • .env.example (1 hunks)
  • agents-core/vision_agents/core/llm/llm.py (2 hunks)
  • plugins/baseten/README.md (1 hunks)
  • plugins/baseten/pyproject.toml (1 hunks)
  • plugins/baseten/vision_agents/plugins/baseten/__init__.py (1 hunks)
  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)
  • plugins/baseten/vision_agents/plugins/baseten/events.py (1 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

  • plugins/baseten/vision_agents/plugins/baseten/__init__.py
  • plugins/baseten/vision_agents/plugins/baseten/events.py
  • agents-core/vision_agents/core/llm/llm.py
  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
🧬 Code graph analysis (4)
plugins/baseten/vision_agents/plugins/baseten/__init__.py (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
  • BasetenVLM (32-274)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
agents-core/vision_agents/core/events/base.py (1)
  • PluginBaseEvent (52-54)
agents-core/vision_agents/core/llm/llm.py (1)
agents-core/vision_agents/core/agents/conversation.py (1)
  • Conversation (67-227)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/llm.py (3)
  • LLMResponseEvent (38-42)
  • VideoLLM (443-464)
  • _conversation (83-86)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (14-195)
  • start_event_consumer (109-195)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-43)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
  • LLMErrorEvent (7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)

89-158: Critical: User text is never sent to the model.

The text parameter containing the new user prompt is never added to the messages payload. The model only receives conversation history and frames, but cannot respond to the new input. This is a correctness bug that breaks the core functionality.

Apply this diff to include the user text:

-        frames_data = []
+        frames_data: list[dict[str, object]] = []
         for frame_bytes in self._get_frames_bytes():
             frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
             frame_msg = {
                 "type": "image_url",
                 "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
             }
             frames_data.append(frame_msg)
 
+        if text:
+            frames_data.insert(0, {"type": "text", "text": text})
+
+        if not frames_data:
+            logger.warning(
+                "Cannot create an LLM response - no prompt text or frames available."
+            )
+            return LLMResponseEvent(original=None, text="")
+
         logger.debug(
             f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
         )
 
         messages.append(
             {
                 "role": "user",
                 "content": frames_data,
             }
         )
🧹 Nitpick comments (5)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)

86-87: Consider making frame dimensions configurable.

The frame dimensions are hardcoded to 800x600. Different models or use cases might benefit from different resolutions. Consider adding frame_width and frame_height as constructor parameters.

Apply this diff to add configurable dimensions:

     def __init__(
         self,
         model: str,
         api_key: Optional[str] = None,
         base_url: Optional[str] = None,
         fps: int = 1,
         frame_buffer_seconds: int = 10,
+        frame_width: int = 800,
+        frame_height: int = 600,
         client: Optional[AsyncOpenAI] = None,
     ):

Then update the initialization:

-        self._frame_width = 800
-        self._frame_height = 600
+        self._frame_width = frame_width
+        self._frame_height = frame_height

92-93: Unused parameter: processors.

The processors parameter is declared but never used in the method. Either utilize it or remove it from the signature.


110-110: Address or remove TODO comment.

The TODO comment references _build_enhanced_instructions, but this method is not present or used. Clarify the intended implementation or remove the comment.


129-129: Consider limiting conversation history size.

The TODO comment raises a valid concern about message volume. Sending unbounded conversation history could lead to token limit errors or increased latency. Consider implementing a sliding window or token-based truncation strategy.


276-308: Well-implemented frame conversion utility.

The function correctly handles aspect ratio preservation and uses appropriate resampling quality (LANCZOS). The TODO comment about moving to core utils is valid—this utility could benefit other plugins.

Would you like me to open an issue to track moving this utility to a shared location?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a9361a8 and 93cf015.

📒 Files selected for processing (1)
  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py
🧬 Code graph analysis (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-43)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (14-195)
  • start_event_consumer (109-195)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
  • LLMErrorEvent (7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (2)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (2)

1-28: LGTM!

Imports are well-organized and all appear necessary for the implementation. No sys.path modifications present, adhering to coding guidelines.


263-273: LGTM!

The method correctly iterates over buffered frames and converts them to JPEG bytes. Implementation is clean and well-documented.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (3)

31-39: Complete the TODO in the class docstring.

The docstring still contains a TODO placeholder. Please add a brief description of the class purpose (e.g., "A video language model backed by Baseten-hosted models that processes video frames alongside text prompts"), document key parameters, and provide usage guidance.


88-157: CRITICAL: User prompt is never sent to the model.

The text parameter (Line 90) is never added to the messages payload. Only video frames are included in the final user message (Lines 152-157). This is a correctness bug that breaks the core functionality—the model cannot respond to the user's actual question.

Apply this diff to fix:

         # Attach the latest bufferred frames to the request
-        frames_data = []
+        frames_data: list[dict[str, object]] = []
         for frame_bytes in self._get_frames_bytes():
             frame_b64 = base64.b64encode(frame_bytes).decode("utf-8")
             frame_msg = {
                 "type": "image_url",
                 "image_url": {"url": f"data:image/jpeg;base64,{frame_b64}"},
             }
             frames_data.append(frame_msg)
 
+        if text:
+            frames_data.insert(0, {"type": "text", "text": text})
+
+        if not frames_data:
+            logger.warning(
+                "Cannot create an LLM response - no prompt text or frames available."
+            )
+            return LLMResponseEvent(original=None, text="")
+
         logger.debug(
             f'Forwarding {len(frames_data)} to the Baseten model "{self.model}"'
         )
 
         messages.append(
             {
                 "role": "user",
                 "content": frames_data,
             }
         )

247-257: Fix redundant condition and avoid starting an already-running forwarder.

The condition if not shared_forwarder: followed by shared_forwarder or VideoForwarder(...) contains dead code—the shared_forwarder or part can never be reached. Additionally, calling await self._video_forwarder.start() when shared_forwarder is provided may attempt to start an already-running forwarder.

Apply this diff:

         logger.info("🎥 BasetenVLM subscribing to VideoForwarder")
-        if not shared_forwarder:
-            self._video_forwarder = shared_forwarder or VideoForwarder(
+        if shared_forwarder is None:
+            self._video_forwarder = VideoForwarder(
                 cast(VideoStreamTrack, track),
                 max_buffer=10,
                 fps=1.0,  # Low FPS for VLM
                 name="baseten_vlm_forwarder",
             )
             await self._video_forwarder.start()
         else:
             self._video_forwarder = shared_forwarder
🧹 Nitpick comments (1)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)

70-73: Enhance credential error messages.

The error messages for missing credentials could be more helpful by mentioning the environment variable names.

Apply this diff:

         elif not api_key:
-            raise ValueError("api_key must be provided")
+            raise ValueError("api_key must be provided or set via BASETEN_API_KEY environment variable")
         elif not base_url:
-            raise ValueError("base_url must be provided")
+            raise ValueError("base_url must be provided or set via BASETEN_BASE_URL environment variable")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 029852f and 2851eeb.

📒 Files selected for processing (4)
  • plugins/baseten/example/README.md (1 hunks)
  • plugins/baseten/example/pyproject.toml (1 hunks)
  • plugins/baseten/example/qwen_vl_example.py (1 hunks)
  • plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • plugins/baseten/example/README.md
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/baseten/example/qwen_vl_example.py (4)
agents-core/vision_agents/core/agents/agents.py (3)
  • Agent (125-1357)
  • create_call (698-703)
  • finish (558-591)
agents-core/vision_agents/core/edge/types.py (1)
  • User (15-18)
agents-core/vision_agents/core/agents/agent_launcher.py (1)
  • AgentLauncher (18-117)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (1)
  • simple_response (88-223)
plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-43)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (14-195)
  • start_event_consumer (109-195)
plugins/baseten/vision_agents/plugins/baseten/events.py (1)
  • LLMErrorEvent (7-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (10)
plugins/baseten/example/pyproject.toml (2)

1-13: LGTM!

Project metadata and dependencies align correctly with the example script's imports.


15-20: LGTM!

Workspace source configuration correctly references all plugin dependencies.

plugins/baseten/example/qwen_vl_example.py (4)

1-11: LGTM!

Imports and environment loading are correct.


14-28: LGTM!

Agent construction correctly wires Baseten VLM with edge, STT, TTS, and appropriate instructions.


30-43: Verify that the critical bug in baseten_vlm.py was addressed.

The join flow is correctly structured. However, Line 38 calls agent.simple_response("Describe what you currently see"), which relies on BasetenVLM's simple_response method. Past review comments identified a critical bug where the text parameter is never added to the messages payload (lines 88-157 in baseten_vlm.py), meaning the prompt won't reach the model. Please ensure this bug was fixed before merging.


45-46: LGTM!

CLI entry point correctly wires the AgentLauncher.

plugins/baseten/vision_agents/plugins/baseten/baseten_vlm.py (4)

1-28: LGTM!

Imports and constants are correctly defined.


159-223: LGTM!

Streaming response logic correctly handles API calls, error reporting via LLMErrorEvent, and emits appropriate chunk and completion events.


262-272: LGTM!

The frame iterator correctly processes buffered frames.


275-307: LGTM!

Frame-to-JPEG conversion correctly maintains aspect ratio and uses appropriate resampling. The TODO comment about moving to core utils is a valid future refactoring consideration.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/openai/README.md (1)

19-31: Align the class name in the quickstart snippet
The snippet now imports Realtime, but it still instantiates OpenAIRealtime, which no longer exists under that import path. Please update the example so the constructor matches the imported symbol; otherwise, readers will copy an import/class combination that raises NameError.

-from vision_agents.plugins.openai import Realtime
-
-# Initialize with API key
-sts = OpenAIRealtime(api_key="your_openai_api_key", voice="alloy")
+from vision_agents.plugins.openai import Realtime
+
+# Initialize with API key
+sts = Realtime(api_key="your_openai_api_key", voice="alloy")
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)

324-335: Reset duplicate guard per LLM response
Hi! Once _all_sent_texts learns a sentence it never forgets, so every identical sentence in future responses is dropped (no HeyGen speech) and the set grows without bound in long sessions. A fresh response should get a clean slate.

         if item_id != self._current_response_id:
             if self._text_buffer:
                 text_to_send = self._text_buffer.strip()
                 if text_to_send and text_to_send not in self._all_sent_texts:
                     await self._send_text_to_heygen(text_to_send)
                     self._all_sent_texts.add(text_to_send)
             self._text_buffer = ""
             self._current_response_id = item_id
+            self._all_sent_texts.clear()
🧹 Nitpick comments (6)
examples/other_examples/openai_realtime_webrtc/openai_realtime_example.py (1)

51-51: Minor: Log message timing could be more precise.

The message "is now joining" suggests an action in progress, but at this point the agent has already completed joining (the await agent.join(call) has resolved). Consider "Agent has joined the call" or "Agent joined the call successfully" for clarity.

Apply this diff to improve clarity:

-        logger.info("Agent is now joining the call")
+        logger.info("Agent has joined the call")
agents-core/vision_agents/core/utils/video_utils.py (2)

32-34: Validate JPEG quality parameter.

The quality parameter lacks bounds checking. JPEG quality should typically be in the range 1-100. Invalid values may cause unexpected behavior or errors during encoding.

Consider adding validation:

 def frame_to_jpeg_bytes(
     frame: av.VideoFrame, target_width: int, target_height: int, quality: int = 85
 ) -> bytes:
     """
     Convert a video frame to JPEG bytes with resizing.
 
     Args:
         frame: an instance of `av.VideoFrame`.
         target_width: target width in pixels.
         target_height: target height in pixels.
-        quality: JPEG quality. Default is 85.
+        quality: JPEG quality (1-100). Default is 85.
 
     Returns: frame as JPEG bytes.
 
     """
+    if not 1 <= quality <= 100:
+        raise ValueError("JPEG quality must be between 1 and 100")
+    
     # Convert frame to a PIL image
     img = frame.to_image()

Also applies to: 42-42, 62-62


50-58: Consider whether upscaling is intended behavior.

The current implementation will upscale images when the source dimensions are smaller than the target dimensions (scale > 1). Upscaling can degrade image quality and may not be the intended behavior for a video frame processing utility. Consider clamping the scale factor to prevent upscaling:

     # Calculate scale factor (fit within target dimensions)
     scale = min(target_width / src_width, target_height / src_height)
+    # Optional: prevent upscaling by clamping scale to 1.0
+    scale = min(scale, 1.0)
+    
     new_width = int(src_width * scale)
     new_height = int(src_height * scale)

If upscaling is intentional, consider documenting this behavior in the docstring.

plugins/openai/vision_agents/plugins/openai/__init__.py (1)

4-7: Export ChatCompletionsVLM alongside the LLM variant
You import ChatCompletionsVLM, but it’s missing from __all__, so from vision_agents.plugins.openai import * (used in docs/examples) won’t pick it up. Please add it to the export list for consistency with the other public classes.

-__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM"]
+__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM", "ChatCompletionsVLM"]
plugins/openai/examples/qwen_vl_example/README.md (1)

4-4: Clarify video processing direction.

The phrase "accepts text and video and responds with text vocalised" could mislead readers into thinking users send video to the agent. Based on the example code, the agent processes video frames internally and sends them to the VLM—users interact via voice/text only.

Consider revising to: "The model processes video frames from the call and responds with text vocalized with the TTS service of your choice."

agents-core/vision_agents/core/cli/cli_runner.py (1)

181-184: Consider using a more robust pattern for capability detection.

The nested hasattr checks work but are somewhat fragile. If the edge interface is expected to have open_demo_for_agent, consider using a protocol or abstract base class to make this contract explicit.

For example, you could define a protocol:

from typing import Protocol

class DemoCapableEdge(Protocol):
    async def open_demo_for_agent(self, agent: "Agent", call_type: str, call_id: str) -> str:
        ...

Then use isinstance checking or type narrowing instead of hasattr.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2851eeb and 4f85895.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (71)
  • .cursor/rules/python.mdc (1 hunks)
  • README.md (1 hunks)
  • agents-core/pyproject.toml (3 hunks)
  • agents-core/vision_agents/core/agents/agent_launcher.py (1 hunks)
  • agents-core/vision_agents/core/agents/agent_options.py (1 hunks)
  • agents-core/vision_agents/core/agents/agents.py (17 hunks)
  • agents-core/vision_agents/core/cli/cli_runner.py (2 hunks)
  • agents-core/vision_agents/core/processors/base_processor.py (1 hunks)
  • agents-core/vision_agents/core/utils/audio_queue.py (1 hunks)
  • agents-core/vision_agents/core/utils/video_forwarder.py (1 hunks)
  • agents-core/vision_agents/core/utils/video_queue.py (1 hunks)
  • agents-core/vision_agents/core/utils/video_track.py (2 hunks)
  • agents-core/vision_agents/core/utils/video_utils.py (1 hunks)
  • examples/01_simple_agent_example/README.md (1 hunks)
  • examples/01_simple_agent_example/simple_agent_example.py (3 hunks)
  • examples/02_golf_coach_example/golf_coach_example.py (0 hunks)
  • examples/other_examples/09_github_mcp_demo/gemini_realtime_github_mcp_demo.py (0 hunks)
  • examples/other_examples/09_github_mcp_demo/github_mcp_demo.py (0 hunks)
  • examples/other_examples/09_github_mcp_demo/openai_realtime_github_mcp_demo.py (0 hunks)
  • examples/other_examples/gemini_live_realtime/gemini_live_example.py (0 hunks)
  • examples/other_examples/openai_realtime_webrtc/openai_realtime_example.py (1 hunks)
  • examples/other_examples/plugins_examples/audio_moderation/main.py (0 hunks)
  • examples/other_examples/plugins_examples/mcp/main.py (0 hunks)
  • examples/other_examples/plugins_examples/stt_deepgram_transcription/main.py (0 hunks)
  • examples/other_examples/plugins_examples/stt_moonshine_transcription/main.py (0 hunks)
  • examples/other_examples/plugins_examples/tts_cartesia/main.py (0 hunks)
  • examples/other_examples/plugins_examples/tts_elevenlabs/main.py (0 hunks)
  • examples/other_examples/plugins_examples/tts_kokoro/main.py (0 hunks)
  • examples/other_examples/plugins_examples/vad_silero/main.py (0 hunks)
  • examples/other_examples/plugins_examples/video_moderation/main.py (0 hunks)
  • examples/other_examples/plugins_examples/wizper_stt_translate/main.py (0 hunks)
  • plugins/aws/example/aws_llm_function_calling_example.py (0 hunks)
  • plugins/aws/example/aws_qwen_example.py (0 hunks)
  • plugins/aws/example/aws_realtime_function_calling_example.py (0 hunks)
  • plugins/aws/example/aws_realtime_nova_example.py (0 hunks)
  • plugins/fish/example/fish_example.py (0 hunks)
  • plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2 hunks)
  • plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1 hunks)
  • plugins/heygen/README.md (0 hunks)
  • plugins/heygen/example/avatar_example.py (0 hunks)
  • plugins/heygen/example/avatar_realtime_example.py (0 hunks)
  • plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (11 hunks)
  • plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2 hunks)
  • plugins/moondream/README.md (5 hunks)
  • plugins/moondream/example/README.md (1 hunks)
  • plugins/moondream/example/moondream_vlm_example.py (1 hunks)
  • plugins/moondream/example/pyproject.toml (1 hunks)
  • plugins/moondream/tests/test_moondream_local.py (4 hunks)
  • plugins/moondream/tests/test_moondream_local_vlm.py (1 hunks)
  • plugins/moondream/tests/test_moondream_vlm.py (1 hunks)
  • plugins/moondream/vision_agents/plugins/moondream/__init__.py (1 hunks)
  • plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (4 hunks)
  • plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (6 hunks)
  • plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (2 hunks)
  • plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1 hunks)
  • plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1 hunks)
  • plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1 hunks)
  • plugins/openai/README.md (1 hunks)
  • plugins/openai/examples/qwen_vl_example/README.md (1 hunks)
  • plugins/openai/examples/qwen_vl_example/pyproject.toml (1 hunks)
  • plugins/openai/examples/qwen_vl_example/qwen_vl_example.py (1 hunks)
  • plugins/openai/tests/test_chat_completions.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/__init__.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1 hunks)
  • plugins/openrouter/example/openrouter_example.py (0 hunks)
  • plugins/sample_plugin/example/my_example.py (0 hunks)
  • plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py (2 hunks)
  • tests/test_audio_queue.py (1 hunks)
  • tests/test_queue_and_video_forwarder.py (9 hunks)
💤 Files with no reviewable changes (25)
  • examples/other_examples/plugins_examples/tts_elevenlabs/main.py
  • examples/other_examples/plugins_examples/vad_silero/main.py
  • plugins/heygen/example/avatar_realtime_example.py
  • examples/02_golf_coach_example/golf_coach_example.py
  • examples/other_examples/plugins_examples/tts_cartesia/main.py
  • examples/other_examples/09_github_mcp_demo/github_mcp_demo.py
  • plugins/heygen/example/avatar_example.py
  • plugins/aws/example/aws_qwen_example.py
  • examples/other_examples/gemini_live_realtime/gemini_live_example.py
  • plugins/aws/example/aws_realtime_function_calling_example.py
  • examples/other_examples/plugins_examples/audio_moderation/main.py
  • plugins/aws/example/aws_realtime_nova_example.py
  • examples/other_examples/plugins_examples/wizper_stt_translate/main.py
  • examples/other_examples/plugins_examples/video_moderation/main.py
  • examples/other_examples/plugins_examples/tts_kokoro/main.py
  • examples/other_examples/09_github_mcp_demo/gemini_realtime_github_mcp_demo.py
  • plugins/aws/example/aws_llm_function_calling_example.py
  • examples/other_examples/plugins_examples/stt_deepgram_transcription/main.py
  • examples/other_examples/09_github_mcp_demo/openai_realtime_github_mcp_demo.py
  • plugins/openrouter/example/openrouter_example.py
  • examples/other_examples/plugins_examples/mcp/main.py
  • plugins/heygen/README.md
  • plugins/sample_plugin/example/my_example.py
  • examples/other_examples/plugins_examples/stt_moonshine_transcription/main.py
  • plugins/fish/example/fish_example.py
🧰 Additional context used
🧬 Code graph analysis (30)
agents-core/vision_agents/core/agents/agent_launcher.py (1)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
  • warmup (118-120)
agents-core/vision_agents/core/utils/video_track.py (1)
agents-core/vision_agents/core/utils/video_queue.py (1)
  • VideoLatestNQueue (6-28)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (2)
plugins/moondream/tests/test_moondream_local.py (3)
  • is_available (188-189)
  • is_available (216-217)
  • is_available (244-245)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
  • device (114-116)
plugins/moondream/tests/test_moondream_local_vlm.py (3)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (4)
  • LocalVLM (31-349)
  • warmup (96-99)
  • close (343-349)
  • simple_response (313-334)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (2)
  • warmup (118-120)
  • close (310-318)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)
  • close (241-246)
  • simple_response (197-218)
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (1)
agents-core/vision_agents/core/utils/video_queue.py (1)
  • VideoLatestNQueue (6-28)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (2)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • add_frame_handler (48-74)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1)
  • _send_video_frame (435-447)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
agents-core/vision_agents/core/utils/video_queue.py (1)
  • VideoLatestNQueue (6-28)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (6)
plugins/openai/tests/test_chat_completions.py (1)
  • llm (37-40)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLM (49-418)
  • LLMResponseEvent (38-42)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-44)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
  • simple_response (90-185)
  • _build_model_request (238-284)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
plugins/moondream/example/moondream_vlm_example.py (2)
agents-core/vision_agents/core/agents/agents.py (7)
  • Agent (93-1262)
  • create_user (741-753)
  • create_call (755-760)
  • subscribe (452-464)
  • simple_response (428-441)
  • join (466-549)
  • finish (578-611)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (2)
  • CloudVLM (27-246)
  • simple_response (197-218)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (2)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • add_frame_handler (48-74)
plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)
  • _send_video_frame (268-274)
agents-core/vision_agents/core/cli/cli_runner.py (1)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (1)
  • open_demo_for_agent (350-354)
plugins/moondream/tests/test_moondream_vlm.py (1)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (3)
  • CloudVLM (27-246)
  • close (241-246)
  • simple_response (197-218)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (3)
  • HeyGenRTCManager (19-267)
  • connect (60-145)
  • close (256-267)
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)
  • HeyGenVideoTrack (14-187)
  • stop (178-187)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (3)
  • connect (187-200)
  • Realtime (53-679)
  • close (372-386)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (3)
  • connect (80-106)
  • Realtime (40-487)
  • close (153-154)
agents-core/vision_agents/core/llm/events.py (3)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
  • RealtimeAgentSpeechTranscriptionEvent (148-153)
agents-core/vision_agents/core/edge/types.py (1)
  • write (45-45)
tests/test_audio_queue.py (1)
agents-core/vision_agents/core/utils/audio_queue.py (11)
  • AudioQueue (12-274)
  • empty (36-38)
  • put (50-83)
  • qsize (40-42)
  • get (119-136)
  • put_nowait (85-117)
  • get_nowait (138-152)
  • get_samples (154-237)
  • get_duration (239-258)
  • get_buffer_info (260-274)
  • _current_duration_ms (44-48)
plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (1)
  • CloudVLM (27-246)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (1)
  • LocalVLM (31-349)
plugins/openai/vision_agents/plugins/openai/__init__.py (2)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
  • ChatCompletionsLLM (23-180)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
  • ChatCompletionsVLM (31-284)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (3)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
  • MoondreamVideoTrack (16-79)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • add_frame_handler (48-74)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
  • _process_and_add_frame (283-308)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
agents-core/vision_agents/core/agents/agents.py (2)
  • create_user (741-753)
  • create_call (755-760)
agents-core/vision_agents/core/edge/edge_transport.py (2)
  • create_user (30-31)
  • open_demo (42-43)
plugins/openai/tests/test_chat_completions.py (6)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
  • ChatCompletionsLLM (23-180)
  • simple_response (65-160)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)
  • ChatCompletionsVLM (31-284)
  • watch_video_track (187-224)
  • simple_response (90-185)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
agents-core/vision_agents/core/events/manager.py (1)
  • wait (474-487)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (5)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (24-147)
  • add_frame_handler (48-74)
agents-core/vision_agents/core/utils/video_utils.py (1)
  • frame_to_jpeg_bytes (32-63)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
  • simple_response (65-160)
  • _build_model_request (162-180)
plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py (2)
agents-core/vision_agents/core/edge/sfu_events.py (1)
  • name (2197-2201)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • add_frame_handler (48-74)
plugins/moondream/tests/test_moondream_local.py (1)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (1)
  • LocalDetectionProcessor (28-318)
tests/test_queue_and_video_forwarder.py (3)
agents-core/vision_agents/core/utils/video_queue.py (1)
  • VideoLatestNQueue (6-28)
conftest.py (1)
  • bunny_video_track (300-344)
agents-core/vision_agents/core/utils/video_forwarder.py (4)
  • VideoForwarder (24-147)
  • add_frame_handler (48-74)
  • stop (102-112)
  • remove_frame_handler (76-92)
plugins/openai/examples/qwen_vl_example/qwen_vl_example.py (3)
agents-core/vision_agents/core/agents/agents.py (3)
  • Agent (93-1262)
  • create_call (755-760)
  • finish (578-611)
agents-core/vision_agents/core/edge/types.py (1)
  • User (15-18)
agents-core/vision_agents/core/agents/agent_launcher.py (1)
  • AgentLauncher (18-125)
agents-core/vision_agents/core/processors/base_processor.py (1)
agents-core/vision_agents/core/edge/sfu_events.py (1)
  • name (2197-2201)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (6)
agents-core/vision_agents/core/stt/events.py (1)
  • STTTranscriptEvent (16-47)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/utils/video_forwarder.py (3)
  • VideoForwarder (24-147)
  • add_frame_handler (48-74)
  • stop (102-112)
agents-core/vision_agents/core/utils/video_queue.py (2)
  • VideoLatestNQueue (6-28)
  • put_latest_nowait (22-28)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (10)
  • watch_video_track (168-200)
  • _stop_watching_video_track (336-341)
  • _on_frame_received (202-208)
  • _setup_stt_subscription (210-217)
  • on_stt_transcript (216-217)
  • _on_stt_transcript (306-311)
  • _consume_stream (219-230)
  • _process_frame (232-304)
  • simple_response (313-334)
  • close (343-349)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
agents-core/vision_agents/core/utils/video_queue.py (2)
  • VideoLatestNQueue (6-28)
  • put_latest (14-20)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (3)
  • parse_detection_bbox (13-31)
  • annotate_detections (48-111)
  • handle_device (7-11)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (1)
  • MoondreamVideoTrack (16-79)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (3)
  • warmup (96-99)
  • _prepare_moondream (101-109)
  • _load_model_sync (111-166)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_cloud_processor.py (4)
  • process_video (105-146)
  • _process_and_add_frame (208-237)
  • _run_inference (166-178)
  • _run_detection_sync (180-206)
agents-core/vision_agents/core/agents/agents.py (9)
agents-core/vision_agents/core/agents/agent_options.py (3)
  • AgentOptions (6-16)
  • default_agent_options (23-24)
  • update (9-16)
agents-core/vision_agents/core/edge/sfu_events.py (22)
  • ParticipantJoinedEvent (1481-1526)
  • participant (1496-1501)
  • participant (1504-1507)
  • participant (1545-1550)
  • participant (1553-1556)
  • participant (1625-1630)
  • participant (1633-1636)
  • participant (2100-2105)
  • participant (2108-2111)
  • participant (2156-2161)
  • participant (2164-2167)
  • Participant (229-270)
  • track_type (579-583)
  • track_type (1193-1197)
  • track_type (2289-2293)
  • user_id (489-493)
  • user_id (856-860)
  • user_id (901-905)
  • user_id (1186-1190)
  • user_id (2093-2097)
  • user_id (2142-2146)
  • name (2197-2201)
agents-core/vision_agents/core/utils/audio_queue.py (4)
  • AudioQueue (12-274)
  • put (50-83)
  • get_duration (239-258)
  • get (119-136)
agents-core/vision_agents/core/edge/types.py (4)
  • Participant (22-24)
  • Connection (27-35)
  • OutputAudioTrack (39-47)
  • write (45-45)
agents-core/vision_agents/core/utils/video_forwarder.py (1)
  • VideoForwarder (24-147)
agents-core/vision_agents/core/events/manager.py (4)
  • send (428-472)
  • subscribe (301-370)
  • wait (474-487)
  • unsubscribe (274-299)
agents-core/vision_agents/core/edge/events.py (3)
  • TrackAddedEvent (18-24)
  • TrackRemovedEvent (28-34)
  • AudioReceivedEvent (9-14)
plugins/getstream/vision_agents/plugins/getstream/stream_edge_transport.py (2)
  • join (256-307)
  • add_track_subscriber (319-322)
agents-core/vision_agents/core/llm/llm.py (4)
  • simple_audio_response (428-440)
  • set_conversation (194-204)
  • watch_video_track (458-471)
  • LLM (49-418)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_local_vlm.py (9)
agents-core/vision_agents/core/agents/agent_options.py (2)
  • AgentOptions (6-16)
  • default_agent_options (23-24)
agents-core/vision_agents/core/stt/events.py (1)
  • STTTranscriptEvent (16-47)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/utils/video_forwarder.py (3)
  • VideoForwarder (24-147)
  • add_frame_handler (48-74)
  • stop (102-112)
agents-core/vision_agents/core/utils/video_queue.py (2)
  • VideoLatestNQueue (6-28)
  • put_latest_nowait (22-28)
plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)
  • handle_device (7-11)
plugins/moondream/vision_agents/plugins/moondream/detection/moondream_local_processor.py (4)
  • device (114-116)
  • warmup (118-120)
  • _prepare_moondream (122-132)
  • close (310-318)
plugins/moondream/vision_agents/plugins/moondream/vlm/moondream_cloud_vlm.py (10)
  • watch_video_track (66-98)
  • _stop_watching_video_track (220-225)
  • _on_frame_received (100-106)
  • _setup_stt_subscription (108-115)
  • on_stt_transcript (114-115)
  • _on_stt_transcript (190-195)
  • _consume_stream (117-130)
  • _process_frame (132-188)
  • simple_response (197-218)
  • close (241-246)
🪛 LanguageTool
plugins/moondream/example/README.md

[typographical] ~1-~1: Consider adding a comma here.
Context: ## Moondream example Please see root readme for details.

(PLEASE_COMMA)

plugins/moondream/README.md

[uncategorized] ~8-~8: Possible missing comma found.
Context: ...s Choose between cloud-hosted or local processing depending on your needs. When running l...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~164-~164: Possible missing article found.
Context: ... the model from HuggingFace and runs on device. It supports both VQA and captioning mo...

(AI_HYDRA_LEO_MISSING_THE)


[uncategorized] ~233-~233: Possible missing comma found.
Context: ...ry configuration. If not provided, uses default which defaults to tempfile.gettempdir()...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~239-~239: Loose punctuation mark.
Context: ...e. ### CloudVLM Parameters - api_key: str - API key for Moondream Cloud API. ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~240-~240: Loose punctuation mark.
Context: ..._API_KEYenvironment variable. -mode`: Literal["vqa", "caption"] - "vqa" for v...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~247-~247: Loose punctuation mark.
Context: ...mits. ### LocalVLM Parameters - mode: Literal["vqa", "caption"] - "vqa" for v...

(UNLIKELY_OPENING_PUNCTUATION)

plugins/openai/examples/qwen_vl_example/README.md

[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...onment Variables - OPENAI_API_KEY: Your Baseten API key (required) - **`OP...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~74-~74: Loose punctuation mark.
Context: ...al) ) ``` ### Parameters - model: The name of the Baseten-hosted model to...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~75-~75: Loose punctuation mark.
Context: ... a vision-capable model. - api_key: Your Baseten API key. If not provided, ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~76-~76: Loose punctuation mark.
Context: ... environment variable. - **base_url`**: The base URL for Baseten API. If not pr...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...E_URL environment variable. - **fps`**: Number of video frames per second to ca...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Possible missing comma found.
Context: ...the model. Lower values reduce API costs but may miss fast-moving content. Default i...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...t is 1 fps. - frame_buffer_seconds: How many seconds of video to buffer. To...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~79-~79: Loose punctuation mark.
Context: .... Default is 10 seconds. - **client**: Optional pre-configured AsyncOpenAI` c...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~98-~98: Loose punctuation mark.
Context: ...g events: - LLMResponseChunkEvent: Emitted for each text chunk in the stre...

(UNLIKELY_OPENING_PUNCTUATION)


[grammar] ~114-~114: It appears that a hyphen is missing in the plural noun “to-dos”?
Context: ...ing support is not yet implemented (see TODOs in code). ## Troubleshooting - **No v...

(TO_DO_HYPHEN)


[uncategorized] ~119-~119: Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short).
Context: ... and OPENAI_BASE_URL are set correctly and the model name is valid. - **High laten...

(COMMA_COMPOUND_SENTENCE_2)

🪛 markdownlint-cli2 (0.18.1)
plugins/moondream/README.md

167-167: Bare URL used

(MD034, no-bare-urls)

README.md

165-165: Images should have alternate text (alt text)

(MD045, no-alt-text)


165-165: Images should have alternate text (alt text)

(MD045, no-alt-text)


165-165: Images should have alternate text (alt text)

(MD045, no-alt-text)


169-169: Images should have alternate text (alt text)

(MD045, no-alt-text)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
🔇 Additional comments (25)
agents-core/pyproject.toml (2)

24-24: Verify getstream 2.5.9 requirement and ensure changelog alignment.

The bump from ≥2.5.8 to ≥2.5.9 suggests upstream changes are needed for the conversation management wiring. Ensure that the 2.5.9 release contains the necessary changes to support the LLM.set_conversation(…) pattern introduced in this PR.


85-95: Consolidate commented sources section and verify git revision.

The [tool.uv.sources] section and its contents are all commented out, creating redundant commenting. Clarify the intent:

  • If these configurations are legacy/unused, remove them entirely.
  • If they document alternate configurations, consolidate under a single comment block explaining their purpose.
  • If line 94 is an active development reference, uncomment the section header and activate that line.

Additionally, verify that the git revision 85bd8ef00859ef6ed5ef4ffe7b7f40ae12d12973 exists in the GetStream/stream-py repository and is the correct commit for supporting the conversation management changes in this PR.

plugins/moondream/example/pyproject.toml (2)

16-22: All workspace packages verified and properly configured.

The verification confirms that all seven workspace dependencies referenced in the configuration exist in the monorepo with correct pyproject.toml definitions:

  • vision-agents (agents-core/pyproject.toml)
  • vision-agents-plugins-moondream (plugins/moondream/pyproject.toml)
  • vision-agents-plugins-getstream (plugins/getstream/pyproject.toml)
  • vision-agents-plugins-deepgram (plugins/deepgram/pyproject.toml)
  • vision-agents-plugins-elevenlabs (plugins/elevenlabs/pyproject.toml)
  • vision-agents-plugins-vogent (plugins/vogent/pyproject.toml)

The workspace configuration is valid.


1-5: Python version requirement is consistent — no action needed.

The vision-agents package requires Python 3.10 or newer, which aligns precisely with the requires-python = ">=3.10" constraint specified in the project metadata. No conflicts or misalignment.

plugins/openai/examples/qwen_vl_example/pyproject.toml (2)

5-5: No changes needed — Python version requirement is correctly aligned.

The verification confirms that requires-python = ">=3.10" in plugins/openai/examples/qwen_vl_example/pyproject.toml matches the core dependency requirement in agents-core/pyproject.toml (also >=3.10) and aligns with the vast majority of the workspace. The file is consistent with the project baseline.


1-20: Dependencies are correct; Baseten is a service provider, not a separate plugin.

Baseten is an OpenAI-compatible VLM provider that integrates through the existing vision-agents-plugins-openai dependency by configuring the API endpoint and credentials. No separate vision-agents-plugins-baseten package exists in the repository. The example's dependencies are appropriately configured.

Likely an incorrect or invalid review comment.

plugins/ultralytics/vision_agents/plugins/ultralytics/yolo_pose_processor.py (1)

60-60: LGTM: Processor name attribute added.

The name class attribute properly identifies this processor instance.

agents-core/vision_agents/core/utils/audio_queue.py (1)

154-237: LGTM on get_samples complexity.

The splitting logic correctly handles partial chunk consumption and maintains sample accounting. The timeout-based waiting and metadata preservation are well implemented.

examples/01_simple_agent_example/README.md (1)

88-91: LGTM on documentation update.

The updated flow correctly reflects the new CLI behavior where the demo UI opens automatically. The note about the --no-demo flag is helpful.

plugins/openai/vision_agents/plugins/openai/rtc_manager.py (1)

293-296: LGTM on video forwarding refactor.

The shift from start_event_consumer to add_frame_handler aligns with the new frame-handler-based architecture. The handler registration with fps and name parameters is clean and consistent with the VideoForwarder API.

plugins/moondream/vision_agents/plugins/moondream/moondream_utils.py (1)

7-11: LGTM on device handling utility.

The handle_device() function provides a clean, centralized way to select compute device and precision. The CUDA detection with CPU fallback is appropriate.

plugins/moondream/vision_agents/plugins/moondream/detection/moondream_video_track.py (2)

8-8: LGTM! Import updated correctly.

The import path and class name have been updated to use the renamed VideoLatestNQueue.


30-30: LGTM! Type annotation updated correctly.

The frame queue type annotation correctly uses VideoLatestNQueue.

examples/01_simple_agent_example/simple_agent_example.py (3)

7-7: LGTM! Imports simplified.

Removed vogent from imports, consistent with the switch to Deepgram's built-in turn detection.


29-29: LGTM! Clarified turn detection behavior.

The comment helpfully explains that turn detection is not needed with Deepgram. This simplifies the agent configuration.


39-40: LGTM! Simplified join_call flow.

The removal of explicit user creation and demo opening aligns with the new CLI-controlled demo opening pattern (via --no-demo flag).

plugins/moondream/vision_agents/plugins/moondream/__init__.py (2)

5-6: LGTM! Docstring updated.

The capability description now accurately reflects visual question answering and captioning without mentioning counting.


8-13: LGTM! Public API expanded with VLM support.

The addition of CloudVLM and LocalVLM imports and exports expands the plugin's capabilities with vision-language model support. The absolute import paths are clear and maintainable.

agents-core/vision_agents/core/utils/video_queue.py (1)

6-6: LGTM! Class renamed for clarity.

The rename from LatestNQueue to VideoLatestNQueue makes the purpose more explicit and better reflects its use in video frame buffering contexts.

agents-core/vision_agents/core/utils/video_track.py (2)

7-7: LGTM! Import updated correctly.

The import path and class name have been updated to use the renamed VideoLatestNQueue.


20-20: LGTM! Type annotation updated correctly.

The frame queue type annotation correctly uses VideoLatestNQueue.

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)

9-9: LGTM! Import updated correctly.

The import path and class name have been updated to use the renamed VideoLatestNQueue.


35-35: LGTM! Type annotation updated correctly.

The frame queue type annotation correctly uses VideoLatestNQueue. The maxlen=2 is appropriately small for HeyGen's low-latency requirements.

agents-core/vision_agents/core/cli/cli_runner.py (2)

153-158: LGTM! New CLI flag added.

The --no-demo flag provides users control over whether the demo UI opens automatically. Good UX improvement.


159-159: LGTM! Function signature extended correctly.

The no_demo parameter is properly added to the function signature.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
agents-core/vision_agents/core/utils/video_utils.py (1)

32-63: Validate dimensions to prevent division by zero.

The scale calculation on line 53 divides by src_width and src_height without validation. If either dimension is zero (or if target_width/target_height are zero or negative), this will raise a runtime error. Add defensive checks before the division.

Apply this diff to add validation:

 def frame_to_jpeg_bytes(
     frame: av.VideoFrame, target_width: int, target_height: int, quality: int = 85
 ) -> bytes:
     """
     Convert a video frame to JPEG bytes with resizing.
 
     Args:
         frame: an instance of `av.VideoFrame`.
         target_width: target width in pixels.
         target_height: target height in pixels.
         quality: JPEG quality. Default is 85.
 
     Returns: frame as JPEG bytes.
 
     """
+    if target_width <= 0 or target_height <= 0:
+        raise ValueError("Target dimensions must be positive")
+    
     # Convert frame to a PIL image
     img = frame.to_image()
 
     # Calculate scaling to maintain aspect ratio
     src_width, src_height = img.size
+    if src_width == 0 or src_height == 0:
+        raise ValueError(f"Source frame has invalid dimensions: {src_width}x{src_height}")
+    
     # Calculate scale factor (fit within target dimensions)
     scale = min(target_width / src_width, target_height / src_height)
plugins/openai/tests/test_chat_completions.py (1)

36-47: Use set_conversation instead of direct assignment.

Both fixtures (lines 39 and 46) directly assign to the private _conversation attribute, bypassing the public set_conversation method introduced in this PR. Tests should exercise the real API that agents use.

Apply this diff:

 @pytest.fixture()
 async def llm(openai_client_mock, conversation):
     llm_ = ChatCompletionsLLM(client=openai_client_mock, model="test")
-    llm_._conversation = conversation
+    llm_.set_conversation(conversation)
     return llm_


 @pytest.fixture()
 async def vlm(openai_client_mock, conversation):
     llm_ = ChatCompletionsVLM(client=openai_client_mock, model="test")
-    llm_._conversation = conversation
+    llm_.set_conversation(conversation)
     return llm_
🧹 Nitpick comments (3)
plugins/openai/examples/qwen_vl_example/README.md (1)

56-57: Resolve past review comment: clarify environment variable naming convention.

Baseten officially uses BASETEN_API_KEY as the standard environment variable, yet this README uses OPENAI_API_KEY and OPENAI_BASE_URL. While this pattern is valid for the OpenAI-compatible client approach, it creates confusion for developers who might expect Baseten's standard naming.

Recommend one of these approaches:

  1. Document the mapping (preferred): Add a note explaining that OPENAI_* variables are used because the OpenAI client is instantiated with Baseten's OpenAI-compatible endpoint. Consider showing both naming conventions:

    - **`OPENAI_API_KEY`**: Your Baseten API key (set this to your value from `BASETEN_API_KEY`)
    - **`OPENAI_BASE_URL`**: The base URL for your Baseten API endpoint (set this to your value from `BASETEN_BASE_URL`)
    

    And add: "See .env.example for the canonical BASETEN_* variable names if you prefer to use those for clarity."

  2. Align with Baseten's convention: Update the code example and README to explicitly use BASETEN_API_KEY and BASETEN_BASE_URL environment variables, remapping them when creating the OpenAI client.

agents-core/vision_agents/core/agents/agents.py (1)

540-541: Remove duplicate set_conversation call.

The conversation is set twice in the join flow—once immediately after creation (line 541) and again after the optional wait_for_participant (line 548). This duplication is unnecessary; the LLM only needs the conversation set once. Consider removing the first call and keeping only the second one after all participant logic has completed.

Apply this diff to remove the duplicate:

         # wait for conversation creation coro at the very end of the join flow
         self.conversation = await create_conversation_coro
-        # Provide conversation to the LLM so it can access the chat history.
-        self.llm.set_conversation(self.conversation)
 
         if wait_for_participant:
             self.logger.info("Agent is ready, waiting for participant to join")

Also applies to: 547-548

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)

210-220: Remove redundant expression in forwarder creation.

On line 211, within the if not shared_forwarder: block, the expression shared_forwarder or VideoForwarder(...) is redundant. Since shared_forwarder is guaranteed to be falsy inside this branch, the shared_forwarder or part is dead code and can be removed for clarity.

Apply this diff:

         if not shared_forwarder:
-            self._video_forwarder = shared_forwarder or VideoForwarder(
+            self._video_forwarder = VideoForwarder(
                 cast(VideoStreamTrack, track),
                 max_buffer=10,
                 fps=1.0,  # Low FPS for VLM
                 name=f"{PLUGIN_NAME}_forwarder",
             )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4f85895 and f3d5b11.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • agents-core/vision_agents/core/agents/agents.py (1 hunks)
  • agents-core/vision_agents/core/llm/llm.py (3 hunks)
  • agents-core/vision_agents/core/utils/video_utils.py (1 hunks)
  • plugins/anthropic/tests/test_anthropic_llm.py (2 hunks)
  • plugins/aws/tests/test_aws.py (1 hunks)
  • plugins/gemini/tests/test_gemini_llm.py (5 hunks)
  • plugins/openai/README.md (1 hunks)
  • plugins/openai/examples/qwen_vl_example/README.md (1 hunks)
  • plugins/openai/examples/qwen_vl_example/pyproject.toml (1 hunks)
  • plugins/openai/examples/qwen_vl_example/qwen_vl_example.py (1 hunks)
  • plugins/openai/tests/test_chat_completions.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/__init__.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1 hunks)
  • plugins/openrouter/tests/test_openrouter_llm.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • plugins/anthropic/tests/test_anthropic_llm.py
  • plugins/aws/tests/test_aws.py
  • plugins/openai/examples/qwen_vl_example/qwen_vl_example.py
🧰 Additional context used
🧬 Code graph analysis (8)
plugins/gemini/tests/test_gemini_llm.py (4)
plugins/anthropic/tests/test_anthropic_llm.py (1)
  • llm (18-22)
agents-core/vision_agents/core/llm/llm.py (2)
  • set_conversation (194-204)
  • simple_response (75-81)
agents-core/vision_agents/core/llm/events.py (1)
  • LLMResponseChunkEvent (87-102)
plugins/gemini/vision_agents/plugins/gemini/gemini_llm.py (1)
  • simple_response (68-85)
plugins/openrouter/tests/test_openrouter_llm.py (3)
plugins/anthropic/tests/test_anthropic_llm.py (1)
  • llm (18-22)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLM (49-418)
  • set_conversation (194-204)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
plugins/openai/tests/test_chat_completions.py (5)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
  • ChatCompletionsLLM (23-180)
  • simple_response (65-160)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)
  • ChatCompletionsVLM (31-284)
  • watch_video_track (187-224)
  • simple_response (90-185)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (7)
plugins/openai/tests/test_chat_completions.py (1)
  • llm (37-40)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLM (49-418)
  • LLMResponseEvent (38-42)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-44)
agents-core/vision_agents/core/events/manager.py (1)
  • register_events_from_module (219-256)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
  • simple_response (90-185)
  • _build_model_request (238-284)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
agents-core/vision_agents/core/agents/agents.py (7)
plugins/anthropic/tests/test_anthropic_llm.py (1)
  • llm (18-22)
plugins/aws/tests/test_aws.py (1)
  • llm (35-39)
plugins/gemini/tests/test_gemini_llm.py (1)
  • llm (31-34)
plugins/openrouter/tests/test_openrouter_llm.py (1)
  • llm (61-68)
agents-core/vision_agents/core/llm/llm.py (1)
  • set_conversation (194-204)
plugins/getstream/tests/test_message_chunking.py (2)
  • conversation (15-27)
  • conversation (244-251)
tests/test_conversation.py (1)
  • conversation (66-73)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (6)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (24-147)
  • add_frame_handler (48-74)
agents-core/vision_agents/core/utils/video_utils.py (1)
  • frame_to_jpeg_bytes (32-63)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
  • simple_response (65-160)
  • _build_model_request (162-180)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
plugins/openai/vision_agents/plugins/openai/__init__.py (2)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)
  • ChatCompletionsLLM (23-180)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)
  • ChatCompletionsVLM (31-284)
agents-core/vision_agents/core/llm/llm.py (2)
agents-core/vision_agents/core/utils/utils.py (2)
  • Instructions (35-40)
  • parse_instructions (89-127)
agents-core/vision_agents/core/agents/conversation.py (1)
  • Conversation (67-227)
🪛 LanguageTool
plugins/openai/examples/qwen_vl_example/README.md

[uncategorized] ~56-~56: Loose punctuation mark.
Context: ...onment Variables - OPENAI_API_KEY: Your Baseten API key (required) - **`OP...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~74-~74: Loose punctuation mark.
Context: ...al) ) ``` ### Parameters - model: The name of the Baseten-hosted model to...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~75-~75: Loose punctuation mark.
Context: ... a vision-capable model. - api_key: Your Baseten API key. If not provided, ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~76-~76: Loose punctuation mark.
Context: ... environment variable. - **base_url`**: The base URL for Baseten API. If not pr...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Loose punctuation mark.
Context: ...E_URL environment variable. - **fps`**: Number of video frames per second to ca...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~77-~77: Possible missing comma found.
Context: ...the model. Lower values reduce API costs but may miss fast-moving content. Default i...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~78-~78: Loose punctuation mark.
Context: ...t is 1 fps. - frame_buffer_seconds: How many seconds of video to buffer. To...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~79-~79: Loose punctuation mark.
Context: .... Default is 10 seconds. - **client**: Optional pre-configured AsyncOpenAI` c...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~98-~98: Loose punctuation mark.
Context: ...g events: - LLMResponseChunkEvent: Emitted for each text chunk in the stre...

(UNLIKELY_OPENING_PUNCTUATION)


[grammar] ~114-~114: It appears that a hyphen is missing in the plural noun “to-dos”?
Context: ...ing support is not yet implemented (see TODOs in code). ## Troubleshooting - **No v...

(TO_DO_HYPHEN)


[uncategorized] ~119-~119: Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short).
Context: ... and OPENAI_BASE_URL are set correctly and the model name is valid. - **High laten...

(COMMA_COMPOUND_SENTENCE_2)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (12)
plugins/openai/examples/qwen_vl_example/pyproject.toml (1)

1-21: LGTM! Appropriate dependency structure for the example project.

The pyproject.toml correctly includes the core vision-agents framework and relevant plugins (OpenAI for VLM, GetStream for Edge, Deepgram for STT, ElevenLabs for TTS), with all workspace mappings properly configured. The Python requirement (>=3.10) is consistent with the broader project.

plugins/openai/examples/qwen_vl_example/README.md (1)

74-79: Fix grammar and punctuation issues flagged by static analysis.

Multiple punctuation and grammar corrections are needed to improve documentation clarity:

  • Line 77: Add comma after "content" clause: "...reduce API costs but may miss fast-moving content**,** Default is 1 fps."
  • Line 114: Hyphenate "to-dos": change TODOs to to-dos.
  • Line 119: Add comma in compound sentence: "...are set correctly and the model name is valid." should be "...are set correctly**,** and the model name is valid."

The repeated "Loose punctuation" warnings (lines 56–79, 98) appear to relate to the Markdown list formatting with backticks and dashes; confirm these are false positives by ensuring your Markdown renders correctly.

Also applies to: 98-98, 114-114, 119-119

plugins/openrouter/tests/test_openrouter_llm.py (1)

66-67: Good refactoring to use the public API.

The migration from direct _conversation attribute assignment to the public set_conversation() method properly encapsulates the conversation setup and aligns with the new LLM interface.

plugins/gemini/tests/test_gemini_llm.py (2)

32-33: Good refactoring to use the public API.

The migration from direct _conversation attribute assignment to the public set_conversation() method properly encapsulates the conversation setup and aligns with the new LLM interface.


84-85: Consistent API usage.

Correctly applies the same public set_conversation() pattern to the locally instantiated LLM in this test.

agents-core/vision_agents/core/llm/llm.py (1)

61-63: LGTM! Conversation management API is well designed.

The new set_conversation method provides a clean public API for conversation wiring. The instruction parsing with Instructions type is properly typed, and the separation of concerns between conversation management and instruction handling is clear.

Also applies to: 194-204, 206-210

plugins/openai/vision_agents/plugins/openai/__init__.py (1)

4-7: Verify ChatCompletionsVLM export is intentionally omitted.

ChatCompletionsVLM is imported on line 5 but not included in __all__ on line 7, meaning users cannot import it via from vision_agents.plugins.openai import ChatCompletionsVLM. If this plugin should be publicly available, add it to the exports list.

If the VLM should be exported, apply this diff:

-__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM"]
+__all__ = ["Realtime", "LLM", "TTS", "ChatCompletionsLLM", "ChatCompletionsVLM"]
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (3)

39-64: LGTM! Clean plugin initialization.

The constructor properly initializes the AsyncOpenAI client with sensible defaults and registers events. The flexibility to pass either credentials or a pre-configured client is a good design.


65-160: LGTM! Streaming implementation with proper event emission.

The streaming response handling is well structured. Defensive check for uninitialized conversation prevents errors, and the event emission pattern (chunk events for deltas, completion event at finish) aligns with the framework's event-driven architecture. Error handling properly emits LLMErrorEvent on failures.


162-180: LGTM! Message construction correctly handles conversation context.

The method properly constructs the messages array with system instructions and conversation history. The pattern of sending a system message when participant is None (line 173-175) is a reasonable way to handle direct LLM calls vs. participant-triggered responses.

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)

267-284: LGTM! Frame encoding and message construction is well designed.

The frame-to-JPEG encoding and base64 conversion properly prepares video frames for the model API. Logging the frame count on line 277 aids debugging, and the message structure with image_url content type aligns with OpenAI's multimodal API format.

plugins/openai/tests/test_chat_completions.py (1)

196-256: LGTM! Well-designed test stubs.

The AsyncStreamStub and VideoStreamTrackStub classes provide clean mocks for streaming responses and video frame generation. The use of numpy for random frame data and proper timing metadata (pts, time_base) makes the video stub realistic.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f3d5b11 and 1bc042e.

📒 Files selected for processing (3)
  • agents-core/vision_agents/core/agents/agents.py (1 hunks)
  • plugins/openai/tests/test_chat_completions.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/__init__.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/openai/vision_agents/plugins/openai/init.py
🧰 Additional context used
🧬 Code graph analysis (2)
agents-core/vision_agents/core/agents/agents.py (5)
plugins/anthropic/tests/test_anthropic_llm.py (1)
  • llm (18-22)
plugins/aws/tests/test_aws.py (1)
  • llm (35-39)
plugins/openrouter/tests/test_openrouter_llm.py (1)
  • llm (61-68)
plugins/gemini/tests/test_gemini_llm.py (1)
  • llm (31-34)
agents-core/vision_agents/core/llm/llm.py (1)
  • set_conversation (194-204)
plugins/openai/tests/test_chat_completions.py (7)
agents-core/vision_agents/core/agents/conversation.py (1)
  • InMemoryConversation (230-237)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
  • ChatCompletionsLLM (23-180)
  • simple_response (65-160)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (3)
  • ChatCompletionsVLM (31-284)
  • watch_video_track (187-224)
  • simple_response (90-185)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
agents-core/vision_agents/core/llm/llm.py (1)
  • set_conversation (194-204)
agents-core/vision_agents/core/events/manager.py (1)
  • wait (474-487)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (7)
agents-core/vision_agents/core/agents/agents.py (1)

540-541: Clean refactor to public API.

The change wires the LLM into the conversation context using the new public set_conversation method, which is a cleaner approach than internal mutation. The placement immediately after conversation creation is logical and the comment clearly explains the intent.

As a minor defensive check, you may want to verify that edge.create_conversation never returns None, since set_conversation expects a non-None Conversation instance. If it can return None, consider adding a guard:

self.conversation = await create_conversation_coro
if self.conversation is not None:
    self.llm.set_conversation(self.conversation)
plugins/openai/tests/test_chat_completions.py (6)

36-47: Past review comment addressed correctly.

Both fixtures now call set_conversation, which ensures the full initialization logic (wiring instructions/parsed state) is exercised. This aligns with the real code path used by agents.


104-124: Error handling test looks solid.

The test properly verifies that model failures emit an LLMErrorEvent with the correct error message, ensuring the error path is covered.


127-171: LLM success test validates streaming and event emission correctly.

The test comprehensively checks that:

  • Streaming chunks are accumulated into the final response text
  • Events are emitted in the expected order (2 chunks + 1 completed)
  • Messages sent to the model include conversation history and system prompt

173-193: Error handling path covered.

The failure test ensures LLMErrorEvent is emitted when the underlying client raises an exception.


196-231: Mock stream implementation is well-structured.

AsyncStreamStub correctly implements the async iterator protocol and generates properly formatted ChatCompletionChunk objects, making it suitable for testing streaming behavior.


233-256: Video stub generates synthetic frames appropriately.

The stub creates random VideoFrame objects with timing metadata, providing sufficient fidelity for testing video track consumption. The minimal sleep in recv prevents tight loops while keeping tests fast.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)

28-28: Track or remove the TODO comment.

The TODO suggests updating documentation for the legacy openai.LLM API. Consider creating an issue to track this task or addressing it directly if documentation updates are straightforward.

Would you like me to open an issue to track this documentation update?


87-88: Consider making frame dimensions configurable.

The frame width and height are hardcoded to 800x600. While this provides sensible defaults, different models or use cases might benefit from different resolutions.

If needed, you could add optional frame_width and frame_height parameters to the constructor:

     def __init__(
         self,
         model: str,
         api_key: Optional[str] = None,
         base_url: Optional[str] = None,
         fps: int = 1,
         frame_buffer_seconds: int = 10,
+        frame_width: int = 800,
+        frame_height: int = 600,
         client: Optional[AsyncOpenAI] = None,
     ):
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1)

64-166: Consider extracting shared streaming logic.

The simple_response method in this file is nearly identical to the one in chat_completions_vlm.py (lines 90-192). The primary difference is that the VLM version doesn't have video-specific logic in this method. This duplication could be reduced by extracting the streaming loop logic into a shared helper method in a base class or utility module.

This is acceptable for the initial implementation, but future refactoring could improve maintainability.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1bc042e and 7d33615.

📒 Files selected for processing (2)
  • plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (6)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (24-147)
  • add_frame_handler (48-74)
agents-core/vision_agents/core/utils/video_utils.py (1)
  • frame_to_jpeg_bytes (32-63)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
  • simple_response (64-166)
  • _build_model_request (168-178)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (7)
plugins/openai/tests/test_chat_completions.py (1)
  • llm (37-40)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLM (49-418)
  • LLMResponseEvent (38-42)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-44)
agents-core/vision_agents/core/events/manager.py (1)
  • register_events_from_module (219-256)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
  • simple_response (90-192)
  • _build_model_request (245-283)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy
  • GitHub Check: unit / Test "not integration"
🔇 Additional comments (4)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1)

173-177: Review the finish_reason warning condition.

The code warns when finish_reason is "length" or "content". While "length" typically indicates truncation (a legitimate warning), "content" may indicate content filtering that successfully completed but with filtered output. Consider whether both deserve the same warning level, or if "content" should be handled differently.

plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (3)

22-36: Well-documented plugin class.

The class docstring clearly describes the purpose and features of the ChatCompletionsLLM plugin, with a helpful usage example.


147-151: Review the finish_reason warning condition.

The code warns when finish_reason is "length" or "content". While "length" typically indicates truncation (a legitimate warning), "content" may indicate content filtering that successfully completed but with filtered output. Consider whether both deserve the same warning level, or if "content" should be handled differently.


168-178: Clean request builder implementation.

The _build_model_request method correctly assembles the prompt from instructions and conversation history. The implementation is appropriately simpler than the VLM variant, which also includes video frame handling.

@dangusev dangusev changed the title Baseten integration Add openai.chat_completions package to support OSS models Nov 10, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
plugins/openai/README.md (2)

20-24: Clarify the distinction between Chat Completions and Responses APIs.

The section describes "gpt-4.1" as using the Responses API "under the hood" (line 22), then immediately suggests Chat Completions as an alternative for "legacy" usage (line 24). This framing may confuse readers about which API to use and when—particularly since Chat Completions is still actively maintained, not legacy.

Consider restructuring to clarify:

  • When to use openai.LLM() (Responses API)
  • When to use openai.ChatCompletionsLLM() (Chat Completions API)
  • Why one might be preferred over the other (e.g., model availability, cost, features)

18-40: Verify the model name "gpt-4.1" is current and available.

The example uses openai.LLM("gpt-4.1") as the model identifier, but this does not match any current OpenAI model naming convention. Recent models use names like gpt-4o, gpt-4-turbo, etc.

Update the example to use a valid, currently available model name so it can be executed successfully.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d33615 and 4508013.

📒 Files selected for processing (3)
  • plugins/openai/README.md (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (1 hunks)
  • plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (6)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLMResponseEvent (38-42)
  • VideoLLM (450-471)
agents-core/vision_agents/core/utils/video_forwarder.py (2)
  • VideoForwarder (24-147)
  • add_frame_handler (48-74)
agents-core/vision_agents/core/utils/video_utils.py (1)
  • frame_to_jpeg_bytes (32-63)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (2)
  • simple_response (63-165)
  • _build_model_request (167-177)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_llm.py (7)
plugins/openai/tests/test_chat_completions.py (1)
  • llm (37-40)
agents-core/vision_agents/core/llm/events.py (2)
  • LLMResponseChunkEvent (87-102)
  • LLMResponseCompletedEvent (106-112)
agents-core/vision_agents/core/llm/llm.py (2)
  • LLM (49-418)
  • LLMResponseEvent (38-42)
agents-core/vision_agents/core/processors/base_processor.py (1)
  • Processor (35-44)
agents-core/vision_agents/core/events/manager.py (1)
  • register_events_from_module (219-256)
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py (2)
  • simple_response (90-192)
  • _build_model_request (245-283)
plugins/openai/vision_agents/plugins/openai/events.py (1)
  • LLMErrorEvent (15-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit / Test "not integration"
  • GitHub Check: unit / Ruff & mypy

Comment on lines +59 to +86
### Chat Completions models
The `openai.ChatCompletionsLLM` and `openai.ChatCompletionsVLM` classes provide APIs for text and vision models that use the [Chat Completions API](https://platform.openai.com/docs/api-reference/chat).

# Connect to a call
async with await sts.connect(call, agent_user_id="assistant") as connection:
# Send user message
await sts.send_user_message("Hello, how can you help me?")
They are compatible with popular inference backends such as vLLM, TGI, and Ollama.

# Request assistant response
await sts.request_assistant_response()
For example, you can use them to interact with Qwen 3 VL visual model hosted on [Baseten](https://www.baseten.co/):

```python
from vision_agents.core import User, Agent
from vision_agents.plugins import deepgram, getstream, elevenlabs, vogent, openai

# Instantiate the visual model wrapper
llm = openai.ChatCompletionsVLM(model="qwen3vl")

# Create an agent with video understanding capabilities
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Video Assistant", id="agent"),
instructions="You're a helpful video AI assistant. Analyze the video frames and respond to user questions about what you see.",
llm=llm,
stt=deepgram.STT(),
tts=elevenlabs.TTS(),
turn_detection=vogent.TurnDetection(),
processors=[],
)
```

For full code, see [examples/qwen_vl_example](examples/qwen_vl_example/README.md).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the example directory path exists.

Line 86 references [examples/qwen_vl_example](examples/qwen_vl_example/README.md) as the location of the full Qwen3-VL example code. Confirm that this path and documentation file exist in the repository; otherwise, the link will be broken and users will not find the referenced example.


🏁 Script executed:

#!/bin/bash
# Verify that the Qwen3-VL example directory and README exist
fd -t d -t f "qwen_vl_example" --max-depth 2
fd -t f "README.md" examples/

Length of output: 867


Fix the broken example link or create the missing directory.

The path examples/qwen_vl_example/README.md referenced on line 86 does not exist in the repository. Either create the Qwen3-VL example directory with documentation, or update the link to point to an existing example file. The broken link will lead users to a 404 when they try to access the full code reference.

🤖 Prompt for AI Agents
plugins/openai/README.md lines 59-86: the README links to
examples/qwen_vl_example/README.md which doesn't exist; either create that
directory and add the example README (and any referenced assets/code) matching
the described Qwen3-VL usage, or update the link to point to an existing example
(e.g., correct path to an equivalent example) and ensure the filename and path
are accurate; commit the new directory or the corrected link so the reference no
longer 404s.

Comment on lines +220 to +231
self._video_forwarder = VideoForwarder(
cast(VideoStreamTrack, track),
max_buffer=10,
fps=1.0, # Low FPS for VLM
name=f"{PLUGIN_NAME}_forwarder",
)
self._video_forwarder.start()

# Start buffering video frames
self._video_forwarder.add_frame_handler(
self._frame_buffer.append, fps=self._fps
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix VideoForwarder fps mismatch.
Line 224 hardcodes the forwarder to fps=1.0, but the handler added on Line 230 passes fps=self._fps. The VideoForwarder.add_frame_handler implementation rejects handler FPS values greater than the forwarder FPS, so any valid configuration where self._fps > 1 immediately raises ValueError and breaks video streaming. Please align the forwarder’s FPS with the configured handler FPS (or allow unlimited FPS) so the advertised constructor parameter works.

-            self._video_forwarder = VideoForwarder(
-                cast(VideoStreamTrack, track),
-                max_buffer=10,
-                fps=1.0,  # Low FPS for VLM
-                name=f"{PLUGIN_NAME}_forwarder",
-            )
+            self._video_forwarder = VideoForwarder(
+                cast(VideoStreamTrack, track),
+                max_buffer=10,
+                fps=float(self._fps) if self._fps > 0 else None,
+                name=f"{PLUGIN_NAME}_forwarder",
+            )
🤖 Prompt for AI Agents
In
plugins/openai/vision_agents/plugins/openai/chat_completions/chat_completions_vlm.py
around lines 220 to 231, the VideoForwarder is constructed with fps=1.0 while
the frame handler is registered with fps=self._fps, which causes ValueError when
self._fps > 1; change the forwarder construction to use fps=self._fps (or pass
None/unlimited if VideoForwarder supports it) so the forwarder FPS equals the
handler FPS, and ensure any validation logic remains consistent (i.e., only
register handlers with fps <= forwarder fps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants