feat: implemented heygen avatars #126

d3xvn · 2025-10-27T14:46:41Z

Summary by CodeRabbit

New Features
- HeyGen Avatar plugin: realistic avatars with automatic lip‑sync, WebRTC streaming, video quality options, AvatarPublisher, RTC/session manager, and runtime video/audio tracks.
Behavior Changes
- Agents: improved audio/video publisher handling and media track selection priority for avatar output.
Documentation
- Full plugin docs, quick‑start examples, and troubleshooting guides added.
Tests
- Unit tests added for HeyGen components.
Chores
- HeyGen plugin added to workspace and optional-dependency lists.

Note

Introduces a HeyGen avatar plugin (AvatarPublisher with WebRTC session/RTC/video track), adds examples/tests and workspace wiring, and updates Agent to attach processors and publish audio via audio publishers.

Plugins/HeyGen:
- Add new plugin with AvatarPublisher, HeyGenRTCManager, HeyGenSession, HeyGenVideoTrack, and VideoQuality in plugins/heygen/vision_agents/plugins/heygen/*.
- Include docs and examples (plugins/heygen/README.md, plugins/heygen/example/*) and unit tests (plugins/heygen/tests/*).
- Add package config (plugins/heygen/pyproject.toml, example pyproject.toml).
Agents Core:
- In vision_agents/core/agents/agents.py: attach processors via _attach_agent; treat audio_publishers as a reason to publish_audio; in _prepare_rtc, initialize audio track from first audio_publisher when present.
Ecosystem/Config:
- Wire plugin into optional deps/extras (agents-core/pyproject.toml), workspace members (pyproject.toml, uv.lock), and AWS example lock.
Minor Fixes:
- plugins/gemini/.../gemini_realtime.py: normalize MIME string.
- plugins/openai/.../openai_realtime.py: remove unused imports and minor event handling cleanup.

^{Written by Cursor Bugbot for commit 12cad15. This will update automatically on new commits. Configure here.}

coderabbitai · 2025-10-27T14:47:09Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a HeyGen Avatar plugin (session, RTC manager, video track, AvatarPublisher), examples, tests, and workspace/manifest updates; modifies Agent init to call processors' _attach_agent and prefer publisher-provided audio/video tracks when preparing RTC.

Changes

Cohort / File(s)	Summary
Workspace & Config `agents-core/pyproject.toml`, `pyproject.toml`, `plugins/heygen/pyproject.toml`, `plugins/heygen/example/pyproject.toml`	Register HeyGen plugin in workspace; add `heygen` optional-dependency and include `vision-agents-plugins-heygen`; add plugin and example project configuration and dependencies.
Agent Core `agents-core/vision_agents/core/agents/agents.py`	Call processors' `_attach_agent(self)` during Agent.init; expand publish_audio to consider `audio_publishers`; prefer audio/video tracks from publishers when preparing RTC and log sources.
HeyGen Session & RTC `plugins/heygen/vision_agents/plugins/heygen/heygen_session.py`, `.../heygen_rtc_manager.py`	New HeyGenSession for HTTP session lifecycle (create/start/send_task/stop/close); HeyGenRTCManager manages SDP/ICE, RTCPeerConnection lifecycle, track callbacks, send_text, is_connected, and close.
HeyGen Media & Publisher `plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py`, `.../heygen_avatar_publisher.py`, `.../__init__.py`, `.../heygen_types.py`	Add HeyGenVideoTrack (frame queue, resizing, recv/stop), AvatarPublisher (attach to agent, subscribe to LLM events, buffer/dedupe text, forward audio/video, lifecycle/close/state), exports, and VideoQuality enum.
Examples & Example Config `plugins/heygen/example/avatar_example.py`, `plugins/heygen/example/avatar_realtime_example.py`, `plugins/heygen/example/README.md`	New example scripts demonstrating streaming and realtime avatar usage, plus example README and example pyproject.
Docs & README `plugins/heygen/README.md`	Add plugin README documenting features, installation, configuration, API usage, and troubleshooting.
Tests `plugins/heygen/tests/test_heygen_plugin.py`	Add unit tests for HeyGenSession, HeyGenVideoTrack, HeyGenRTCManager, and AvatarPublisher using mocks.
Submodule metadata `aiortc`	Removed a single submodule gitlink line (metadata-only change).

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant AvatarPublisher
    participant LLM
    participant HeyGenRTCMgr
    participant HeyGenAPI
    participant VideoCall

    Agent->>AvatarPublisher: _attach_agent(agent)
    AvatarPublisher->>AvatarPublisher: _subscribe_to_text_events()
    Agent->>LLM: produce text (streaming/realtime)
    LLM-->>AvatarPublisher: text_chunk / completion / realtime_transcript
    AvatarPublisher->>AvatarPublisher: buffer & dedupe text
    AvatarPublisher->>HeyGenRTCMgr: send_text(text)
    HeyGenRTCMgr->>HeyGenAPI: HTTP /streaming.* (SDP/ICE/task)
    HeyGenAPI-->>HeyGenRTCMgr: media tracks (video/audio)
    HeyGenRTCMgr->>AvatarPublisher: on_video_track / on_audio_track
    AvatarPublisher->>HeyGenVideoTrack: start_receiving(track)
    AvatarPublisher->>VideoCall: publish_video_track()/publish_audio_track()
    VideoCall-->>User: deliver avatar media

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Focus areas:
- plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py — event subscription, buffering/deduplication, audio routing, lifecycle.
- plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py — ICE parsing, SDP/peer-connection flow, timeouts, callbacks.
- plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py — frame resizing, queue behavior, recv timing.
- agents-core/vision_agents/core/agents/agents.py — verify _attach_agent semantics and audio/video-track selection/fallbacks.

Possibly related PRs

Simplify TTS plugin and audio utils #123 — overlapping changes to Agent RTC/audio-track initialization and publisher wiring.
New conversation API #102 — related edits to Agent setup and processor attachment flow.
WIP - Vogent + New Smart TURN + Audio utils usage #128 — related Agent initialization and plugin/workspace registration changes.

Suggested labels

core-agents, examples, tests

Suggested reviewers

yarikdevcom

Poem

The avatar learns my voice like a small, black lesson,
each syllable hammered into a plaster skull of glass,
I feed it sentences and watch the mouth stitch shut and open,
the room swells with rehearsed breaths, a synthetic tide,
and somewhere, a private face remembers how to be human.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main change: implementing HeyGen avatars as a new plugin feature with WebRTC support.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/heygen

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3f5e203 and 12cad15.

📒 Files selected for processing (1)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Cursor Bugbot
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- Removed obsolete heygen_audio_track.py (from old audio-based approach) - Removed unused _audio_sender field and transceiver logic - Removed unused _original_audio_write field - Simplified audio track management - Moved imports to top of file - Updated docstrings to reflect text-based lip-sync approach Fixed duplicate text sending issue: - Added deduplication tracking with _sent_texts set - Added minimum length filter (>15 chars) to prevent tiny fragments - Simplified event handling to avoid duplicate subscriptions - Proper buffer management between chunk and complete events Known limitation: ~3-4 second audio delay is inherent to HeyGen platform

- Add processor._attach_agent() lifecycle hook to Agent.__init__ - Rename HeyGen set_agent() -> _attach_agent() for consistency with LLM - Remove manual agent attachment from examples and docs - HeyGen now works like YOLO - just add to processors list Examples are now much cleaner: agent = Agent(processors=[heygen.AvatarPublisher()]) # That's it! No manual wiring needed.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

plugins/heygen/example/avatar_realtime_example.py (1)
55-57: Consider using agent.simple_response() instead of agent.llm.simple_response().

Calling agent.llm.simple_response() directly bypasses the agent's wrapper method, which may skip tracing and logging functionality. The agent provides a simple_response() method that forwards to the LLM while adding instrumentation.

Apply this diff to use the agent's method:
-        await agent.llm.simple_response(
+        await agent.simple_response(
             text="Hello! I'm your AI assistant. How can I help you today?"
         )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5059336 and f03c81d.

⛔ Files ignored due to path filters (2)

plugins/aws/example/uv.lock is excluded by !**/*.lock
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

agents-core/pyproject.toml (2 hunks)
agents-core/vision_agents/core/agents/agents.py (3 hunks)
aiortc (0 hunks)
plugins/heygen/README.md (1 hunks)
plugins/heygen/example/README.md (1 hunks)
plugins/heygen/example/avatar_example.py (1 hunks)
plugins/heygen/example/avatar_realtime_example.py (1 hunks)
plugins/heygen/example/pyproject.toml (1 hunks)
plugins/heygen/pyproject.toml (1 hunks)
plugins/heygen/tests/test_heygen_plugin.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/__init__.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (1 hunks)
pyproject.toml (2 hunks)

💤 Files with no reviewable changes (1)

aiortc

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/heygen/example/avatar_realtime_example.py
agents-core/vision_agents/core/agents/agents.py
plugins/heygen/vision_agents/plugins/heygen/__init__.py
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py
plugins/heygen/example/avatar_example.py
plugins/heygen/tests/test_heygen_plugin.py
plugins/heygen/vision_agents/plugins/heygen/heygen_session.py
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py

🧬 Code graph analysis (9)

plugins/heygen/example/avatar_realtime_example.py (3)

agents-core/vision_agents/core/edge/types.py (1)

User (22-25)

agents-core/vision_agents/core/agents/agents.py (2)

Agent (107-1340)

finish (534-565)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)

AvatarPublisher (19-391)

agents-core/vision_agents/core/agents/agents.py (2)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)

_attach_agent (116-128)

publish_audio_track (108-114)

agents-core/vision_agents/core/processors/base_processor.py (1)

publish_audio_track (84-85)

plugins/heygen/vision_agents/plugins/heygen/__init__.py (1)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)

AvatarPublisher (19-391)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)

agents-core/vision_agents/core/processors/base_processor.py (3)

AudioVideoProcessor (111-140)

VideoPublisherMixin (78-80)

AudioPublisherMixin (83-85)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (7)

HeyGenRTCManager (18-260)

set_video_callback (216-222)

set_audio_callback (224-230)

connect (55-138)

send_text (232-242)

is_connected (245-247)

close (249-260)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (4)

HeyGenVideoTrack (14-180)

start_receiving (48-60)

recv (134-169)

stop (171-180)

agents-core/vision_agents/core/llm/events.py (3)

LLMResponseChunkEvent (90-105)

LLMResponseCompletedEvent (109-115)

RealtimeAgentSpeechTranscriptionEvent (151-156)

agents-core/vision_agents/core/agents/agents.py (3)

subscribe (286-298)

recv (946-947)

close (567-639)

agents-core/vision_agents/core/edge/types.py (1)

write (52-52)

plugins/heygen/example/avatar_example.py (3)

agents-core/vision_agents/core/edge/types.py (1)

User (22-25)

agents-core/vision_agents/core/agents/agents.py (1)

Agent (107-1340)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)

AvatarPublisher (19-391)

plugins/heygen/tests/test_heygen_plugin.py (4)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (3)

AvatarPublisher (19-391)

publish_video_track (340-354)

state (356-368)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)

HeyGenVideoTrack (14-180)

stop (171-180)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (2)

HeyGenRTCManager (18-260)

is_connected (245-247)

plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (1)

HeyGenSession (9-232)

plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (3)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)

close (370-391)

agents-core/vision_agents/core/agents/agents.py (1)

close (567-639)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (1)

close (249-260)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (2)

plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (5)

HeyGenSession (9-232)

create_session (44-84)

start_session (86-135)

send_task (137-187)

close (222-232)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)

close (370-391)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

agents-core/vision_agents/core/agents/agents.py (1)

recv (946-947)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

🔇 Additional comments (15)

plugins/heygen/vision_agents/plugins/heygen/__init__.py (1)

1-12: LGTM!

Clean package initialization with proper docstring and clear public API surface. The single export pattern is appropriate for this plugin entry point.

pyproject.toml (2)

24-24: LGTM!

Proper workspace source registration following the established pattern for other plugins.

54-54: LGTM!

Correct workspace member registration for the new HeyGen plugin package.

plugins/heygen/pyproject.toml (1)

1-41: LGTM!

Well-structured plugin package configuration with appropriate dependencies for WebRTC avatar streaming. The VCS versioning configuration correctly searches parent directories for the git repository root in this monorepo setup.

plugins/heygen/example/pyproject.toml (1)

1-21: LGTM!

Appropriate example package configuration with workspace dependencies for local development and testing. The inclusion of python-dotenv and multiple plugin dependencies aligns well with the documented example use cases.

agents-core/pyproject.toml (2)

45-45: LGTM!

Proper optional dependency group registration following the established pattern for plugin integration.

61-61: LGTM!

Correct addition to the all-plugins group, maintaining alphabetical ordering.

plugins/heygen/README.md (1)

1-186: LGTM!

Comprehensive and well-structured documentation. The code examples demonstrate proper usage patterns, configuration options are clearly documented, and the troubleshooting section addresses common issues. The documentation aligns well with the plugin's implementation.

plugins/heygen/example/README.md (1)

1-188: LGTM!

Excellent example documentation that clearly distinguishes between standard and realtime LLM modes. The detailed flow explanations and proper coverage of API key requirements make this helpful for developers. The note about audio handling differences between modes accurately reflects the implementation.

plugins/heygen/example/avatar_realtime_example.py (3)

1-16: LGTM!

Proper imports and environment setup. Loading dotenv before agent creation ensures API keys are available.

18-44: LGTM!

Well-structured function with proper docstring following Google style guide. The agent configuration correctly uses Realtime LLM without separate STT/TTS components, and the AvatarPublisher is appropriately configured.

63-65: LGTM!

Standard and correct main entry point pattern for async code.

agents-core/vision_agents/core/agents/agents.py (1)

218-222: Processor attach hook fits plugin needs.

Connecting processors that expose _attach_agent ensures publishers like HeyGen can subscribe to LLM events the moment the agent spins up. Looks solid.

plugins/heygen/example/avatar_example.py (1)

22-63: Example flow is clear.

The example stitches together the Edge, Gemini LLM, Deepgram STT, and the avatar publisher cleanly—handy for integrators to copy-paste and start experimenting.

plugins/heygen/tests/test_heygen_plugin.py (1)

94-123: Good coverage of publisher surface.

These tests assert the publisher exposes a video track and reports state without needing live network calls—nice guardrails for future regressions.

coderabbitai · 2025-11-04T09:38:00Z

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py

+        self._all_sent_texts: set = set()  # Track all sent texts to prevent duplicates
+
+        logger.info(
+            f"HeyGen AvatarPublisher initialized "
+            f"(avatar: {avatar_id}, quality: {quality}, resolution: {resolution})"
+        )
+
+    def publish_audio_track(self):
+        """Return the audio track for publishing HeyGen's audio.
+
+        This method is called by the Agent to get the audio track that will
+        be published to the call. HeyGen's audio will be forwarded to this track.
+        """
+        return self._audio_track
+
+    def _attach_agent(self, agent: Any) -> None:
+        """Attach the agent reference for event subscription.
+
+        This is called automatically by the Agent during initialization.
+
+        Args:
+            agent: The agent instance.
+        """
+        self._agent = agent
+        logger.info("Agent reference set for HeyGen avatar publisher")
+
+        # Subscribe to text events immediately when agent is set
+        self._subscribe_to_text_events()
+
+    async def _connect_to_heygen(self) -> None:
+        """Establish connection to HeyGen and start receiving video and audio."""
+        try:
+            # Set up video and audio callbacks before connecting
+            self.rtc_manager.set_video_callback(self._on_video_track)
+            self.rtc_manager.set_audio_callback(self._on_audio_track)
+
+            # Connect to HeyGen
+            await self.rtc_manager.connect()
+
+            self._connected = True
+            logger.info("Connected to HeyGen, avatar streaming active")
+
+        except Exception as e:
+            logger.error(f"Failed to connect to HeyGen: {e}")
+            self._connected = False
+            raise
+
+    def _subscribe_to_text_events(self) -> None:
+        """Subscribe to text output events from the LLM.
+
+        HeyGen requires text input (not audio) for proper lip-sync.
+        We listen to the LLM's text output and send it to HeyGen's task API.
+        """
+        try:
+            # Import the event types
+            from vision_agents.core.llm.events import (
+                LLMResponseChunkEvent,
+                LLMResponseCompletedEvent,
+                RealtimeAgentSpeechTranscriptionEvent,
+            )
+
+            # Get the LLM's event manager (events are emitted by the LLM, not the agent)
+            if hasattr(self, '_agent') and self._agent and hasattr(self._agent, 'llm'):
+                @self._agent.llm.events.subscribe
+                async def on_text_chunk(event: LLMResponseChunkEvent):
+                    """Handle streaming text chunks from the LLM."""
+                    logger.debug(f"HeyGen received text chunk: delta='{event.delta}'")
+                    if event.delta:
+                        await self._on_text_chunk(event.delta, event.item_id)
+
+                @self._agent.llm.events.subscribe
+                async def on_text_complete(event: LLMResponseCompletedEvent):
+                    """Handle end of LLM response - split into sentences and send each once."""
+                    if not self._text_buffer.strip():
+                        return
+
+                    # Split the complete response into sentences
+                    import re
+                    text = self._text_buffer.strip()
+                    # Split on sentence boundaries but keep the punctuation
+                    sentences = re.split(r'([.!?]+\s*)', text)
+                    # Recombine sentences with their punctuation
+                    full_sentences = []
+                    for i in range(0, len(sentences)-1, 2):
+                        if sentences[i].strip():
+                            sentence = (sentences[i] + sentences[i+1] if i+1 < len(sentences) else sentences[i]).strip()
+                            full_sentences.append(sentence)
+                    # Handle last part if no punctuation
+                    if sentences and sentences[-1].strip() and not any(sentences[-1].strip().endswith(p) for p in ['.', '!', '?']):
+                        full_sentences.append(sentences[-1].strip())
+
+                    # Send each sentence once if not already sent
+                    for sentence in full_sentences:
+                        if sentence and len(sentence) > 5:
+                            if sentence not in self._all_sent_texts:
+                                await self._send_text_to_heygen(sentence)
+                                self._all_sent_texts.add(sentence)
+                            else:
+                                logger.debug(f"Skipping duplicate: '{sentence[:30]}...'")
+
+                    # Reset for next response
+                    self._text_buffer = ""
+                    self._current_response_id = None
+


⚠️ Potential issue | 🟠 Major

Allow repeated sentences after each response.

self._all_sent_texts is never cleared once a sentence goes out, so any later LLM turn that reuses the same sentence gets silently dropped and the avatar never speaks it. That breaks normal conversation—think “Hi there!” uttered twice in a demo. Please scope the de‑duplication to the active response (e.g., clear the set when item_id changes or after completion) so only intra-response duplicates are suppressed.

Apply something like:

if item_id != self._current_response_id: if self._text_buffer: # Send any accumulated text from previous response text_to_send = self._text_buffer.strip() if text_to_send and text_to_send not in self._all_sent_texts: await self._send_text_to_heygen(text_to_send) self._all_sent_texts.add(text_to_send) self._text_buffer = "" self._current_response_id = item_id + self._all_sent_texts.clear()

…and optionally clear again once the completion handler finishes dispatching sentences.

Committable suggestion skipped: line range outside the PR's diff.

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py

cursor

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on November 7

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2025-11-04T22:01:55Z

plugins/heygen/example/avatar_example.py

+                avatar_id="default",  # Use your HeyGen avatar ID
+                quality=VideoQuality.HIGH,  # Video quality: VideoQuality.LOW, VideoQuality.MEDIUM, or VideoQuality.HIGH
+                resolution=(1920, 1080),  # Output resolution
+                mute_llm_audio=False,  # Not needed for streaming LLM


Bug: Hidden Mute LLM Audio Parameter Misleading Docs

The heygen.AvatarPublisher constructor receives a mute_llm_audio parameter in the example and its own docstring, but this parameter isn't explicitly defined. It's absorbed by **kwargs and ignored, which makes the example misleading, especially with the accompanying comment suggesting it's a functional setting.

Additional Locations (1)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py#L48-L49

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
306-329: Clear _all_sent_texts per response to allow repeated phrases across turns.

The deduplication set _all_sent_texts grows indefinitely and is never cleared, preventing legitimate repetition of phrases across different conversation turns. For example, saying "Hello!" in the first response will block "Hello!" in all subsequent responses forever.

As noted in a past review comment, the set should be scoped to the active response. Clear it when the item_id changes to limit deduplication to intra-response duplicates only.

Apply this diff:
         if item_id != self._current_response_id:
             if self._text_buffer:
                 # Send any accumulated text from previous response
                 text_to_send = self._text_buffer.strip()
                 if text_to_send and text_to_send not in self._all_sent_texts:
                     await self._send_text_to_heygen(text_to_send)
                     self._all_sent_texts.add(text_to_send)
             self._text_buffer = ""
             self._current_response_id = item_id
+            self._all_sent_texts.clear()
Also consider clearing the set in the on_text_complete handler (around line 212-213) after dispatching all sentences.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f03c81d and f7a2f37.

📒 Files selected for processing (10)

plugins/heygen/README.md (1 hunks)
plugins/heygen/example/README.md (1 hunks)
plugins/heygen/example/avatar_example.py (1 hunks)
plugins/heygen/example/avatar_realtime_example.py (1 hunks)
plugins/heygen/tests/test_heygen_plugin.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/__init__.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

plugins/heygen/example/README.md

🚧 Files skipped from review as they are similar to previous changes (2)

plugins/heygen/README.md
plugins/heygen/vision_agents/plugins/heygen/init.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/heygen/example/avatar_example.py
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py
plugins/heygen/example/avatar_realtime_example.py
plugins/heygen/tests/test_heygen_plugin.py
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py
plugins/heygen/vision_agents/plugins/heygen/heygen_session.py
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py

🧬 Code graph analysis (7)

plugins/heygen/example/avatar_example.py (2)

agents-core/vision_agents/core/edge/types.py (1)

User (22-25)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)

VideoQuality (6-11)

AvatarPublisher (29-401)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)

agents-core/vision_agents/core/utils/queue.py (2)

LatestNQueue (6-28)

put_latest_nowait (22-28)

agents-core/vision_agents/core/agents/agents.py (1)

recv (946-947)

plugins/heygen/example/avatar_realtime_example.py (4)

agents-core/vision_agents/core/edge/types.py (1)

User (22-25)

agents-core/vision_agents/core/agents/agents.py (2)

Agent (107-1340)

finish (534-565)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)

VideoQuality (6-11)

AvatarPublisher (29-401)

plugins/heygen/example/avatar_example.py (1)

start_avatar_agent (12-63)

plugins/heygen/tests/test_heygen_plugin.py (4)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (4)

AvatarPublisher (29-401)

VideoQuality (6-11)

publish_video_track (350-364)

state (366-378)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)

HeyGenVideoTrack (14-187)

stop (178-187)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (2)

HeyGenRTCManager (21-269)

is_connected (254-256)

plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (1)

HeyGenSession (11-238)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (4)

agents-core/vision_agents/core/processors/base_processor.py (3)

AudioVideoProcessor (111-140)

VideoPublisherMixin (78-80)

AudioPublisherMixin (83-85)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (7)

HeyGenRTCManager (21-269)

set_video_callback (225-231)

set_audio_callback (233-239)

connect (62-147)

send_text (241-251)

is_connected (254-256)

close (258-269)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (4)

HeyGenVideoTrack (14-187)

start_receiving (48-66)

recv (141-176)

stop (178-187)

agents-core/vision_agents/core/llm/events.py (3)

LLMResponseChunkEvent (90-105)

LLMResponseCompletedEvent (109-115)

RealtimeAgentSpeechTranscriptionEvent (151-156)

plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (2)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)

VideoQuality (6-11)

close (380-401)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (1)

close (258-269)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (2)

plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (5)

HeyGenSession (11-238)

create_session (50-90)

start_session (92-141)

send_task (143-193)

close (228-238)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)

VideoQuality (6-11)

close (380-401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Cursor Bugbot

🔇 Additional comments (29)

plugins/heygen/example/avatar_realtime_example.py (2)

1-17: LGTM!

The imports and logging configuration are well-structured. The use of load_dotenv() for API key management is appropriate for an example script.

56-58: The simple_response API is a valid, standardized method across all LLM implementations in the codebase, including Gemini Realtime. The method is documented in the base LLM class and all plugin implementations. Your usage is correct.

Likely an incorrect or invalid review comment.

plugins/heygen/tests/test_heygen_plugin.py (4)

9-28: LGTM!

The tests for HeyGenSession appropriately cover initialization with and without an API key, including proper error handling verification.

31-47: LGTM!

The video track tests correctly verify initialization parameters and the stop lifecycle method.

50-73: LGTM!

The RTC manager tests appropriately mock dependencies and verify connection state management.

76-122: LGTM!

The AvatarPublisher tests effectively use mocking to verify initialization, video track publishing, and state reporting without requiring real connections.

plugins/heygen/example/avatar_example.py (1)

1-10: LGTM!

The imports are clean and follow best practices.

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (5)

1-46: LGTM!

The class initialization is well-structured with appropriate placeholder handling and a small frame queue for low-latency streaming. Docstrings follow Google style guidelines.

48-66: LGTM! Reattachment logic correctly implemented.

The fix for allowing new source tracks is properly implemented. When a new track arrives, the existing receiver is cancelled and awaited (handling CancelledError), then a fresh task is created with the new source.

68-101: LGTM!

The frame receiving loop is well-structured with proper error handling, type checking, and automatic resizing when needed.

103-139: LGTM!

The resize logic correctly maintains aspect ratio with letterboxing and uses high-quality LANCZOS resampling. The fallback to the original frame on error prevents crashes.

141-187: LGTM!

The recv and stop methods correctly implement the VideoStreamTrack interface with proper timestamp management and cleanup.

plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (5)

50-90: LGTM!

The session creation method has proper error handling, informative error messages, and correctly stores session state.

92-141: LGTM!

The start_session method correctly validates prerequisites and handles the optional SDP answer parameter.

143-193: LGTM!

The send_task method appropriately uses non-fatal error handling, allowing the avatar to continue functioning even if individual task submissions fail.

195-226: LGTM!

The stop_session method gracefully handles missing sessions and uses appropriate non-fatal error handling for cleanup operations.

228-238: LGTM!

The close method properly cleans up all resources in the correct order.

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)

1-27: LGTM!

The VideoQuality enum is correctly placed before other imports to prevent circular dependencies, though a dedicated types module would be more maintainable long-term.

29-116: LGTM!

The initialization is well-structured. Creating the audio track immediately (lines 98-101) is necessary for the Agent to detect publishing capabilities during initialization, as explained in the comment.

118-138: LGTM!

The audio track publishing and agent attachment methods are straightforward and well-documented.

140-156: LGTM!

The connection method properly sets up callbacks and handles errors.

233-304: LGTM!

The media track handlers correctly differentiate between Realtime and standard LLMs, forwarding HeyGen audio only when appropriate.

350-401: LGTM!

The video track publishing, state reporting, and cleanup methods are properly implemented with correct resource management.

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (6)

21-60: LGTM!

The RTC manager initialization is well-structured with proper dependency injection and state management. Docstrings follow Google style guidelines.

62-147: LGTM!

The WebRTC connection setup follows the correct offer/answer flow with robust SDP format handling and appropriate timeouts. The comprehensive docstring clearly explains the HeyGen-specific connection sequence.

149-202: LGTM!

The ICE server parsing robustly handles multiple response formats and provides a sensible fallback. The comment about LiveKit embedding servers in SDP (lines 194-197) is helpful context.

204-223: LGTM!

The track handler correctly dispatches incoming media to registered callbacks with appropriate warnings when callbacks are missing.

225-251: LGTM!

The callback setters and text sending method provide a clean API for managing HeyGen interactions.

253-269: LGTM!

The connection status property and cleanup method are correctly implemented with proper resource management order.

coderabbitai · 2025-11-04T22:08:55Z

plugins/heygen/example/avatar_example.py

+        # Note: mute_llm_audio is not needed since streaming LLM doesn't produce audio
+        processors=[
+            heygen.AvatarPublisher(
+                avatar_id="default",  # Use your HeyGen avatar ID
+                quality=VideoQuality.HIGH,  # Video quality: VideoQuality.LOW, VideoQuality.MEDIUM, or VideoQuality.HIGH
+                resolution=(1920, 1080),  # Output resolution
+                mute_llm_audio=False,  # Not needed for streaming LLM
+            )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Remove or verify the mute_llm_audio parameter.

The AvatarPublisher.__init__ signature in the provided code does not include a mute_llm_audio parameter. The comment states it's not needed for streaming LLMs, yet it's explicitly set to False at line 49. Either this parameter doesn't exist (and should be removed), or the comment is incorrect.

Run the following script to verify the parameter:

🏁 Script executed:

#!/bin/bash # Description: Check if mute_llm_audio parameter exists in AvatarPublisher # Search for mute_llm_audio in HeyGen plugin files rg -n "mute_llm_audio" --type=py

Length of output: 244

I'll search the repository directly for the AvatarPublisher class definition to verify the parameter:

[/run_scripts]

🏁 Script executed:

#!/bin/bash # Find AvatarPublisher class definition in heygen plugin # Search for class definition rg -n "class AvatarPublisher" --type=py -A 20 echo "---" echo "Also check __init__ parameters:" rg -n "def __init__" --type=py -B 2 -A 10 plugins/heygen

Length of output: 7445

I need to see the complete __init__ method implementation and docstring:

[/run_scripts]

🏁 Script executed:

#!/bin/bash # Get the full __init__ method implementation for AvatarPublisher cat plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py | sed -n '53,120p'

Length of output: 2509

Remove the mute_llm_audio parameter from the AvatarPublisher instantiation.

The AvatarPublisher.__init__ method does not document or explicitly handle the mute_llm_audio parameter. While **kwargs allows it to be passed, the parameter is not processed and will be silently ignored. Line 43 states it's "not needed," yet line 49 explicitly sets it—this inconsistency should be resolved by removing it from the example.

processors=[ heygen.AvatarPublisher( avatar_id="default", quality=VideoQuality.HIGH, resolution=(1920, 1080), )

🤖 Prompt for AI Agents

In plugins/heygen/example/avatar_example.py around lines 43 to 50, the AvatarPublisher example passes an undocumented mute_llm_audio keyword that is silently ignored; remove the mute_llm_audio=False argument and its inline comment from the Processor instantiation and update the surrounding comment to reflect that no mute parameter is required for streaming LLMs so the example only includes avatar_id, quality, and resolution.

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py

cursor · 2025-11-04T22:11:09Z

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py

+        output_frame.pts = pts
+        output_frame.time_base = time_base
+
+        return output_frame


Bug: Reference bug: last_frame mutated by output_frame copy

In recv, the code assigns self.last_frame to output_frame, creating a reference instead of a copy as the comment suggests. This means updating output_frame.pts and time_base directly modifies self.last_frame, which can lead to timestamp corruption and timing issues for the cached frame.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

agents-core/vision_agents/core/agents/agents.py (1)
316-320: Audio track initialization from publishers follows the established pattern.

The logic correctly initializes the audio track from the first audio publisher when present, maintaining a clear priority: realtime output → audio publisher → default edge track. This mirrors the video publisher approach.

Two refinements to consider:

Multiple publishers: When multiple audio publishers exist, only the first is used silently. Consider logging a warning if len(self.audio_publishers) > 1 to help users debug unexpected behavior. (Note: video publishers have the same pattern at line 339.)

Return value validation: The code assumes publish_audio_track() returns a valid audio track. Consider adding a basic check:
         elif self.audio_publishers:
             # Get the first audio publisher to create the track
             audio_publisher = self.audio_publishers[0]
+            if len(self.audio_publishers) > 1:
+                self.logger.warning(
+                    f"Multiple audio publishers found ({len(self.audio_publishers)}), using first: {type(audio_publisher).__name__}"
+                )
             self._audio_track = audio_publisher.publish_audio_track()
+            if not self._audio_track:
+                raise ValueError(f"Audio publisher {type(audio_publisher).__name__} returned None from publish_audio_track()")
             self.logger.info("🎵 Audio track initialized from audio publisher")
These are optional improvements since the current implementation follows existing conventions.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0b4894a and 30322b2.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (3)

agents-core/pyproject.toml (2 hunks)
agents-core/vision_agents/core/agents/agents.py (3 hunks)
pyproject.toml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

agents-core/pyproject.toml

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

agents-core/vision_agents/core/agents/agents.py

🧬 Code graph analysis (1)

agents-core/vision_agents/core/agents/agents.py (3)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)

_attach_agent (117-129)

publish_audio_track (109-115)

agents-core/vision_agents/core/llm/llm.py (1)

_attach_agent (185-191)

agents-core/vision_agents/core/processors/base_processor.py (1)

publish_audio_track (84-85)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Cursor Bugbot
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Ruff & mypy

🔇 Additional comments (3)

pyproject.toml (1)

24-24: Workspace integration verified and correct.

The HeyGen plugin entries are correctly added to both [tool.uv.sources] (line 24) and [tool.uv.workspace] members (line 54), maintaining alphabetical ordering and following established patterns. The plugin directory structure is properly configured with a valid [project] section in its pyproject.toml.

agents-core/vision_agents/core/agents/agents.py (2)

184-190: Clean extension of audio publishing logic.

The expanded publish_audio property correctly considers audio publishers (like the HeyGen avatar) alongside TTS and realtime mode. The docstring update accurately reflects this change and follows the Google style guide.

223-226: Processor attachment is correct and follows established patterns.

The loop properly uses hasattr to guard against processors lacking the hook, and _attach_agent is safely called after LLM attachment. The HeyGenAvatarPublisher implementation is idempotent—multiple calls simply re-assign the agent and re-subscribe, causing no issues. The _subscribe_to_text_events method already includes defensive error handling.

The fail-fast behavior (exceptions in _attach_agent crash agent initialization) is a reasonable design choice that surfaces configuration errors early rather than silently masking them.

cursor · 2025-11-04T22:54:50Z

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py

+                    if hasattr(frame, 'to_ndarray'):
+                        audio_array = frame.to_ndarray()
+                        audio_bytes = audio_array.tobytes()
+                        await dest_track.write(audio_bytes)


Bug: Type Mismatch: Bytes Instead of PCM Data

The code calls await dest_track.write(audio_bytes) where audio_bytes is raw bytes from audio_array.tobytes(). However, according to the OutputAudioTrack protocol (agents-core/vision_agents/core/edge/types.py line 45), the write method expects PcmData, not raw bytes. This type mismatch will cause a runtime error when HeyGen audio forwarding is attempted with standard (non-Realtime) LLMs.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1)
251-253: Remove commented code.

The commented line #e = SessionUpdatedEvent(**event) should be removed entirely rather than left as dead code.

Apply this diff to clean up the commented code:
         elif et == "session.updated":
             pass
-            #e = SessionUpdatedEvent(**event)

♻️ Duplicate comments (1)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
149-222: Past issue not addressed: _all_sent_texts prevents repeated sentences across conversation turns.

The past review flagged that self._all_sent_texts is never cleared, causing any sentence repeated in a later LLM turn to be silently dropped. This issue persists in lines 202-204 where _text_buffer and _current_response_id are reset but _all_sent_texts is not. Normal conversation patterns—like greeting with "Hi there!" multiple times—will break.

Apply the suggested fix from the past review:
         if item_id != self._current_response_id:
             if self._text_buffer:
                 # Send any accumulated text from previous response
                 text_to_send = self._text_buffer.strip()
                 if text_to_send and text_to_send not in self._all_sent_texts:
                     await self._send_text_to_heygen(text_to_send)
                     self._all_sent_texts.add(text_to_send)
             self._text_buffer = ""
             self._current_response_id = item_id
+            self._all_sent_texts.clear()
Also consider clearing _all_sent_texts in the on_text_complete handler after sending all sentences (after line 200).

🧹 Nitpick comments (4)

plugins/openai/vision_agents/plugins/openai/openai_realtime.py (2)

231-234: Clean up or document the commented flush call.

The commented await self.output_track.flush() should either be removed if no longer needed or uncommented with documentation explaining why flushing the output track on speech start is necessary.

243-244: Removal of response.created handler verified as safe—no other code depends on it.

The grep search found no other references to this event handler in the codebase. However, remove the commented code at lines 234 and 251–253; they're code smells that should be cleaned up rather than left commented out.
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)
264-295: Clarify error handling for frames without to_ndarray().

Line 285 logs a warning when a frame lacks the to_ndarray() method, but the code continues to the next iteration without explicitly breaking or continuing. Consider whether this case should continue to the next frame or break the loop entirely.

Apply this diff to make the intent explicit:
                     if hasattr(frame, 'to_ndarray'):
                         audio_array = frame.to_ndarray()
                         audio_bytes = audio_array.tobytes()
                         await dest_track.write(audio_bytes)
                     else:
                         logger.warning("Received frame without to_ndarray() method")
+                        continue
371-392: Consider stopping the audio track for consistency.

The close() method stops the video track but does not stop the audio track (_audio_track). For symmetry and complete cleanup, consider stopping the audio track as well.

Apply this diff to stop the audio track:
     async def close(self) -> None:
         """Clean up resources and close connections."""
         logger.info("Closing HeyGen avatar publisher")
         
         # Stop video track
         if self._video_track:
             self._video_track.stop()
+        
+        # Stop audio track
+        if self._audio_track:
+            self._audio_track.stop()
         
         # Close RTC connection
         if self.rtc_manager:
             await self.rtc_manager.close()

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30322b2 and 3f5e203.

📒 Files selected for processing (3)

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py (1 hunks)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1 hunks)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/python.mdc)

**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide

Files:

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py
plugins/openai/vision_agents/plugins/openai/openai_realtime.py

🧬 Code graph analysis (1)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)

agents-core/vision_agents/core/processors/base_processor.py (3)

AudioVideoProcessor (111-140)

VideoPublisherMixin (78-80)

AudioPublisherMixin (83-85)

plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (7)

HeyGenRTCManager (19-267)

set_video_callback (223-229)

set_audio_callback (231-237)

connect (60-145)

send_text (239-249)

is_connected (252-254)

close (256-267)

plugins/heygen/vision_agents/plugins/heygen/heygen_types.py (1)

VideoQuality (6-11)

plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (4)

HeyGenVideoTrack (14-187)

start_receiving (48-66)

recv (141-176)

stop (178-187)

agents-core/vision_agents/core/llm/events.py (3)

LLMResponseChunkEvent (87-102)

LLMResponseCompletedEvent (106-112)

RealtimeAgentSpeechTranscriptionEvent (148-153)

agents-core/vision_agents/core/agents/agents.py (3)

subscribe (291-303)

recv (951-952)

close (572-644)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Cursor Bugbot
GitHub Check: unit / Test "not integration"
GitHub Check: unit / Test "not integration"

🔇 Additional comments (11)

plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (11)

1-18: LGTM!

Imports are well-organized and appropriate for the module's functionality.

20-107: Well-structured initialization.

The class properly initializes all components including RTC manager, video track, and audio track. The docstring follows Google style guide conventions.

109-115: LGTM!

Simple accessor method correctly returns the audio track for agent consumption.

117-129: LGTM!

Agent attachment and event subscription are handled correctly.

131-147: LGTM!

Connection setup with callbacks and error handling is well-implemented.

224-231: LGTM!

Video track callback correctly delegates frame receiving to the HeyGenVideoTrack.

233-262: LGTM!

Audio track handling correctly distinguishes between Realtime LLMs (video-only) and standard LLMs (video + audio forwarding).

297-320: LGTM!

Text chunk accumulation logic correctly handles response boundaries and buffers text for sentence-based sending.

322-339: LGTM!

Text sending with connection state validation and error handling is well-implemented.

341-355: LGTM!

Lazy connection initialization and video track publishing are correctly implemented.

357-369: LGTM!

State reporting method provides appropriate visibility into publisher status.

implemented heygen avatars

9675a14

d3xvn added 4 commits October 28, 2025 10:46

add lip-sync support by forwarding agent audio to heygen

4f3a6e4

switch avatar example to use gemini realtime for better lip-sync testing

680f5d7

WIP: audio track approach for lip-sync (audio flows but no lip movement)

6eb638f

github-actions bot added dependencies agents-core plugins config docs project-info labels Oct 30, 2025

d3xvn added 7 commits November 3, 2025 17:04

Merge main into feat/heygen - add vogent plugin alongside heygen

aefdeda

PR cleanup

96f1cc9

fixed audio duplication and sluggishness

6188ed3

Fix video aspect ratio stretching - add letterboxing

74aa6ff

fixed and simplified both implementations

f54c372

Merge main into feat/heygen - added moondream plugin

a94b181

d3xvn marked this pull request as ready for review November 4, 2025 09:28

d3xvn added 2 commits November 4, 2025 10:30

Fix ruff linting - remove unused imports

fad9f49

Fix HeyGen plugin tests - import paths and mocking

f03c81d

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

d3xvn and others added 3 commits November 4, 2025 10:48

Fix mypy type errors in HeyGen plugin

a5be206

Allow reattaching to new HeyGen video tracks on renegotiation

d6d66bf

Migrate quality to enum

f7a2f37

cursor bot reviewed Nov 4, 2025

View reviewed changes

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

Ruff and Mypy

0b4894a

cursor bot reviewed Nov 4, 2025

View reviewed changes

Merge branch 'main' into feat/heygen

30322b2

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

Nash0x7E2 added 3 commits November 4, 2025 15:21

More ruff issues

4bafa66

Fix broken method sigs

f5a1aaa

Unused var

3f5e203

cursor bot reviewed Nov 4, 2025

View reviewed changes

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

final ruff error

12cad15

Nash0x7E2 merged commit aaa7d21 into main Nov 4, 2025
7 checks passed

Nash0x7E2 deleted the feat/heygen branch November 4, 2025 23:25

This was referenced Nov 5, 2025

Optimize delays - realtime, waiting logic and error handling #132

Merged

Add AudioLLM and VideoLLM base classes #151

Merged

feat: implemented heygen avatars #126

feat: implemented heygen avatars #126

Uh oh!

Conversation

d3xvn commented Oct 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This is the final PR Bugbot will review for you during this billing cycle

Uh oh!

cursor bot Nov 4, 2025

Choose a reason for hiding this comment

Bug: Hidden Mute LLM Audio Parameter Misleading Docs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot Nov 4, 2025

Choose a reason for hiding this comment

Bug: Reference bug: last_frame mutated by output_frame copy

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Nov 4, 2025

Choose a reason for hiding this comment

Bug: Type Mismatch: Bytes Instead of PCM Data

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

d3xvn commented Oct 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 27, 2025 •

edited

Loading