-
Notifications
You must be signed in to change notification settings - Fork 75
feat: implemented heygen avatars #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a HeyGen Avatar plugin (session, RTC manager, video track, AvatarPublisher), examples, tests, and workspace/manifest updates; modifies Agent init to call processors' _attach_agent and prefer publisher-provided audio/video tracks when preparing RTC. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent
participant AvatarPublisher
participant LLM
participant HeyGenRTCMgr
participant HeyGenAPI
participant VideoCall
Agent->>AvatarPublisher: _attach_agent(agent)
AvatarPublisher->>AvatarPublisher: _subscribe_to_text_events()
Agent->>LLM: produce text (streaming/realtime)
LLM-->>AvatarPublisher: text_chunk / completion / realtime_transcript
AvatarPublisher->>AvatarPublisher: buffer & dedupe text
AvatarPublisher->>HeyGenRTCMgr: send_text(text)
HeyGenRTCMgr->>HeyGenAPI: HTTP /streaming.* (SDP/ICE/task)
HeyGenAPI-->>HeyGenRTCMgr: media tracks (video/audio)
HeyGenRTCMgr->>AvatarPublisher: on_video_track / on_audio_track
AvatarPublisher->>HeyGenVideoTrack: start_receiving(track)
AvatarPublisher->>VideoCall: publish_video_track()/publish_audio_track()
VideoCall-->>User: deliver avatar media
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Removed obsolete heygen_audio_track.py (from old audio-based approach) - Removed unused _audio_sender field and transceiver logic - Removed unused _original_audio_write field - Simplified audio track management - Moved imports to top of file - Updated docstrings to reflect text-based lip-sync approach Fixed duplicate text sending issue: - Added deduplication tracking with _sent_texts set - Added minimum length filter (>15 chars) to prevent tiny fragments - Simplified event handling to avoid duplicate subscriptions - Proper buffer management between chunk and complete events Known limitation: ~3-4 second audio delay is inherent to HeyGen platform
- Add processor._attach_agent() lifecycle hook to Agent.__init__ - Rename HeyGen set_agent() -> _attach_agent() for consistency with LLM - Remove manual agent attachment from examples and docs - HeyGen now works like YOLO - just add to processors list Examples are now much cleaner: agent = Agent(processors=[heygen.AvatarPublisher()]) # That's it! No manual wiring needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
plugins/heygen/example/avatar_realtime_example.py (1)
55-57: Consider usingagent.simple_response()instead ofagent.llm.simple_response().Calling
agent.llm.simple_response()directly bypasses the agent's wrapper method, which may skip tracing and logging functionality. The agent provides asimple_response()method that forwards to the LLM while adding instrumentation.Apply this diff to use the agent's method:
- await agent.llm.simple_response( + await agent.simple_response( text="Hello! I'm your AI assistant. How can I help you today?" )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (2)
plugins/aws/example/uv.lockis excluded by!**/*.lockuv.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
agents-core/pyproject.toml(2 hunks)agents-core/vision_agents/core/agents/agents.py(3 hunks)aiortc(0 hunks)plugins/heygen/README.md(1 hunks)plugins/heygen/example/README.md(1 hunks)plugins/heygen/example/avatar_example.py(1 hunks)plugins/heygen/example/avatar_realtime_example.py(1 hunks)plugins/heygen/example/pyproject.toml(1 hunks)plugins/heygen/pyproject.toml(1 hunks)plugins/heygen/tests/test_heygen_plugin.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/__init__.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_session.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py(1 hunks)pyproject.toml(2 hunks)
💤 Files with no reviewable changes (1)
- aiortc
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/heygen/example/avatar_realtime_example.pyagents-core/vision_agents/core/agents/agents.pyplugins/heygen/vision_agents/plugins/heygen/__init__.pyplugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.pyplugins/heygen/example/avatar_example.pyplugins/heygen/tests/test_heygen_plugin.pyplugins/heygen/vision_agents/plugins/heygen/heygen_session.pyplugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.pyplugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py
🧬 Code graph analysis (9)
plugins/heygen/example/avatar_realtime_example.py (3)
agents-core/vision_agents/core/edge/types.py (1)
User(22-25)agents-core/vision_agents/core/agents/agents.py (2)
Agent(107-1340)finish(534-565)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
AvatarPublisher(19-391)
agents-core/vision_agents/core/agents/agents.py (2)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)
_attach_agent(116-128)publish_audio_track(108-114)agents-core/vision_agents/core/processors/base_processor.py (1)
publish_audio_track(84-85)
plugins/heygen/vision_agents/plugins/heygen/__init__.py (1)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
AvatarPublisher(19-391)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)
agents-core/vision_agents/core/processors/base_processor.py (3)
AudioVideoProcessor(111-140)VideoPublisherMixin(78-80)AudioPublisherMixin(83-85)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (7)
HeyGenRTCManager(18-260)set_video_callback(216-222)set_audio_callback(224-230)connect(55-138)send_text(232-242)is_connected(245-247)close(249-260)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (4)
HeyGenVideoTrack(14-180)start_receiving(48-60)recv(134-169)stop(171-180)agents-core/vision_agents/core/llm/events.py (3)
LLMResponseChunkEvent(90-105)LLMResponseCompletedEvent(109-115)RealtimeAgentSpeechTranscriptionEvent(151-156)agents-core/vision_agents/core/agents/agents.py (3)
subscribe(286-298)recv(946-947)close(567-639)agents-core/vision_agents/core/edge/types.py (1)
write(52-52)
plugins/heygen/example/avatar_example.py (3)
agents-core/vision_agents/core/edge/types.py (1)
User(22-25)agents-core/vision_agents/core/agents/agents.py (1)
Agent(107-1340)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
AvatarPublisher(19-391)
plugins/heygen/tests/test_heygen_plugin.py (4)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (3)
AvatarPublisher(19-391)publish_video_track(340-354)state(356-368)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)
HeyGenVideoTrack(14-180)stop(171-180)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (2)
HeyGenRTCManager(18-260)is_connected(245-247)plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (1)
HeyGenSession(9-232)
plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (3)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
close(370-391)agents-core/vision_agents/core/agents/agents.py (1)
close(567-639)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (1)
close(249-260)
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (2)
plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (5)
HeyGenSession(9-232)create_session(44-84)start_session(86-135)send_task(137-187)close(222-232)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
close(370-391)
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)
agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)agents-core/vision_agents/core/agents/agents.py (1)
recv(946-947)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (15)
plugins/heygen/vision_agents/plugins/heygen/__init__.py (1)
1-12: LGTM!Clean package initialization with proper docstring and clear public API surface. The single export pattern is appropriate for this plugin entry point.
pyproject.toml (2)
24-24: LGTM!Proper workspace source registration following the established pattern for other plugins.
54-54: LGTM!Correct workspace member registration for the new HeyGen plugin package.
plugins/heygen/pyproject.toml (1)
1-41: LGTM!Well-structured plugin package configuration with appropriate dependencies for WebRTC avatar streaming. The VCS versioning configuration correctly searches parent directories for the git repository root in this monorepo setup.
plugins/heygen/example/pyproject.toml (1)
1-21: LGTM!Appropriate example package configuration with workspace dependencies for local development and testing. The inclusion of python-dotenv and multiple plugin dependencies aligns well with the documented example use cases.
agents-core/pyproject.toml (2)
45-45: LGTM!Proper optional dependency group registration following the established pattern for plugin integration.
61-61: LGTM!Correct addition to the all-plugins group, maintaining alphabetical ordering.
plugins/heygen/README.md (1)
1-186: LGTM!Comprehensive and well-structured documentation. The code examples demonstrate proper usage patterns, configuration options are clearly documented, and the troubleshooting section addresses common issues. The documentation aligns well with the plugin's implementation.
plugins/heygen/example/README.md (1)
1-188: LGTM!Excellent example documentation that clearly distinguishes between standard and realtime LLM modes. The detailed flow explanations and proper coverage of API key requirements make this helpful for developers. The note about audio handling differences between modes accurately reflects the implementation.
plugins/heygen/example/avatar_realtime_example.py (3)
1-16: LGTM!Proper imports and environment setup. Loading dotenv before agent creation ensures API keys are available.
18-44: LGTM!Well-structured function with proper docstring following Google style guide. The agent configuration correctly uses Realtime LLM without separate STT/TTS components, and the AvatarPublisher is appropriately configured.
63-65: LGTM!Standard and correct main entry point pattern for async code.
agents-core/vision_agents/core/agents/agents.py (1)
218-222: Processor attach hook fits plugin needs.Connecting processors that expose
_attach_agentensures publishers like HeyGen can subscribe to LLM events the moment the agent spins up. Looks solid.plugins/heygen/example/avatar_example.py (1)
22-63: Example flow is clear.The example stitches together the Edge, Gemini LLM, Deepgram STT, and the avatar publisher cleanly—handy for integrators to copy-paste and start experimenting.
plugins/heygen/tests/test_heygen_plugin.py (1)
94-123: Good coverage of publisher surface.These tests assert the publisher exposes a video track and reports state without needing live network calls—nice guardrails for future regressions.
| self._all_sent_texts: set = set() # Track all sent texts to prevent duplicates | ||
|
|
||
| logger.info( | ||
| f"HeyGen AvatarPublisher initialized " | ||
| f"(avatar: {avatar_id}, quality: {quality}, resolution: {resolution})" | ||
| ) | ||
|
|
||
| def publish_audio_track(self): | ||
| """Return the audio track for publishing HeyGen's audio. | ||
|
|
||
| This method is called by the Agent to get the audio track that will | ||
| be published to the call. HeyGen's audio will be forwarded to this track. | ||
| """ | ||
| return self._audio_track | ||
|
|
||
| def _attach_agent(self, agent: Any) -> None: | ||
| """Attach the agent reference for event subscription. | ||
|
|
||
| This is called automatically by the Agent during initialization. | ||
|
|
||
| Args: | ||
| agent: The agent instance. | ||
| """ | ||
| self._agent = agent | ||
| logger.info("Agent reference set for HeyGen avatar publisher") | ||
|
|
||
| # Subscribe to text events immediately when agent is set | ||
| self._subscribe_to_text_events() | ||
|
|
||
| async def _connect_to_heygen(self) -> None: | ||
| """Establish connection to HeyGen and start receiving video and audio.""" | ||
| try: | ||
| # Set up video and audio callbacks before connecting | ||
| self.rtc_manager.set_video_callback(self._on_video_track) | ||
| self.rtc_manager.set_audio_callback(self._on_audio_track) | ||
|
|
||
| # Connect to HeyGen | ||
| await self.rtc_manager.connect() | ||
|
|
||
| self._connected = True | ||
| logger.info("Connected to HeyGen, avatar streaming active") | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Failed to connect to HeyGen: {e}") | ||
| self._connected = False | ||
| raise | ||
|
|
||
| def _subscribe_to_text_events(self) -> None: | ||
| """Subscribe to text output events from the LLM. | ||
|
|
||
| HeyGen requires text input (not audio) for proper lip-sync. | ||
| We listen to the LLM's text output and send it to HeyGen's task API. | ||
| """ | ||
| try: | ||
| # Import the event types | ||
| from vision_agents.core.llm.events import ( | ||
| LLMResponseChunkEvent, | ||
| LLMResponseCompletedEvent, | ||
| RealtimeAgentSpeechTranscriptionEvent, | ||
| ) | ||
|
|
||
| # Get the LLM's event manager (events are emitted by the LLM, not the agent) | ||
| if hasattr(self, '_agent') and self._agent and hasattr(self._agent, 'llm'): | ||
| @self._agent.llm.events.subscribe | ||
| async def on_text_chunk(event: LLMResponseChunkEvent): | ||
| """Handle streaming text chunks from the LLM.""" | ||
| logger.debug(f"HeyGen received text chunk: delta='{event.delta}'") | ||
| if event.delta: | ||
| await self._on_text_chunk(event.delta, event.item_id) | ||
|
|
||
| @self._agent.llm.events.subscribe | ||
| async def on_text_complete(event: LLMResponseCompletedEvent): | ||
| """Handle end of LLM response - split into sentences and send each once.""" | ||
| if not self._text_buffer.strip(): | ||
| return | ||
|
|
||
| # Split the complete response into sentences | ||
| import re | ||
| text = self._text_buffer.strip() | ||
| # Split on sentence boundaries but keep the punctuation | ||
| sentences = re.split(r'([.!?]+\s*)', text) | ||
| # Recombine sentences with their punctuation | ||
| full_sentences = [] | ||
| for i in range(0, len(sentences)-1, 2): | ||
| if sentences[i].strip(): | ||
| sentence = (sentences[i] + sentences[i+1] if i+1 < len(sentences) else sentences[i]).strip() | ||
| full_sentences.append(sentence) | ||
| # Handle last part if no punctuation | ||
| if sentences and sentences[-1].strip() and not any(sentences[-1].strip().endswith(p) for p in ['.', '!', '?']): | ||
| full_sentences.append(sentences[-1].strip()) | ||
|
|
||
| # Send each sentence once if not already sent | ||
| for sentence in full_sentences: | ||
| if sentence and len(sentence) > 5: | ||
| if sentence not in self._all_sent_texts: | ||
| await self._send_text_to_heygen(sentence) | ||
| self._all_sent_texts.add(sentence) | ||
| else: | ||
| logger.debug(f"Skipping duplicate: '{sentence[:30]}...'") | ||
|
|
||
| # Reset for next response | ||
| self._text_buffer = "" | ||
| self._current_response_id = None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow repeated sentences after each response.
self._all_sent_texts is never cleared once a sentence goes out, so any later LLM turn that reuses the same sentence gets silently dropped and the avatar never speaks it. That breaks normal conversation—think “Hi there!” uttered twice in a demo. Please scope the de‑duplication to the active response (e.g., clear the set when item_id changes or after completion) so only intra-response duplicates are suppressed.
Apply something like:
if item_id != self._current_response_id:
if self._text_buffer:
# Send any accumulated text from previous response
text_to_send = self._text_buffer.strip()
if text_to_send and text_to_send not in self._all_sent_texts:
await self._send_text_to_heygen(text_to_send)
self._all_sent_texts.add(text_to_send)
self._text_buffer = ""
self._current_response_id = item_id
+ self._all_sent_texts.clear()…and optionally clear again once the completion handler finishes dispatching sentences.
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the final PR Bugbot will review for you during this billing cycle
Your free Bugbot reviews will reset on November 7
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| avatar_id="default", # Use your HeyGen avatar ID | ||
| quality=VideoQuality.HIGH, # Video quality: VideoQuality.LOW, VideoQuality.MEDIUM, or VideoQuality.HIGH | ||
| resolution=(1920, 1080), # Output resolution | ||
| mute_llm_audio=False, # Not needed for streaming LLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Hidden Mute LLM Audio Parameter Misleading Docs
The heygen.AvatarPublisher constructor receives a mute_llm_audio parameter in the example and its own docstring, but this parameter isn't explicitly defined. It's absorbed by **kwargs and ignored, which makes the example misleading, especially with the accompanying comment suggesting it's a functional setting.
Additional Locations (1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
306-329: Clear_all_sent_textsper response to allow repeated phrases across turns.The deduplication set
_all_sent_textsgrows indefinitely and is never cleared, preventing legitimate repetition of phrases across different conversation turns. For example, saying "Hello!" in the first response will block "Hello!" in all subsequent responses forever.As noted in a past review comment, the set should be scoped to the active response. Clear it when the
item_idchanges to limit deduplication to intra-response duplicates only.Apply this diff:
if item_id != self._current_response_id: if self._text_buffer: # Send any accumulated text from previous response text_to_send = self._text_buffer.strip() if text_to_send and text_to_send not in self._all_sent_texts: await self._send_text_to_heygen(text_to_send) self._all_sent_texts.add(text_to_send) self._text_buffer = "" self._current_response_id = item_id + self._all_sent_texts.clear()Also consider clearing the set in the
on_text_completehandler (around line 212-213) after dispatching all sentences.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (10)
plugins/heygen/README.md(1 hunks)plugins/heygen/example/README.md(1 hunks)plugins/heygen/example/avatar_example.py(1 hunks)plugins/heygen/example/avatar_realtime_example.py(1 hunks)plugins/heygen/tests/test_heygen_plugin.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/__init__.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_session.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- plugins/heygen/example/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- plugins/heygen/README.md
- plugins/heygen/vision_agents/plugins/heygen/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/heygen/example/avatar_example.pyplugins/heygen/vision_agents/plugins/heygen/heygen_video_track.pyplugins/heygen/example/avatar_realtime_example.pyplugins/heygen/tests/test_heygen_plugin.pyplugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.pyplugins/heygen/vision_agents/plugins/heygen/heygen_session.pyplugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py
🧬 Code graph analysis (7)
plugins/heygen/example/avatar_example.py (2)
agents-core/vision_agents/core/edge/types.py (1)
User(22-25)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)
VideoQuality(6-11)AvatarPublisher(29-401)
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)
agents-core/vision_agents/core/utils/queue.py (2)
LatestNQueue(6-28)put_latest_nowait(22-28)agents-core/vision_agents/core/agents/agents.py (1)
recv(946-947)
plugins/heygen/example/avatar_realtime_example.py (4)
agents-core/vision_agents/core/edge/types.py (1)
User(22-25)agents-core/vision_agents/core/agents/agents.py (2)
Agent(107-1340)finish(534-565)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)
VideoQuality(6-11)AvatarPublisher(29-401)plugins/heygen/example/avatar_example.py (1)
start_avatar_agent(12-63)
plugins/heygen/tests/test_heygen_plugin.py (4)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (4)
AvatarPublisher(29-401)VideoQuality(6-11)publish_video_track(350-364)state(366-378)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (2)
HeyGenVideoTrack(14-187)stop(178-187)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (2)
HeyGenRTCManager(21-269)is_connected(254-256)plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (1)
HeyGenSession(11-238)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (4)
agents-core/vision_agents/core/processors/base_processor.py (3)
AudioVideoProcessor(111-140)VideoPublisherMixin(78-80)AudioPublisherMixin(83-85)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (7)
HeyGenRTCManager(21-269)set_video_callback(225-231)set_audio_callback(233-239)connect(62-147)send_text(241-251)is_connected(254-256)close(258-269)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (4)
HeyGenVideoTrack(14-187)start_receiving(48-66)recv(141-176)stop(178-187)agents-core/vision_agents/core/llm/events.py (3)
LLMResponseChunkEvent(90-105)LLMResponseCompletedEvent(109-115)RealtimeAgentSpeechTranscriptionEvent(151-156)
plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (2)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)
VideoQuality(6-11)close(380-401)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (1)
close(258-269)
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (2)
plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (5)
HeyGenSession(11-238)create_session(50-90)start_session(92-141)send_task(143-193)close(228-238)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)
VideoQuality(6-11)close(380-401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (29)
plugins/heygen/example/avatar_realtime_example.py (2)
1-17: LGTM!The imports and logging configuration are well-structured. The use of
load_dotenv()for API key management is appropriate for an example script.
56-58: Thesimple_responseAPI is a valid, standardized method across all LLM implementations in the codebase, including Gemini Realtime. The method is documented in the base LLM class and all plugin implementations. Your usage is correct.Likely an incorrect or invalid review comment.
plugins/heygen/tests/test_heygen_plugin.py (4)
9-28: LGTM!The tests for
HeyGenSessionappropriately cover initialization with and without an API key, including proper error handling verification.
31-47: LGTM!The video track tests correctly verify initialization parameters and the stop lifecycle method.
50-73: LGTM!The RTC manager tests appropriately mock dependencies and verify connection state management.
76-122: LGTM!The
AvatarPublishertests effectively use mocking to verify initialization, video track publishing, and state reporting without requiring real connections.plugins/heygen/example/avatar_example.py (1)
1-10: LGTM!The imports are clean and follow best practices.
plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (5)
1-46: LGTM!The class initialization is well-structured with appropriate placeholder handling and a small frame queue for low-latency streaming. Docstrings follow Google style guidelines.
48-66: LGTM! Reattachment logic correctly implemented.The fix for allowing new source tracks is properly implemented. When a new track arrives, the existing receiver is cancelled and awaited (handling
CancelledError), then a fresh task is created with the new source.
68-101: LGTM!The frame receiving loop is well-structured with proper error handling, type checking, and automatic resizing when needed.
103-139: LGTM!The resize logic correctly maintains aspect ratio with letterboxing and uses high-quality LANCZOS resampling. The fallback to the original frame on error prevents crashes.
141-187: LGTM!The
recvandstopmethods correctly implement the VideoStreamTrack interface with proper timestamp management and cleanup.plugins/heygen/vision_agents/plugins/heygen/heygen_session.py (5)
50-90: LGTM!The session creation method has proper error handling, informative error messages, and correctly stores session state.
92-141: LGTM!The
start_sessionmethod correctly validates prerequisites and handles the optional SDP answer parameter.
143-193: LGTM!The
send_taskmethod appropriately uses non-fatal error handling, allowing the avatar to continue functioning even if individual task submissions fail.
195-226: LGTM!The
stop_sessionmethod gracefully handles missing sessions and uses appropriate non-fatal error handling for cleanup operations.
228-238: LGTM!The
closemethod properly cleans up all resources in the correct order.plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)
1-27: LGTM!The
VideoQualityenum is correctly placed before other imports to prevent circular dependencies, though a dedicated types module would be more maintainable long-term.
29-116: LGTM!The initialization is well-structured. Creating the audio track immediately (lines 98-101) is necessary for the Agent to detect publishing capabilities during initialization, as explained in the comment.
118-138: LGTM!The audio track publishing and agent attachment methods are straightforward and well-documented.
140-156: LGTM!The connection method properly sets up callbacks and handles errors.
233-304: LGTM!The media track handlers correctly differentiate between Realtime and standard LLMs, forwarding HeyGen audio only when appropriate.
350-401: LGTM!The video track publishing, state reporting, and cleanup methods are properly implemented with correct resource management.
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (6)
21-60: LGTM!The RTC manager initialization is well-structured with proper dependency injection and state management. Docstrings follow Google style guidelines.
62-147: LGTM!The WebRTC connection setup follows the correct offer/answer flow with robust SDP format handling and appropriate timeouts. The comprehensive docstring clearly explains the HeyGen-specific connection sequence.
149-202: LGTM!The ICE server parsing robustly handles multiple response formats and provides a sensible fallback. The comment about LiveKit embedding servers in SDP (lines 194-197) is helpful context.
204-223: LGTM!The track handler correctly dispatches incoming media to registered callbacks with appropriate warnings when callbacks are missing.
225-251: LGTM!The callback setters and text sending method provide a clean API for managing HeyGen interactions.
253-269: LGTM!The connection status property and cleanup method are correctly implemented with proper resource management order.
| # Note: mute_llm_audio is not needed since streaming LLM doesn't produce audio | ||
| processors=[ | ||
| heygen.AvatarPublisher( | ||
| avatar_id="default", # Use your HeyGen avatar ID | ||
| quality=VideoQuality.HIGH, # Video quality: VideoQuality.LOW, VideoQuality.MEDIUM, or VideoQuality.HIGH | ||
| resolution=(1920, 1080), # Output resolution | ||
| mute_llm_audio=False, # Not needed for streaming LLM | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Remove or verify the mute_llm_audio parameter.
The AvatarPublisher.__init__ signature in the provided code does not include a mute_llm_audio parameter. The comment states it's not needed for streaming LLMs, yet it's explicitly set to False at line 49. Either this parameter doesn't exist (and should be removed), or the comment is incorrect.
Run the following script to verify the parameter:
🏁 Script executed:
#!/bin/bash
# Description: Check if mute_llm_audio parameter exists in AvatarPublisher
# Search for mute_llm_audio in HeyGen plugin files
rg -n "mute_llm_audio" --type=pyLength of output: 244
I'll search the repository directly for the AvatarPublisher class definition to verify the parameter:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Find AvatarPublisher class definition in heygen plugin
# Search for class definition
rg -n "class AvatarPublisher" --type=py -A 20
echo "---"
echo "Also check __init__ parameters:"
rg -n "def __init__" --type=py -B 2 -A 10 plugins/heygenLength of output: 7445
I need to see the complete __init__ method implementation and docstring:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Get the full __init__ method implementation for AvatarPublisher
cat plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py | sed -n '53,120p'Length of output: 2509
Remove the mute_llm_audio parameter from the AvatarPublisher instantiation.
The AvatarPublisher.__init__ method does not document or explicitly handle the mute_llm_audio parameter. While **kwargs allows it to be passed, the parameter is not processed and will be silently ignored. Line 43 states it's "not needed," yet line 49 explicitly sets it—this inconsistency should be resolved by removing it from the example.
processors=[
heygen.AvatarPublisher(
avatar_id="default",
quality=VideoQuality.HIGH,
resolution=(1920, 1080),
)🤖 Prompt for AI Agents
In plugins/heygen/example/avatar_example.py around lines 43 to 50, the
AvatarPublisher example passes an undocumented mute_llm_audio keyword that is
silently ignored; remove the mute_llm_audio=False argument and its inline
comment from the Processor instantiation and update the surrounding comment to
reflect that no mute parameter is required for streaming LLMs so the example
only includes avatar_id, quality, and resolution.
plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py
Outdated
Show resolved
Hide resolved
| output_frame.pts = pts | ||
| output_frame.time_base = time_base | ||
|
|
||
| return output_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Reference bug: last_frame mutated by output_frame copy
In recv, the code assigns self.last_frame to output_frame, creating a reference instead of a copy as the comment suggests. This means updating output_frame.pts and time_base directly modifies self.last_frame, which can lead to timestamp corruption and timing issues for the cached frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
agents-core/vision_agents/core/agents/agents.py (1)
316-320: Audio track initialization from publishers follows the established pattern.The logic correctly initializes the audio track from the first audio publisher when present, maintaining a clear priority: realtime output → audio publisher → default edge track. This mirrors the video publisher approach.
Two refinements to consider:
Multiple publishers: When multiple audio publishers exist, only the first is used silently. Consider logging a warning if
len(self.audio_publishers) > 1to help users debug unexpected behavior. (Note: video publishers have the same pattern at line 339.)Return value validation: The code assumes
publish_audio_track()returns a valid audio track. Consider adding a basic check:elif self.audio_publishers: # Get the first audio publisher to create the track audio_publisher = self.audio_publishers[0] + if len(self.audio_publishers) > 1: + self.logger.warning( + f"Multiple audio publishers found ({len(self.audio_publishers)}), using first: {type(audio_publisher).__name__}" + ) self._audio_track = audio_publisher.publish_audio_track() + if not self._audio_track: + raise ValueError(f"Audio publisher {type(audio_publisher).__name__} returned None from publish_audio_track()") self.logger.info("🎵 Audio track initialized from audio publisher")These are optional improvements since the current implementation follows existing conventions.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
agents-core/pyproject.toml(2 hunks)agents-core/vision_agents/core/agents/agents.py(3 hunks)pyproject.toml(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- agents-core/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
agents-core/vision_agents/core/agents/agents.py
🧬 Code graph analysis (1)
agents-core/vision_agents/core/agents/agents.py (3)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)
_attach_agent(117-129)publish_audio_track(109-115)agents-core/vision_agents/core/llm/llm.py (1)
_attach_agent(185-191)agents-core/vision_agents/core/processors/base_processor.py (1)
publish_audio_track(84-85)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Cursor Bugbot
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Ruff & mypy
🔇 Additional comments (3)
pyproject.toml (1)
24-24: Workspace integration verified and correct.The HeyGen plugin entries are correctly added to both
[tool.uv.sources](line 24) and[tool.uv.workspace] members(line 54), maintaining alphabetical ordering and following established patterns. The plugin directory structure is properly configured with a valid[project]section in its pyproject.toml.agents-core/vision_agents/core/agents/agents.py (2)
184-190: Clean extension of audio publishing logic.The expanded
publish_audioproperty correctly considers audio publishers (like the HeyGen avatar) alongside TTS and realtime mode. The docstring update accurately reflects this change and follows the Google style guide.
223-226: Processor attachment is correct and follows established patterns.The loop properly uses
hasattrto guard against processors lacking the hook, and_attach_agentis safely called after LLM attachment. The HeyGenAvatarPublisher implementation is idempotent—multiple calls simply re-assign the agent and re-subscribe, causing no issues. The_subscribe_to_text_eventsmethod already includes defensive error handling.The fail-fast behavior (exceptions in
_attach_agentcrash agent initialization) is a reasonable design choice that surfaces configuration errors early rather than silently masking them.
| if hasattr(frame, 'to_ndarray'): | ||
| audio_array = frame.to_ndarray() | ||
| audio_bytes = audio_array.tobytes() | ||
| await dest_track.write(audio_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Type Mismatch: Bytes Instead of PCM Data
The code calls await dest_track.write(audio_bytes) where audio_bytes is raw bytes from audio_array.tobytes(). However, according to the OutputAudioTrack protocol (agents-core/vision_agents/core/edge/types.py line 45), the write method expects PcmData, not raw bytes. This type mismatch will cause a runtime error when HeyGen audio forwarding is attempted with standard (non-Realtime) LLMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (1)
251-253: Remove commented code.The commented line
#e = SessionUpdatedEvent(**event)should be removed entirely rather than left as dead code.Apply this diff to clean up the commented code:
elif et == "session.updated": pass - #e = SessionUpdatedEvent(**event)
♻️ Duplicate comments (1)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (1)
149-222: Past issue not addressed: _all_sent_texts prevents repeated sentences across conversation turns.The past review flagged that
self._all_sent_textsis never cleared, causing any sentence repeated in a later LLM turn to be silently dropped. This issue persists in lines 202-204 where_text_bufferand_current_response_idare reset but_all_sent_textsis not. Normal conversation patterns—like greeting with "Hi there!" multiple times—will break.Apply the suggested fix from the past review:
if item_id != self._current_response_id: if self._text_buffer: # Send any accumulated text from previous response text_to_send = self._text_buffer.strip() if text_to_send and text_to_send not in self._all_sent_texts: await self._send_text_to_heygen(text_to_send) self._all_sent_texts.add(text_to_send) self._text_buffer = "" self._current_response_id = item_id + self._all_sent_texts.clear()Also consider clearing
_all_sent_textsin theon_text_completehandler after sending all sentences (after line 200).
🧹 Nitpick comments (4)
plugins/openai/vision_agents/plugins/openai/openai_realtime.py (2)
231-234: Clean up or document the commented flush call.The commented
await self.output_track.flush()should either be removed if no longer needed or uncommented with documentation explaining why flushing the output track on speech start is necessary.
243-244: Removal of response.created handler verified as safe—no other code depends on it.The grep search found no other references to this event handler in the codebase. However, remove the commented code at lines 234 and 251–253; they're code smells that should be cleaned up rather than left commented out.
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (2)
264-295: Clarify error handling for frames without to_ndarray().Line 285 logs a warning when a frame lacks the
to_ndarray()method, but the code continues to the next iteration without explicitly breaking or continuing. Consider whether this case shouldcontinueto the next frame orbreakthe loop entirely.Apply this diff to make the intent explicit:
if hasattr(frame, 'to_ndarray'): audio_array = frame.to_ndarray() audio_bytes = audio_array.tobytes() await dest_track.write(audio_bytes) else: logger.warning("Received frame without to_ndarray() method") + continue
371-392: Consider stopping the audio track for consistency.The
close()method stops the video track but does not stop the audio track (_audio_track). For symmetry and complete cleanup, consider stopping the audio track as well.Apply this diff to stop the audio track:
async def close(self) -> None: """Clean up resources and close connections.""" logger.info("Closing HeyGen avatar publisher") # Stop video track if self._video_track: self._video_track.stop() + + # Stop audio track + if self._audio_track: + self._audio_track.stop() # Close RTC connection if self.rtc_manager: await self.rtc_manager.close()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py(1 hunks)plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py(1 hunks)plugins/openai/vision_agents/plugins/openai/openai_realtime.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- plugins/gemini/vision_agents/plugins/gemini/gemini_realtime.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/python.mdc)
**/*.py: Do not modify sys.path in Python code
Docstrings must follow the Google style guide
Files:
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.pyplugins/openai/vision_agents/plugins/openai/openai_realtime.py
🧬 Code graph analysis (1)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (6)
agents-core/vision_agents/core/processors/base_processor.py (3)
AudioVideoProcessor(111-140)VideoPublisherMixin(78-80)AudioPublisherMixin(83-85)plugins/heygen/vision_agents/plugins/heygen/heygen_rtc_manager.py (7)
HeyGenRTCManager(19-267)set_video_callback(223-229)set_audio_callback(231-237)connect(60-145)send_text(239-249)is_connected(252-254)close(256-267)plugins/heygen/vision_agents/plugins/heygen/heygen_types.py (1)
VideoQuality(6-11)plugins/heygen/vision_agents/plugins/heygen/heygen_video_track.py (4)
HeyGenVideoTrack(14-187)start_receiving(48-66)recv(141-176)stop(178-187)agents-core/vision_agents/core/llm/events.py (3)
LLMResponseChunkEvent(87-102)LLMResponseCompletedEvent(106-112)RealtimeAgentSpeechTranscriptionEvent(148-153)agents-core/vision_agents/core/agents/agents.py (3)
subscribe(291-303)recv(951-952)close(572-644)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Cursor Bugbot
- GitHub Check: unit / Test "not integration"
- GitHub Check: unit / Test "not integration"
🔇 Additional comments (11)
plugins/heygen/vision_agents/plugins/heygen/heygen_avatar_publisher.py (11)
1-18: LGTM!Imports are well-organized and appropriate for the module's functionality.
20-107: Well-structured initialization.The class properly initializes all components including RTC manager, video track, and audio track. The docstring follows Google style guide conventions.
109-115: LGTM!Simple accessor method correctly returns the audio track for agent consumption.
117-129: LGTM!Agent attachment and event subscription are handled correctly.
131-147: LGTM!Connection setup with callbacks and error handling is well-implemented.
224-231: LGTM!Video track callback correctly delegates frame receiving to the HeyGenVideoTrack.
233-262: LGTM!Audio track handling correctly distinguishes between Realtime LLMs (video-only) and standard LLMs (video + audio forwarding).
297-320: LGTM!Text chunk accumulation logic correctly handles response boundaries and buffers text for sentence-based sending.
322-339: LGTM!Text sending with connection state validation and error handling is well-implemented.
341-355: LGTM!Lazy connection initialization and video track publishing are correctly implemented.
357-369: LGTM!State reporting method provides appropriate visibility into publisher status.
Summary by CodeRabbit
Note
Introduces a HeyGen avatar plugin (AvatarPublisher with WebRTC session/RTC/video track), adds examples/tests and workspace wiring, and updates Agent to attach processors and publish audio via audio publishers.
AvatarPublisher,HeyGenRTCManager,HeyGenSession,HeyGenVideoTrack, andVideoQualityinplugins/heygen/vision_agents/plugins/heygen/*.plugins/heygen/README.md,plugins/heygen/example/*) and unit tests (plugins/heygen/tests/*).plugins/heygen/pyproject.toml, examplepyproject.toml).vision_agents/core/agents/agents.py: attach processors via_attach_agent; treataudio_publishersas a reason topublish_audio; in_prepare_rtc, initialize audio track from firstaudio_publisherwhen present.agents-core/pyproject.toml), workspace members (pyproject.toml,uv.lock), and AWS example lock.plugins/gemini/.../gemini_realtime.py: normalize MIME string.plugins/openai/.../openai_realtime.py: remove unused imports and minor event handling cleanup.Written by Cursor Bugbot for commit 12cad15. This will update automatically on new commits. Configure here.