feat: add Discussion TTS with per-agent voice assignment by wyuc · Pull Request #211 · THU-MAIC/OpenMAIC

wyuc · 2026-03-22T15:18:45Z

Summary

Add TTS (text-to-speech) support to the discussion phase, enabling every agent to speak with distinct voices during classroom discussions.

Per-agent voice config: Each agent can have a different TTS provider + voice, configured via the AgentBar voice picker with preview
Discussion TTS playback: New useDiscussionTTS hook manages per-segment audio queue with ordered playback
Bubble hold: StreamBuffer waits for TTS audio to finish before advancing to next segment/agent
Audio indicator: Equalizer bar animation on roundtable bubble (amber = generating, agent color = playing)
Cross-provider: Agents can use different TTS providers (e.g., teacher on Qwen, student on OpenAI)
LLM voice selection: Auto-generated agents get voice matching their persona via LLM
Settings simplification: TTS settings reduced to toggle + provider config; voice config moved to AgentBar
Playback speed: Discussion TTS respects speed setting, switchable in real-time

Files changed (17 files, +1027 -597)

Area	Files
Core	`lib/hooks/use-discussion-tts.ts` (new), `lib/audio/voice-resolver.ts` (new), `lib/buffer/stream-buffer.ts`
UI	`components/agent/agent-bar.tsx`, `components/roundtable/audio-indicator.tsx` (new), `components/roundtable/index.tsx`
Integration	`components/stage.tsx`, `components/chat/chat-area.tsx`, `components/chat/use-chat-sessions.ts`
Settings	`components/settings/audio-settings.tsx`, `components/generation/media-popover.tsx`, `components/canvas/canvas-toolbar.tsx`
Data	`lib/orchestration/registry/types.ts`, `lib/orchestration/registry/store.ts`
Generation	`app/api/generate/agent-profiles/route.ts`, `app/generation-preview/page.tsx`

Tracking: #39, #27, #109

Test plan

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…dtable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace native select styling with a compact rounded-full pill that blends into the agent row. Remove border, use muted bg, smaller text, and a custom chevron icon. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace native <select> with shadcn Select component for consistent UI - Hide voice dropdown when TTS is muted (ttsMuted) - Compact pill-style trigger with rounded-full, no border, muted bg Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Show "音色: Alloy" instead of plain "Alloy" in the voice pill. Always show dropdown regardless of mute state (voice config is independent of playback). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Show a small Volume2 icon in the collapsed pill when voice config is available, hinting that voice settings are inside. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move voice pill below agent name (second line) to prevent horizontal overflow in English - Wrap Select in div with onPointerDown stopPropagation to fix Radix click-through to parent row - Add line-clamp-1 to descriptions for consistent row height - Use items-start instead of items-center for better multi-line alignment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Single-line layout: checkbox · avatar · name · role · voice pill - Remove descriptions from agent rows (saves vertical space) - Extract AgentVoicePill component to isolate Select event handling - Smaller avatars (size-7), tighter row padding (py-1.5) - Voice pill uses Volume2 icon + voice name (no prefix text) - Works in both Chinese and English without overflow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Change voiceConfig from per-provider lookup to explicit { providerId, voiceId } per agent - Each agent can use a different TTS provider's voice - Voice picker dropdown groups voices by provider - useDiscussionTTS routes TTS requests per agent's provider - resolveAgentVoice falls back to global provider if no config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Give role badge fixed width (w-14 text-right) so role text and voice pills align vertically across all rows regardless of agent name length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Wrap role badge + voice pill in a fixed-width container (w-[9.5rem] justify-end) so both align vertically across all agent rows regardless of name or role text length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add min-w-[52px] text-right to role badge so it starts at a consistent position regardless of agent name length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tal is open

- Replace Radix Select with Popover + button list (fixes click issue) - Fix getAvailableProvidersWithVoices to always include global provider - Widen panel from w-80 to w-96 (prevents name truncation) - Voice pill uses primary color instead of gray (more visible) - Extract renderAgentRow helper to reduce duplication - Popover shows voices grouped by provider with active state - Add findVoiceDisplayName utility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…as no voices

Voice resolution now only depends on available providers (those with API keys or server-configured). No more globalProviderId parameter. Fallback is first available provider, then browser-native-tts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Load speechSynthesis.getVoices() in AgentBar and include as a "Browser Native" provider group in the voice popover. No API key needed - always available if browser supports it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Toolbar: - Replace volume slider with simple TTS on/off toggle button - Remove ttsMuted/ttsVolume/onVolumeChange props from CanvasToolbar - Toggle now controls ttsEnabled (not ttsMuted) AgentBar: - Collapsed: show VolumeX icon when TTS disabled - Voice pills show disabled state (gray, cursor-not-allowed, no popover) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove voice selection, speed slider, preview/test, Azure locale filter from Settings TTS tab. Voice is now per-agent in AgentBar. Keep: on/off toggle, provider selector, API key + base URL config. Add hint text pointing to AgentBar for voice configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When buffer drains (text=null) but audio indicator is still active, don't clear liveSpeech. Clear it only when audio state goes idle. This keeps the speech bubble visible until TTS finishes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When StreamBuffer fires the done signal (onLiveSpeech null), Stage now checks if TTS is still playing. If so, it defers clearing the bubble state. The bubble stays visible until onAllAudioEnd fires from the TTS hook (queue empty + nothing playing), then clears. This prevents the jarring UX where the bubble disappears while the agent's voice is still audible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Root cause: bubble disappears because doSessionCleanup fires via onStopSession when the agent loop ends naturally, NOT because of onLiveSpeech(null, null). Fix: when onStopSession fires and TTS is still playing, defer doSessionCleanup to onAllAudioEnd callback. Manual stop (user presses button) still cleans up immediately via handleStopDiscussion. Use doSessionCleanupRef to avoid circular dependency between discussionTTS hook and doSessionCleanup useCallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two paths clear the bubble: 1. onLiveSpeech(null, null) from StreamBuffer done → clears liveSpeech 2. onStopSession → doSessionCleanup → clears all state Both fire when agent loop ends. Path 1 fires first (tick loop), path 2 fires after (waitUntilDrained resolves). Both must be guarded when TTS is still playing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Client sends available voices (providerId + voiceId + name) to /api/generate/agent-profiles - LLM prompt asks to pick a voice matching each agent's personality - Parse "providerId::voiceId" from response, save as voiceConfig - Fallback to index-based assignment if LLM doesn't pick - Browser native voices hidden when server providers are available - saveGeneratedAgents accepts and persists voiceConfig Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert the toolbar simplification from 36e3997 that replaced the volume slider with a TTS on/off toggle. The volume control with hover slider is a core classroom UX. TTS on/off is controlled via Settings and Media popover instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ceConfig override

- Issue 2: enabled flag now checks ttsEnabled && !ttsMuted in stage.tsx - Issue 4: remove unused browserAvailableVoices from useDiscussionTTS - Issue 5: remove dead code in audio-settings.tsx (Slider, Loader2, handleTTSVoiceChange, handleTTSSpeedChange, handleTestTTS, testingTTS, ttsTestStatus, ttsTestMessage, testText, ttsSpeed, setTTSSpeed, and unused browser-tts-preview imports) - Issue 6: shouldHold now checks queue length in addition to isPlayingRef - Issue 8: hide AgentVoicePill for teacher row in agent-bar.tsx (teacher voice is controlled in Settings) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cosarah

Code Review — feat: add Discussion TTS with per-agent voice assignment

整体评价

很不错的 PR，架构清晰，功能完整。per-agent voice 的设计思路合理——通过 voiceConfig 持久化到 agent registry，fallback 到 deterministic index-based 选择。useDiscussionTTS hook 和 StreamBuffer 的 shouldHoldAfterReveal 集成干净。Settings 页面精简后把 voice 配置移到 AgentBar 是正确的方向。

Strengths

架构分层清晰：voice-resolver.ts 封装了 voice 解析逻辑，useDiscussionTTS 封装了队列+播放，audio-indicator.tsx 封装了可视化——每层职责明确
StreamBuffer hold 机制：通过 shouldHoldAfterReveal callback 实现 TTS 等待，不侵入 buffer 核心逻辑，只增加了一个可选回调。非常干净
Playback speed 实时同步：useEffect 监听 playbackSpeed 并直接同步到 audioRef.current.playbackRate，用户体验好
i18n 覆盖完整：新增的文案都有中英文对照
Settings 精简：audio-settings.tsx 删掉了 ~400 行冗余的 voice picker UI，改为指引到 AgentBar——减少了维护面

Issues

Important

handlePreview 中 server TTS 请求缺少 abort 机制
- components/agent/agent-bar.tsx:98-124
- 当用户快速切换 voice preview 时，stopPreview() 只停止了 browser TTS 和已创建的 Audio，但 server fetch 请求没有被 abort。如果网络慢，多个 fetch 可能并发，旧的响应可能覆盖新的状态
- 建议：加一个 AbortController，在 stopPreview 中 abort
processQueue 中 error handler 的递归风险
- lib/hooks/use-discussion-tts.ts:150-163
- audio.error 和 catch block 中都调用 processQueueRef.current()。如果队列中连续多个 item 都触发 error（比如 API key 失效），会形成快速递归调用链
- 建议：用 queueMicrotask 或 setTimeout(…, 0) 延迟调用 processQueueRef.current()，避免同步递归栈溢出
resolveAgentVoice 的 voiceConfig 校验过于严格
- lib/audio/voice-resolver.ts:22-26
- getServerVoiceList 对 browser-native-tts 返回空数组，导致 agent 配置了 browser-native-tts voice 时会 fallback 到 deterministic 选择，丢失用户配置
- 建议：对 browser-native-tts 直接返回 voiceConfig 而不走 getServerVoiceList 校验

Minor

Roundtable 中直接调用 useAgentRegistry.getState() 在 render 中
- components/roundtable/index.tsx:1000-1004, components/roundtable/index.tsx:504-506
- 多处在 JSX render 函数内直接调用 useAgentRegistry.getState().getAgent(…)，这不会触发 re-render。目前因为父组件传入的 props 会触发重渲染所以碰巧能工作，但不够 robust
- 建议：用 useAgentRegistry((s) => s.getAgent(id)) 或提前在组件顶部解析
agentIndexMap 可能出现 stale 引用
- lib/hooks/use-discussion-tts.ts:57-62
- agentIndexMap 是个 ref，通过 useEffect 更新。如果 agents 变化后 resolveVoiceForAgent 在同一个渲染周期内被调用，可能读到旧的 map
- 影响较小（agents 列表变化不频繁），但可以考虑用 useMemo 替代
AgentVoicePill 的 preview 文案硬编码
- components/agent/agent-bar.tsx:82-83
- 'Welcome to AI Classroom' 和 '欢迎来到AI课堂' 硬编码，未走 i18n
- 建议移到 i18n strings 中
sealLastText 中的 onSegmentSealed 回调使用 this.currentAgentId
- lib/buffer/stream-buffer.ts:420
- 如果 seal 发生在 agent_end 之后（push 顺序问题），currentAgentId 可能已经改变。当前 push 流程中 sealLastText 在 pushAgentEnd 之前被调用所以没问题，但这个隐含依赖不够明显
- 建议加个注释说明 seal 的 ordering invariant

Assessment

Ready to merge: With fixes

核心架构扎实，Important #1（preview abort）和 #2（递归风险）建议修复后合并。#3 可以作为 follow-up。Minor issues 不影响功能正确性。

1. Add AbortController to voice preview server TTS fetch, abort on stopPreview to prevent stale responses on rapid switching 2. Use queueMicrotask for processQueue calls in error/ended handlers to prevent synchronous recursion if multiple items fail consecutively 3. Add ordering invariant comment on sealLastText's onSegmentSealed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cosarah

Re-review after latest updates

Good improvements since the last round — preview abort controller, queueMicrotask for error recovery, volume sync, teacher voice pill, buffer-level pause with spacebar shortcut, and the sealLastText ordering comment. Here's where things stand.

Fixed since last review

Preview abort: previewAbortRef added to AgentVoicePill, stopPreview aborts in-flight fetch.
Recursive queue drain: error/ended handlers use queueMicrotask() instead of direct calls.
Teacher voice pill: Teacher row in AgentBar now renders AgentVoicePill.
sealLastText ordering: Comment explains why this.currentAgentId is safe.
Volume sync: useDiscussionTTS respects ttsVolume/ttsMuted changes in real-time.
Buffer-level pause: pauseActiveLiveBuffer/resumeActiveLiveBuffer with livePausedRef sticky intent that survives buffer recreation across turns. Spacebar shortcut in Roundtable is a nice UX touch.

Remaining issues

Worth documenting / deciding on

Browser-native TTS is invisible when any server provider is configured

Two things going on here:

In agent-bar.tsx:266-279, availableProviders is built with an either/or approach — if getAvailableProvidersWithVoices() returns any server providers, browser-native voices are excluded entirely. Users can't pick a browser voice for any agent as long as they have at least one server TTS provider configured with an API key. Browser voices only appear as a fallback when zero server providers are available.

Separately, in voice-resolver.ts:21-26, resolveAgentVoice validates a saved voiceConfig by checking getServerVoiceList(providerId), which returns [] for browser-native-tts (browser voices are dynamic, not in the static registry). So if a user previously selected a browser voice (while no server providers were configured), then later adds a server provider, the saved browser voiceConfig silently fails validation and falls through to the deterministic server-voice fallback.

Not necessarily a bug if the intent is "browser-native is purely a degraded fallback", but worth calling out since the behavior is non-obvious. If mixed mode (some agents on server TTS, some on browser) should be supported in the future, both places need changes.

Minor

useAgentRegistry.getState() called inside render bodies
- components/roundtable/index.tsx — multiple places (AudioIndicator color, HoverCard content, student loop, ProactiveCard)
- getState() reads imperatively without subscribing — works today because parent prop changes trigger re-renders, but would break if those subtrees get memoized later. Not urgent since agent config rarely changes mid-session.
Preview text not i18n'd
- components/agent/agent-bar.tsx:83-86 — hardcoded 'Welcome to AI Classroom' / '欢迎来到AI课堂' with a direct localStorage read for generationLanguage. Bypasses the i18n system.
agentIndexMap ref could go stale within a render
- lib/hooks/use-discussion-tts.ts:57-62 — ref updated via useEffect (runs after render). If agents changes and resolveVoiceForAgent fires in the same render cycle, it reads the old map. Unlikely in practice since agents rarely change, but useMemo would be strictly correct.

Verdict

Ready to merge. The browser-native TTS behavior (#1) is worth a design decision but isn't blocking — it works fine as a fallback-only mode, just needs to be an intentional choice rather than an accident. The rest are minor cleanup items for follow-ups.

… loop

Teacher voice pill now reads/writes global ttsProviderId + ttsVoice (same settings used by lecture TTS). This ensures lecture and discussion always use the same teacher voice. Student agents still use per-agent voiceConfig. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each avatar now has a one-line description (appearance, vibe) sent to the agent-profiles generation API. LLM picks avatars matching agent personality instead of guessing from file paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cosarah

LGTM

* feat(tts): add resolveVoice() and getServerVoiceList() utilities Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tts): add AudioIndicator equalizer bars component Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tts): add onSegmentSealed callback to StreamBuffer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tts): add voiceOverrides field to AgentConfig and AgentTemplate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tts): add useDiscussionTTS hook with audio queue and cleanup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): add audio state indicator to Roundtable bubble Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tts): wire onSegmentSealed callback through chat sessions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): add per-agent voice dropdown to AgentBar Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): integrate useDiscussionTTS in Stage and pass state to Roundtable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style(tts): refine voice dropdown to pill-style selector Replace native select styling with a compact rounded-full pill that blends into the agent row. Remove border, use muted bg, smaller text, and a custom chevron icon. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style(tts): use shadcn Select for voice dropdown, link with TTS toggle - Replace native <select> with shadcn Select component for consistent UI - Hide voice dropdown when TTS is muted (ttsMuted) - Compact pill-style trigger with rounded-full, no border, muted bg Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style(tts): add voice label prefix and always show dropdown Show "音色: Alloy" instead of plain "Alloy" in the voice pill. Always show dropdown regardless of mute state (voice config is independent of playback). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style(tts): add volume icon hint in collapsed AgentBar Show a small Volume2 icon in the collapsed pill when voice config is available, hinting that voice settings are inside. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): fix voice dropdown layout and click handling - Move voice pill below agent name (second line) to prevent horizontal overflow in English - Wrap Select in div with onPointerDown stopPropagation to fix Radix click-through to parent row - Add line-clamp-1 to descriptions for consistent row height - Use items-start instead of items-center for better multi-line alignment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(tts): redesign AgentBar voice layout for compactness - Single-line layout: checkbox · avatar · name · role · voice pill - Remove descriptions from agent rows (saves vertical space) - Extract AgentVoicePill component to isolate Select event handling - Smaller avatars (size-7), tighter row padding (py-1.5) - Voice pill uses Volume2 icon + voice name (no prefix text) - Works in both Chinese and English without overflow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): cross-provider voice selection per agent - Change voiceConfig from per-provider lookup to explicit { providerId, voiceId } per agent - Each agent can use a different TTS provider's voice - Voice picker dropdown groups voices by provider - useDiscussionTTS routes TTS requests per agent's provider - resolveAgentVoice falls back to global provider if no config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): align role badge and voice pill across agent rows Give role badge fixed width (w-14 text-right) so role text and voice pills align vertically across all rows regardless of agent name length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): fix role badge and voice pill alignment Wrap role badge + voice pill in a fixed-width container (w-[9.5rem] justify-end) so both align vertically across all agent rows regardless of name or role text length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style(tts): align role badge and voice pill across agent rows Add min-w-[52px] text-right to role badge so it starts at a consistent position regardless of agent name length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): use fixed w-[60px] for role badge alignment * fix(tts): use fixed w-[88px] for voice pill alignment * fix(tts): prevent click-outside from closing AgentBar when Select portal is open * fix(tts): comprehensive voice picker rewrite - Replace Radix Select with Popover + button list (fixes click issue) - Fix getAvailableProvidersWithVoices to always include global provider - Widen panel from w-80 to w-96 (prevents name truncation) - Voice pill uses primary color instead of gray (more visible) - Extract renderAgentRow helper to reduce duplication - Popover shows voices grouped by provider with active state - Add findVoiceDisplayName utility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): align voice provider availability with toolbar logic * fix(tts): fallback to first available provider when global provider has no voices * refactor(tts): remove global provider fallback from voice resolution Voice resolution now only depends on available providers (those with API keys or server-configured). No more globalProviderId parameter. Fallback is first available provider, then browser-native-tts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): add browser native TTS voices to agent voice picker Load speechSynthesis.getVoices() in AgentBar and include as a "Browser Native" provider group in the voice popover. No API key needed - always available if browser supports it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): simplify toolbar TTS to on/off toggle, add disabled state Toolbar: - Replace volume slider with simple TTS on/off toggle button - Remove ttsMuted/ttsVolume/onVolumeChange props from CanvasToolbar - Toggle now controls ttsEnabled (not ttsMuted) AgentBar: - Collapsed: show VolumeX icon when TTS disabled - Voice pills show disabled state (gray, cursor-not-allowed, no popover) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(tts): simplify Settings TTS tab to toggle + provider config Remove voice selection, speed slider, preview/test, Azure locale filter from Settings TTS tab. Voice is now per-agent in AgentBar. Keep: on/off toggle, provider selector, API key + base URL config. Add hint text pointing to AgentBar for voice configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(tts): simplify media popover TTS tab to toggle only * fix(tts): add voice config hint to media popover TTS tab * feat(tts): add per-voice preview button in voice picker Each voice row in the popover has a small speaker icon button. Click to preview the voice with "欢迎来到AI课堂" / "Welcome to AI Classroom" (follows i18n). Browser native uses Web Speech API, server TTS calls /api/generate/tts. Click again or close popover to stop. Shows spinner while generating. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): preview text follows course language instead of UI language * refactor(tts): redesign AgentBar expanded panel layout - Teacher always at top with voice pill (works in both modes) - Mode tabs moved below teacher - Auto mode: single compact row with shuffle icon + description - Max turns: compact inline row with smaller input - Preset mode: only student agents listed (teacher already above) - Remove large shuffle animation from auto mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(tts): merge max turns into teacher row * refactor(tts): separate teacher row and max turns, use stepper UI - Teacher row: avatar + name + voice pill only - Max turns: bottom row with MessageSquare icon + compact stepper (minus/number/plus in a rounded pill) - Remove Input component dependency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): increase voice pill contrast in dark mode * fix(tts): make max turns input editable, tighten panel padding * fix(tts): restore shuffle animation in auto mode (compact version) * fix(tts): adjust auto mode text spacing and add voice auto-assign hint * fix(tts): auto-close voice popover after selecting a voice * fix(tts): increase auto mode vertical padding for better balance * fix(tts): push auto mode text toward bottom with flex spacer * fix(tts): reduce auto mode bottom padding * feat(tts): wait for TTS audio to finish before next agent turn Add waitForDrain() to useDiscussionTTS that returns a promise resolving when the audio queue is empty. The agent loop in useChatSessions now awaits this after buffer drain, so the next agent's turn doesn't start until the current agent's TTS audio finishes playing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): keep bubble visible while TTS audio is still playing When buffer drains (text=null) but audio indicator is still active, don't clear liveSpeech. Clear it only when audio state goes idle. This keeps the speech bubble visible until TTS finishes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): hold discussion bubble until TTS audio finishes When StreamBuffer fires the done signal (onLiveSpeech null), Stage now checks if TTS is still playing. If so, it defers clearing the bubble state. The bubble stays visible until onAllAudioEnd fires from the TTS hook (queue empty + nothing playing), then clears. This prevents the jarring UX where the bubble disappears while the agent's voice is still audible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): fix bubble hold - guard onStopSession instead of onLiveSpeech Root cause: bubble disappears because doSessionCleanup fires via onStopSession when the agent loop ends naturally, NOT because of onLiveSpeech(null, null). Fix: when onStopSession fires and TTS is still playing, defer doSessionCleanup to onAllAudioEnd callback. Manual stop (user presses button) still cleans up immediately via handleStopDiscussion. Use doSessionCleanupRef to avoid circular dependency between discussionTTS hook and doSessionCleanup useCallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): guard BOTH onLiveSpeech and onStopSession for bubble hold Two paths clear the bubble: 1. onLiveSpeech(null, null) from StreamBuffer done → clears liveSpeech 2. onStopSession → doSessionCleanup → clears all state Both fire when agent loop ends. Path 1 fires first (tick loop), path 2 fires after (waitUntilDrained resolves). Both must be guarded when TTS is still playing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): hold bubble during TTS playback and respect playback speed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(tts): LLM picks voice matching agent persona during generation - Client sends available voices (providerId + voiceId + name) to /api/generate/agent-profiles - LLM prompt asks to pick a voice matching each agent's personality - Parse "providerId::voiceId" from response, save as voiceConfig - Fallback to index-based assignment if LLM doesn't pick - Browser native voices hidden when server providers are available - saveGeneratedAgents accepts and persists voiceConfig Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): restore volume slider in classroom toolbar Revert the toolbar simplification from 36e3997 that replaced the volume slider with a TTS on/off toggle. The volume control with hover slider is a core classroom UX. TTS on/off is controlled via Settings and Media popover instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): teacher uses global lecture voice in discussion when no voiceConfig override * fix(tts): teacher always uses global lecture voice, no overrides * fix(tts): sync playback speed to currently playing audio in real-time * fix(tts): address code review issues - Issue 2: enabled flag now checks ttsEnabled && !ttsMuted in stage.tsx - Issue 4: remove unused browserAvailableVoices from useDiscussionTTS - Issue 5: remove dead code in audio-settings.tsx (Slider, Loader2, handleTTSVoiceChange, handleTTSSpeedChange, handleTestTTS, testingTTS, ttsTestStatus, ttsTestMessage, testText, ttsSpeed, setTTSSpeed, and unused browser-tts-preview imports) - Issue 6: shouldHold now checks queue length in addition to isPlayingRef - Issue 8: hide AgentVoicePill for teacher row in agent-bar.tsx (teacher voice is controlled in Settings) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tts): address PR review — abort preview fetch, defer error recovery 1. Add AbortController to voice preview server TTS fetch, abort on stopPreview to prevent stale responses on rapid switching 2. Use queueMicrotask for processQueue calls in error/ended handlers to prevent synchronous recursion if multiple items fail consecutively 3. Add ordering invariant comment on sealLastText's onSegmentSealed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tts): restore teacher voice pill, respect voiceConfig override * fix(tts): sync volume and mute to discussion TTS audio in real-time * fix(tts): allow browser-native TTS alongside server providers * fix(tts): remove top padding from voice popover content * fix(tts): make selectedAgents reactive to voiceConfig changes * fix(tts): use agents record instead of listAgents() to avoid infinite loop * fix(tts): single source of truth for teacher voice Teacher voice pill now reads/writes global ttsProviderId + ttsVoice (same settings used by lecture TTS). This ensures lecture and discussion always use the same teacher voice. Student agents still use per-agent voiceConfig. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add avatar descriptions for smarter LLM avatar selection Each avatar now has a one-line description (appearance, vibe) sent to the agent-profiles generation API. LLM picks avatars matching agent personality instead of guessing from file paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: 杨慎 <117187635+cosarah@users.noreply.github.com>

wyuc and others added 30 commits March 21, 2026 13:47

feat(tts): add resolveVoice() and getServerVoiceList() utilities

27ac0fc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tts): add AudioIndicator equalizer bars component

eabaad2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tts): add onSegmentSealed callback to StreamBuffer

9053b38

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tts): add voiceOverrides field to AgentConfig and AgentTemplate

ea8c189

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tts): add useDiscussionTTS hook with audio queue and cleanup

984bdb3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(tts): add audio state indicator to Roundtable bubble

93e9542

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tts): wire onSegmentSealed callback through chat sessions

38ef134

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(tts): add per-agent voice dropdown to AgentBar

266e976

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(tts): integrate useDiscussionTTS in Stage and pass state to Roun…

8e4e964

…dtable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style(tts): add voice label prefix and always show dropdown

a2e5124

Show "音色: Alloy" instead of plain "Alloy" in the voice pill. Always show dropdown regardless of mute state (voice config is independent of playback). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style(tts): add volume icon hint in collapsed AgentBar

0ffdf49

Show a small Volume2 icon in the collapsed pill when voice config is available, hinting that voice settings are inside. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(tts): align role badge and voice pill across agent rows

f6cc8b8

Give role badge fixed width (w-14 text-right) so role text and voice pills align vertically across all rows regardless of agent name length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(tts): fix role badge and voice pill alignment

cbad4df

Wrap role badge + voice pill in a fixed-width container (w-[9.5rem] justify-end) so both align vertically across all agent rows regardless of name or role text length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style(tts): align role badge and voice pill across agent rows

0431589

Add min-w-[52px] text-right to role badge so it starts at a consistent position regardless of agent name length. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(tts): use fixed w-[60px] for role badge alignment

e10d0b0

fix(tts): use fixed w-[88px] for voice pill alignment

82bc273

fix(tts): prevent click-outside from closing AgentBar when Select por…

13cf0a2

…tal is open

fix(tts): align voice provider availability with toolbar logic

b0b9ba3

fix(tts): fallback to first available provider when global provider h…

77b2b5b

…as no voices

refactor(tts): simplify media popover TTS tab to toggle only

27370b0

wyuc and others added 11 commits March 22, 2026 14:47

feat(tts): hold bubble during TTS playback and respect playback speed

963b2f2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(tts): teacher uses global lecture voice in discussion when no voi…

0e61845

…ceConfig override

fix(tts): teacher always uses global lecture voice, no overrides

db18945

fix(tts): sync playback speed to currently playing audio in real-time

880190e

cosarah reviewed Mar 23, 2026

View reviewed changes

wyuc and others added 4 commits March 23, 2026 14:38

fix(tts): restore teacher voice pill, respect voiceConfig override

0e756f4

fix(tts): sync volume and mute to discussion TTS audio in real-time

f489d5f

Merge branch 'main' into feat/discussion-tts

0dfe405

cosarah reviewed Mar 23, 2026

View reviewed changes

wyuc and others added 7 commits March 23, 2026 16:33

fix(tts): allow browser-native TTS alongside server providers

6f84b5f

fix(tts): remove top padding from voice popover content

7292dc7

fix(tts): make selectedAgents reactive to voiceConfig changes

460cbf2

fix(tts): use agents record instead of listAgents() to avoid infinite…

88fae23

… loop

Merge branch 'main' into feat/discussion-tts

a451872

cosarah approved these changes Mar 23, 2026

View reviewed changes

cosarah merged commit ddb5224 into main Mar 23, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Discussion TTS with per-agent voice assignment#211

feat: add Discussion TTS with per-agent voice assignment#211
cosarah merged 67 commits intomainfrom
feat/discussion-tts

wyuc commented Mar 22, 2026

Uh oh!

cosarah left a comment

Uh oh!

cosarah left a comment

Uh oh!

cosarah left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wyuc commented Mar 22, 2026

Summary

Files changed (17 files, +1027 -597)

Test plan

Uh oh!

cosarah left a comment

Choose a reason for hiding this comment

Code Review — feat: add Discussion TTS with per-agent voice assignment

整体评价

Strengths

Issues

Important

Minor

Assessment

Uh oh!

cosarah left a comment

Choose a reason for hiding this comment

Re-review after latest updates

Fixed since last review

Remaining issues

Worth documenting / deciding on

Minor

Verdict

Uh oh!

cosarah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants