Skip to content

feat: add Discussion TTS with per-agent voice assignment#211

Merged
cosarah merged 67 commits intomainfrom
feat/discussion-tts
Mar 23, 2026
Merged

feat: add Discussion TTS with per-agent voice assignment#211
cosarah merged 67 commits intomainfrom
feat/discussion-tts

Conversation

@wyuc
Copy link
Copy Markdown
Contributor

@wyuc wyuc commented Mar 22, 2026

Summary

Add TTS (text-to-speech) support to the discussion phase, enabling every agent to speak with distinct voices during classroom discussions.

  • Per-agent voice config: Each agent can have a different TTS provider + voice, configured via the AgentBar voice picker with preview
  • Discussion TTS playback: New useDiscussionTTS hook manages per-segment audio queue with ordered playback
  • Bubble hold: StreamBuffer waits for TTS audio to finish before advancing to next segment/agent
  • Audio indicator: Equalizer bar animation on roundtable bubble (amber = generating, agent color = playing)
  • Cross-provider: Agents can use different TTS providers (e.g., teacher on Qwen, student on OpenAI)
  • LLM voice selection: Auto-generated agents get voice matching their persona via LLM
  • Settings simplification: TTS settings reduced to toggle + provider config; voice config moved to AgentBar
  • Playback speed: Discussion TTS respects speed setting, switchable in real-time

Files changed (17 files, +1027 -597)

Area Files
Core lib/hooks/use-discussion-tts.ts (new), lib/audio/voice-resolver.ts (new), lib/buffer/stream-buffer.ts
UI components/agent/agent-bar.tsx, components/roundtable/audio-indicator.tsx (new), components/roundtable/index.tsx
Integration components/stage.tsx, components/chat/chat-area.tsx, components/chat/use-chat-sessions.ts
Settings components/settings/audio-settings.tsx, components/generation/media-popover.tsx, components/canvas/canvas-toolbar.tsx
Data lib/orchestration/registry/types.ts, lib/orchestration/registry/store.ts
Generation app/api/generate/agent-profiles/route.ts, app/generation-preview/page.tsx

Tracking: #39, #27, #109

Test plan

  • Preset mode: configure per-agent voices in AgentBar, verify each agent speaks with selected voice in discussion
  • Auto mode: generate a course, verify LLM assigns voices matching agent personas
  • Teacher voice: verify teacher uses the same voice in discussion as in lecture
  • Browser native TTS: verify it works when no server provider is configured
  • Audio indicator: verify amber bars during generation, agent-color bars during playback
  • Bubble hold: verify bubble stays until TTS finishes before switching to next segment
  • Speed control: change playback speed during discussion, verify audio speed changes immediately
  • TTS toggle: disable TTS in settings, verify no audio plays and voice pills show disabled state
  • Volume control: verify toolbar volume slider works during discussion playback
  • Preview: click speaker icon in voice picker, verify preview plays with course language text

🤖 Generated with Claude Code

wyuc and others added 30 commits March 21, 2026 13:47
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dtable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace native select styling with a compact rounded-full pill
that blends into the agent row. Remove border, use muted bg,
smaller text, and a custom chevron icon.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace native <select> with shadcn Select component for consistent UI
- Hide voice dropdown when TTS is muted (ttsMuted)
- Compact pill-style trigger with rounded-full, no border, muted bg

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show "音色: Alloy" instead of plain "Alloy" in the voice pill.
Always show dropdown regardless of mute state (voice config is
independent of playback).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Show a small Volume2 icon in the collapsed pill when voice config
is available, hinting that voice settings are inside.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move voice pill below agent name (second line) to prevent
  horizontal overflow in English
- Wrap Select in div with onPointerDown stopPropagation to fix
  Radix click-through to parent row
- Add line-clamp-1 to descriptions for consistent row height
- Use items-start instead of items-center for better multi-line alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Single-line layout: checkbox · avatar · name · role · voice pill
- Remove descriptions from agent rows (saves vertical space)
- Extract AgentVoicePill component to isolate Select event handling
- Smaller avatars (size-7), tighter row padding (py-1.5)
- Voice pill uses Volume2 icon + voice name (no prefix text)
- Works in both Chinese and English without overflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change voiceConfig from per-provider lookup to explicit
  { providerId, voiceId } per agent
- Each agent can use a different TTS provider's voice
- Voice picker dropdown groups voices by provider
- useDiscussionTTS routes TTS requests per agent's provider
- resolveAgentVoice falls back to global provider if no config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Give role badge fixed width (w-14 text-right) so role text
and voice pills align vertically across all rows regardless
of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wrap role badge + voice pill in a fixed-width container
(w-[9.5rem] justify-end) so both align vertically across
all agent rows regardless of name or role text length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add min-w-[52px] text-right to role badge so it starts at a
consistent position regardless of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace Radix Select with Popover + button list (fixes click issue)
- Fix getAvailableProvidersWithVoices to always include global provider
- Widen panel from w-80 to w-96 (prevents name truncation)
- Voice pill uses primary color instead of gray (more visible)
- Extract renderAgentRow helper to reduce duplication
- Popover shows voices grouped by provider with active state
- Add findVoiceDisplayName utility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Voice resolution now only depends on available providers (those with
API keys or server-configured). No more globalProviderId parameter.
Fallback is first available provider, then browser-native-tts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Load speechSynthesis.getVoices() in AgentBar and include as a
"Browser Native" provider group in the voice popover. No API key
needed - always available if browser supports it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Toolbar:
- Replace volume slider with simple TTS on/off toggle button
- Remove ttsMuted/ttsVolume/onVolumeChange props from CanvasToolbar
- Toggle now controls ttsEnabled (not ttsMuted)

AgentBar:
- Collapsed: show VolumeX icon when TTS disabled
- Voice pills show disabled state (gray, cursor-not-allowed, no popover)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove voice selection, speed slider, preview/test, Azure locale
filter from Settings TTS tab. Voice is now per-agent in AgentBar.
Keep: on/off toggle, provider selector, API key + base URL config.
Add hint text pointing to AgentBar for voice configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wyuc and others added 11 commits March 22, 2026 14:47
When buffer drains (text=null) but audio indicator is still active,
don't clear liveSpeech. Clear it only when audio state goes idle.
This keeps the speech bubble visible until TTS finishes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When StreamBuffer fires the done signal (onLiveSpeech null), Stage
now checks if TTS is still playing. If so, it defers clearing the
bubble state. The bubble stays visible until onAllAudioEnd fires
from the TTS hook (queue empty + nothing playing), then clears.

This prevents the jarring UX where the bubble disappears while
the agent's voice is still audible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: bubble disappears because doSessionCleanup fires via
onStopSession when the agent loop ends naturally, NOT because of
onLiveSpeech(null, null).

Fix: when onStopSession fires and TTS is still playing, defer
doSessionCleanup to onAllAudioEnd callback. Manual stop (user
presses button) still cleans up immediately via handleStopDiscussion.

Use doSessionCleanupRef to avoid circular dependency between
discussionTTS hook and doSessionCleanup useCallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two paths clear the bubble:
1. onLiveSpeech(null, null) from StreamBuffer done → clears liveSpeech
2. onStopSession → doSessionCleanup → clears all state

Both fire when agent loop ends. Path 1 fires first (tick loop),
path 2 fires after (waitUntilDrained resolves). Both must be
guarded when TTS is still playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Client sends available voices (providerId + voiceId + name) to
  /api/generate/agent-profiles
- LLM prompt asks to pick a voice matching each agent's personality
- Parse "providerId::voiceId" from response, save as voiceConfig
- Fallback to index-based assignment if LLM doesn't pick
- Browser native voices hidden when server providers are available
- saveGeneratedAgents accepts and persists voiceConfig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert the toolbar simplification from 36e3997 that replaced the
volume slider with a TTS on/off toggle. The volume control with
hover slider is a core classroom UX. TTS on/off is controlled via
Settings and Media popover instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Issue 2: enabled flag now checks ttsEnabled && !ttsMuted in stage.tsx
- Issue 4: remove unused browserAvailableVoices from useDiscussionTTS
- Issue 5: remove dead code in audio-settings.tsx (Slider, Loader2, handleTTSVoiceChange, handleTTSSpeedChange, handleTestTTS, testingTTS, ttsTestStatus, ttsTestMessage, testText, ttsSpeed, setTTSSpeed, and unused browser-tts-preview imports)
- Issue 6: shouldHold now checks queue length in addition to isPlayingRef
- Issue 8: hide AgentVoicePill for teacher row in agent-bar.tsx (teacher voice is controlled in Settings)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@cosarah cosarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — feat: add Discussion TTS with per-agent voice assignment

整体评价

很不错的 PR,架构清晰,功能完整。per-agent voice 的设计思路合理——通过 voiceConfig 持久化到 agent registry,fallback 到 deterministic index-based 选择。useDiscussionTTS hook 和 StreamBuffer 的 shouldHoldAfterReveal 集成干净。Settings 页面精简后把 voice 配置移到 AgentBar 是正确的方向。

Strengths

  • 架构分层清晰voice-resolver.ts 封装了 voice 解析逻辑,useDiscussionTTS 封装了队列+播放,audio-indicator.tsx 封装了可视化——每层职责明确
  • StreamBuffer hold 机制:通过 shouldHoldAfterReveal callback 实现 TTS 等待,不侵入 buffer 核心逻辑,只增加了一个可选回调。非常干净
  • Playback speed 实时同步useEffect 监听 playbackSpeed 并直接同步到 audioRef.current.playbackRate,用户体验好
  • i18n 覆盖完整:新增的文案都有中英文对照
  • Settings 精简:audio-settings.tsx 删掉了 ~400 行冗余的 voice picker UI,改为指引到 AgentBar——减少了维护面

Issues

Important

  1. handlePreview 中 server TTS 请求缺少 abort 机制

    • components/agent/agent-bar.tsx:98-124
    • 当用户快速切换 voice preview 时,stopPreview() 只停止了 browser TTS 和已创建的 Audio,但 server fetch 请求没有被 abort。如果网络慢,多个 fetch 可能并发,旧的响应可能覆盖新的状态
    • 建议:加一个 AbortController,在 stopPreview 中 abort
  2. processQueue 中 error handler 的递归风险

    • lib/hooks/use-discussion-tts.ts:150-163
    • audio.error 和 catch block 中都调用 processQueueRef.current()。如果队列中连续多个 item 都触发 error(比如 API key 失效),会形成快速递归调用链
    • 建议:用 queueMicrotasksetTimeout(…, 0) 延迟调用 processQueueRef.current(),避免同步递归栈溢出
  3. resolveAgentVoicevoiceConfig 校验过于严格

    • lib/audio/voice-resolver.ts:22-26
    • getServerVoiceListbrowser-native-tts 返回空数组,导致 agent 配置了 browser-native-tts voice 时会 fallback 到 deterministic 选择,丢失用户配置
    • 建议:对 browser-native-tts 直接返回 voiceConfig 而不走 getServerVoiceList 校验

Minor

  1. Roundtable 中直接调用 useAgentRegistry.getState() 在 render 中

    • components/roundtable/index.tsx:1000-1004, components/roundtable/index.tsx:504-506
    • 多处在 JSX render 函数内直接调用 useAgentRegistry.getState().getAgent(…),这不会触发 re-render。目前因为父组件传入的 props 会触发重渲染所以碰巧能工作,但不够 robust
    • 建议:用 useAgentRegistry((s) => s.getAgent(id)) 或提前在组件顶部解析
  2. agentIndexMap 可能出现 stale 引用

    • lib/hooks/use-discussion-tts.ts:57-62
    • agentIndexMap 是个 ref,通过 useEffect 更新。如果 agents 变化后 resolveVoiceForAgent 在同一个渲染周期内被调用,可能读到旧的 map
    • 影响较小(agents 列表变化不频繁),但可以考虑用 useMemo 替代
  3. AgentVoicePill 的 preview 文案硬编码

    • components/agent/agent-bar.tsx:82-83
    • 'Welcome to AI Classroom''欢迎来到AI课堂' 硬编码,未走 i18n
    • 建议移到 i18n strings 中
  4. sealLastText 中的 onSegmentSealed 回调使用 this.currentAgentId

    • lib/buffer/stream-buffer.ts:420
    • 如果 seal 发生在 agent_end 之后(push 顺序问题),currentAgentId 可能已经改变。当前 push 流程中 sealLastTextpushAgentEnd 之前被调用所以没问题,但这个隐含依赖不够明显
    • 建议加个注释说明 seal 的 ordering invariant

Assessment

Ready to merge: With fixes

核心架构扎实,Important #1(preview abort)和 #2(递归风险)建议修复后合并。#3 可以作为 follow-up。Minor issues 不影响功能正确性。

wyuc and others added 4 commits March 23, 2026 14:38
1. Add AbortController to voice preview server TTS fetch, abort on
   stopPreview to prevent stale responses on rapid switching
2. Use queueMicrotask for processQueue calls in error/ended handlers
   to prevent synchronous recursion if multiple items fail consecutively
3. Add ordering invariant comment on sealLastText's onSegmentSealed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@cosarah cosarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after latest updates

Good improvements since the last round — preview abort controller, queueMicrotask for error recovery, volume sync, teacher voice pill, buffer-level pause with spacebar shortcut, and the sealLastText ordering comment. Here's where things stand.

Fixed since last review

  • Preview abort: previewAbortRef added to AgentVoicePill, stopPreview aborts in-flight fetch.
  • Recursive queue drain: error/ended handlers use queueMicrotask() instead of direct calls.
  • Teacher voice pill: Teacher row in AgentBar now renders AgentVoicePill.
  • sealLastText ordering: Comment explains why this.currentAgentId is safe.
  • Volume sync: useDiscussionTTS respects ttsVolume/ttsMuted changes in real-time.
  • Buffer-level pause: pauseActiveLiveBuffer/resumeActiveLiveBuffer with livePausedRef sticky intent that survives buffer recreation across turns. Spacebar shortcut in Roundtable is a nice UX touch.

Remaining issues

Worth documenting / deciding on

  1. Browser-native TTS is invisible when any server provider is configured

    Two things going on here:

    In agent-bar.tsx:266-279, availableProviders is built with an either/or approach — if getAvailableProvidersWithVoices() returns any server providers, browser-native voices are excluded entirely. Users can't pick a browser voice for any agent as long as they have at least one server TTS provider configured with an API key. Browser voices only appear as a fallback when zero server providers are available.

    Separately, in voice-resolver.ts:21-26, resolveAgentVoice validates a saved voiceConfig by checking getServerVoiceList(providerId), which returns [] for browser-native-tts (browser voices are dynamic, not in the static registry). So if a user previously selected a browser voice (while no server providers were configured), then later adds a server provider, the saved browser voiceConfig silently fails validation and falls through to the deterministic server-voice fallback.

    Not necessarily a bug if the intent is "browser-native is purely a degraded fallback", but worth calling out since the behavior is non-obvious. If mixed mode (some agents on server TTS, some on browser) should be supported in the future, both places need changes.

Minor

  1. useAgentRegistry.getState() called inside render bodies

    • components/roundtable/index.tsx — multiple places (AudioIndicator color, HoverCard content, student loop, ProactiveCard)
    • getState() reads imperatively without subscribing — works today because parent prop changes trigger re-renders, but would break if those subtrees get memoized later. Not urgent since agent config rarely changes mid-session.
  2. Preview text not i18n'd

    • components/agent/agent-bar.tsx:83-86 — hardcoded 'Welcome to AI Classroom' / '欢迎来到AI课堂' with a direct localStorage read for generationLanguage. Bypasses the i18n system.
  3. agentIndexMap ref could go stale within a render

    • lib/hooks/use-discussion-tts.ts:57-62 — ref updated via useEffect (runs after render). If agents changes and resolveVoiceForAgent fires in the same render cycle, it reads the old map. Unlikely in practice since agents rarely change, but useMemo would be strictly correct.

Verdict

Ready to merge. The browser-native TTS behavior (#1) is worth a design decision but isn't blocking — it works fine as a fallback-only mode, just needs to be an intentional choice rather than an accident. The rest are minor cleanup items for follow-ups.

wyuc and others added 7 commits March 23, 2026 16:33
Teacher voice pill now reads/writes global ttsProviderId + ttsVoice
(same settings used by lecture TTS). This ensures lecture and
discussion always use the same teacher voice. Student agents still
use per-agent voiceConfig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each avatar now has a one-line description (appearance, vibe) sent
to the agent-profiles generation API. LLM picks avatars matching
agent personality instead of guessing from file paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@cosarah cosarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cosarah cosarah merged commit ddb5224 into main Mar 23, 2026
2 checks passed
ifishcool pushed a commit to ifishcool/Linksy that referenced this pull request Mar 24, 2026
* feat(tts): add resolveVoice() and getServerVoiceList() utilities

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add AudioIndicator equalizer bars component

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add onSegmentSealed callback to StreamBuffer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add voiceOverrides field to AgentConfig and AgentTemplate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add useDiscussionTTS hook with audio queue and cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add audio state indicator to Roundtable bubble

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): wire onSegmentSealed callback through chat sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add per-agent voice dropdown to AgentBar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): integrate useDiscussionTTS in Stage and pass state to Roundtable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): refine voice dropdown to pill-style selector

Replace native select styling with a compact rounded-full pill
that blends into the agent row. Remove border, use muted bg,
smaller text, and a custom chevron icon.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): use shadcn Select for voice dropdown, link with TTS toggle

- Replace native <select> with shadcn Select component for consistent UI
- Hide voice dropdown when TTS is muted (ttsMuted)
- Compact pill-style trigger with rounded-full, no border, muted bg

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): add voice label prefix and always show dropdown

Show "音色: Alloy" instead of plain "Alloy" in the voice pill.
Always show dropdown regardless of mute state (voice config is
independent of playback).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): add volume icon hint in collapsed AgentBar

Show a small Volume2 icon in the collapsed pill when voice config
is available, hinting that voice settings are inside.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix voice dropdown layout and click handling

- Move voice pill below agent name (second line) to prevent
  horizontal overflow in English
- Wrap Select in div with onPointerDown stopPropagation to fix
  Radix click-through to parent row
- Add line-clamp-1 to descriptions for consistent row height
- Use items-start instead of items-center for better multi-line alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): redesign AgentBar voice layout for compactness

- Single-line layout: checkbox · avatar · name · role · voice pill
- Remove descriptions from agent rows (saves vertical space)
- Extract AgentVoicePill component to isolate Select event handling
- Smaller avatars (size-7), tighter row padding (py-1.5)
- Voice pill uses Volume2 icon + voice name (no prefix text)
- Works in both Chinese and English without overflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): cross-provider voice selection per agent

- Change voiceConfig from per-provider lookup to explicit
  { providerId, voiceId } per agent
- Each agent can use a different TTS provider's voice
- Voice picker dropdown groups voices by provider
- useDiscussionTTS routes TTS requests per agent's provider
- resolveAgentVoice falls back to global provider if no config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): align role badge and voice pill across agent rows

Give role badge fixed width (w-14 text-right) so role text
and voice pills align vertically across all rows regardless
of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix role badge and voice pill alignment

Wrap role badge + voice pill in a fixed-width container
(w-[9.5rem] justify-end) so both align vertically across
all agent rows regardless of name or role text length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): align role badge and voice pill across agent rows

Add min-w-[52px] text-right to role badge so it starts at a
consistent position regardless of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): use fixed w-[60px] for role badge alignment

* fix(tts): use fixed w-[88px] for voice pill alignment

* fix(tts): prevent click-outside from closing AgentBar when Select portal is open

* fix(tts): comprehensive voice picker rewrite

- Replace Radix Select with Popover + button list (fixes click issue)
- Fix getAvailableProvidersWithVoices to always include global provider
- Widen panel from w-80 to w-96 (prevents name truncation)
- Voice pill uses primary color instead of gray (more visible)
- Extract renderAgentRow helper to reduce duplication
- Popover shows voices grouped by provider with active state
- Add findVoiceDisplayName utility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): align voice provider availability with toolbar logic

* fix(tts): fallback to first available provider when global provider has no voices

* refactor(tts): remove global provider fallback from voice resolution

Voice resolution now only depends on available providers (those with
API keys or server-configured). No more globalProviderId parameter.
Fallback is first available provider, then browser-native-tts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add browser native TTS voices to agent voice picker

Load speechSynthesis.getVoices() in AgentBar and include as a
"Browser Native" provider group in the voice popover. No API key
needed - always available if browser supports it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): simplify toolbar TTS to on/off toggle, add disabled state

Toolbar:
- Replace volume slider with simple TTS on/off toggle button
- Remove ttsMuted/ttsVolume/onVolumeChange props from CanvasToolbar
- Toggle now controls ttsEnabled (not ttsMuted)

AgentBar:
- Collapsed: show VolumeX icon when TTS disabled
- Voice pills show disabled state (gray, cursor-not-allowed, no popover)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): simplify Settings TTS tab to toggle + provider config

Remove voice selection, speed slider, preview/test, Azure locale
filter from Settings TTS tab. Voice is now per-agent in AgentBar.
Keep: on/off toggle, provider selector, API key + base URL config.
Add hint text pointing to AgentBar for voice configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): simplify media popover TTS tab to toggle only

* fix(tts): add voice config hint to media popover TTS tab

* feat(tts): add per-voice preview button in voice picker

Each voice row in the popover has a small speaker icon button.
Click to preview the voice with "欢迎来到AI课堂" / "Welcome to
AI Classroom" (follows i18n). Browser native uses Web Speech API,
server TTS calls /api/generate/tts. Click again or close popover
to stop. Shows spinner while generating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): preview text follows course language instead of UI language

* refactor(tts): redesign AgentBar expanded panel layout

- Teacher always at top with voice pill (works in both modes)
- Mode tabs moved below teacher
- Auto mode: single compact row with shuffle icon + description
- Max turns: compact inline row with smaller input
- Preset mode: only student agents listed (teacher already above)
- Remove large shuffle animation from auto mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): merge max turns into teacher row

* refactor(tts): separate teacher row and max turns, use stepper UI

- Teacher row: avatar + name + voice pill only
- Max turns: bottom row with MessageSquare icon + compact stepper
  (minus/number/plus in a rounded pill)
- Remove Input component dependency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): increase voice pill contrast in dark mode

* fix(tts): make max turns input editable, tighten panel padding

* fix(tts): restore shuffle animation in auto mode (compact version)

* fix(tts): adjust auto mode text spacing and add voice auto-assign hint

* fix(tts): auto-close voice popover after selecting a voice

* fix(tts): increase auto mode vertical padding for better balance

* fix(tts): push auto mode text toward bottom with flex spacer

* fix(tts): reduce auto mode bottom padding

* feat(tts): wait for TTS audio to finish before next agent turn

Add waitForDrain() to useDiscussionTTS that returns a promise
resolving when the audio queue is empty. The agent loop in
useChatSessions now awaits this after buffer drain, so the next
agent's turn doesn't start until the current agent's TTS audio
finishes playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): keep bubble visible while TTS audio is still playing

When buffer drains (text=null) but audio indicator is still active,
don't clear liveSpeech. Clear it only when audio state goes idle.
This keeps the speech bubble visible until TTS finishes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): hold discussion bubble until TTS audio finishes

When StreamBuffer fires the done signal (onLiveSpeech null), Stage
now checks if TTS is still playing. If so, it defers clearing the
bubble state. The bubble stays visible until onAllAudioEnd fires
from the TTS hook (queue empty + nothing playing), then clears.

This prevents the jarring UX where the bubble disappears while
the agent's voice is still audible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix bubble hold - guard onStopSession instead of onLiveSpeech

Root cause: bubble disappears because doSessionCleanup fires via
onStopSession when the agent loop ends naturally, NOT because of
onLiveSpeech(null, null).

Fix: when onStopSession fires and TTS is still playing, defer
doSessionCleanup to onAllAudioEnd callback. Manual stop (user
presses button) still cleans up immediately via handleStopDiscussion.

Use doSessionCleanupRef to avoid circular dependency between
discussionTTS hook and doSessionCleanup useCallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): guard BOTH onLiveSpeech and onStopSession for bubble hold

Two paths clear the bubble:
1. onLiveSpeech(null, null) from StreamBuffer done → clears liveSpeech
2. onStopSession → doSessionCleanup → clears all state

Both fire when agent loop ends. Path 1 fires first (tick loop),
path 2 fires after (waitUntilDrained resolves). Both must be
guarded when TTS is still playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): hold bubble during TTS playback and respect playback speed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): LLM picks voice matching agent persona during generation

- Client sends available voices (providerId + voiceId + name) to
  /api/generate/agent-profiles
- LLM prompt asks to pick a voice matching each agent's personality
- Parse "providerId::voiceId" from response, save as voiceConfig
- Fallback to index-based assignment if LLM doesn't pick
- Browser native voices hidden when server providers are available
- saveGeneratedAgents accepts and persists voiceConfig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): restore volume slider in classroom toolbar

Revert the toolbar simplification from 36e3997 that replaced the
volume slider with a TTS on/off toggle. The volume control with
hover slider is a core classroom UX. TTS on/off is controlled via
Settings and Media popover instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): teacher uses global lecture voice in discussion when no voiceConfig override

* fix(tts): teacher always uses global lecture voice, no overrides

* fix(tts): sync playback speed to currently playing audio in real-time

* fix(tts): address code review issues

- Issue 2: enabled flag now checks ttsEnabled && !ttsMuted in stage.tsx
- Issue 4: remove unused browserAvailableVoices from useDiscussionTTS
- Issue 5: remove dead code in audio-settings.tsx (Slider, Loader2, handleTTSVoiceChange, handleTTSSpeedChange, handleTestTTS, testingTTS, ttsTestStatus, ttsTestMessage, testText, ttsSpeed, setTTSSpeed, and unused browser-tts-preview imports)
- Issue 6: shouldHold now checks queue length in addition to isPlayingRef
- Issue 8: hide AgentVoicePill for teacher row in agent-bar.tsx (teacher voice is controlled in Settings)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tts): address PR review — abort preview fetch, defer error recovery

1. Add AbortController to voice preview server TTS fetch, abort on
   stopPreview to prevent stale responses on rapid switching
2. Use queueMicrotask for processQueue calls in error/ended handlers
   to prevent synchronous recursion if multiple items fail consecutively
3. Add ordering invariant comment on sealLastText's onSegmentSealed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): restore teacher voice pill, respect voiceConfig override

* fix(tts): sync volume and mute to discussion TTS audio in real-time

* fix(tts): allow browser-native TTS alongside server providers

* fix(tts): remove top padding from voice popover content

* fix(tts): make selectedAgents reactive to voiceConfig changes

* fix(tts): use agents record instead of listAgents() to avoid infinite loop

* fix(tts): single source of truth for teacher voice

Teacher voice pill now reads/writes global ttsProviderId + ttsVoice
(same settings used by lecture TTS). This ensures lecture and
discussion always use the same teacher voice. Student agents still
use per-agent voiceConfig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add avatar descriptions for smarter LLM avatar selection

Each avatar now has a one-line description (appearance, vibe) sent
to the agent-profiles generation API. LLM picks avatars matching
agent personality instead of guessing from file paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: 杨慎 <117187635+cosarah@users.noreply.github.com>
ifishcool pushed a commit to ifishcool/Linksy that referenced this pull request Mar 24, 2026
* feat(tts): add resolveVoice() and getServerVoiceList() utilities

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add AudioIndicator equalizer bars component

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add onSegmentSealed callback to StreamBuffer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add voiceOverrides field to AgentConfig and AgentTemplate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add useDiscussionTTS hook with audio queue and cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add audio state indicator to Roundtable bubble

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): wire onSegmentSealed callback through chat sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add per-agent voice dropdown to AgentBar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): integrate useDiscussionTTS in Stage and pass state to Roundtable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): refine voice dropdown to pill-style selector

Replace native select styling with a compact rounded-full pill
that blends into the agent row. Remove border, use muted bg,
smaller text, and a custom chevron icon.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): use shadcn Select for voice dropdown, link with TTS toggle

- Replace native <select> with shadcn Select component for consistent UI
- Hide voice dropdown when TTS is muted (ttsMuted)
- Compact pill-style trigger with rounded-full, no border, muted bg

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): add voice label prefix and always show dropdown

Show "音色: Alloy" instead of plain "Alloy" in the voice pill.
Always show dropdown regardless of mute state (voice config is
independent of playback).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): add volume icon hint in collapsed AgentBar

Show a small Volume2 icon in the collapsed pill when voice config
is available, hinting that voice settings are inside.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix voice dropdown layout and click handling

- Move voice pill below agent name (second line) to prevent
  horizontal overflow in English
- Wrap Select in div with onPointerDown stopPropagation to fix
  Radix click-through to parent row
- Add line-clamp-1 to descriptions for consistent row height
- Use items-start instead of items-center for better multi-line alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): redesign AgentBar voice layout for compactness

- Single-line layout: checkbox · avatar · name · role · voice pill
- Remove descriptions from agent rows (saves vertical space)
- Extract AgentVoicePill component to isolate Select event handling
- Smaller avatars (size-7), tighter row padding (py-1.5)
- Voice pill uses Volume2 icon + voice name (no prefix text)
- Works in both Chinese and English without overflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): cross-provider voice selection per agent

- Change voiceConfig from per-provider lookup to explicit
  { providerId, voiceId } per agent
- Each agent can use a different TTS provider's voice
- Voice picker dropdown groups voices by provider
- useDiscussionTTS routes TTS requests per agent's provider
- resolveAgentVoice falls back to global provider if no config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): align role badge and voice pill across agent rows

Give role badge fixed width (w-14 text-right) so role text
and voice pills align vertically across all rows regardless
of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix role badge and voice pill alignment

Wrap role badge + voice pill in a fixed-width container
(w-[9.5rem] justify-end) so both align vertically across
all agent rows regardless of name or role text length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): align role badge and voice pill across agent rows

Add min-w-[52px] text-right to role badge so it starts at a
consistent position regardless of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): use fixed w-[60px] for role badge alignment

* fix(tts): use fixed w-[88px] for voice pill alignment

* fix(tts): prevent click-outside from closing AgentBar when Select portal is open

* fix(tts): comprehensive voice picker rewrite

- Replace Radix Select with Popover + button list (fixes click issue)
- Fix getAvailableProvidersWithVoices to always include global provider
- Widen panel from w-80 to w-96 (prevents name truncation)
- Voice pill uses primary color instead of gray (more visible)
- Extract renderAgentRow helper to reduce duplication
- Popover shows voices grouped by provider with active state
- Add findVoiceDisplayName utility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): align voice provider availability with toolbar logic

* fix(tts): fallback to first available provider when global provider has no voices

* refactor(tts): remove global provider fallback from voice resolution

Voice resolution now only depends on available providers (those with
API keys or server-configured). No more globalProviderId parameter.
Fallback is first available provider, then browser-native-tts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add browser native TTS voices to agent voice picker

Load speechSynthesis.getVoices() in AgentBar and include as a
"Browser Native" provider group in the voice popover. No API key
needed - always available if browser supports it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): simplify toolbar TTS to on/off toggle, add disabled state

Toolbar:
- Replace volume slider with simple TTS on/off toggle button
- Remove ttsMuted/ttsVolume/onVolumeChange props from CanvasToolbar
- Toggle now controls ttsEnabled (not ttsMuted)

AgentBar:
- Collapsed: show VolumeX icon when TTS disabled
- Voice pills show disabled state (gray, cursor-not-allowed, no popover)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): simplify Settings TTS tab to toggle + provider config

Remove voice selection, speed slider, preview/test, Azure locale
filter from Settings TTS tab. Voice is now per-agent in AgentBar.
Keep: on/off toggle, provider selector, API key + base URL config.
Add hint text pointing to AgentBar for voice configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): simplify media popover TTS tab to toggle only

* fix(tts): add voice config hint to media popover TTS tab

* feat(tts): add per-voice preview button in voice picker

Each voice row in the popover has a small speaker icon button.
Click to preview the voice with "欢迎来到AI课堂" / "Welcome to
AI Classroom" (follows i18n). Browser native uses Web Speech API,
server TTS calls /api/generate/tts. Click again or close popover
to stop. Shows spinner while generating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): preview text follows course language instead of UI language

* refactor(tts): redesign AgentBar expanded panel layout

- Teacher always at top with voice pill (works in both modes)
- Mode tabs moved below teacher
- Auto mode: single compact row with shuffle icon + description
- Max turns: compact inline row with smaller input
- Preset mode: only student agents listed (teacher already above)
- Remove large shuffle animation from auto mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): merge max turns into teacher row

* refactor(tts): separate teacher row and max turns, use stepper UI

- Teacher row: avatar + name + voice pill only
- Max turns: bottom row with MessageSquare icon + compact stepper
  (minus/number/plus in a rounded pill)
- Remove Input component dependency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): increase voice pill contrast in dark mode

* fix(tts): make max turns input editable, tighten panel padding

* fix(tts): restore shuffle animation in auto mode (compact version)

* fix(tts): adjust auto mode text spacing and add voice auto-assign hint

* fix(tts): auto-close voice popover after selecting a voice

* fix(tts): increase auto mode vertical padding for better balance

* fix(tts): push auto mode text toward bottom with flex spacer

* fix(tts): reduce auto mode bottom padding

* feat(tts): wait for TTS audio to finish before next agent turn

Add waitForDrain() to useDiscussionTTS that returns a promise
resolving when the audio queue is empty. The agent loop in
useChatSessions now awaits this after buffer drain, so the next
agent's turn doesn't start until the current agent's TTS audio
finishes playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): keep bubble visible while TTS audio is still playing

When buffer drains (text=null) but audio indicator is still active,
don't clear liveSpeech. Clear it only when audio state goes idle.
This keeps the speech bubble visible until TTS finishes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): hold discussion bubble until TTS audio finishes

When StreamBuffer fires the done signal (onLiveSpeech null), Stage
now checks if TTS is still playing. If so, it defers clearing the
bubble state. The bubble stays visible until onAllAudioEnd fires
from the TTS hook (queue empty + nothing playing), then clears.

This prevents the jarring UX where the bubble disappears while
the agent's voice is still audible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix bubble hold - guard onStopSession instead of onLiveSpeech

Root cause: bubble disappears because doSessionCleanup fires via
onStopSession when the agent loop ends naturally, NOT because of
onLiveSpeech(null, null).

Fix: when onStopSession fires and TTS is still playing, defer
doSessionCleanup to onAllAudioEnd callback. Manual stop (user
presses button) still cleans up immediately via handleStopDiscussion.

Use doSessionCleanupRef to avoid circular dependency between
discussionTTS hook and doSessionCleanup useCallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): guard BOTH onLiveSpeech and onStopSession for bubble hold

Two paths clear the bubble:
1. onLiveSpeech(null, null) from StreamBuffer done → clears liveSpeech
2. onStopSession → doSessionCleanup → clears all state

Both fire when agent loop ends. Path 1 fires first (tick loop),
path 2 fires after (waitUntilDrained resolves). Both must be
guarded when TTS is still playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): hold bubble during TTS playback and respect playback speed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): LLM picks voice matching agent persona during generation

- Client sends available voices (providerId + voiceId + name) to
  /api/generate/agent-profiles
- LLM prompt asks to pick a voice matching each agent's personality
- Parse "providerId::voiceId" from response, save as voiceConfig
- Fallback to index-based assignment if LLM doesn't pick
- Browser native voices hidden when server providers are available
- saveGeneratedAgents accepts and persists voiceConfig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): restore volume slider in classroom toolbar

Revert the toolbar simplification from 36e3997 that replaced the
volume slider with a TTS on/off toggle. The volume control with
hover slider is a core classroom UX. TTS on/off is controlled via
Settings and Media popover instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): teacher uses global lecture voice in discussion when no voiceConfig override

* fix(tts): teacher always uses global lecture voice, no overrides

* fix(tts): sync playback speed to currently playing audio in real-time

* fix(tts): address code review issues

- Issue 2: enabled flag now checks ttsEnabled && !ttsMuted in stage.tsx
- Issue 4: remove unused browserAvailableVoices from useDiscussionTTS
- Issue 5: remove dead code in audio-settings.tsx (Slider, Loader2, handleTTSVoiceChange, handleTTSSpeedChange, handleTestTTS, testingTTS, ttsTestStatus, ttsTestMessage, testText, ttsSpeed, setTTSSpeed, and unused browser-tts-preview imports)
- Issue 6: shouldHold now checks queue length in addition to isPlayingRef
- Issue 8: hide AgentVoicePill for teacher row in agent-bar.tsx (teacher voice is controlled in Settings)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tts): address PR review — abort preview fetch, defer error recovery

1. Add AbortController to voice preview server TTS fetch, abort on
   stopPreview to prevent stale responses on rapid switching
2. Use queueMicrotask for processQueue calls in error/ended handlers
   to prevent synchronous recursion if multiple items fail consecutively
3. Add ordering invariant comment on sealLastText's onSegmentSealed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): restore teacher voice pill, respect voiceConfig override

* fix(tts): sync volume and mute to discussion TTS audio in real-time

* fix(tts): allow browser-native TTS alongside server providers

* fix(tts): remove top padding from voice popover content

* fix(tts): make selectedAgents reactive to voiceConfig changes

* fix(tts): use agents record instead of listAgents() to avoid infinite loop

* fix(tts): single source of truth for teacher voice

Teacher voice pill now reads/writes global ttsProviderId + ttsVoice
(same settings used by lecture TTS). This ensures lecture and
discussion always use the same teacher voice. Student agents still
use per-agent voiceConfig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add avatar descriptions for smarter LLM avatar selection

Each avatar now has a one-line description (appearance, vibe) sent
to the agent-profiles generation API. LLM picks avatars matching
agent personality instead of guessing from file paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: 杨慎 <117187635+cosarah@users.noreply.github.com>
ifishcool pushed a commit to ifishcool/Linksy that referenced this pull request Mar 24, 2026
* feat(tts): add resolveVoice() and getServerVoiceList() utilities

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add AudioIndicator equalizer bars component

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add onSegmentSealed callback to StreamBuffer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add voiceOverrides field to AgentConfig and AgentTemplate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add useDiscussionTTS hook with audio queue and cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add audio state indicator to Roundtable bubble

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): wire onSegmentSealed callback through chat sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add per-agent voice dropdown to AgentBar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): integrate useDiscussionTTS in Stage and pass state to Roundtable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): refine voice dropdown to pill-style selector

Replace native select styling with a compact rounded-full pill
that blends into the agent row. Remove border, use muted bg,
smaller text, and a custom chevron icon.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): use shadcn Select for voice dropdown, link with TTS toggle

- Replace native <select> with shadcn Select component for consistent UI
- Hide voice dropdown when TTS is muted (ttsMuted)
- Compact pill-style trigger with rounded-full, no border, muted bg

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): add voice label prefix and always show dropdown

Show "音色: Alloy" instead of plain "Alloy" in the voice pill.
Always show dropdown regardless of mute state (voice config is
independent of playback).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): add volume icon hint in collapsed AgentBar

Show a small Volume2 icon in the collapsed pill when voice config
is available, hinting that voice settings are inside.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix voice dropdown layout and click handling

- Move voice pill below agent name (second line) to prevent
  horizontal overflow in English
- Wrap Select in div with onPointerDown stopPropagation to fix
  Radix click-through to parent row
- Add line-clamp-1 to descriptions for consistent row height
- Use items-start instead of items-center for better multi-line alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): redesign AgentBar voice layout for compactness

- Single-line layout: checkbox · avatar · name · role · voice pill
- Remove descriptions from agent rows (saves vertical space)
- Extract AgentVoicePill component to isolate Select event handling
- Smaller avatars (size-7), tighter row padding (py-1.5)
- Voice pill uses Volume2 icon + voice name (no prefix text)
- Works in both Chinese and English without overflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): cross-provider voice selection per agent

- Change voiceConfig from per-provider lookup to explicit
  { providerId, voiceId } per agent
- Each agent can use a different TTS provider's voice
- Voice picker dropdown groups voices by provider
- useDiscussionTTS routes TTS requests per agent's provider
- resolveAgentVoice falls back to global provider if no config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): align role badge and voice pill across agent rows

Give role badge fixed width (w-14 text-right) so role text
and voice pills align vertically across all rows regardless
of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix role badge and voice pill alignment

Wrap role badge + voice pill in a fixed-width container
(w-[9.5rem] justify-end) so both align vertically across
all agent rows regardless of name or role text length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): align role badge and voice pill across agent rows

Add min-w-[52px] text-right to role badge so it starts at a
consistent position regardless of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): use fixed w-[60px] for role badge alignment

* fix(tts): use fixed w-[88px] for voice pill alignment

* fix(tts): prevent click-outside from closing AgentBar when Select portal is open

* fix(tts): comprehensive voice picker rewrite

- Replace Radix Select with Popover + button list (fixes click issue)
- Fix getAvailableProvidersWithVoices to always include global provider
- Widen panel from w-80 to w-96 (prevents name truncation)
- Voice pill uses primary color instead of gray (more visible)
- Extract renderAgentRow helper to reduce duplication
- Popover shows voices grouped by provider with active state
- Add findVoiceDisplayName utility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): align voice provider availability with toolbar logic

* fix(tts): fallback to first available provider when global provider has no voices

* refactor(tts): remove global provider fallback from voice resolution

Voice resolution now only depends on available providers (those with
API keys or server-configured). No more globalProviderId parameter.
Fallback is first available provider, then browser-native-tts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add browser native TTS voices to agent voice picker

Load speechSynthesis.getVoices() in AgentBar and include as a
"Browser Native" provider group in the voice popover. No API key
needed - always available if browser supports it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): simplify toolbar TTS to on/off toggle, add disabled state

Toolbar:
- Replace volume slider with simple TTS on/off toggle button
- Remove ttsMuted/ttsVolume/onVolumeChange props from CanvasToolbar
- Toggle now controls ttsEnabled (not ttsMuted)

AgentBar:
- Collapsed: show VolumeX icon when TTS disabled
- Voice pills show disabled state (gray, cursor-not-allowed, no popover)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): simplify Settings TTS tab to toggle + provider config

Remove voice selection, speed slider, preview/test, Azure locale
filter from Settings TTS tab. Voice is now per-agent in AgentBar.
Keep: on/off toggle, provider selector, API key + base URL config.
Add hint text pointing to AgentBar for voice configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): simplify media popover TTS tab to toggle only

* fix(tts): add voice config hint to media popover TTS tab

* feat(tts): add per-voice preview button in voice picker

Each voice row in the popover has a small speaker icon button.
Click to preview the voice with "欢迎来到AI课堂" / "Welcome to
AI Classroom" (follows i18n). Browser native uses Web Speech API,
server TTS calls /api/generate/tts. Click again or close popover
to stop. Shows spinner while generating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): preview text follows course language instead of UI language

* refactor(tts): redesign AgentBar expanded panel layout

- Teacher always at top with voice pill (works in both modes)
- Mode tabs moved below teacher
- Auto mode: single compact row with shuffle icon + description
- Max turns: compact inline row with smaller input
- Preset mode: only student agents listed (teacher already above)
- Remove large shuffle animation from auto mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): merge max turns into teacher row

* refactor(tts): separate teacher row and max turns, use stepper UI

- Teacher row: avatar + name + voice pill only
- Max turns: bottom row with MessageSquare icon + compact stepper
  (minus/number/plus in a rounded pill)
- Remove Input component dependency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): increase voice pill contrast in dark mode

* fix(tts): make max turns input editable, tighten panel padding

* fix(tts): restore shuffle animation in auto mode (compact version)

* fix(tts): adjust auto mode text spacing and add voice auto-assign hint

* fix(tts): auto-close voice popover after selecting a voice

* fix(tts): increase auto mode vertical padding for better balance

* fix(tts): push auto mode text toward bottom with flex spacer

* fix(tts): reduce auto mode bottom padding

* feat(tts): wait for TTS audio to finish before next agent turn

Add waitForDrain() to useDiscussionTTS that returns a promise
resolving when the audio queue is empty. The agent loop in
useChatSessions now awaits this after buffer drain, so the next
agent's turn doesn't start until the current agent's TTS audio
finishes playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): keep bubble visible while TTS audio is still playing

When buffer drains (text=null) but audio indicator is still active,
don't clear liveSpeech. Clear it only when audio state goes idle.
This keeps the speech bubble visible until TTS finishes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): hold discussion bubble until TTS audio finishes

When StreamBuffer fires the done signal (onLiveSpeech null), Stage
now checks if TTS is still playing. If so, it defers clearing the
bubble state. The bubble stays visible until onAllAudioEnd fires
from the TTS hook (queue empty + nothing playing), then clears.

This prevents the jarring UX where the bubble disappears while
the agent's voice is still audible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix bubble hold - guard onStopSession instead of onLiveSpeech

Root cause: bubble disappears because doSessionCleanup fires via
onStopSession when the agent loop ends naturally, NOT because of
onLiveSpeech(null, null).

Fix: when onStopSession fires and TTS is still playing, defer
doSessionCleanup to onAllAudioEnd callback. Manual stop (user
presses button) still cleans up immediately via handleStopDiscussion.

Use doSessionCleanupRef to avoid circular dependency between
discussionTTS hook and doSessionCleanup useCallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): guard BOTH onLiveSpeech and onStopSession for bubble hold

Two paths clear the bubble:
1. onLiveSpeech(null, null) from StreamBuffer done → clears liveSpeech
2. onStopSession → doSessionCleanup → clears all state

Both fire when agent loop ends. Path 1 fires first (tick loop),
path 2 fires after (waitUntilDrained resolves). Both must be
guarded when TTS is still playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): hold bubble during TTS playback and respect playback speed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): LLM picks voice matching agent persona during generation

- Client sends available voices (providerId + voiceId + name) to
  /api/generate/agent-profiles
- LLM prompt asks to pick a voice matching each agent's personality
- Parse "providerId::voiceId" from response, save as voiceConfig
- Fallback to index-based assignment if LLM doesn't pick
- Browser native voices hidden when server providers are available
- saveGeneratedAgents accepts and persists voiceConfig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): restore volume slider in classroom toolbar

Revert the toolbar simplification from 36e3997 that replaced the
volume slider with a TTS on/off toggle. The volume control with
hover slider is a core classroom UX. TTS on/off is controlled via
Settings and Media popover instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): teacher uses global lecture voice in discussion when no voiceConfig override

* fix(tts): teacher always uses global lecture voice, no overrides

* fix(tts): sync playback speed to currently playing audio in real-time

* fix(tts): address code review issues

- Issue 2: enabled flag now checks ttsEnabled && !ttsMuted in stage.tsx
- Issue 4: remove unused browserAvailableVoices from useDiscussionTTS
- Issue 5: remove dead code in audio-settings.tsx (Slider, Loader2, handleTTSVoiceChange, handleTTSSpeedChange, handleTestTTS, testingTTS, ttsTestStatus, ttsTestMessage, testText, ttsSpeed, setTTSSpeed, and unused browser-tts-preview imports)
- Issue 6: shouldHold now checks queue length in addition to isPlayingRef
- Issue 8: hide AgentVoicePill for teacher row in agent-bar.tsx (teacher voice is controlled in Settings)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tts): address PR review — abort preview fetch, defer error recovery

1. Add AbortController to voice preview server TTS fetch, abort on
   stopPreview to prevent stale responses on rapid switching
2. Use queueMicrotask for processQueue calls in error/ended handlers
   to prevent synchronous recursion if multiple items fail consecutively
3. Add ordering invariant comment on sealLastText's onSegmentSealed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): restore teacher voice pill, respect voiceConfig override

* fix(tts): sync volume and mute to discussion TTS audio in real-time

* fix(tts): allow browser-native TTS alongside server providers

* fix(tts): remove top padding from voice popover content

* fix(tts): make selectedAgents reactive to voiceConfig changes

* fix(tts): use agents record instead of listAgents() to avoid infinite loop

* fix(tts): single source of truth for teacher voice

Teacher voice pill now reads/writes global ttsProviderId + ttsVoice
(same settings used by lecture TTS). This ensures lecture and
discussion always use the same teacher voice. Student agents still
use per-agent voiceConfig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add avatar descriptions for smarter LLM avatar selection

Each avatar now has a one-line description (appearance, vibe) sent
to the agent-profiles generation API. LLM picks avatars matching
agent personality instead of guessing from file paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: 杨慎 <117187635+cosarah@users.noreply.github.com>
jaumemir pushed a commit to jaumemir/OpenMAIC that referenced this pull request Apr 8, 2026
* feat(tts): add resolveVoice() and getServerVoiceList() utilities

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add AudioIndicator equalizer bars component

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add onSegmentSealed callback to StreamBuffer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add voiceOverrides field to AgentConfig and AgentTemplate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): add useDiscussionTTS hook with audio queue and cleanup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add audio state indicator to Roundtable bubble

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tts): wire onSegmentSealed callback through chat sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add per-agent voice dropdown to AgentBar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): integrate useDiscussionTTS in Stage and pass state to Roundtable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): refine voice dropdown to pill-style selector

Replace native select styling with a compact rounded-full pill
that blends into the agent row. Remove border, use muted bg,
smaller text, and a custom chevron icon.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): use shadcn Select for voice dropdown, link with TTS toggle

- Replace native <select> with shadcn Select component for consistent UI
- Hide voice dropdown when TTS is muted (ttsMuted)
- Compact pill-style trigger with rounded-full, no border, muted bg

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): add voice label prefix and always show dropdown

Show "音色: Alloy" instead of plain "Alloy" in the voice pill.
Always show dropdown regardless of mute state (voice config is
independent of playback).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): add volume icon hint in collapsed AgentBar

Show a small Volume2 icon in the collapsed pill when voice config
is available, hinting that voice settings are inside.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix voice dropdown layout and click handling

- Move voice pill below agent name (second line) to prevent
  horizontal overflow in English
- Wrap Select in div with onPointerDown stopPropagation to fix
  Radix click-through to parent row
- Add line-clamp-1 to descriptions for consistent row height
- Use items-start instead of items-center for better multi-line alignment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): redesign AgentBar voice layout for compactness

- Single-line layout: checkbox · avatar · name · role · voice pill
- Remove descriptions from agent rows (saves vertical space)
- Extract AgentVoicePill component to isolate Select event handling
- Smaller avatars (size-7), tighter row padding (py-1.5)
- Voice pill uses Volume2 icon + voice name (no prefix text)
- Works in both Chinese and English without overflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): cross-provider voice selection per agent

- Change voiceConfig from per-provider lookup to explicit
  { providerId, voiceId } per agent
- Each agent can use a different TTS provider's voice
- Voice picker dropdown groups voices by provider
- useDiscussionTTS routes TTS requests per agent's provider
- resolveAgentVoice falls back to global provider if no config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): align role badge and voice pill across agent rows

Give role badge fixed width (w-14 text-right) so role text
and voice pills align vertically across all rows regardless
of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix role badge and voice pill alignment

Wrap role badge + voice pill in a fixed-width container
(w-[9.5rem] justify-end) so both align vertically across
all agent rows regardless of name or role text length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(tts): align role badge and voice pill across agent rows

Add min-w-[52px] text-right to role badge so it starts at a
consistent position regardless of agent name length.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): use fixed w-[60px] for role badge alignment

* fix(tts): use fixed w-[88px] for voice pill alignment

* fix(tts): prevent click-outside from closing AgentBar when Select portal is open

* fix(tts): comprehensive voice picker rewrite

- Replace Radix Select with Popover + button list (fixes click issue)
- Fix getAvailableProvidersWithVoices to always include global provider
- Widen panel from w-80 to w-96 (prevents name truncation)
- Voice pill uses primary color instead of gray (more visible)
- Extract renderAgentRow helper to reduce duplication
- Popover shows voices grouped by provider with active state
- Add findVoiceDisplayName utility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): align voice provider availability with toolbar logic

* fix(tts): fallback to first available provider when global provider has no voices

* refactor(tts): remove global provider fallback from voice resolution

Voice resolution now only depends on available providers (those with
API keys or server-configured). No more globalProviderId parameter.
Fallback is first available provider, then browser-native-tts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): add browser native TTS voices to agent voice picker

Load speechSynthesis.getVoices() in AgentBar and include as a
"Browser Native" provider group in the voice popover. No API key
needed - always available if browser supports it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): simplify toolbar TTS to on/off toggle, add disabled state

Toolbar:
- Replace volume slider with simple TTS on/off toggle button
- Remove ttsMuted/ttsVolume/onVolumeChange props from CanvasToolbar
- Toggle now controls ttsEnabled (not ttsMuted)

AgentBar:
- Collapsed: show VolumeX icon when TTS disabled
- Voice pills show disabled state (gray, cursor-not-allowed, no popover)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): simplify Settings TTS tab to toggle + provider config

Remove voice selection, speed slider, preview/test, Azure locale
filter from Settings TTS tab. Voice is now per-agent in AgentBar.
Keep: on/off toggle, provider selector, API key + base URL config.
Add hint text pointing to AgentBar for voice configuration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): simplify media popover TTS tab to toggle only

* fix(tts): add voice config hint to media popover TTS tab

* feat(tts): add per-voice preview button in voice picker

Each voice row in the popover has a small speaker icon button.
Click to preview the voice with "欢迎来到AI课堂" / "Welcome to
AI Classroom" (follows i18n). Browser native uses Web Speech API,
server TTS calls /api/generate/tts. Click again or close popover
to stop. Shows spinner while generating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): preview text follows course language instead of UI language

* refactor(tts): redesign AgentBar expanded panel layout

- Teacher always at top with voice pill (works in both modes)
- Mode tabs moved below teacher
- Auto mode: single compact row with shuffle icon + description
- Max turns: compact inline row with smaller input
- Preset mode: only student agents listed (teacher already above)
- Remove large shuffle animation from auto mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(tts): merge max turns into teacher row

* refactor(tts): separate teacher row and max turns, use stepper UI

- Teacher row: avatar + name + voice pill only
- Max turns: bottom row with MessageSquare icon + compact stepper
  (minus/number/plus in a rounded pill)
- Remove Input component dependency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): increase voice pill contrast in dark mode

* fix(tts): make max turns input editable, tighten panel padding

* fix(tts): restore shuffle animation in auto mode (compact version)

* fix(tts): adjust auto mode text spacing and add voice auto-assign hint

* fix(tts): auto-close voice popover after selecting a voice

* fix(tts): increase auto mode vertical padding for better balance

* fix(tts): push auto mode text toward bottom with flex spacer

* fix(tts): reduce auto mode bottom padding

* feat(tts): wait for TTS audio to finish before next agent turn

Add waitForDrain() to useDiscussionTTS that returns a promise
resolving when the audio queue is empty. The agent loop in
useChatSessions now awaits this after buffer drain, so the next
agent's turn doesn't start until the current agent's TTS audio
finishes playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): keep bubble visible while TTS audio is still playing

When buffer drains (text=null) but audio indicator is still active,
don't clear liveSpeech. Clear it only when audio state goes idle.
This keeps the speech bubble visible until TTS finishes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): hold discussion bubble until TTS audio finishes

When StreamBuffer fires the done signal (onLiveSpeech null), Stage
now checks if TTS is still playing. If so, it defers clearing the
bubble state. The bubble stays visible until onAllAudioEnd fires
from the TTS hook (queue empty + nothing playing), then clears.

This prevents the jarring UX where the bubble disappears while
the agent's voice is still audible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): fix bubble hold - guard onStopSession instead of onLiveSpeech

Root cause: bubble disappears because doSessionCleanup fires via
onStopSession when the agent loop ends naturally, NOT because of
onLiveSpeech(null, null).

Fix: when onStopSession fires and TTS is still playing, defer
doSessionCleanup to onAllAudioEnd callback. Manual stop (user
presses button) still cleans up immediately via handleStopDiscussion.

Use doSessionCleanupRef to avoid circular dependency between
discussionTTS hook and doSessionCleanup useCallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): guard BOTH onLiveSpeech and onStopSession for bubble hold

Two paths clear the bubble:
1. onLiveSpeech(null, null) from StreamBuffer done → clears liveSpeech
2. onStopSession → doSessionCleanup → clears all state

Both fire when agent loop ends. Path 1 fires first (tick loop),
path 2 fires after (waitUntilDrained resolves). Both must be
guarded when TTS is still playing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): hold bubble during TTS playback and respect playback speed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(tts): LLM picks voice matching agent persona during generation

- Client sends available voices (providerId + voiceId + name) to
  /api/generate/agent-profiles
- LLM prompt asks to pick a voice matching each agent's personality
- Parse "providerId::voiceId" from response, save as voiceConfig
- Fallback to index-based assignment if LLM doesn't pick
- Browser native voices hidden when server providers are available
- saveGeneratedAgents accepts and persists voiceConfig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): restore volume slider in classroom toolbar

Revert the toolbar simplification from 36e3997 that replaced the
volume slider with a TTS on/off toggle. The volume control with
hover slider is a core classroom UX. TTS on/off is controlled via
Settings and Media popover instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): teacher uses global lecture voice in discussion when no voiceConfig override

* fix(tts): teacher always uses global lecture voice, no overrides

* fix(tts): sync playback speed to currently playing audio in real-time

* fix(tts): address code review issues

- Issue 2: enabled flag now checks ttsEnabled && !ttsMuted in stage.tsx
- Issue 4: remove unused browserAvailableVoices from useDiscussionTTS
- Issue 5: remove dead code in audio-settings.tsx (Slider, Loader2, handleTTSVoiceChange, handleTTSSpeedChange, handleTestTTS, testingTTS, ttsTestStatus, ttsTestMessage, testText, ttsSpeed, setTTSSpeed, and unused browser-tts-preview imports)
- Issue 6: shouldHold now checks queue length in addition to isPlayingRef
- Issue 8: hide AgentVoicePill for teacher row in agent-bar.tsx (teacher voice is controlled in Settings)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tts): address PR review — abort preview fetch, defer error recovery

1. Add AbortController to voice preview server TTS fetch, abort on
   stopPreview to prevent stale responses on rapid switching
2. Use queueMicrotask for processQueue calls in error/ended handlers
   to prevent synchronous recursion if multiple items fail consecutively
3. Add ordering invariant comment on sealLastText's onSegmentSealed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tts): restore teacher voice pill, respect voiceConfig override

* fix(tts): sync volume and mute to discussion TTS audio in real-time

* fix(tts): allow browser-native TTS alongside server providers

* fix(tts): remove top padding from voice popover content

* fix(tts): make selectedAgents reactive to voiceConfig changes

* fix(tts): use agents record instead of listAgents() to avoid infinite loop

* fix(tts): single source of truth for teacher voice

Teacher voice pill now reads/writes global ttsProviderId + ttsVoice
(same settings used by lecture TTS). This ensures lecture and
discussion always use the same teacher voice. Student agents still
use per-agent voiceConfig.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add avatar descriptions for smarter LLM avatar selection

Each avatar now has a one-line description (appearance, vibe) sent
to the agent-profiles generation API. LLM picks avatars matching
agent personality instead of guessing from file paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: 杨慎 <117187635+cosarah@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants