Skip to content

fix: use browser speechSynthesis for playback when browser-native-tts is selected#28

Merged
cosarah merged 5 commits intoTHU-MAIC:mainfrom
YizukiAme:fix/browser-native-tts-playback
Mar 18, 2026
Merged

fix: use browser speechSynthesis for playback when browser-native-tts is selected#28
cosarah merged 5 commits intoTHU-MAIC:mainfrom
YizukiAme:fix/browser-native-tts-playback

Conversation

@YizukiAme
Copy link
Copy Markdown
Contributor

Summary

Fix browser-native TTS producing no sound during classroom playback, while the settings test plays sound correctly.

Fixes #25, fixes #12, fixes #5

Root Cause

When browser-native-tts is selected as the TTS provider:

  1. Scene generation (use-scene-generator.ts:214,450) correctly skips pre-generating audio — browser TTS runs client-side via Web Speech API, not via server API
  2. Playback (engine.ts:436-444) calls audioPlayer.play() which finds no pre-generated audio in IndexedDB → returns false → falls back to scheduleReadingTimer() — a silent timer that estimates reading time but never calls speechSynthesis

Fix

Add Web Speech API integration directly in PlaybackEngine (lib/playback/engine.ts):

  • playBrowserTTS() — speaks text via window.speechSynthesis, respecting user's voice, speed, volume, and mute settings
  • cancelBrowserTTS() — cancels active browser TTS
  • pause() — calls speechSynthesis.pause() when browser TTS is active
  • resume() — calls speechSynthesis.resume() when browser TTS is paused
  • stop() / handleUserInterrupt() — calls speechSynthesis.cancel() to stop browser TTS

The fix is self-contained in one file. When audioPlayer.play() returns false (no pre-generated audio), the engine now checks if browser-native-tts is the selected provider and calls speechSynthesis.speak() instead of falling back to the silent reading timer.

Changes

File Change
lib/playback/engine.ts +83 lines, -4 lines

Testing

  • Set TTS Provider to "Browser Native" → settings test plays sound ✅
  • Generate classroom → play → sound plays
  • Pause/resume works correctly
  • No regression for other TTS providers (they still use pre-generated audio path)

@cosarah
Copy link
Copy Markdown
Collaborator

cosarah commented Mar 17, 2026

Code Review

Clean, focused change with correct edge case handling. Overall LGTM. Two items to consider as follow-up improvements:

1. Chrome long utterance cutoff

Chrome has a known bug where SpeechSynthesisUtterance longer than ~15 seconds gets silently cut off. If a speechAction.text is long, playback may stop mid-sentence and onend won't fire, causing the playback engine to hang. A follow-up PR could add a workaround (e.g., chunking text or adding a timeout watchdog).

2. speechSynthesis.pause()/resume() cross-browser compatibility

Firefox has incomplete support for speechSynthesis.pause() / speechSynthesis.resume(), which may cause playback to not recover after pausing. This is a Web Speech API platform limitation and doesn't affect the correctness of this PR, but worth noting.

@YizukiAme
Copy link
Copy Markdown
Contributor Author

YizukiAme commented Mar 17, 2026

Thanks for the review!!!! I'll address the Chrome 15s cutoff and Firefox pause/resume issues in a follow-up PR by implementing an utterance queue with text chunking. This will elegantly handle both issues while keeping this PR focused on the basic fallback.

… is selected

Previously, selecting browser-native-tts as the TTS provider would
produce sound in the settings test but remain silent during classroom
playback. This happened because:

1. The scene generator correctly skipped pre-generation for browser TTS
   (it runs client-side, not via API)
2. The playback engine fell back to a silent reading timer when no
   pre-generated audio was found, instead of calling speechSynthesis

This commit adds Web Speech API integration directly in the
PlaybackEngine:
- New playBrowserTTS() method speaks text via speechSynthesis
- Properly wires onend/onerror to advance to the next action
- pause()/resume() now handle speechSynthesis.pause()/resume()
- stop() and handleUserInterrupt() cancel browser TTS

Fixes THU-MAIC#25, fixes THU-MAIC#12, fixes THU-MAIC#5
@YizukiAme YizukiAme force-pushed the fix/browser-native-tts-playback branch from 49b470f to 496b5d9 Compare March 17, 2026 04:40
- When ttsVoice is "default" (set by Browser Native TTS which has no
  voice picker), the voiceURI lookup silently fails and no lang is set,
  causing Chinese text to be spoken with an English voice.

- Extract the 0.3 CJK detection threshold as a named constant
  CJK_LANG_THRESHOLD with JSDoc explaining the rationale.

- Fall through to language auto-detection when voice lookup fails,
  regardless of the reason (missing voice, "default" sentinel, etc.).
@YizukiAme
Copy link
Copy Markdown
Contributor Author

YizukiAme commented Mar 17, 2026

Update: Both follow-up items have been addressed in the latest commits on this branch.

1. Chrome long utterance cutoff → ✅ Fixed

Added splitIntoChunks() that splits text on sentence-ending punctuation (Latin + CJK) and newlines, plus playBrowserTTSChunk() that sequentially speaks each chunk. This avoids Chrome's ~15s silent cutoff entirely.

2. Firefox pause()/resume() incompatibility → ✅ Fixed

Implemented a cancel+re-speak pattern:

  • On pause: saves remaining chunks, then calls speechSynthesis.cancel()
  • On resume: re-speaks from the saved chunk onward

This bypasses Firefox's broken speechSynthesis.pause()/resume() completely.

Additional improvements in latest commits:

  • Async voice loading (ensureVoicesLoaded): Chrome loads speechSynthesis.getVoices() asynchronously — we now wait for the voiceschanged event (with 2s timeout) before selecting a voice.
  • Language auto-detection: When no specific voice is configured (Browser Native TTS has no voice picker), we detect text language via CJK character ratio (CJK_LANG_THRESHOLD = 0.3) and set utterance.lang to zh-CN or en-US so the browser auto-selects an appropriate voice.
  • "default" voice handling: Fixed edge case where ttsVoice === "default" caused voice lookup to silently fail, falling through to language detection instead.

@cosarah
Copy link
Copy Markdown
Collaborator

cosarah commented Mar 18, 2026

For both Firefox and Chrome, the browser's native TTS works correctly inside the classroom. However, the TTS preview under the input box on the homepage cannot play the browser's native TTS. Additionally, on Firefox, previewing the browser's native TTS in the Settings panel shows a success status, but no sound is actually heard.

@YizukiAme
Copy link
Copy Markdown
Contributor Author

YizukiAme commented Mar 18, 2026

For both Firefox and Chrome, the browser's native TTS works correctly inside the classroom. However, the TTS preview under the input box on the homepage cannot play the browser's native TTS. Additionally, on Firefox, previewing the browser's native TTS in the Settings panel shows a success status, but no sound is actually heard.

Thanks for your review! Both issues fixed in latest push (4350a8e):

  • Homepage/toolbar TTS preview now works with browser-native-tts (was calling server API which rejects it)
  • Firefox settings test now correctly detects no-voice/silent-success scenarios

The browser-native and API-based TTS preview code was duplicated across
tts-config-popover, media-popover, and tts-settings. Extract it into a
reusable useTTSPreview hook that handles refs, cancellation, audio
lifecycle, and staleness checks in one place.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cosarah cosarah force-pushed the fix/browser-native-tts-playback branch from 49a94e6 to 1a582f3 Compare March 18, 2026 14:20
Copy link
Copy Markdown
Collaborator

@cosarah cosarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

整体改动质量不错,approve with minor suggestions。

改动分析

核心功能:浏览器原生 TTS 预览

  • lib/audio/browser-tts-preview.ts 封装完善,取消/超时/竞态处理到位
  • ensureVoicesLoaded 正确处理了 Chrome 异步加载 voices 的已知问题
  • resolveBrowserVoice 支持 voiceURI/name/lang 多种匹配 + CJK 语言自动检测

useTTSPreview hook 抽取

  • 三个组件(tts-config-popover、media-popover、tts-settings)的重复预览逻辑统一收敛到 hook
  • 净减约 130 行重复代码
  • hook 接口清晰:{ previewing, startPreview, stopPreview }

setASRProvider 语言重置

  • 切换 ASR provider 时自动重置不兼容的语言代码(BCP-47 vs ISO 639-1)

Settings 输入框 autofill 隔离

  • 6 个 settings 组件统一加了 nameautoComplete="new-password" 等属性,防止浏览器串填

pptxgenjs 注释 typo 修复

Minor suggestions(不阻塞合并)

  1. browser-tts-preview.ts 顶部的 'use client' 指令可以去掉——该文件是纯工具函数,没有 React 组件
  2. inferPreviewLang 目前只检测中文(CJK 基本平面 + 扩展 A),如果将来需要支持日文/韩文可能需要扩展匹配范围
  3. browserTTSNoVoices 的中文翻译 "当前浏览器没有可用的 TTS voice" 建议改为 "当前浏览器没有可用的语音",保持中文一致性

@cosarah cosarah merged commit db5ac82 into THU-MAIC:main Mar 18, 2026
1 check passed
ShaojieLiu added a commit to ShaojieLiu/OpenMAIC that referenced this pull request Mar 19, 2026
* main:
  feat: whiteboard history and auto-save (THU-MAIC#40)
  fix: use browser speechSynthesis for playback when browser-native-tts is selected (THU-MAIC#28)
  chore: fix some minor issues in the comments (THU-MAIC#71)
  fix: reset ASR language when changing provider (THU-MAIC#67)
  fix: isolate settings API key autofill fields (THU-MAIC#48)

# Conflicts:
#	components/audio/tts-config-popover.tsx
#	components/generation/media-popover.tsx
#	components/settings/tts-settings.tsx
#	lib/store/settings.ts
jaumemir pushed a commit to jaumemir/OpenMAIC that referenced this pull request Apr 8, 2026
… is selected (THU-MAIC#28)

* fix: use browser speechSynthesis for playback when browser-native-tts is selected

Previously, selecting browser-native-tts as the TTS provider would
produce sound in the settings test but remain silent during classroom
playback. This happened because:

1. The scene generator correctly skipped pre-generation for browser TTS
   (it runs client-side, not via API)
2. The playback engine fell back to a silent reading timer when no
   pre-generated audio was found, instead of calling speechSynthesis

This commit adds Web Speech API integration directly in the
PlaybackEngine:
- New playBrowserTTS() method speaks text via speechSynthesis
- Properly wires onend/onerror to advance to the next action
- pause()/resume() now handle speechSynthesis.pause()/resume()
- stop() and handleUserInterrupt() cancel browser TTS

Fixes THU-MAIC#25, fixes THU-MAIC#12, fixes THU-MAIC#5

* fix: handle "default" ttsVoice, extract CJK_LANG_THRESHOLD constant

- When ttsVoice is "default" (set by Browser Native TTS which has no
  voice picker), the voiceURI lookup silently fails and no lang is set,
  causing Chinese text to be spoken with an English voice.

- Extract the 0.3 CJK detection threshold as a named constant
  CJK_LANG_THRESHOLD with JSDoc explaining the rationale.

- Fall through to language auto-detection when voice lookup fails,
  regardless of the reason (missing voice, "default" sentinel, etc.).

* fix: support browser-native tts previews

* refactor: extract shared TTS preview logic into useTTSPreview hook

The browser-native and API-based TTS preview code was duplicated across
tts-config-popover, media-popover, and tts-settings. Extract it into a
reusable useTTSPreview hook that handles refs, cancellation, audio
lifecycle, and staleness checks in one place.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: yangshen <1322568757@qq.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
smalldeer1982 pushed a commit to smalldeer1982/OpenMAIC that referenced this pull request Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 浏览器原生 TTS (browser-native-tts) 播放时没有声音 课程播放过程中没有声音 ask about the sound pronblems

2 participants