fix: use browser speechSynthesis for playback when browser-native-tts is selected by YizukiAme · Pull Request #28 · THU-MAIC/OpenMAIC

YizukiAme · 2026-03-16T15:40:56Z

Summary

Fix browser-native TTS producing no sound during classroom playback, while the settings test plays sound correctly.

Fixes #25, fixes #12, fixes #5

Root Cause

When browser-native-tts is selected as the TTS provider:

Scene generation (use-scene-generator.ts:214,450) correctly skips pre-generating audio — browser TTS runs client-side via Web Speech API, not via server API
Playback (engine.ts:436-444) calls audioPlayer.play() which finds no pre-generated audio in IndexedDB → returns false → falls back to scheduleReadingTimer() — a silent timer that estimates reading time but never calls speechSynthesis

Fix

Add Web Speech API integration directly in PlaybackEngine (lib/playback/engine.ts):

playBrowserTTS() — speaks text via window.speechSynthesis, respecting user's voice, speed, volume, and mute settings
cancelBrowserTTS() — cancels active browser TTS
pause() — calls speechSynthesis.pause() when browser TTS is active
resume() — calls speechSynthesis.resume() when browser TTS is paused
stop() / handleUserInterrupt() — calls speechSynthesis.cancel() to stop browser TTS

The fix is self-contained in one file. When audioPlayer.play() returns false (no pre-generated audio), the engine now checks if browser-native-tts is the selected provider and calls speechSynthesis.speak() instead of falling back to the silent reading timer.

Changes

File	Change
`lib/playback/engine.ts`	+83 lines, -4 lines

Testing

Set TTS Provider to "Browser Native" → settings test plays sound ✅
Generate classroom → play → sound plays ✅
Pause/resume works correctly
No regression for other TTS providers (they still use pre-generated audio path)

cosarah · 2026-03-17T01:25:57Z

Code Review

Clean, focused change with correct edge case handling. Overall LGTM. Two items to consider as follow-up improvements:

1. Chrome long utterance cutoff

Chrome has a known bug where SpeechSynthesisUtterance longer than ~15 seconds gets silently cut off. If a speechAction.text is long, playback may stop mid-sentence and onend won't fire, causing the playback engine to hang. A follow-up PR could add a workaround (e.g., chunking text or adding a timeout watchdog).

2. `speechSynthesis.pause()/resume()` cross-browser compatibility

Firefox has incomplete support for speechSynthesis.pause() / speechSynthesis.resume(), which may cause playback to not recover after pausing. This is a Web Speech API platform limitation and doesn't affect the correctness of this PR, but worth noting.

YizukiAme · 2026-03-17T04:23:43Z

Thanks for the review!!!! I'll address the Chrome 15s cutoff and Firefox pause/resume issues in a follow-up PR by implementing an utterance queue with text chunking. This will elegantly handle both issues while keeping this PR focused on the basic fallback.

… is selected Previously, selecting browser-native-tts as the TTS provider would produce sound in the settings test but remain silent during classroom playback. This happened because: 1. The scene generator correctly skipped pre-generation for browser TTS (it runs client-side, not via API) 2. The playback engine fell back to a silent reading timer when no pre-generated audio was found, instead of calling speechSynthesis This commit adds Web Speech API integration directly in the PlaybackEngine: - New playBrowserTTS() method speaks text via speechSynthesis - Properly wires onend/onerror to advance to the next action - pause()/resume() now handle speechSynthesis.pause()/resume() - stop() and handleUserInterrupt() cancel browser TTS Fixes THU-MAIC#25, fixes THU-MAIC#12, fixes THU-MAIC#5

- When ttsVoice is "default" (set by Browser Native TTS which has no voice picker), the voiceURI lookup silently fails and no lang is set, causing Chinese text to be spoken with an English voice. - Extract the 0.3 CJK detection threshold as a named constant CJK_LANG_THRESHOLD with JSDoc explaining the rationale. - Fall through to language auto-detection when voice lookup fails, regardless of the reason (missing voice, "default" sentinel, etc.).

YizukiAme · 2026-03-17T12:51:18Z

Update: Both follow-up items have been addressed in the latest commits on this branch.

1. Chrome long utterance cutoff → ✅ Fixed

Added splitIntoChunks() that splits text on sentence-ending punctuation (Latin + CJK) and newlines, plus playBrowserTTSChunk() that sequentially speaks each chunk. This avoids Chrome's ~15s silent cutoff entirely.

2. Firefox `pause()`/`resume()` incompatibility → ✅ Fixed

Implemented a cancel+re-speak pattern:

On pause: saves remaining chunks, then calls speechSynthesis.cancel()
On resume: re-speaks from the saved chunk onward

This bypasses Firefox's broken speechSynthesis.pause()/resume() completely.

Additional improvements in latest commits:

Async voice loading (ensureVoicesLoaded): Chrome loads speechSynthesis.getVoices() asynchronously — we now wait for the voiceschanged event (with 2s timeout) before selecting a voice.
Language auto-detection: When no specific voice is configured (Browser Native TTS has no voice picker), we detect text language via CJK character ratio (CJK_LANG_THRESHOLD = 0.3) and set utterance.lang to zh-CN or en-US so the browser auto-selects an appropriate voice.
"default" voice handling: Fixed edge case where ttsVoice === "default" caused voice lookup to silently fail, falling through to language detection instead.

cosarah · 2026-03-18T07:02:40Z

For both Firefox and Chrome, the browser's native TTS works correctly inside the classroom. However, the TTS preview under the input box on the homepage cannot play the browser's native TTS. Additionally, on Firefox, previewing the browser's native TTS in the Settings panel shows a success status, but no sound is actually heard.

YizukiAme · 2026-03-18T13:12:46Z

For both Firefox and Chrome, the browser's native TTS works correctly inside the classroom. However, the TTS preview under the input box on the homepage cannot play the browser's native TTS. Additionally, on Firefox, previewing the browser's native TTS in the Settings panel shows a success status, but no sound is actually heard.

Thanks for your review! Both issues fixed in latest push (4350a8e):

Homepage/toolbar TTS preview now works with browser-native-tts (was calling server API which rejects it)
Firefox settings test now correctly detects no-voice/silent-success scenarios

The browser-native and API-based TTS preview code was duplicated across tts-config-popover, media-popover, and tts-settings. Extract it into a reusable useTTSPreview hook that handles refs, cancellation, audio lifecycle, and staleness checks in one place. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cosarah

Review Summary

整体改动质量不错，approve with minor suggestions。

改动分析

核心功能：浏览器原生 TTS 预览 ✅

lib/audio/browser-tts-preview.ts 封装完善，取消/超时/竞态处理到位
ensureVoicesLoaded 正确处理了 Chrome 异步加载 voices 的已知问题
resolveBrowserVoice 支持 voiceURI/name/lang 多种匹配 + CJK 语言自动检测

useTTSPreview hook 抽取 ✅

三个组件（tts-config-popover、media-popover、tts-settings）的重复预览逻辑统一收敛到 hook
净减约 130 行重复代码
hook 接口清晰：{ previewing, startPreview, stopPreview }

setASRProvider 语言重置 ✅

切换 ASR provider 时自动重置不兼容的语言代码（BCP-47 vs ISO 639-1）

Settings 输入框 autofill 隔离 ✅

6 个 settings 组件统一加了 name、autoComplete="new-password" 等属性，防止浏览器串填

pptxgenjs 注释 typo 修复 ✅

Minor suggestions（不阻塞合并）

browser-tts-preview.ts 顶部的 'use client' 指令可以去掉——该文件是纯工具函数，没有 React 组件
inferPreviewLang 目前只检测中文（CJK 基本平面 + 扩展 A），如果将来需要支持日文/韩文可能需要扩展匹配范围
browserTTSNoVoices 的中文翻译 "当前浏览器没有可用的 TTS voice" 建议改为 "当前浏览器没有可用的语音"，保持中文一致性

* main: feat: whiteboard history and auto-save (THU-MAIC#40) fix: use browser speechSynthesis for playback when browser-native-tts is selected (THU-MAIC#28) chore: fix some minor issues in the comments (THU-MAIC#71) fix: reset ASR language when changing provider (THU-MAIC#67) fix: isolate settings API key autofill fields (THU-MAIC#48) # Conflicts: # components/audio/tts-config-popover.tsx # components/generation/media-popover.tsx # components/settings/tts-settings.tsx # lib/store/settings.ts

… is selected (THU-MAIC#28) * fix: use browser speechSynthesis for playback when browser-native-tts is selected Previously, selecting browser-native-tts as the TTS provider would produce sound in the settings test but remain silent during classroom playback. This happened because: 1. The scene generator correctly skipped pre-generation for browser TTS (it runs client-side, not via API) 2. The playback engine fell back to a silent reading timer when no pre-generated audio was found, instead of calling speechSynthesis This commit adds Web Speech API integration directly in the PlaybackEngine: - New playBrowserTTS() method speaks text via speechSynthesis - Properly wires onend/onerror to advance to the next action - pause()/resume() now handle speechSynthesis.pause()/resume() - stop() and handleUserInterrupt() cancel browser TTS Fixes THU-MAIC#25, fixes THU-MAIC#12, fixes THU-MAIC#5 * fix: handle "default" ttsVoice, extract CJK_LANG_THRESHOLD constant - When ttsVoice is "default" (set by Browser Native TTS which has no voice picker), the voiceURI lookup silently fails and no lang is set, causing Chinese text to be spoken with an English voice. - Extract the 0.3 CJK detection threshold as a named constant CJK_LANG_THRESHOLD with JSDoc explaining the rationale. - Fall through to language auto-detection when voice lookup fails, regardless of the reason (missing voice, "default" sentinel, etc.). * fix: support browser-native tts previews * refactor: extract shared TTS preview logic into useTTSPreview hook The browser-native and API-based TTS preview code was duplicated across tts-config-popover, media-popover, and tts-settings. Extract it into a reusable useTTSPreview hook that handles refs, cancellation, audio lifecycle, and staleness checks in one place. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: yangshen <1322568757@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… is selected (THU-MAIC#28)

YizukiAme force-pushed the fix/browser-native-tts-playback branch from 49b470f to 496b5d9 Compare March 17, 2026 04:40

Merge branch 'main' into fix/browser-native-tts-playback

2dbdaa4

YizukiAme mentioned this pull request Mar 18, 2026

[Bug]: openclaw插件生成的课堂没有声音 #84

Closed

fix: support browser-native tts previews

4350a8e

cosarah force-pushed the fix/browser-native-tts-playback branch from 49a94e6 to 1a582f3 Compare March 18, 2026 14:20

cosarah approved these changes Mar 18, 2026

View reviewed changes

cosarah merged commit db5ac82 into THU-MAIC:main Mar 18, 2026
1 check passed

YizukiAme mentioned this pull request Mar 18, 2026

使用托管OpenMAIC方式生成了两个课件均无声音 #86

Closed

wyuc mentioned this pull request Mar 20, 2026

fix: support browser-native TTS in classroom playback #62

Closed

13 tasks

smalldeer1982 pushed a commit to smalldeer1982/OpenMAIC that referenced this pull request Apr 20, 2026

fix: use browser speechSynthesis for playback when browser-native-tts…

c6f4c07

… is selected (THU-MAIC#28)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use browser speechSynthesis for playback when browser-native-tts is selected#28

fix: use browser speechSynthesis for playback when browser-native-tts is selected#28
cosarah merged 5 commits intoTHU-MAIC:mainfrom
YizukiAme:fix/browser-native-tts-playback

YizukiAme commented Mar 16, 2026

Uh oh!

cosarah commented Mar 17, 2026 •

edited

Loading

Uh oh!

YizukiAme commented Mar 17, 2026 •

edited

Loading

Uh oh!

YizukiAme commented Mar 17, 2026 •

edited

Loading

Uh oh!

cosarah commented Mar 18, 2026

Uh oh!

YizukiAme commented Mar 18, 2026 •

edited

Loading

Uh oh!

cosarah left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YizukiAme commented Mar 16, 2026

Summary

Root Cause

Fix

Changes

Testing

Uh oh!

cosarah commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

1. Chrome long utterance cutoff

2. speechSynthesis.pause()/resume() cross-browser compatibility

Uh oh!

YizukiAme commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YizukiAme commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Chrome long utterance cutoff → ✅ Fixed

2. Firefox pause()/resume() incompatibility → ✅ Fixed

Additional improvements in latest commits:

Uh oh!

cosarah commented Mar 18, 2026

Uh oh!

YizukiAme commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cosarah left a comment

Choose a reason for hiding this comment

Review Summary

改动分析

Minor suggestions（不阻塞合并）

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cosarah commented Mar 17, 2026 •

edited

Loading

2. `speechSynthesis.pause()/resume()` cross-browser compatibility

YizukiAme commented Mar 17, 2026 •

edited

Loading

YizukiAme commented Mar 17, 2026 •

edited

Loading

2. Firefox `pause()`/`resume()` incompatibility → ✅ Fixed

YizukiAme commented Mar 18, 2026 •

edited

Loading