adds configurable model support for TTS and ASR#108
adds configurable model support for TTS and ASR#108ShaojieLiu wants to merge 23 commits intoTHU-MAIC:mainfrom
Conversation
* main: feat: whiteboard history and auto-save (THU-MAIC#40) fix: use browser speechSynthesis for playback when browser-native-tts is selected (THU-MAIC#28) chore: fix some minor issues in the comments (THU-MAIC#71) fix: reset ASR language when changing provider (THU-MAIC#67) fix: isolate settings API key autofill fields (THU-MAIC#48) # Conflicts: # components/audio/tts-config-popover.tsx # components/generation/media-popover.tsx # components/settings/tts-settings.tsx # lib/store/settings.ts
…to lsj/fix-configurable-tts-asr
|
@wyuc Hi, could you help me review this PR? THX |
wyuc
left a comment
There was a problem hiding this comment.
Thanks for this — the overall approach is solid and the model selection threads through the full stack correctly.
Two things to address:
1. Azure TTS / Browser Native don't actually use model IDs
Azure TTS uses SSML with voice selection, and Browser Native uses the Web Speech API — neither has a "model" concept. But the PR adds dummy model entries for them (azure-neural-tts, browser-native-tts, browser-native-asr), which means the UI shows a model selector that does nothing. This is confusing for users.
Suggestion: skip the model section for providers where model selection has no effect, or add a flag like supportsModelSelection to conditionally render it.
2. Dead code
getTTSModels() and getASRModels() in lib/audio/constants.ts are defined but never called anywhere. Either wire them up or remove them.
Minor nits (non-blocking):
- Some section comments were removed from
tts-settings.tsx({/* API Key & Base URL */}, etc.) — looks unintentional, might want to restore them - Custom model list uses
key={custom-${index}}— using model ID as key would be more robust
…tts-asr # Conflicts: # lib/store/settings.ts
…to lsj/fix-configurable-tts-asr
Thanks, addressed all of these. I split the follow-up into three commits for clarity: 5c852b2 fix: hide audio model selectors for unsupported providers Added supportsModelSelection to TTS/ASR provider metadata and set it to false for Azure TTS and Browser Native TTS/ASR |
…to lsj/fix-configurable-tts-asr
…to lsj/fix-configurable-tts-asr
|
Thanks for the follow-up commits. The One thing I missed in the first round: Fix should be straightforward: ```ts |
Summary
This PR adds configurable model support for TTS and ASR.
Previously, TTS and ASR provider implementations used hardcoded server-side models, so the settings UI could not choose which model to use. This change introduces persisted
ttsModelIdandasrModelIdsettings, updates the TTS/ASR configuration pages to follow the image-generation model-management pattern, and propagates the selected model through preview, generation, and transcription flows.Related Issues
Fixes the issue where TTS and ASR could not select models independently in settings.
closed #14
Changes
ttsModelIdandasrModelIdto persisted settings stateType of Change
Verification
Steps to reproduce / test
What you personally verified
Ran targeted
eslintchecks on modified filesRan
pnpm exec tsc -p tsconfig.json --noEmitVerified the TTS/ASR settings UI structure now mirrors the image-generation model section
Verified selected TTS/ASR model IDs are threaded through client requests into server handlers
Did not run full manual browser interaction testing or full CI suite in this session
pnpm exec eslint lib/audio/types.ts lib/audio/constants.ts lib/store/settings.ts lib/audio/tts-providers.ts lib/audio/asr-providers.ts app/api/generate/tts/route.ts app/api/transcription/route.ts components/settings/tts-settings.tsx components/settings/asr-settings.tsx components/generation/media-popover.tsx components/audio/tts-config-popover.tsx lib/hooks/use-audio-recorder.ts lib/hooks/use-scene-generator.ts app/generation-preview/page.tsxpnpm exec tsc -p tsconfig.json --noEmitEvidence
pnpm check && pnpm lint && npx tsc --noEmit)Checklist