adds configurable model support for TTS and ASR by ShaojieLiu · Pull Request #108 · THU-MAIC/OpenMAIC

ShaojieLiu · 2026-03-19T06:40:02Z

Summary

This PR adds configurable model support for TTS and ASR.

Previously, TTS and ASR provider implementations used hardcoded server-side models, so the settings UI could not choose which model to use. This change introduces persisted ttsModelId and asrModelId settings, updates the TTS/ASR configuration pages to follow the image-generation model-management pattern, and propagates the selected model through preview, generation, and transcription flows.

Related Issues

Fixes the issue where TTS and ASR could not select models independently in settings.
closed #14

Changes

Added ttsModelId and asrModelId to persisted settings state
Added TTS/ASR provider model definitions to audio provider metadata
Reworked TTS and ASR settings pages to match the image-generation model section pattern
Added bottom-positioned model lists for TTS/ASR with selectable active model
Added create/edit/delete support for custom TTS/ASR models
Updated TTS preview, scene generation, generation preview, and ASR recording/transcription flows to send selected model IDs
Replaced hardcoded server-side TTS/ASR model selection with configurable model IDs from settings
Added migration/defaulting behavior so existing users get valid default TTS/ASR models

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Refactoring (no functional changes)
CI/CD or build changes

Verification

Steps to reproduce / test

Open TTS settings and confirm the model section appears at the bottom of the page
Select a built-in TTS model, add a custom TTS model, switch selection, and verify preview requests use the selected model
Open ASR settings and confirm the model section appears at the bottom of the page
Select a built-in ASR model, add a custom ASR model, switch selection, and verify transcription requests use the selected model
Delete a selected custom TTS/ASR model and confirm the UI falls back to another available model

What you personally verified

Ran targeted eslint checks on modified files
Ran pnpm exec tsc -p tsconfig.json --noEmit
Verified the TTS/ASR settings UI structure now mirrors the image-generation model section
Verified selected TTS/ASR model IDs are threaded through client requests into server handlers
Did not run full manual browser interaction testing or full CI suite in this session
pnpm exec eslint lib/audio/types.ts lib/audio/constants.ts lib/store/settings.ts lib/audio/tts-providers.ts lib/audio/asr-providers.ts app/api/generate/tts/route.ts app/api/transcription/route.ts components/settings/tts-settings.tsx components/settings/asr-settings.tsx components/generation/media-popover.tsx components/audio/tts-config-popover.tsx lib/hooks/use-audio-recorder.ts lib/hooks/use-scene-generator.ts app/generation-preview/page.tsx
pnpm exec tsc -p tsconfig.json --noEmit

Evidence

CI passes (pnpm check && pnpm lint && npx tsc --noEmit)
Manually tested locally
Screenshots / recordings attached (if UI changes)

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have added/updated documentation as needed
My changes do not introduce new warnings

* main: feat: whiteboard history and auto-save (THU-MAIC#40) fix: use browser speechSynthesis for playback when browser-native-tts is selected (THU-MAIC#28) chore: fix some minor issues in the comments (THU-MAIC#71) fix: reset ASR language when changing provider (THU-MAIC#67) fix: isolate settings API key autofill fields (THU-MAIC#48) # Conflicts: # components/audio/tts-config-popover.tsx # components/generation/media-popover.tsx # components/settings/tts-settings.tsx # lib/store/settings.ts

…tts-asr

…to lsj/fix-configurable-tts-asr

ShaojieLiu · 2026-03-19T16:36:38Z

@wyuc Hi, could you help me review this PR? THX

wyuc

Thanks for this — the overall approach is solid and the model selection threads through the full stack correctly.

Two things to address:

1. Azure TTS / Browser Native don't actually use model IDs

Azure TTS uses SSML with voice selection, and Browser Native uses the Web Speech API — neither has a "model" concept. But the PR adds dummy model entries for them (azure-neural-tts, browser-native-tts, browser-native-asr), which means the UI shows a model selector that does nothing. This is confusing for users.

Suggestion: skip the model section for providers where model selection has no effect, or add a flag like supportsModelSelection to conditionally render it.

2. Dead code

getTTSModels() and getASRModels() in lib/audio/constants.ts are defined but never called anywhere. Either wire them up or remove them.

Minor nits (non-blocking):

Some section comments were removed from tts-settings.tsx ({/* API Key & Base URL */}, etc.) — looks unintentional, might want to restore them
Custom model list uses key={custom-${index}} — using model ID as key would be more robust

…tts-asr # Conflicts: # lib/store/settings.ts

…to lsj/fix-configurable-tts-asr

ShaojieLiu · 2026-03-23T03:06:13Z

Thanks for this — the overall approach is solid and the model selection threads through the full stack correctly.

Two things to address:

1. Azure TTS / Browser Native don't actually use model IDs

Azure TTS uses SSML with voice selection, and Browser Native uses the Web Speech API — neither has a "model" concept. But the PR adds dummy model entries for them (azure-neural-tts, browser-native-tts, browser-native-asr), which means the UI shows a model selector that does nothing. This is confusing for users.

Suggestion: skip the model section for providers where model selection has no effect, or add a flag like supportsModelSelection to conditionally render it.

2. Dead code

getTTSModels() and getASRModels() in lib/audio/constants.ts are defined but never called anywhere. Either wire them up or remove them.

Minor nits (non-blocking):

Some section comments were removed from tts-settings.tsx ({/* API Key & Base URL */}, etc.) — looks unintentional, might want to restore them

Custom model list uses key={custom-${index}} — using model ID as key would be more robust

Thanks, addressed all of these.

I split the follow-up into three commits for clarity:

5c852b2 fix: hide audio model selectors for unsupported providers
5193da9 chore: remove unused audio model helpers
bcab3e3 chore: restore settings comments and stable model keys
What changed:

Added supportsModelSelection to TTS/ASR provider metadata and set it to false for Azure TTS and Browser Native TTS/ASR
Removed the dummy model entries for those providers
Updated the TTS/ASR settings UIs to only render model management when model selection actually matters
Updated default/store fallback logic so unsupported providers keep an empty model id instead of a fake one
Removed unused getTTSModels() / getASRModels()
Restored the dropped section comments in tts-settings.tsx
Switched custom model list keys to use model.id

…tts-asr

…to lsj/fix-configurable-tts-asr

…tts-asr

…to lsj/fix-configurable-tts-asr

wyuc · 2026-03-27T14:10:42Z

Thanks for the follow-up commits. The supportsModelSelection flag and the cleanup all look correct.

One thing I missed in the first round: generateElevenLabsTTS() in lib/audio/tts-providers.ts still has model_id hardcoded to 'eleven_multilingual_v2' (line 353). The other providers were updated to use config.modelId || 'default', but ElevenLabs was not. Since ElevenLabs is marked supportsModelSelection: true, users can select or add a custom model in the UI, but the actual API request will always send eleven_multilingual_v2.

Fix should be straightforward:

```ts
model_id: config.modelId || 'eleven_multilingual_v2',
```

ShaojieLiu added 3 commits March 17, 2026 15:58

fix: add configurable models for tts and asr

38f74fb

style: format audio model settings changes

568ac52

ShaojieLiu mentioned this pull request Mar 19, 2026

fix: add configurable models for tts and asr #50

Closed

13 tasks

ShaojieLiu added 4 commits March 19, 2026 17:22

Merge branch 'main' into lsj/fix-configurable-tts-asr

3fdded0

Merge branch 'main' into lsj/fix-configurable-tts-asr

4b5e11a

Merge remote-tracking branch 'origin/main' into lsj/fix-configurable-…

3d4d65e

…tts-asr

Merge remote-tracking branch 'github/lsj/fix-configurable-tts-asr' in…

eacfc5c

…to lsj/fix-configurable-tts-asr

Merge branch 'main' into lsj/fix-configurable-tts-asr

a34884d

wyuc requested changes Mar 21, 2026

View reviewed changes

ShaojieLiu added 5 commits March 23, 2026 10:22

Merge remote-tracking branch 'origin/main' into lsj/fix-configurable-…

ef3adcb

…tts-asr # Conflicts: # lib/store/settings.ts

Merge remote-tracking branch 'github/lsj/fix-configurable-tts-asr' in…

091b8e6

…to lsj/fix-configurable-tts-asr

fix: hide audio model selectors for unsupported providers

5c852b2

chore: remove unused audio model helpers

5193da9

chore: restore settings comments and stable model keys

bcab3e3

ShaojieLiu requested a review from wyuc March 23, 2026 03:06

ShaojieLiu added 10 commits March 24, 2026 10:10

Merge branch 'main' into lsj/fix-configurable-tts-asr

76751d1

Merge remote-tracking branch 'origin/main' into lsj/fix-configurable-…

71e3255

…tts-asr

test: update settings sync audio provider mocks

f181a6e

Merge remote-tracking branch 'github/lsj/fix-configurable-tts-asr' in…

ea05f44

…to lsj/fix-configurable-tts-asr

Merge remote-tracking branch 'origin/main' into lsj/fix-configurable-…

20d8421

…tts-asr

Merge branch 'main' into lsj/fix-configurable-tts-asr

56eb2ef

Merge branch 'main' into lsj/fix-configurable-tts-asr

3feb22a

Merge remote-tracking branch 'origin/main' into lsj/fix-configurable-…

7251b79

…tts-asr

Merge remote-tracking branch 'github/lsj/fix-configurable-tts-asr' in…

af9f7e6

…to lsj/fix-configurable-tts-asr

Merge branch 'main' into lsj/fix-configurable-tts-asr

919c702

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds configurable model support for TTS and ASR#108

adds configurable model support for TTS and ASR#108
ShaojieLiu wants to merge 23 commits intoTHU-MAIC:mainfrom
ShaojieLiu:lsj/fix-configurable-tts-asr

ShaojieLiu commented Mar 19, 2026

Uh oh!

ShaojieLiu commented Mar 19, 2026

Uh oh!

wyuc left a comment

Uh oh!

ShaojieLiu commented Mar 23, 2026

Uh oh!

wyuc commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShaojieLiu commented Mar 19, 2026

Summary

Related Issues

Changes

Type of Change

Verification

Steps to reproduce / test

What you personally verified

Evidence

Checklist

Uh oh!

ShaojieLiu commented Mar 19, 2026

Uh oh!

wyuc left a comment

Choose a reason for hiding this comment

Uh oh!

ShaojieLiu commented Mar 23, 2026

Uh oh!

wyuc commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants