[codex] Add PocketTTS language selection#1118
Conversation
| .appendingPathComponent(".cache", isDirectory: true) | ||
| .appendingPathComponent("fluidaudio", isDirectory: true) | ||
| .appendingPathComponent("Models", isDirectory: true) | ||
| .appendingPathComponent("pocket-tts", isDirectory: true) |
There was a problem hiding this comment.
FluidAudio's Repo.pocketTts.folderName is "pocket-tts-coreml" (see FluidAudio source). The downloader writes to ~/.cache/fluidaudio/Models/pocket-tts-coreml/v2/<lang>/ so this probe always returns false. This means that even after downloading a pack and restarting (or any state where modelState
isn't already .ready), clicking the speaker icon on a message will route them to the settings window instead of lazy loading the cached pack.
fix is just to rename "pocket-tts" → "pocket-tts-coreml"
…codex/t-h-voice-reliability
|
@tpae quick batch nudge on the older important PRs: what is needed to get #1118, #1120, #1124, #1110, and #1058 into main? Current read: #1118/#1120/#1124/#1110 are green and mergeable, #1058 is draft but green/mergeable; #1134/#1135/#1136 are also green/mergeable, while #1133 is green/mergeable but has the non-fast-forward publication ambiguity from the local refresh attempt. Should these older PRs get review, rebase, scope changes, or supersession notes? |
Business rationale
PocketTTS already supports multiple language packs upstream, but Osaurus exposed TTS as English-only and always loaded the default PocketTTS manager. That leaves users in #1002 unable to make spoken replies match non-English assistant output, with French specifically called out as a missing supported path. This PR makes the selected language an explicit user setting, shows that choice in the Voice settings UI, and treats the selected pack as part of model readiness so the observable fix is simple: choose French (
french_24l) or another PocketTTS pack, download/load that pack, and synthesis uses the matching FluidAudio manager instead of silently staying on English.Coding rationale
The implementation keeps language identity in an Osaurus-owned
TTSLanguageenum instead of persisting FluidAudio symbol names directly, so future FluidAudio API changes do not become a settings migration trap. Auto-detecting language per message and routing per conversation were considered but deliberately left out because FluidAudio binds language atPocketTtsManagerconstruction and does not provide automatic language detection; switching languages now invalidates the cached manager and reloads the correct pack. The trust boundary crossed here is local persisted TTS configuration plus FluidAudio's pinned package/download API: Osaurus stores only stable raw language IDs, maps them to FluidAudio's typed enum, and never accepts a user-supplied model path.Acceptance criteria
french_24l).Quality gate
git fetch --all --prunebefore the gate and again before push; branch remained based on currentorigin/main)Debug evidence
Full-gate tail also ended cleanly:
Rollback
Run
git revert ad7fc8ea; previously persistedlanguagekeys are ignored by the old decoder after revert, so TTS returns to the English-only default behavior without a data migration.