Skip to content

[codex] Add PocketTTS language selection#1118

Open
mimeding wants to merge 9 commits into
osaurus-ai:mainfrom
mimeding:codex/t-h-voice-reliability
Open

[codex] Add PocketTTS language selection#1118
mimeding wants to merge 9 commits into
osaurus-ai:mainfrom
mimeding:codex/t-h-voice-reliability

Conversation

@mimeding
Copy link
Copy Markdown
Contributor

Business rationale

PocketTTS already supports multiple language packs upstream, but Osaurus exposed TTS as English-only and always loaded the default PocketTTS manager. That leaves users in #1002 unable to make spoken replies match non-English assistant output, with French specifically called out as a missing supported path. This PR makes the selected language an explicit user setting, shows that choice in the Voice settings UI, and treats the selected pack as part of model readiness so the observable fix is simple: choose French (french_24l) or another PocketTTS pack, download/load that pack, and synthesis uses the matching FluidAudio manager instead of silently staying on English.

Coding rationale

The implementation keeps language identity in an Osaurus-owned TTSLanguage enum instead of persisting FluidAudio symbol names directly, so future FluidAudio API changes do not become a settings migration trap. Auto-detecting language per message and routing per conversation were considered but deliberately left out because FluidAudio binds language at PocketTtsManager construction and does not provide automatic language detection; switching languages now invalidates the cached manager and reloads the correct pack. The trust boundary crossed here is local persisted TTS configuration plus FluidAudio's pinned package/download API: Osaurus stores only stable raw language IDs, maps them to FluidAudio's typed enum, and never accepts a user-supplied model path.

Acceptance criteria

  • Existing TTS configs without a language decode as English.
  • Unknown persisted language values fail closed to English instead of crashing settings load.
  • Voice settings expose a PocketTTS language picker including French (french_24l).
  • Model readiness, download checks, and synthesis use the selected language pack.
  • FluidAudio resolves to the 0.14.x language-bound PocketTTS API and release-builds successfully.

Quality gate

  • swift-format / swiftlint / shellcheck — clean (all commands exited 0; SwiftLint reported existing warning backlog with 0 serious violations)
  • swift test --package-path Packages/OsaurusCLI --parallel — green
  • make ci-test — green
  • swift build --package-path Packages/OsaurusCore -c release — green
  • Archive freshness — confirmed (git fetch --all --prune before the gate and again before push; branch remained based on current origin/main)

Debug evidence

swift test --package-path Packages/OsaurusCore --filter TTSConfigurationTests
✔ Test languageCatalogIncludesPocketTTSLanguagePacks() passed after 0.001 seconds.
✔ Test decodeLegacyConfigDefaultsToEnglish() passed after 0.001 seconds.
✔ Test decodeUnknownLanguageFallsBackToEnglish() passed after 0.001 seconds.
✔ Test roundTripPersistsSelectedLanguageRawValue() passed after 0.001 seconds.
✔ Suite TTSConfigurationTests passed after 0.001 seconds.
✔ Test run with 4 tests in 1 suite passed after 0.001 seconds.

Full-gate tail also ended cleanly:

swift test --package-path Packages/OsaurusCLI --parallel
Build complete! (22.22s)
✔ Test run with 0 tests in 0 suites passed after 0.001 seconds.

make ci-test
Done. Inspect failures with: open build/Tests.xcresult

swift build --package-path Packages/OsaurusCore -c release
Build complete! (322.19s)

Rollback

Run git revert ad7fc8ea; previously persisted language keys are ignored by the old decoder after revert, so TTS returns to the English-only default behavior without a data migration.

@tpae tpae requested a review from RaajeevChandran May 17, 2026 02:32
.appendingPathComponent(".cache", isDirectory: true)
.appendingPathComponent("fluidaudio", isDirectory: true)
.appendingPathComponent("Models", isDirectory: true)
.appendingPathComponent("pocket-tts", isDirectory: true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FluidAudio's Repo.pocketTts.folderName is "pocket-tts-coreml" (see FluidAudio source). The downloader writes to ~/.cache/fluidaudio/Models/pocket-tts-coreml/v2/<lang>/ so this probe always returns false. This means that even after downloading a pack and restarting (or any state where modelState
isn't already .ready), clicking the speaker icon on a message will route them to the settings window instead of lazy loading the cached pack.

fix is just to rename "pocket-tts" → "pocket-tts-coreml"

@mimeding
Copy link
Copy Markdown
Contributor Author

@tpae quick batch nudge on the older important PRs: what is needed to get #1118, #1120, #1124, #1110, and #1058 into main? Current read: #1118/#1120/#1124/#1110 are green and mergeable, #1058 is draft but green/mergeable; #1134/#1135/#1136 are also green/mergeable, while #1133 is green/mergeable but has the non-fast-forward publication ambiguity from the local refresh attempt. Should these older PRs get review, rebase, scope changes, or supersession notes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants