[Feature]: Add cross-provider TTS voice management and voice cloning

## Problem or Motivation

OpenMAIC currently has a complete custom voice workflow for VoxCPM2: users can create prompt voices, clone voices from reference audio, generate auto voices from agent persona, preview voices, delete saved voices, and reuse those voices from the shared Agent Bar voice pool.

Other TTS providers now expose similar capabilities, but OpenMAIC only treats most of them as static voice lists:

- ElevenLabs supports voice listing, Instant Voice Cloning, Voice Design previews, designed voice creation, voice settings, and deletion.
- MiniMax supports voice cloning, account voice listing, and voice deletion.
- Qwen / Alibaba Bailian supports voice clone and voice design through dedicated VC/VD model families, but OpenMAIC currently only exposes the regular TTS models.
- GLM supports `GLM-TTS-Clone`, which can create a custom voice ID from short reference audio.
- Doubao, OpenAI, and Azure have custom voice capabilities, but with more account, consent, or training/deployment constraints.

This creates an inconsistent user experience: VoxCPM users get reusable managed voices, while users of other providers must either use built-in voices or manually manage provider-side voice IDs outside OpenMAIC.

## Proposed Solution

Add a cross-provider TTS voice management capability layer, then implement provider-specific adapters incrementally.

### 1. Add a provider voice capability abstraction

Introduce a small adapter interface for providers that support voice management:

```ts
type VoiceKind = 'system' | 'library' | 'prompt' | 'clone' | 'designed';
type VoiceStatus = 'ready' | 'pending' | 'failed' | 'unsupported';

interface ProviderVoice {
  providerId: TTSProviderId;
  id: string;
  name: string;
  kind: VoiceKind;
  status?: VoiceStatus;
  targetModelId?: string;
  language?: string;
  gender?: 'male' | 'female' | 'neutral';
  previewUrl?: string;
  description?: string;
  metadata?: Record<string, unknown>;
}

interface TTSVoiceManagementAdapter {
  listVoices?(config: TTSProviderConfigInput): Promise<ProviderVoice[]>;
  createCloneVoice?(input: CloneVoiceInput): Promise<ProviderVoice>;
  createDesignedVoice?(input: DesignedVoiceInput): Promise<ProviderVoice | ProviderVoice[]>;
  deleteVoice?(voiceId: string, input: DeleteVoiceInput): Promise<void>;
  resolveVoiceForSynthesis?(voice: ProviderVoice): Promise<TTSSynthesisVoiceRef>;
}
```

The adapter should preserve the current VoxCPM2 behavior while allowing cloud providers to return provider-managed voices.

### 2. Build a shared voice pool UI

Generalize the current VoxCPM voice manager into a shared provider-aware voice pool:

- Show system, library, cloned, designed, prompt, and pending voices with clear labels.
- Support preview where the provider allows it.
- Support provider-side delete separately from local removal.
- Filter incompatible voices by provider/model when a voice is bound to a target model.
- Keep browser-local metadata only as a cache or UX helper for cloud voices; refresh from provider list APIs where possible.

### 3. Implement providers in priority order

P0:

- **ElevenLabs**: list/get voices, create Instant Voice Clone, optionally support Voice Design previews, create designed voices, delete voices.
- **MiniMax**: upload reference audio, clone voice, list account voices, delete cloned/generated voices.

P1:

- **Qwen / Alibaba Bailian**: add `qwen3-tts-vc-*` and `qwen3-tts-vd-*` model families, enforce `target_model` compatibility, then support clone/design voice creation and listing.
- **GLM**: support creating a clone voice and saving the returned voice ID; postpone full cloud sync until list/delete APIs are confirmed.
- **Doubao**: support only after the current recommended voice clone / TTS V3 account flow is confirmed.

P2:

- **OpenAI custom voices**: gate behind eligibility and implement the required consent-aware workflow.
- **Azure custom voice**: treat as an advanced provider-specific training/deployment workflow, not a simple one-click clone flow.

Browser native TTS and generic custom OpenAI-compatible TTS should remain unsupported for voice cloning unless a provider-specific adapter is configured.

## Acceptance Criteria

- Existing VoxCPM2 prompt voice, clone voice, auto voice, preview, delete, and Agent Bar voice selection behavior continues to work.
- TTS providers can declare voice management capabilities without provider-specific UI branching scattered through settings components.
- ElevenLabs and MiniMax voice management can be implemented through the new abstraction without changing the TTS synthesis API shape for other providers.
- Provider-managed voices and local voices have a common `ProviderVoice` shape and can appear in the shared voice picker.
- The UI distinguishes provider-side deletion from local metadata removal.
- Model-bound voices are not shown for incompatible TTS models.
- Unit tests cover voice resolution, capability gating, and at least one cloud adapter with mocked API responses.
- Documentation is updated with provider capability status and official references.

## Alternatives Considered

- Keep provider-specific voice managers under each TTS settings panel. This is faster for one provider, but it duplicates the VoxCPM UI and makes the Agent Bar voice pool harder to keep consistent.
- Keep accepting manual voice IDs. This is useful as a fallback, but it does not solve clone creation, account voice listing, preview, or deletion.
- Only support VoxCPM2 custom voices. This leaves strong existing provider capabilities unused, especially ElevenLabs and MiniMax.

## Additional Context

Research document:

- `docs/tts-voice-capabilities-research.md`

High-priority provider references:

- ElevenLabs voices: https://elevenlabs.io/docs/capabilities/voices
- ElevenLabs create voice: https://elevenlabs.io/docs/api-reference/add-voice
- ElevenLabs voice design: https://elevenlabs.io/docs/api-reference/ttv-create-previews
- MiniMax voice clone: https://platform.minimaxi.com/docs/api-reference/voice-cloning-clone
- MiniMax voice management: https://platform.minimaxi.com/docs/api-reference/voice-management-get
- Qwen voice clone: https://help.aliyun.com/zh/model-studio/qwen-tts-voice-replica
- Qwen voice design: https://help.aliyun.com/zh/model-studio/qwen-tts-voice-design
- GLM-TTS-Clone: https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-tts-clone

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add cross-provider TTS voice management and voice cloning #503

Problem or Motivation

Proposed Solution

1. Add a provider voice capability abstraction

2. Build a shared voice pool UI

3. Implement providers in priority order

Acceptance Criteria

Alternatives Considered

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Add cross-provider TTS voice management and voice cloning #503

Description

Problem or Motivation

Proposed Solution

1. Add a provider voice capability abstraction

2. Build a shared voice pool UI

3. Implement providers in priority order

Acceptance Criteria

Alternatives Considered

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions