Build a reliable language tutor loop:
- Tutor speaks.
- Learner holds to talk.
- Learner releases.
- Agent commits turn and replies.
If this loop is stable, everything else is optional.
- One LiveKit room per session.
- One learner identity per session (
learner-{userId}-{sessionId}). - One worker dispatch per room (
decipher-agent). - One STT provider (Deepgram, language-locked to target language).
- One TTS provider (Deepgram, language-specific voice model).
- One LLM provider (Claude via OpenAI-compatible client).
- One explicit turn boundary (
ptt_release->commitUserTurn()).
- Removed auto-reconnect behavior in the web client.
- Removed mixed turn-finalization dependency on VAD only.
- Removed sticky session identity collisions across restarts.
- Kept session disconnect behavior strict to avoid zombie worker jobs.
- Client -> Worker:
ptt_release
- Worker -> Client:
user_transcribeduser_utteranceagent_utteranceagent_error
- Do not add new providers until loop reliability is proven.
- Do not add auth complexity into voice path.
- Do not add reconnect logic before core loop is stable in production.
- Every new feature must preserve: hold-to-talk release always produces a tutor turn.
- Confirm tutor always starts on new session.
- Confirm
Heard:appears after learner turn. - Confirm tutor replies within one turn cycle after release.
- Confirm refresh creates a clean new session without stale room state.