Skip to content

feat: bilingual practice agent with helping phrases#4

Merged
darthmolen merged 13 commits intomainfrom
feature/agent_practice_with_site_language
Jan 29, 2026
Merged

feat: bilingual practice agent with helping phrases#4
darthmolen merged 13 commits intomainfrom
feature/agent_practice_with_site_language

Conversation

@darthmolen
Copy link
Owner

Summary

  • Implement bilingual practice agent that gives instructions in the user's chosen site language while practicing in English
  • Add helping phrases feature with database-backed phrases displayed as chat bubbles
  • Fix various bugs in the Realtime API integration

Changes

Features

  • Bilingual practice agent with instruction language support (Spanish/English)
  • Helping phrases displayed as scrollable chat bubbles in both desktop and mobile views
  • End Session button to disconnect realtime WebSocket sessions
  • Console logging for debugging agent conversations

Bug Fixes

  • Handle content filter errors from Azure OpenAI Realtime API with retry logic
  • Handle null status_details in response.done events
  • Check correct code field for empty audio buffer detection
  • Add auto-scroll to mobile chat overlay for new messages
  • Move /helping-phrases route before /{lesson_number} to fix routing

Technical

  • Add detailed realtime event logging for debugging
  • Add system prompt logging to realtime router

Test plan

  • Test on mobile: verify chat auto-scrolls when new messages arrive
  • Test on desktop: verify helping phrases appear as chat bubble
  • Test content filter handling: agent should retry on filter errors
  • Test End Session button: should disconnect and close overlay

🤖 Generated with Claude Code

EnglishConnect Dev and others added 12 commits January 28, 2026 11:19
Add support for the practice agent to use the site language for instructions
and explanations while practicing in English.

Backend:
- Create helping_phrases database table with migration
- Add HelpingPhraseService with fuzzy matching for STT errors
- Update UnifiedTeachingAgent with bilingual formatting methods
- Update mode_practice.md prompt with two-phase flow and help recovery
- Add /api/lessons/helping-phrases endpoint
- Update conversation router to load and inject phrases

Frontend:
- Add HelpingPhrase type
- Create HelpingPhrasesPanel component
- Integrate into PracticeView with language-aware fetching
- Add i18n translations for help phrases UI

Testing:
- Add baseline evaluation script
- Add final evaluation with comparison
- Add 25 unit tests for bilingual features

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…console logging

- Update mode_practice.md with explicit Language Rules section
- Clarify that ALL support (encouragement, corrections, clarifications) must be
  in instruction_language, not just intro and help recovery
- Add concrete examples for Spanish/English language switching
- Add conversation logging to console for debugging (shows user input, agent
  response with language flags, and tool calls)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add print-based conversation logging to realtime WebSocket handler
- Accumulate assistant transcript deltas and log on response_done
- Load helping phrases for instruction_language in realtime mode
- Now both REST and realtime paths have bilingual support

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
FastAPI routes are matched in order. The dynamic /{lesson_number} route
was capturing /helping-phrases requests, trying to parse "helping-phrases"
as an integer. Moved static route before dynamic to fix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add disconnect() function to useConversation hook to properly cleanup
  WebSocket, audio context, media stream, and reset state
- Replace X button with "End Session" in MobileChatOverlay
- Replace X button with "End Session" in ConversationDrawer
- Call disconnect() when closing chat on both mobile and desktop
- Prevents 30-minute idle sessions when user closes chat overlay

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Prints the full system prompt at session start to help diagnose
agent behavior issues like:
- Cut off introductions
- Wrong turn order
- Off-pattern responses

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Logs all Realtime API events (except high-frequency audio deltas):
- session.created, session.updated
- response.done with status
- response.cancelled (interruption detection)
- VAD speech_started/stopped
- Client commit/cancel/text messages

This helps diagnose truncated responses and unexpected interruptions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Detect incomplete responses due to content_filter and auto-retry
- Send content_filtered event to client while waiting for retry
- Handle empty audio buffer error gracefully (PTT without speech)
- Add response_incomplete event for non-filter incomplete responses
- Frontend handles new event types appropriately

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create HelpingPhrasesBubble component that appears at top of chat
- Add helpingPhrases to conversation store with fetch on language change
- Pass helpingPhrases to ConversationView (desktop) and MobileChatOverlay
- Remove HelpingPhrasesPanel from PracticeView main page
- Bubble scrolls with conversation instead of staying fixed

Helps users see help phrases in context during practice.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
status_details can be null when response completes normally,
causing AttributeError when calling .get() on NoneType.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The error structure has type="invalid_request_error" and
code="input_audio_buffer_commit_empty". Was checking wrong field.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add useRef and useEffect to auto-scroll to the bottom when new messages
arrive, matching the behavior of the desktop ConversationView component.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings January 29, 2026 02:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds bilingual “practice” behavior to the unified teaching agent (instructions in the user’s site language while practicing in English), introduces database-backed “helping phrases” surfaced in the chat UI, and improves Realtime API session handling (disconnect + additional event/error handling).

Changes:

  • Add “helping phrases” end-to-end: DB model + migration/seed, service + API route, frontend types/store/API fetch, and chat bubble UI (desktop + mobile).
  • Extend UnifiedTeachingAgent to include helping phrases + bilingual pattern/intro formatting in the practice prompt.
  • Enhance Realtime integration: new client event handling (content filter/incomplete/empty audio), explicit disconnect, and additional backend event processing/logging.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/unit/test_unified_teaching_agent_bilingual.py Adds unit tests around bilingual formatting and helping-phrases prompt inclusion.
tests/unit/test_helping_phrase_service.py Adds unit tests for phrase retrieval and fuzzy matching/normalization behavior.
tests/unit/init.py Marks unit test package.
tests/harness/agent_harness.py Loads helping phrases + supports instruction_language in the agent test harness.
tests/evaluation/run_final_evaluation.py Adds a script to run a “final” evaluation and optionally compare vs baseline.
tests/evaluation/run_baseline.py Adds a script to capture baseline evaluation results for later comparison.
tests/evaluation/final_results.json Stores captured “final” evaluation output.
tests/evaluation/comparison_report.md Stores a generated baseline vs final comparison report.
tests/evaluation/baseline_results.json Stores captured baseline evaluation output.
src/frontend/src/types/index.ts Adds HelpingPhrase type to match backend schema.
src/frontend/src/stores/conversationStore.ts Adds store state + setter for helpingPhrases.
src/frontend/src/services/api.ts Adds fetchHelpingPhrases() client API method.
src/frontend/src/locales/es.json Adds translations for End Session and helping phrases UI text (ES).
src/frontend/src/locales/en.json Adds translations for End Session and helping phrases UI text (EN).
src/frontend/src/hooks/useConversation.ts Adds handling for new realtime events and exposes a disconnect() cleanup API.
src/frontend/src/components/mobile/MobileChatOverlay.tsx Adds helping-phrases bubble + auto-scroll + “End Session” UX on mobile overlay.
src/frontend/src/components/content/HelpingPhrasesPanel.tsx Adds a panel component to display helping phrases in content UI.
src/frontend/src/components/HelpingPhrasesBubble.tsx Adds a chat-bubble component that renders helping phrases in the transcript.
src/frontend/src/components/ConversationView.tsx Displays helping-phrases bubble and adjusts empty-state logic accordingly.
src/frontend/src/components/ConversationDrawer.tsx Adds “End Session” action and passes helping phrases into the conversation view.
src/frontend/src/MobileApp.tsx Fetches helping phrases on language change; calls disconnect() when closing chat.
src/frontend/src/App.tsx Fetches helping phrases on language change; calls disconnect() when ending session.
src/backend/app/services/realtime_session.py Adds realtime event logging + content-filter retry + empty-audio handling.
src/backend/app/services/helping_phrase_service.py Implements DB retrieval + fuzzy match for help phrases.
src/backend/app/schemas/helping_phrase.py Adds HelpingPhraseSchema for API/service/agent use.
src/backend/app/routers/realtime.py Loads helping phrases for realtime practice; adds extensive session/prompt/transcript logging.
src/backend/app/routers/lessons.py Adds /api/lessons/helping-phrases route and ensures routing order.
src/backend/app/routers/conversation.py Loads helping phrases for practice; adds detailed conversation logging to stdout.
src/backend/app/prompts/agent/mode_practice.md Expands practice prompt with bilingual rules + helping-phrases intro + help recovery protocol.
src/backend/app/models/content.py Adds HelpingPhrase ORM model.
src/backend/app/agents/unified_teaching_agent.py Adds helping-phrases support and new formatting helpers for the practice prompt.
src/backend/alembic/versions/005_add_helping_phrases.py Adds migration for helping phrases table + seeds ES/EN phrase rows.
Comments suppressed due to low confidence (1)

tests/unit/test_unified_teaching_agent_bilingual.py:347

  • This test builds prompt_en but never asserts anything about it, so it doesn’t actually verify the English-path behavior. Add assertions that (a) prompt_es and prompt_en differ, and (b) prompt_en contains English instruction-language markers and does not contain Spanish markers (and/or vice versa).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +256 to +260
# Log ALL events for debugging (except high-frequency audio deltas)
if event_type not in ["response.audio.delta", "response.audio_transcript.delta"]:
print(f"[REALTIME EVENT] {event_type}")
if event_type in ["error", "response.done", "session.created", "session.updated"]:
print(f" -> {json.dumps(event, indent=2)[:500]}")
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These debug prints log every Realtime event (and sometimes the full event payload). This will produce very high-volume stdout in production and may leak sensitive content (prompts, transcripts, tool args). Prefer structured logging via the module logger at DEBUG level and gate it behind a config flag (e.g., settings.debug_realtime_logging).

Copilot uses AI. Check for mistakes.
Comment on lines +316 to +329
if status == "incomplete":
if reason == "content_filter":
print("[REALTIME] WARNING: Response blocked by content filter!")
# Notify client about the filter
yield {
"type": "content_filtered",
"message": "Response was filtered. Retrying..."
}
# Retry by requesting a new response
if self.ws:
print("[REALTIME] Retrying response after content filter...")
await self.ws.send(json.dumps({"type": "response.create"}))
# Don't yield response_done - wait for retry
continue
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content-filter retry path can loop indefinitely if the model keeps getting filtered (it immediately sends another response.create with no retry limit/backoff). Add a capped retry counter (per response / per session) and a backoff; after the limit, surface an error to the client so the UI can recover.

Copilot uses AI. Check for mistakes.
Comment on lines +176 to +185
# Log system prompt for debugging
print("\n" + "=" * 80)
print("REALTIME SESSION STARTING")
print("=" * 80)
print(f"Lesson: {lesson_number}, Mode: {mode}, Language: {instruction_language}")
print("-" * 80)
print("SYSTEM PROMPT:")
print("-" * 80)
print(system_prompt)
print("=" * 80 + "\n")
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prints the full system prompt (and other conversation details) to stdout for every realtime session. System prompts often contain sensitive instructions and can include user/context data, so this is a security/privacy risk and noisy operationally. Use structured logger calls and only emit this behind an explicit debug flag; avoid logging the full prompt by default (log a hash or truncated summary instead).

Suggested change
# Log system prompt for debugging
print("\n" + "=" * 80)
print("REALTIME SESSION STARTING")
print("=" * 80)
print(f"Lesson: {lesson_number}, Mode: {mode}, Language: {instruction_language}")
print("-" * 80)
print("SYSTEM PROMPT:")
print("-" * 80)
print(system_prompt)
print("=" * 80 + "\n")
# Safely log session start with a truncated system prompt preview for debugging
max_preview_len = 200
preview = system_prompt[:max_preview_len]
if len(system_prompt) > max_preview_len:
preview += "... [truncated]"
logger.debug(
"Realtime session starting",
extra={
"lesson_number": lesson_number,
"mode": mode,
"instruction_language": instruction_language,
"system_prompt_preview": preview,
},
)

Copilot uses AI. Check for mistakes.
Comment on lines +212 to +233
print("\n" + "=" * 60)
print(f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}")
print("=" * 60)
print(f"USER: {request.message if request.message else '(session start)'}")
print("-" * 60)

# Show all speak() calls with their languages
if audio_chunks:
for chunk in audio_chunks:
lang_label = "🇪🇸 ES" if chunk.language == "es" else "🇺🇸 EN"
print(f"AGENT [{lang_label}]: {chunk.text}")
else:
print(f"AGENT: {response_text}")

# Show tool calls summary
tool_calls = agent_result.get("tool_results", [])
if tool_calls:
tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"]
if tools_used:
print(f"TOOLS: {', '.join(tools_used)}")
print("=" * 60 + "\n")

Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The router currently prints full user messages, agent outputs, and tool usage to stdout on every request. This can leak PII and makes logs noisy/unstructured. Replace with logger.debug/info and guard detailed transcript/tool logging behind an explicit debug setting; consider redacting user content in non-debug environments.

Suggested change
print("\n" + "=" * 60)
print(f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}")
print("=" * 60)
print(f"USER: {request.message if request.message else '(session start)'}")
print("-" * 60)
# Show all speak() calls with their languages
if audio_chunks:
for chunk in audio_chunks:
lang_label = "🇪🇸 ES" if chunk.language == "es" else "🇺🇸 EN"
print(f"AGENT [{lang_label}]: {chunk.text}")
else:
print(f"AGENT: {response_text}")
# Show tool calls summary
tool_calls = agent_result.get("tool_results", [])
if tool_calls:
tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"]
if tools_used:
print(f"TOOLS: {', '.join(tools_used)}")
print("=" * 60 + "\n")
settings = get_settings()
debug_conversation_logging = getattr(settings, "debug_conversation_logging", False)
if debug_conversation_logging and logger.isEnabledFor(logging.DEBUG):
convo_lines = []
convo_lines.append("=" * 60)
convo_lines.append(
f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}"
)
convo_lines.append("=" * 60)
user_text = request.message if request.message else "(session start)"
convo_lines.append(f"USER: {user_text}")
convo_lines.append("-" * 60)
# Show all speak() calls with their languages
if audio_chunks:
for chunk in audio_chunks:
lang_label = "ES" if chunk.language == "es" else "EN"
convo_lines.append(f"AGENT [{lang_label}]: {chunk.text}")
else:
convo_lines.append(f"AGENT: {response_text}")
# Show tool calls summary (excluding speak)
tool_calls = agent_result.get("tool_results", [])
if tool_calls:
tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"]
if tools_used:
convo_lines.append(f"TOOLS: {', '.join(tools_used)}")
convo_lines.append("=" * 60)
logger.debug("\n" + "\n".join(convo_lines))

Copilot uses AI. Check for mistakes.
"""Add helping phrases table for bilingual practice support.

Revision ID: 005_add_helping_phrases
Revises: 004_add_jwt_auth
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration header comment says "Revises: 004_add_jwt_auth" but down_revision points to "ef47914574a5". Update the docstring metadata so it matches the actual Alembic dependency chain to avoid confusion during maintenance/review.

Suggested change
Revises: 004_add_jwt_auth
Revises: ef47914574a5

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,346 @@
"""Unit tests for bilingual features in UnifiedTeachingAgent."""

import pytest
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'pytest' is not used.

Copilot uses AI. Check for mistakes.
"""Unit tests for bilingual features in UnifiedTeachingAgent."""

import pytest
from unittest.mock import MagicMock, patch
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'patch' is not used.

Copilot uses AI. Check for mistakes.
- Replace verbose print statements with logger.debug in realtime_session.py,
  realtime.py, and conversation.py
- Add retry limit (max 3) for content filter retries to prevent infinite loops
- Truncate system prompt preview in logs for security/privacy
- Fix migration docstring to match actual down_revision
- Remove unused pytest and patch imports from test file

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@darthmolen darthmolen merged commit 3f24c8f into main Jan 29, 2026
3 checks passed
@darthmolen darthmolen deleted the feature/agent_practice_with_site_language branch January 29, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants