feat: bilingual practice agent with helping phrases#4
Conversation
Add support for the practice agent to use the site language for instructions and explanations while practicing in English. Backend: - Create helping_phrases database table with migration - Add HelpingPhraseService with fuzzy matching for STT errors - Update UnifiedTeachingAgent with bilingual formatting methods - Update mode_practice.md prompt with two-phase flow and help recovery - Add /api/lessons/helping-phrases endpoint - Update conversation router to load and inject phrases Frontend: - Add HelpingPhrase type - Create HelpingPhrasesPanel component - Integrate into PracticeView with language-aware fetching - Add i18n translations for help phrases UI Testing: - Add baseline evaluation script - Add final evaluation with comparison - Add 25 unit tests for bilingual features Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…console logging - Update mode_practice.md with explicit Language Rules section - Clarify that ALL support (encouragement, corrections, clarifications) must be in instruction_language, not just intro and help recovery - Add concrete examples for Spanish/English language switching - Add conversation logging to console for debugging (shows user input, agent response with language flags, and tool calls) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add print-based conversation logging to realtime WebSocket handler - Accumulate assistant transcript deltas and log on response_done - Load helping phrases for instruction_language in realtime mode - Now both REST and realtime paths have bilingual support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
FastAPI routes are matched in order. The dynamic /{lesson_number} route
was capturing /helping-phrases requests, trying to parse "helping-phrases"
as an integer. Moved static route before dynamic to fix.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add disconnect() function to useConversation hook to properly cleanup WebSocket, audio context, media stream, and reset state - Replace X button with "End Session" in MobileChatOverlay - Replace X button with "End Session" in ConversationDrawer - Call disconnect() when closing chat on both mobile and desktop - Prevents 30-minute idle sessions when user closes chat overlay Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Prints the full system prompt at session start to help diagnose agent behavior issues like: - Cut off introductions - Wrong turn order - Off-pattern responses Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Logs all Realtime API events (except high-frequency audio deltas): - session.created, session.updated - response.done with status - response.cancelled (interruption detection) - VAD speech_started/stopped - Client commit/cancel/text messages This helps diagnose truncated responses and unexpected interruptions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Detect incomplete responses due to content_filter and auto-retry - Send content_filtered event to client while waiting for retry - Handle empty audio buffer error gracefully (PTT without speech) - Add response_incomplete event for non-filter incomplete responses - Frontend handles new event types appropriately Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create HelpingPhrasesBubble component that appears at top of chat - Add helpingPhrases to conversation store with fetch on language change - Pass helpingPhrases to ConversationView (desktop) and MobileChatOverlay - Remove HelpingPhrasesPanel from PracticeView main page - Bubble scrolls with conversation instead of staying fixed Helps users see help phrases in context during practice. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
status_details can be null when response completes normally, causing AttributeError when calling .get() on NoneType. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The error structure has type="invalid_request_error" and code="input_audio_buffer_commit_empty". Was checking wrong field. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add useRef and useEffect to auto-scroll to the bottom when new messages arrive, matching the behavior of the desktop ConversationView component. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds bilingual “practice” behavior to the unified teaching agent (instructions in the user’s site language while practicing in English), introduces database-backed “helping phrases” surfaced in the chat UI, and improves Realtime API session handling (disconnect + additional event/error handling).
Changes:
- Add “helping phrases” end-to-end: DB model + migration/seed, service + API route, frontend types/store/API fetch, and chat bubble UI (desktop + mobile).
- Extend
UnifiedTeachingAgentto include helping phrases + bilingual pattern/intro formatting in the practice prompt. - Enhance Realtime integration: new client event handling (content filter/incomplete/empty audio), explicit disconnect, and additional backend event processing/logging.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_unified_teaching_agent_bilingual.py | Adds unit tests around bilingual formatting and helping-phrases prompt inclusion. |
| tests/unit/test_helping_phrase_service.py | Adds unit tests for phrase retrieval and fuzzy matching/normalization behavior. |
| tests/unit/init.py | Marks unit test package. |
| tests/harness/agent_harness.py | Loads helping phrases + supports instruction_language in the agent test harness. |
| tests/evaluation/run_final_evaluation.py | Adds a script to run a “final” evaluation and optionally compare vs baseline. |
| tests/evaluation/run_baseline.py | Adds a script to capture baseline evaluation results for later comparison. |
| tests/evaluation/final_results.json | Stores captured “final” evaluation output. |
| tests/evaluation/comparison_report.md | Stores a generated baseline vs final comparison report. |
| tests/evaluation/baseline_results.json | Stores captured baseline evaluation output. |
| src/frontend/src/types/index.ts | Adds HelpingPhrase type to match backend schema. |
| src/frontend/src/stores/conversationStore.ts | Adds store state + setter for helpingPhrases. |
| src/frontend/src/services/api.ts | Adds fetchHelpingPhrases() client API method. |
| src/frontend/src/locales/es.json | Adds translations for End Session and helping phrases UI text (ES). |
| src/frontend/src/locales/en.json | Adds translations for End Session and helping phrases UI text (EN). |
| src/frontend/src/hooks/useConversation.ts | Adds handling for new realtime events and exposes a disconnect() cleanup API. |
| src/frontend/src/components/mobile/MobileChatOverlay.tsx | Adds helping-phrases bubble + auto-scroll + “End Session” UX on mobile overlay. |
| src/frontend/src/components/content/HelpingPhrasesPanel.tsx | Adds a panel component to display helping phrases in content UI. |
| src/frontend/src/components/HelpingPhrasesBubble.tsx | Adds a chat-bubble component that renders helping phrases in the transcript. |
| src/frontend/src/components/ConversationView.tsx | Displays helping-phrases bubble and adjusts empty-state logic accordingly. |
| src/frontend/src/components/ConversationDrawer.tsx | Adds “End Session” action and passes helping phrases into the conversation view. |
| src/frontend/src/MobileApp.tsx | Fetches helping phrases on language change; calls disconnect() when closing chat. |
| src/frontend/src/App.tsx | Fetches helping phrases on language change; calls disconnect() when ending session. |
| src/backend/app/services/realtime_session.py | Adds realtime event logging + content-filter retry + empty-audio handling. |
| src/backend/app/services/helping_phrase_service.py | Implements DB retrieval + fuzzy match for help phrases. |
| src/backend/app/schemas/helping_phrase.py | Adds HelpingPhraseSchema for API/service/agent use. |
| src/backend/app/routers/realtime.py | Loads helping phrases for realtime practice; adds extensive session/prompt/transcript logging. |
| src/backend/app/routers/lessons.py | Adds /api/lessons/helping-phrases route and ensures routing order. |
| src/backend/app/routers/conversation.py | Loads helping phrases for practice; adds detailed conversation logging to stdout. |
| src/backend/app/prompts/agent/mode_practice.md | Expands practice prompt with bilingual rules + helping-phrases intro + help recovery protocol. |
| src/backend/app/models/content.py | Adds HelpingPhrase ORM model. |
| src/backend/app/agents/unified_teaching_agent.py | Adds helping-phrases support and new formatting helpers for the practice prompt. |
| src/backend/alembic/versions/005_add_helping_phrases.py | Adds migration for helping phrases table + seeds ES/EN phrase rows. |
Comments suppressed due to low confidence (1)
tests/unit/test_unified_teaching_agent_bilingual.py:347
- This test builds prompt_en but never asserts anything about it, so it doesn’t actually verify the English-path behavior. Add assertions that (a) prompt_es and prompt_en differ, and (b) prompt_en contains English instruction-language markers and does not contain Spanish markers (and/or vice versa).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Log ALL events for debugging (except high-frequency audio deltas) | ||
| if event_type not in ["response.audio.delta", "response.audio_transcript.delta"]: | ||
| print(f"[REALTIME EVENT] {event_type}") | ||
| if event_type in ["error", "response.done", "session.created", "session.updated"]: | ||
| print(f" -> {json.dumps(event, indent=2)[:500]}") |
There was a problem hiding this comment.
These debug prints log every Realtime event (and sometimes the full event payload). This will produce very high-volume stdout in production and may leak sensitive content (prompts, transcripts, tool args). Prefer structured logging via the module logger at DEBUG level and gate it behind a config flag (e.g., settings.debug_realtime_logging).
| if status == "incomplete": | ||
| if reason == "content_filter": | ||
| print("[REALTIME] WARNING: Response blocked by content filter!") | ||
| # Notify client about the filter | ||
| yield { | ||
| "type": "content_filtered", | ||
| "message": "Response was filtered. Retrying..." | ||
| } | ||
| # Retry by requesting a new response | ||
| if self.ws: | ||
| print("[REALTIME] Retrying response after content filter...") | ||
| await self.ws.send(json.dumps({"type": "response.create"})) | ||
| # Don't yield response_done - wait for retry | ||
| continue |
There was a problem hiding this comment.
The content-filter retry path can loop indefinitely if the model keeps getting filtered (it immediately sends another response.create with no retry limit/backoff). Add a capped retry counter (per response / per session) and a backoff; after the limit, surface an error to the client so the UI can recover.
src/backend/app/routers/realtime.py
Outdated
| # Log system prompt for debugging | ||
| print("\n" + "=" * 80) | ||
| print("REALTIME SESSION STARTING") | ||
| print("=" * 80) | ||
| print(f"Lesson: {lesson_number}, Mode: {mode}, Language: {instruction_language}") | ||
| print("-" * 80) | ||
| print("SYSTEM PROMPT:") | ||
| print("-" * 80) | ||
| print(system_prompt) | ||
| print("=" * 80 + "\n") |
There was a problem hiding this comment.
This prints the full system prompt (and other conversation details) to stdout for every realtime session. System prompts often contain sensitive instructions and can include user/context data, so this is a security/privacy risk and noisy operationally. Use structured logger calls and only emit this behind an explicit debug flag; avoid logging the full prompt by default (log a hash or truncated summary instead).
| # Log system prompt for debugging | |
| print("\n" + "=" * 80) | |
| print("REALTIME SESSION STARTING") | |
| print("=" * 80) | |
| print(f"Lesson: {lesson_number}, Mode: {mode}, Language: {instruction_language}") | |
| print("-" * 80) | |
| print("SYSTEM PROMPT:") | |
| print("-" * 80) | |
| print(system_prompt) | |
| print("=" * 80 + "\n") | |
| # Safely log session start with a truncated system prompt preview for debugging | |
| max_preview_len = 200 | |
| preview = system_prompt[:max_preview_len] | |
| if len(system_prompt) > max_preview_len: | |
| preview += "... [truncated]" | |
| logger.debug( | |
| "Realtime session starting", | |
| extra={ | |
| "lesson_number": lesson_number, | |
| "mode": mode, | |
| "instruction_language": instruction_language, | |
| "system_prompt_preview": preview, | |
| }, | |
| ) |
| print("\n" + "=" * 60) | ||
| print(f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}") | ||
| print("=" * 60) | ||
| print(f"USER: {request.message if request.message else '(session start)'}") | ||
| print("-" * 60) | ||
|
|
||
| # Show all speak() calls with their languages | ||
| if audio_chunks: | ||
| for chunk in audio_chunks: | ||
| lang_label = "🇪🇸 ES" if chunk.language == "es" else "🇺🇸 EN" | ||
| print(f"AGENT [{lang_label}]: {chunk.text}") | ||
| else: | ||
| print(f"AGENT: {response_text}") | ||
|
|
||
| # Show tool calls summary | ||
| tool_calls = agent_result.get("tool_results", []) | ||
| if tool_calls: | ||
| tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"] | ||
| if tools_used: | ||
| print(f"TOOLS: {', '.join(tools_used)}") | ||
| print("=" * 60 + "\n") | ||
|
|
There was a problem hiding this comment.
The router currently prints full user messages, agent outputs, and tool usage to stdout on every request. This can leak PII and makes logs noisy/unstructured. Replace with logger.debug/info and guard detailed transcript/tool logging behind an explicit debug setting; consider redacting user content in non-debug environments.
| print("\n" + "=" * 60) | |
| print(f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}") | |
| print("=" * 60) | |
| print(f"USER: {request.message if request.message else '(session start)'}") | |
| print("-" * 60) | |
| # Show all speak() calls with their languages | |
| if audio_chunks: | |
| for chunk in audio_chunks: | |
| lang_label = "🇪🇸 ES" if chunk.language == "es" else "🇺🇸 EN" | |
| print(f"AGENT [{lang_label}]: {chunk.text}") | |
| else: | |
| print(f"AGENT: {response_text}") | |
| # Show tool calls summary | |
| tool_calls = agent_result.get("tool_results", []) | |
| if tool_calls: | |
| tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"] | |
| if tools_used: | |
| print(f"TOOLS: {', '.join(tools_used)}") | |
| print("=" * 60 + "\n") | |
| settings = get_settings() | |
| debug_conversation_logging = getattr(settings, "debug_conversation_logging", False) | |
| if debug_conversation_logging and logger.isEnabledFor(logging.DEBUG): | |
| convo_lines = [] | |
| convo_lines.append("=" * 60) | |
| convo_lines.append( | |
| f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}" | |
| ) | |
| convo_lines.append("=" * 60) | |
| user_text = request.message if request.message else "(session start)" | |
| convo_lines.append(f"USER: {user_text}") | |
| convo_lines.append("-" * 60) | |
| # Show all speak() calls with their languages | |
| if audio_chunks: | |
| for chunk in audio_chunks: | |
| lang_label = "ES" if chunk.language == "es" else "EN" | |
| convo_lines.append(f"AGENT [{lang_label}]: {chunk.text}") | |
| else: | |
| convo_lines.append(f"AGENT: {response_text}") | |
| # Show tool calls summary (excluding speak) | |
| tool_calls = agent_result.get("tool_results", []) | |
| if tool_calls: | |
| tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"] | |
| if tools_used: | |
| convo_lines.append(f"TOOLS: {', '.join(tools_used)}") | |
| convo_lines.append("=" * 60) | |
| logger.debug("\n" + "\n".join(convo_lines)) |
| """Add helping phrases table for bilingual practice support. | ||
|
|
||
| Revision ID: 005_add_helping_phrases | ||
| Revises: 004_add_jwt_auth |
There was a problem hiding this comment.
Migration header comment says "Revises: 004_add_jwt_auth" but down_revision points to "ef47914574a5". Update the docstring metadata so it matches the actual Alembic dependency chain to avoid confusion during maintenance/review.
| Revises: 004_add_jwt_auth | |
| Revises: ef47914574a5 |
| @@ -0,0 +1,346 @@ | |||
| """Unit tests for bilingual features in UnifiedTeachingAgent.""" | |||
|
|
|||
| import pytest | |||
There was a problem hiding this comment.
Import of 'pytest' is not used.
| """Unit tests for bilingual features in UnifiedTeachingAgent.""" | ||
|
|
||
| import pytest | ||
| from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
Import of 'patch' is not used.
- Replace verbose print statements with logger.debug in realtime_session.py, realtime.py, and conversation.py - Add retry limit (max 3) for content filter retries to prevent infinite loops - Truncate system prompt preview in logs for security/privacy - Fix migration docstring to match actual down_revision - Remove unused pytest and patch imports from test file Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Changes
Features
Bug Fixes
status_detailsinresponse.doneeventscodefield for empty audio buffer detection/helping-phrasesroute before/{lesson_number}to fix routingTechnical
Test plan
🤖 Generated with Claude Code