feat: bilingual practice agent with helping phrases by darthmolen · Pull Request #4 · darthmolen/EnglishConnect

darthmolen · 2026-01-29T02:19:16Z

Summary

Implement bilingual practice agent that gives instructions in the user's chosen site language while practicing in English
Add helping phrases feature with database-backed phrases displayed as chat bubbles
Fix various bugs in the Realtime API integration

Changes

Features

Bilingual practice agent with instruction language support (Spanish/English)
Helping phrases displayed as scrollable chat bubbles in both desktop and mobile views
End Session button to disconnect realtime WebSocket sessions
Console logging for debugging agent conversations

Bug Fixes

Handle content filter errors from Azure OpenAI Realtime API with retry logic
Handle null status_details in response.done events
Check correct code field for empty audio buffer detection
Add auto-scroll to mobile chat overlay for new messages
Move /helping-phrases route before /{lesson_number} to fix routing

Technical

Add detailed realtime event logging for debugging
Add system prompt logging to realtime router

Test plan

Test on mobile: verify chat auto-scrolls when new messages arrive
Test on desktop: verify helping phrases appear as chat bubble
Test content filter handling: agent should retry on filter errors
Test End Session button: should disconnect and close overlay

🤖 Generated with Claude Code

Add support for the practice agent to use the site language for instructions and explanations while practicing in English. Backend: - Create helping_phrases database table with migration - Add HelpingPhraseService with fuzzy matching for STT errors - Update UnifiedTeachingAgent with bilingual formatting methods - Update mode_practice.md prompt with two-phase flow and help recovery - Add /api/lessons/helping-phrases endpoint - Update conversation router to load and inject phrases Frontend: - Add HelpingPhrase type - Create HelpingPhrasesPanel component - Integrate into PracticeView with language-aware fetching - Add i18n translations for help phrases UI Testing: - Add baseline evaluation script - Add final evaluation with comparison - Add 25 unit tests for bilingual features Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…console logging - Update mode_practice.md with explicit Language Rules section - Clarify that ALL support (encouragement, corrections, clarifications) must be in instruction_language, not just intro and help recovery - Add concrete examples for Spanish/English language switching - Add conversation logging to console for debugging (shows user input, agent response with language flags, and tool calls) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add print-based conversation logging to realtime WebSocket handler - Accumulate assistant transcript deltas and log on response_done - Load helping phrases for instruction_language in realtime mode - Now both REST and realtime paths have bilingual support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

FastAPI routes are matched in order. The dynamic /{lesson_number} route was capturing /helping-phrases requests, trying to parse "helping-phrases" as an integer. Moved static route before dynamic to fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add disconnect() function to useConversation hook to properly cleanup WebSocket, audio context, media stream, and reset state - Replace X button with "End Session" in MobileChatOverlay - Replace X button with "End Session" in ConversationDrawer - Call disconnect() when closing chat on both mobile and desktop - Prevents 30-minute idle sessions when user closes chat overlay Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Prints the full system prompt at session start to help diagnose agent behavior issues like: - Cut off introductions - Wrong turn order - Off-pattern responses Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Logs all Realtime API events (except high-frequency audio deltas): - session.created, session.updated - response.done with status - response.cancelled (interruption detection) - VAD speech_started/stopped - Client commit/cancel/text messages This helps diagnose truncated responses and unexpected interruptions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Detect incomplete responses due to content_filter and auto-retry - Send content_filtered event to client while waiting for retry - Handle empty audio buffer error gracefully (PTT without speech) - Add response_incomplete event for non-filter incomplete responses - Frontend handles new event types appropriately Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create HelpingPhrasesBubble component that appears at top of chat - Add helpingPhrases to conversation store with fetch on language change - Pass helpingPhrases to ConversationView (desktop) and MobileChatOverlay - Remove HelpingPhrasesPanel from PracticeView main page - Bubble scrolls with conversation instead of staying fixed Helps users see help phrases in context during practice. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

status_details can be null when response completes normally, causing AttributeError when calling .get() on NoneType. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The error structure has type="invalid_request_error" and code="input_audio_buffer_commit_empty". Was checking wrong field. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add useRef and useEffect to auto-scroll to the bottom when new messages arrive, matching the behavior of the desktop ConversationView component. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

Adds bilingual “practice” behavior to the unified teaching agent (instructions in the user’s site language while practicing in English), introduces database-backed “helping phrases” surfaced in the chat UI, and improves Realtime API session handling (disconnect + additional event/error handling).

Changes:

Add “helping phrases” end-to-end: DB model + migration/seed, service + API route, frontend types/store/API fetch, and chat bubble UI (desktop + mobile).
Extend UnifiedTeachingAgent to include helping phrases + bilingual pattern/intro formatting in the practice prompt.
Enhance Realtime integration: new client event handling (content filter/incomplete/empty audio), explicit disconnect, and additional backend event processing/logging.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/unit/test_unified_teaching_agent_bilingual.py	Adds unit tests around bilingual formatting and helping-phrases prompt inclusion.
tests/unit/test_helping_phrase_service.py	Adds unit tests for phrase retrieval and fuzzy matching/normalization behavior.
tests/unit/init.py	Marks unit test package.
tests/harness/agent_harness.py	Loads helping phrases + supports `instruction_language` in the agent test harness.
tests/evaluation/run_final_evaluation.py	Adds a script to run a “final” evaluation and optionally compare vs baseline.
tests/evaluation/run_baseline.py	Adds a script to capture baseline evaluation results for later comparison.
tests/evaluation/final_results.json	Stores captured “final” evaluation output.
tests/evaluation/comparison_report.md	Stores a generated baseline vs final comparison report.
tests/evaluation/baseline_results.json	Stores captured baseline evaluation output.
src/frontend/src/types/index.ts	Adds `HelpingPhrase` type to match backend schema.
src/frontend/src/stores/conversationStore.ts	Adds store state + setter for `helpingPhrases`.
src/frontend/src/services/api.ts	Adds `fetchHelpingPhrases()` client API method.
src/frontend/src/locales/es.json	Adds translations for End Session and helping phrases UI text (ES).
src/frontend/src/locales/en.json	Adds translations for End Session and helping phrases UI text (EN).
src/frontend/src/hooks/useConversation.ts	Adds handling for new realtime events and exposes a `disconnect()` cleanup API.
src/frontend/src/components/mobile/MobileChatOverlay.tsx	Adds helping-phrases bubble + auto-scroll + “End Session” UX on mobile overlay.
src/frontend/src/components/content/HelpingPhrasesPanel.tsx	Adds a panel component to display helping phrases in content UI.
src/frontend/src/components/HelpingPhrasesBubble.tsx	Adds a chat-bubble component that renders helping phrases in the transcript.
src/frontend/src/components/ConversationView.tsx	Displays helping-phrases bubble and adjusts empty-state logic accordingly.
src/frontend/src/components/ConversationDrawer.tsx	Adds “End Session” action and passes helping phrases into the conversation view.
src/frontend/src/MobileApp.tsx	Fetches helping phrases on language change; calls `disconnect()` when closing chat.
src/frontend/src/App.tsx	Fetches helping phrases on language change; calls `disconnect()` when ending session.
src/backend/app/services/realtime_session.py	Adds realtime event logging + content-filter retry + empty-audio handling.
src/backend/app/services/helping_phrase_service.py	Implements DB retrieval + fuzzy match for help phrases.
src/backend/app/schemas/helping_phrase.py	Adds `HelpingPhraseSchema` for API/service/agent use.
src/backend/app/routers/realtime.py	Loads helping phrases for realtime practice; adds extensive session/prompt/transcript logging.
src/backend/app/routers/lessons.py	Adds `/api/lessons/helping-phrases` route and ensures routing order.
src/backend/app/routers/conversation.py	Loads helping phrases for practice; adds detailed conversation logging to stdout.
src/backend/app/prompts/agent/mode_practice.md	Expands practice prompt with bilingual rules + helping-phrases intro + help recovery protocol.
src/backend/app/models/content.py	Adds `HelpingPhrase` ORM model.
src/backend/app/agents/unified_teaching_agent.py	Adds helping-phrases support and new formatting helpers for the practice prompt.
src/backend/alembic/versions/005_add_helping_phrases.py	Adds migration for helping phrases table + seeds ES/EN phrase rows.

Comments suppressed due to low confidence (1)

tests/unit/test_unified_teaching_agent_bilingual.py:347

This test builds prompt_en but never asserts anything about it, so it doesn’t actually verify the English-path behavior. Add assertions that (a) prompt_es and prompt_en differ, and (b) prompt_en contains English instruction-language markers and does not contain Spanish markers (and/or vice versa).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-29T02:27:56Z

src/backend/app/services/realtime_session.py

+                # Log ALL events for debugging (except high-frequency audio deltas)
+                if event_type not in ["response.audio.delta", "response.audio_transcript.delta"]:
+                    print(f"[REALTIME EVENT] {event_type}")
+                    if event_type in ["error", "response.done", "session.created", "session.updated"]:
+                        print(f"  -> {json.dumps(event, indent=2)[:500]}")


These debug prints log every Realtime event (and sometimes the full event payload). This will produce very high-volume stdout in production and may leak sensitive content (prompts, transcripts, tool args). Prefer structured logging via the module logger at DEBUG level and gate it behind a config flag (e.g., settings.debug_realtime_logging).

Copilot · 2026-01-29T02:27:56Z

src/backend/app/services/realtime_session.py

+                    if status == "incomplete":
+                        if reason == "content_filter":
+                            print("[REALTIME] WARNING: Response blocked by content filter!")
+                            # Notify client about the filter
+                            yield {
+                                "type": "content_filtered",
+                                "message": "Response was filtered. Retrying..."
+                            }
+                            # Retry by requesting a new response
+                            if self.ws:
+                                print("[REALTIME] Retrying response after content filter...")
+                                await self.ws.send(json.dumps({"type": "response.create"}))
+                            # Don't yield response_done - wait for retry
+                            continue


The content-filter retry path can loop indefinitely if the model keeps getting filtered (it immediately sends another response.create with no retry limit/backoff). Add a capped retry counter (per response / per session) and a backoff; after the limit, surface an error to the client so the UI can recover.

Copilot · 2026-01-29T02:27:56Z

src/backend/app/routers/realtime.py

+    # Log system prompt for debugging
+    print("\n" + "=" * 80)
+    print("REALTIME SESSION STARTING")
+    print("=" * 80)
+    print(f"Lesson: {lesson_number}, Mode: {mode}, Language: {instruction_language}")
+    print("-" * 80)
+    print("SYSTEM PROMPT:")
+    print("-" * 80)
+    print(system_prompt)
+    print("=" * 80 + "\n")


This prints the full system prompt (and other conversation details) to stdout for every realtime session. System prompts often contain sensitive instructions and can include user/context data, so this is a security/privacy risk and noisy operationally. Use structured logger calls and only emit this behind an explicit debug flag; avoid logging the full prompt by default (log a hash or truncated summary instead).

Suggested change

# Log system prompt for debugging

print("\n" + "=" * 80)

print("REALTIME SESSION STARTING")

print("=" * 80)

print(f"Lesson: {lesson_number}, Mode: {mode}, Language: {instruction_language}")

print("-" * 80)

print("SYSTEM PROMPT:")

print("-" * 80)

print(system_prompt)

print("=" * 80 + "\n")

# Safely log session start with a truncated system prompt preview for debugging

max_preview_len = 200

preview = system_prompt[:max_preview_len]

if len(system_prompt) > max_preview_len:

preview += "... [truncated]"

logger.debug(

"Realtime session starting",

extra={

"lesson_number": lesson_number,

"mode": mode,

"instruction_language": instruction_language,

"system_prompt_preview": preview,

},

)

Copilot · 2026-01-29T02:27:57Z

src/backend/app/routers/conversation.py

+    print("\n" + "=" * 60)
+    print(f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}")
+    print("=" * 60)
+    print(f"USER: {request.message if request.message else '(session start)'}")
+    print("-" * 60)
+
+    # Show all speak() calls with their languages
+    if audio_chunks:
+        for chunk in audio_chunks:
+            lang_label = "🇪🇸 ES" if chunk.language == "es" else "🇺🇸 EN"
+            print(f"AGENT [{lang_label}]: {chunk.text}")
+    else:
+        print(f"AGENT: {response_text}")
+
+    # Show tool calls summary
+    tool_calls = agent_result.get("tool_results", [])
+    if tool_calls:
+        tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"]
+        if tools_used:
+            print(f"TOOLS: {', '.join(tools_used)}")
+    print("=" * 60 + "\n")
+


The router currently prints full user messages, agent outputs, and tool usage to stdout on every request. This can leak PII and makes logs noisy/unstructured. Replace with logger.debug/info and guard detailed transcript/tool logging behind an explicit debug setting; consider redacting user content in non-debug environments.

Suggested change

print("\n" + "=" * 60)

print(f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}")

print("=" * 60)

print(f"USER: {request.message if request.message else '(session start)'}")

print("-" * 60)

# Show all speak() calls with their languages

if audio_chunks:

for chunk in audio_chunks:

lang_label = "🇪🇸 ES" if chunk.language == "es" else "🇺🇸 EN"

print(f"AGENT [{lang_label}]: {chunk.text}")

else:

print(f"AGENT: {response_text}")

# Show tool calls summary

tool_calls = agent_result.get("tool_results", [])

if tool_calls:

tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"]

if tools_used:

print(f"TOOLS: {', '.join(tools_used)}")

print("=" * 60 + "\n")

settings = get_settings()

debug_conversation_logging = getattr(settings, "debug_conversation_logging", False)

if debug_conversation_logging and logger.isEnabledFor(logging.DEBUG):

convo_lines = []

convo_lines.append("=" * 60)

convo_lines.append(

f"CONVERSATION [{request.mode.upper()}] Exchange #{request.exchange_count}"

)

convo_lines.append("=" * 60)

user_text = request.message if request.message else "(session start)"

convo_lines.append(f"USER: {user_text}")

convo_lines.append("-" * 60)

# Show all speak() calls with their languages

if audio_chunks:

for chunk in audio_chunks:

lang_label = "ES" if chunk.language == "es" else "EN"

convo_lines.append(f"AGENT [{lang_label}]: {chunk.text}")

else:

convo_lines.append(f"AGENT: {response_text}")

# Show tool calls summary (excluding speak)

tool_calls = agent_result.get("tool_results", [])

if tool_calls:

tools_used = [t.get("tool") for t in tool_calls if t.get("tool") != "speak"]

if tools_used:

convo_lines.append(f"TOOLS: {', '.join(tools_used)}")

convo_lines.append("=" * 60)

logger.debug("\n" + "\n".join(convo_lines))

Copilot · 2026-01-29T02:27:57Z

src/backend/alembic/versions/005_add_helping_phrases.py

+"""Add helping phrases table for bilingual practice support.
+
+Revision ID: 005_add_helping_phrases
+Revises: 004_add_jwt_auth


Migration header comment says "Revises: 004_add_jwt_auth" but down_revision points to "ef47914574a5". Update the docstring metadata so it matches the actual Alembic dependency chain to avoid confusion during maintenance/review.

Suggested change

Revises: 004_add_jwt_auth

Revises: ef47914574a5

Copilot · 2026-01-29T02:27:57Z

tests/unit/test_unified_teaching_agent_bilingual.py

@@ -0,0 +1,346 @@
+"""Unit tests for bilingual features in UnifiedTeachingAgent."""
+
+import pytest


Import of 'pytest' is not used.

Copilot · 2026-01-29T02:27:57Z

tests/unit/test_unified_teaching_agent_bilingual.py

+"""Unit tests for bilingual features in UnifiedTeachingAgent."""
+
+import pytest
+from unittest.mock import MagicMock, patch


Import of 'patch' is not used.

- Replace verbose print statements with logger.debug in realtime_session.py, realtime.py, and conversation.py - Add retry limit (max 3) for content filter retries to prevent infinite loops - Truncate system prompt preview in logs for security/privacy - Fix migration docstring to match actual down_revision - Remove unused pytest and patch imports from test file Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

EnglishConnect Dev and others added 12 commits January 28, 2026 11:19

chore: add system prompt logging to realtime router for debugging

e1e0a20

Prints the full system prompt at session start to help diagnose agent behavior issues like: - Cut off introductions - Wrong turn order - Off-pattern responses Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: handle null status_details in response.done event

61c09e6

status_details can be null when response completes normally, causing AttributeError when calling .get() on NoneType. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: check error code field for empty audio buffer detection

20d89c2

The error structure has type="invalid_request_error" and code="input_audio_buffer_commit_empty". Was checking wrong field. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: add auto-scroll to mobile chat overlay

fdc09ea

Add useRef and useEffect to auto-scroll to the bottom when new messages arrive, matching the behavior of the desktop ConversationView component. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings January 29, 2026 02:19

Copilot started reviewing on behalf of darthmolen January 29, 2026 02:19 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

darthmolen merged commit 3f24c8f into main Jan 29, 2026
3 checks passed

darthmolen deleted the feature/agent_practice_with_site_language branch January 29, 2026 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: bilingual practice agent with helping phrases#4

feat: bilingual practice agent with helping phrases#4
darthmolen merged 13 commits intomainfrom
feature/agent_practice_with_site_language

darthmolen commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    # Log system prompt for debugging
-    print("\n" + "=" * 80)
-    print("REALTIME SESSION STARTING")
-    print("=" * 80)
-    print(f"Lesson: {lesson_number}, Mode: {mode}, Language: {instruction_language}")
-    print("-" * 80)
-    print("SYSTEM PROMPT:")
-    print("-" * 80)
-    print(system_prompt)
-    print("=" * 80 + "\n")
+    # Safely log session start with a truncated system prompt preview for debugging
+    max_preview_len = 200
+    preview = system_prompt[:max_preview_len]
+    if len(system_prompt) > max_preview_len:
+        preview += "... [truncated]"
+    logger.debug(
+        "Realtime session starting",
+        extra={
+            "lesson_number": lesson_number,
+            "mode": mode,
+            "instruction_language": instruction_language,
+            "system_prompt_preview": preview,
+        },
+    )

		@@ -0,0 +1,346 @@
		"""Unit tests for bilingual features in UnifiedTeachingAgent."""

		import pytest

Conversation

darthmolen commented Jan 29, 2026

Summary

Changes

Features

Bug Fixes

Technical

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants