fix: fix Gemini runner — Vertex auth, system prompt, custom commands, feedback MCP#803
fix: fix Gemini runner — Vertex auth, system prompt, custom commands, feedback MCP#803Gkrumbach07 merged 10 commits intomainfrom
Conversation
…nd feedback MCP server
## Fixes
- Map ANTHROPIC_VERTEX_PROJECT_ID/CLOUD_ML_REGION to GOOGLE_CLOUD_PROJECT/GOOGLE_CLOUD_LOCATION
in subprocess env so Gemini CLI can connect to Vertex AI
- Validate GOOGLE_APPLICATION_CREDENTIALS file existence at startup (fail fast with clear error)
- Wire add_dirs from resolve_workspace_paths into include_directories (was silently dropped)
## New Gemini features
- System prompt: write .gemini/system.md using ${SubAgents}/${AgentSkills}/${AvailableTools}
variable substitutions to preserve Gemini's default instructions, then append platform context
(workspace paths, repos, git push instructions, workflow, rubric/correction hints)
- Custom commands: /ambient:evaluate-rubric and /ambient:log-correction written to
.gemini/commands/ambient/ at session setup
- Feedback MCP server: minimal stdlib stdio MCP server exposing evaluate_rubric and
log_correction tools backed by existing Langfuse logging; injected into .gemini/settings.json
with Langfuse credentials in env block (bypasses Gemini CLI blocklist)
## Platform refactoring
- Extract validate_vertex_credentials_file() to platform/auth.py (shared by both bridges)
- Extract set_context(), _ensure_ready(), _setup_platform(), _refresh_credentials_if_stale()
to PlatformBridge base class — bridges now only contain framework-specific code
- Extract _async_safe_manager_shutdown() to bridge.py as shared helper for mark_dirty()
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Replaces mutable :latest tags with immutable SHA digests so operator pods always pull a known-good image regardless of pull policy. :latest with IfNotPresent is a known source of stale-image bugs (as seen with the Gemini hydrate.sh fix earlier in this branch). Digests pinned to current main build: - vteam_claude_runner@sha256:5363f4bb... - vteam_state_sync@sha256:4541a831... CI should update these digests in the ConfigMap after each successful image build on main. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
This reverts commit 8a60755.
After building and pushing runner/state-sync images, patch the ambient-agent-registry ConfigMap with the same SHA tag used for the operator env vars. This ensures new sessions always use the image that was just built rather than whatever :latest happened to be cached on the node. - On push to main: uses github.sha tag (or 'stage' if image unchanged) - On workflow_dispatch: uses 'stage' tag Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
:stage is a mutable tag — nodes with imagePullPolicy: IfNotPresent won't re-pull it even when a new image is pushed. Use the immutable git SHA tag instead so the operator always spawns pods with the exact image built in that CI run. Also replace the :stage fallback (for unchanged images) with :latest, which is semantically correct: if an image wasn't rebuilt, the node already has the right content and doesn't need to pull. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Mirrors the same fix from components-build-deploy.yml — patch the ambient-agent-registry ConfigMap with the release version tag so new sessions use the exact release image rather than whatever :latest is cached on the node. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
…th :latest Previous approach fetched the ConfigMap from the repo file and applied the full thing, which reset unchanged images back to :latest. New approach: 1. Fetch the live JSON from the cluster (which has the last good SHA) 2. Only replace images that were actually rebuilt in this run using a greedy sed pattern ([^"]*) that matches any current tag/digest 3. oc patch just the data key — unchanged images keep their current tag Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
- Use jq -Rs . instead of python3 for JSON escaping (simpler, already on ubuntu-latest, no extra process) - Fix sed pattern [@:][^"]* to handle both :tag and @sha256:digest refs (previous [^"]* only matched after : so digest refs were skipped) - Fix dispatch job operator env vars to use github.sha not :stage (ConfigMap used sha but operator still pointed to mutable :stage tag) - Add concurrency: group to prod-release-deploy.yaml to prevent parallel releases racing on the ConfigMap read-modify-write Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Base class methods (_ensure_ready, _refresh_credentials_if_stale, set_context) were accessing self._context, self._ready, and self._last_creds_refresh via # type: ignore[attr-defined] because those attributes were only declared in subclasses. Any future bridge that forgot to initialise them would crash with AttributeError. - Add PlatformBridge.__init__ declaring the three shared attributes with correct types - Both bridges call super().__init__() and drop their own redundant declarations of the same three attrs - Remove all seven # type: ignore[attr-defined] annotations Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
… exception The Gemini feedback_server was importing private functions (_log_to_langfuse, _log_correction_to_langfuse, _get_session_context) directly from bridges/claude/, creating an illegal cross-bridge dependency on internals. - Create platform/feedback.py with public API: log_rubric_score(), log_correction(), get_session_context() — backed by the existing Claude bridge implementations but accessible to any bridge - feedback_server.py now imports only from platform.feedback (no bridges.claude.* imports remain) - Session context caching moved to platform/feedback.py so it's shared regardless of which caller uses get_session_context() Also fix silent exception in system_prompt.py: bare `except Exception: pass` replaced with `logger.warning(...)` so filesystem errors are visible. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Claude Code ReviewSummaryPR #803 fixes Gemini CLI Vertex auth, adds system prompt support, custom slash commands, and a feedback MCP server. It also includes a solid refactor moving shared bridge code (credential refresh, lazy setup, async shutdown helper) into the `PlatformBridge` base class. The changes are well-structured and the logic is correct. Two issues stand out: an inverted dependency in the new platform layer and Langfuse credentials written to a file accessible to the agent. Issues by SeverityBlocker IssuesNone. Critical IssuesNone. Major Issues1. Inverted dependency: `platform/feedback.py` imports private symbols from a bridge-specific module
2. `LANGFUSE_SECRET_KEY` written in plaintext to workspace-accessible `.gemini/settings.json`
Minor Issues3. Return type annotation mismatch on `setup_gemini_mcp`
4. Deferred import inconsistency in `_build_system_prompt`
5. `deploy-with-dispatch` always pins both images to `github.sha`
Positive Highlights
Recommendations
🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
Summary
ANTHROPIC_VERTEX_PROJECT_ID→GOOGLE_CLOUD_PROJECT,CLOUD_ML_REGION→GOOGLE_CLOUD_LOCATION) in Gemini CLI subprocess env; validate credentials file exists at startup with a clear error.gemini/system.mdusing${SubAgents}/${AgentSkills}/${AvailableTools}variable substitutions to preserve Gemini's default instructions, then append platform context (workspace paths, repos, git, workflow, rubric/correction command hints)/ambient:evaluate-rubricand/ambient:log-correctionwritten to.gemini/commands/ambient/at session setupfeedback_server.py) exposingevaluate_rubricandlog_correctiontools backed by existing Langfuse logging; registered in.gemini/settings.jsonwith Langfuse credentials injected viaenvblock (bypasses the Gemini CLI subprocess env blocklist)add_dirsbug fixed:resolve_workspace_pathsreturn value was silently dropped; now seeded intoinclude_directoriesPlatform refactoring
Moved shared code out of individual bridges into the base layer:
platform/auth.py:validate_vertex_credentials_file()— shared Vertex credential validation used by both Claude and Gemini auth modulesbridge.py(base class):set_context(),_ensure_ready(),_setup_platform(),_refresh_credentials_if_stale()— identical in both bridges, now inherited; bridges only contain framework-specific codebridge.py:_async_safe_manager_shutdown()— the async-safe fire-and-forget manager shutdown pattern shared by bothmark_dirty()implementationsTest plan
/ambient:evaluate-rubricand/ambient:log-correctioncommands appear in Gemini CLI/ambient:log-correction— should log to Langfuse without "credentials missing" error🤖 Generated with Claude Code