fix: fix Gemini runner — Vertex auth, system prompt, custom commands, feedback MCP by Gkrumbach07 · Pull Request #803 · ambient-code/platform

Gkrumbach07 · 2026-03-04T20:24:19Z

Summary

Vertex AI auth fixed: map platform env vars (ANTHROPIC_VERTEX_PROJECT_ID → GOOGLE_CLOUD_PROJECT, CLOUD_ML_REGION → GOOGLE_CLOUD_LOCATION) in Gemini CLI subprocess env; validate credentials file exists at startup with a clear error
System prompt: write .gemini/system.md using ${SubAgents}/${AgentSkills}/${AvailableTools} variable substitutions to preserve Gemini's default instructions, then append platform context (workspace paths, repos, git, workflow, rubric/correction command hints)
Custom commands: /ambient:evaluate-rubric and /ambient:log-correction written to .gemini/commands/ambient/ at session setup
Feedback MCP server: minimal stdlib stdio MCP server (feedback_server.py) exposing evaluate_rubric and log_correction tools backed by existing Langfuse logging; registered in .gemini/settings.json with Langfuse credentials injected via env block (bypasses the Gemini CLI subprocess env blocklist)
add_dirs bug fixed: resolve_workspace_paths return value was silently dropped; now seeded into include_directories

Platform refactoring

Moved shared code out of individual bridges into the base layer:

platform/auth.py: validate_vertex_credentials_file() — shared Vertex credential validation used by both Claude and Gemini auth modules
bridge.py (base class): set_context(), _ensure_ready(), _setup_platform(), _refresh_credentials_if_stale() — identical in both bridges, now inherited; bridges only contain framework-specific code
bridge.py: _async_safe_manager_shutdown() — the async-safe fire-and-forget manager shutdown pattern shared by both mark_dirty() implementations

Test plan

Create a Gemini session with Vertex AI enabled — should connect and respond
Verify /ambient:evaluate-rubric and /ambient:log-correction commands appear in Gemini CLI
Run /ambient:log-correction — should log to Langfuse without "credentials missing" error
Verify Claude sessions still work (no regression from base class refactor)

🤖 Generated with Claude Code

…nd feedback MCP server ## Fixes - Map ANTHROPIC_VERTEX_PROJECT_ID/CLOUD_ML_REGION to GOOGLE_CLOUD_PROJECT/GOOGLE_CLOUD_LOCATION in subprocess env so Gemini CLI can connect to Vertex AI - Validate GOOGLE_APPLICATION_CREDENTIALS file existence at startup (fail fast with clear error) - Wire add_dirs from resolve_workspace_paths into include_directories (was silently dropped) ## New Gemini features - System prompt: write .gemini/system.md using ${SubAgents}/${AgentSkills}/${AvailableTools} variable substitutions to preserve Gemini's default instructions, then append platform context (workspace paths, repos, git push instructions, workflow, rubric/correction hints) - Custom commands: /ambient:evaluate-rubric and /ambient:log-correction written to .gemini/commands/ambient/ at session setup - Feedback MCP server: minimal stdlib stdio MCP server exposing evaluate_rubric and log_correction tools backed by existing Langfuse logging; injected into .gemini/settings.json with Langfuse credentials in env block (bypasses Gemini CLI blocklist) ## Platform refactoring - Extract validate_vertex_credentials_file() to platform/auth.py (shared by both bridges) - Extract set_context(), _ensure_ready(), _setup_platform(), _refresh_credentials_if_stale() to PlatformBridge base class — bridges now only contain framework-specific code - Extract _async_safe_manager_shutdown() to bridge.py as shared helper for mark_dirty() Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Replaces mutable :latest tags with immutable SHA digests so operator pods always pull a known-good image regardless of pull policy. :latest with IfNotPresent is a known source of stale-image bugs (as seen with the Gemini hydrate.sh fix earlier in this branch). Digests pinned to current main build: - vteam_claude_runner@sha256:5363f4bb... - vteam_state_sync@sha256:4541a831... CI should update these digests in the ConfigMap after each successful image build on main. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 8a60755.

After building and pushing runner/state-sync images, patch the ambient-agent-registry ConfigMap with the same SHA tag used for the operator env vars. This ensures new sessions always use the image that was just built rather than whatever :latest happened to be cached on the node. - On push to main: uses github.sha tag (or 'stage' if image unchanged) - On workflow_dispatch: uses 'stage' tag Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

:stage is a mutable tag — nodes with imagePullPolicy: IfNotPresent won't re-pull it even when a new image is pushed. Use the immutable git SHA tag instead so the operator always spawns pods with the exact image built in that CI run. Also replace the :stage fallback (for unchanged images) with :latest, which is semantically correct: if an image wasn't rebuilt, the node already has the right content and doesn't need to pull. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Mirrors the same fix from components-build-deploy.yml — patch the ambient-agent-registry ConfigMap with the release version tag so new sessions use the exact release image rather than whatever :latest is cached on the node. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…th :latest Previous approach fetched the ConfigMap from the repo file and applied the full thing, which reset unchanged images back to :latest. New approach: 1. Fetch the live JSON from the cluster (which has the last good SHA) 2. Only replace images that were actually rebuilt in this run using a greedy sed pattern ([^"]*) that matches any current tag/digest 3. oc patch just the data key — unchanged images keep their current tag Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

@sha256

- Use jq -Rs . instead of python3 for JSON escaping (simpler, already on ubuntu-latest, no extra process) - Fix sed pattern [@:][^"]* to handle both :tag and @sha256:digest refs (previous [^"]* only matched after : so digest refs were skipped) - Fix dispatch job operator env vars to use github.sha not :stage (ConfigMap used sha but operator still pointed to mutable :stage tag) - Add concurrency: group to prod-release-deploy.yaml to prevent parallel releases racing on the ConfigMap read-modify-write Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Base class methods (_ensure_ready, _refresh_credentials_if_stale, set_context) were accessing self._context, self._ready, and self._last_creds_refresh via # type: ignore[attr-defined] because those attributes were only declared in subclasses. Any future bridge that forgot to initialise them would crash with AttributeError. - Add PlatformBridge.__init__ declaring the three shared attributes with correct types - Both bridges call super().__init__() and drop their own redundant declarations of the same three attrs - Remove all seven # type: ignore[attr-defined] annotations Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

… exception The Gemini feedback_server was importing private functions (_log_to_langfuse, _log_correction_to_langfuse, _get_session_context) directly from bridges/claude/, creating an illegal cross-bridge dependency on internals. - Create platform/feedback.py with public API: log_rubric_score(), log_correction(), get_session_context() — backed by the existing Claude bridge implementations but accessible to any bridge - feedback_server.py now imports only from platform.feedback (no bridges.claude.* imports remain) - Session context caching moved to platform/feedback.py so it's shared regardless of which caller uses get_session_context() Also fix silent exception in system_prompt.py: bare `except Exception: pass` replaced with `logger.warning(...)` so filesystem errors are visible. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-04T21:15:11Z

Claude Code Review

Summary

PR #803 fixes Gemini CLI Vertex auth, adds system prompt support, custom slash commands, and a feedback MCP server. It also includes a solid refactor moving shared bridge code (credential refresh, lazy setup, async shutdown helper) into the `PlatformBridge` base class. The changes are well-structured and the logic is correct. Two issues stand out: an inverted dependency in the new platform layer and Langfuse credentials written to a file accessible to the agent.

Issues by Severity

Blocker Issues

None.

Critical Issues

None.

Major Issues

1. Inverted dependency: `platform/feedback.py` imports private symbols from a bridge-specific module

File: `components/runners/ambient-runner/ambient_runner/platform/feedback.py:747-755`
What: The platform layer (`platform/`) imports private implementation functions from `bridges/claude/`:
```python
from ambient_runner.bridges.claude.corrections import (
_get_session_context, # private
_log_correction_to_langfuse, # private
...
)
from ambient_runner.bridges.claude.tools import _log_to_langfuse # private
```
Which standard: Architecture violation — the platform layer should not depend on bridge-specific modules. `platform/` is the shared base that bridges build on top of; the dependency arrow should never point the other way.
Impact: Any non-Claude bridge that uses `platform/feedback.py` silently pulls in Claude-specific code. Adding a third bridge without Claude installed would break feedback logging entirely.
Fix: Move `_get_session_context`, `_log_correction_to_langfuse`, `_log_to_langfuse`, `CORRECTION_SOURCES`, and `CORRECTION_TYPES` into `platform/` (e.g. `platform/corrections.py`). Have the Claude bridge import from there instead of the reverse.

2. `LANGFUSE_SECRET_KEY` written in plaintext to workspace-accessible `.gemini/settings.json`

File: `components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/mcp.py:364-396`
What: `_build_feedback_server_entry()` injects `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` into the MCP server `env` block, which lands in `{cwd_path}/.gemini/settings.json`. The Gemini CLI agent has file-reading tools and can access the entire workspace including this file.
Which standard: Security standards — minimal credential exposure; the Gemini CLI's own subprocess env blocklist exists specifically to prevent the agent from seeing `LANGFUSE_*` keys. This workaround circumvents that intent by writing them to a file the agent can read.
Impact: An adversarial or prompt-injected interaction could read `.gemini/settings.json` and exfiltrate `LANGFUSE_SECRET_KEY`.
Fix (short-term): After `write_gemini_settings` returns, set `chmod 0o600` on the settings file:
```python
settings_path = write_gemini_settings(cwd_path, settings)
try:
Path(settings_path).chmod(0o600)
except OSError:
pass
```
Fix (long-term): Move the credentials file outside the agent-accessible workspace (e.g. `/run/secrets/ambient-feedback/`) and have the feedback server read from there, or inject only `LANGFUSE_ENABLED` into the settings env block and pass secrets via a separate mechanism.

Minor Issues

3. Return type annotation mismatch on `setup_gemini_mcp`

File: `components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/mcp.py` (function signature)
What: The function now always registers `ambient-feedback` and calls `write_gemini_settings`, so it always returns a non-`None` path. The annotation `-> Optional[str]` is misleading.
Fix: Change to `-> str`.

4. Deferred import inconsistency in `_build_system_prompt`

File: `components/runners/ambient-runner/ambient_runner/bridges/gemini_cli/system_prompt.py:688`
What: `from ambient_runner.observability import is_langfuse_enabled` is a lazy import at the bottom of the function while all other platform imports in the same function are grouped at the top.
Fix: Hoist it to the function-level import block alongside the other lazy imports.

5. `deploy-with-dispatch` always pins both images to `github.sha`

File: `.github/workflows/components-build-deploy.yml` (new ConfigMap step in `deploy-with-dispatch`)
What: Unlike the conditional step earlier in the workflow (which guards each sed substitution with `detect-changes` output checks), the new ConfigMap patch in `deploy-with-dispatch` always replaces both `vteam_claude_runner` and `vteam_state_sync` with the current SHA — even if only one was actually rebuilt. This mirrors the pre-existing `oc set env` pattern above it.
Fix: Either guard each sed substitution with the same `detect-changes` conditionals, or add a comment documenting why unconditional patching is acceptable for the dispatch flow.

Positive Highlights

Excellent base-class refactor: Consolidating `set_context`, `_ensure_ready`, `_setup_platform`, `_refresh_credentials_if_stale`, and shared state into `PlatformBridge.init` eliminates ~80 lines of duplicated code across two bridges and gives any future bridge a correct lifecycle for free.
`_async_safe_manager_shutdown` consolidation: Single, well-documented implementation replacing identical copy-paste in both `mark_dirty()` methods. Using `create_task` + done-callback is the correct pattern for the within-loop case.
`validate_vertex_credentials_file` consolidation: Consistent, early credential validation with the same error messages shared between Claude and Gemini auth modules — good for operators debugging misconfigured credentials.
Env-var fallback in `session.py` is safe: The `if not env.get("GOOGLE_CLOUD_PROJECT")` guard correctly respects explicitly-set values and only fills in the alias when the var is absent.
CI concurrency guard on prod deployment: `cancel-in-progress: false` in `prod-release-deploy.yaml` is the right setting — prod deployments must never be cancelled mid-flight.
MCP server correctly routes logs to `stderr`: `logging.basicConfig(stream=sys.stderr)` in `feedback_server.py` is easy to get wrong and correctly done here since `stdout` is the JSON-RPC channel.

Recommendations

(Major — before merge) Refactor `platform/feedback.py` to eliminate the reverse-dependency on `bridges/claude/`. Move shared correction/logging internals to `platform/corrections.py` and have the Claude bridge import from there.
(Major — before merge or tracked as follow-up) Address the Langfuse credentials in `.gemini/settings.json`. At minimum, `chmod 0o600` the file immediately after writing it.
(Minor) Fix the `Optional[str]` return type on `setup_gemini_mcp`.
(Minor) Hoist the `is_langfuse_enabled` import to the function-level import block in `_build_system_prompt`.
(Minor) Document or conditionally guard the unconditional image pinning in the `deploy-with-dispatch` ConfigMap step.

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

CLAUDE.md - Master project instructions, development standards
backend-development.md - Go backend, K8s integration patterns
frontend-development.md - NextJS, Shadcn UI, React Query patterns
security-standards.md - Auth, RBAC, token handling
k8s-client-usage.md - User token vs service account patterns
error-handling.md - Consistent error patterns
react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Gkrumbach07 and others added 2 commits March 4, 2026 14:23

This comment has been minimized.

Sign in to view

Gkrumbach07 and others added 2 commits March 4, 2026 14:35

Revert "fix: pin agent registry images to digest instead of :latest"

64f86e3

This reverts commit 8a60755.

This comment has been minimized.

Sign in to view

Gkrumbach07 merged commit 047314a into main Mar 4, 2026
13 checks passed

Gkrumbach07 deleted the bug/fix-gemini-runner branch March 4, 2026 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix Gemini runner — Vertex auth, system prompt, custom commands, feedback MCP#803

fix: fix Gemini runner — Vertex auth, system prompt, custom commands, feedback MCP#803
Gkrumbach07 merged 10 commits intomainfrom
bug/fix-gemini-runner

Gkrumbach07 commented Mar 4, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Mar 4, 2026 •

edited

Loading

What Amber Loaded for Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gkrumbach07 commented Mar 4, 2026

Summary

Platform refactoring

Test plan

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Summary

Issues by Severity

Blocker Issues

Critical Issues

Major Issues

Minor Issues

Positive Highlights

Recommendations

What Amber Loaded for Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 4, 2026 •

edited

Loading