Skip to content

fix(backend,runner): track GitHub App token expiry for proactive refresh#889

Merged
Gkrumbach07 merged 4 commits intomainfrom
fix/github-token-refresh-52858
Mar 13, 2026
Merged

fix(backend,runner): track GitHub App token expiry for proactive refresh#889
Gkrumbach07 merged 4 commits intomainfrom
fix/github-token-refresh-52858

Conversation

@Gkrumbach07
Copy link
Contributor

@Gkrumbach07 Gkrumbach07 commented Mar 12, 2026

Summary

Fixes RHOAIENG-52858: GitHub App installation tokens expire after ~1 hour, causing silent auth failures in long-running sessions.

Changes

Backend (components/backend/)

  • Updated GetGitHubToken() in git/operations.go to return (string, time.Time, error), propagating the expiresAt from MintInstallationTokenForHost
  • Updated GetGitHubTokenForSession handler to include expiresAt in the JSON response when non-zero
  • Updated WrapGitHubTokenForRepo wrapper to accept the new 3-return signature and discard expiresAt

Runner (components/runners/ambient-runner/)

  • Added _credential_expiry tracking dict and _EXPIRY_BUFFER_SEC (5 min) constant in auth.py
  • fetch_github_credentials() now parses expiresAt from backend response and stores expiry timestamp
  • Added github_token_expiring_soon() helper for proactive refresh checks
  • _refresh_credentials_if_stale() in bridge.py now also triggers refresh when GitHub token is expiring soon

How it works

  1. Backend returns expiresAt (RFC 3339) for GitHub App tokens in the credential response
  2. Runner tracks the expiry timestamp in _credential_expiry["github"]
  3. Before each run(), the bridge checks both the 60s refresh interval AND token expiry (with 5-min buffer)
  4. If the token will expire within 5 minutes, a proactive refresh is triggered regardless of the interval timer

PAT tokens (which don't expire) are unaffected — they return no expiresAt and skip expiry tracking.

Test plan

  • Go build passes (cd components/backend && go build ./...)
  • Go lint passes (gofmt, go vet)
  • Python lint passes (ruff check, ruff format)
  • Python tests pass (14/14 bridge registry tests)
  • Integration test: verify GitHub App token refresh during a long session (>1hr)

Fixes: RHOAIENG-52858

Jira: RHOAIENG-52858

GitHub App installation tokens expire after ~1 hour but the runner
had no expiry tracking. The backend now returns expiresAt in the
credential response, and the runner proactively refreshes tokens
before they expire (5 min buffer).

Backend changes:
- GetGitHubToken returns (token, expiresAt, error) instead of (token, error)
- runtime_credentials.go includes expiresAt in GitHub credential response
- WrapGitHubTokenForRepo updated for new signature

Runner changes:
- auth.py tracks GitHub token expiry from backend response
- bridge.py checks token expiry in _refresh_credentials_if_stale()

Fixes: RHOAIENG-52858

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 12, 2026

Warning

Rate limit exceeded

@Gkrumbach07 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 34 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 632e15f2-4138-458b-b75a-c47171f8b408

📥 Commits

Reviewing files that changed from the base of the PR and between 5584cff and f421fc4.

📒 Files selected for processing (2)
  • components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py
  • components/runners/ambient-runner/ambient_runner/bridges/claude/tools.py

Walkthrough

GetGitHubToken now returns (string, time.Time, error); callers and wrappers updated to handle/discard the expiry. The runner stores and checks token expiry to trigger credential refreshes. TokenManager cache was removed. Claude bridge tools now pass a bridge reference to resolve adapters at call time.

Changes

Cohort / File(s) Summary
Backend Git Operations
components/backend/git/operations.go
GetGitHubToken signature changed to return (string, time.Time, error); all internal return paths updated to supply expiry (zero or real).
Backend Handler Wrappers & Calls
components/backend/handlers/github_auth.go, components/backend/handlers/runtime_credentials.go, components/backend/handlers/operations_test.go
Wrappers and call sites updated to accept the extra time.Time return; wrappers typically discard expiry; GetGitHubTokenForSession conditionally includes expiresAt in JSON responses; tests updated to ignore extra value.
Token manager: caching removed
components/backend/github/token.go
Removed TokenManager caching fields and cachedInstallationToken type; eliminated cache lookups/sets in MintInstallationTokenForHost.
Runner: credential expiry tracking & refresh
components/runners/ambient-runner/ambient_runner/platform/auth.py, components/runners/ambient-runner/ambient_runner/bridge.py
Added expiry storage, expiry buffer constant, and github_token_expiring_soon() helper; runner refresh logic now triggers on interval or imminent token expiry and updates last-refresh timestamp; minor callback logging guarded.
Claude bridge: bridge ref propagation
components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py, components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py, components/runners/ambient-runner/ambient_runner/bridges/claude/tools.py
build_mcp_servers gains optional bridge_ref; callers pass bridge_ref=self; create_restart_session_tool now accepts bridge_ref, resolves adapter at call time, and returns a user-facing error if adapter unavailable.
Misc formatting / minor updates
.github/workflows/e2e.yml, components/ambient-api-server/secrets/db.port, components/ambient-cli/.../cmd.go, components/runners/.../*
Various non-functional formatting, docstring, test whitespace, and import reflow changes across tests, scripts, and bridge modules; no logic changes beyond those above.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as HTTP Handler
    participant GitOps as Git Operations
    participant KubeSecret as Kubernetes Secret
    participant Runner as Ambient Runner
    participant Auth as Auth Module

    rect rgba(100,150,200,0.5)
    Note over Client,Handler: Token retrieval with expiry
    Client->>Handler: Request GitHub credentials
    Handler->>GitOps: GetGitHubToken(ctx,...)
    GitOps->>KubeSecret: Fetch token & metadata
    KubeSecret-->>GitOps: Return token [+ expiresAt]
    GitOps-->>Handler: (token, expiresAt, error)
    Handler->>Client: Response includes expiresAt if non-zero
    end

    rect rgba(150,100,200,0.5)
    Note over Runner,Auth: Runner refresh decision
    Runner->>Auth: github_token_expiring_soon()?
    alt expiring soon
        Auth-->>Runner: true
    else not expiring
        Runner->>Runner: check interval since last refresh
    end

    opt Refresh needed
        Runner->>Handler: Fetch fresh credentials
        Handler->>GitOps: GetGitHubToken(...)
        GitOps-->>Handler: (token, expiresAt, error)
        Handler-->>Runner: credentials with expiresAt
        Runner->>Auth: update cached token & expiry
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: tracking GitHub App token expiry for proactive refresh, which is the core objective of the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing backend and runner changes, the implementation approach, and test status for the token expiry tracking feature.
Docstring Coverage ✅ Passed Docstring coverage is 82.86% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/github-token-refresh-52858
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/runners/ambient-runner/ambient_runner/bridge.py`:
- Around line 58-65: The done-callback for the shutdown task calls f.exception()
even when the Future was cancelled, which raises CancelledError; update the
callback attached in task.add_done_callback (the lambda receiving f) to first
check f.cancelled() (or use f.done() and not f.cancelled()) and only call
f.exception() when not cancelled, or wrap the call in a try/except and log
exceptions only when present and not a cancellation; ensure the callback uses
_bridge_logger.warning with the exception details only for real exceptions and
ignores cancelled tasks.

In `@components/runners/ambient-runner/ambient_runner/platform/auth.py`:
- Around line 146-152: The current datetime.fromisoformat call in the GitHub
token expiry block fails on RFC3339 strings that end with a trailing "Z" (Python
3.10); update the parsing in the block that sets _credential_expiry["github"] so
it normalizes a trailing "Z" to "+00:00" (or otherwise convert "Z" to an
RFC3339-compatible offset) before calling datetime.fromisoformat, then set
_credential_expiry["github"] = exp_dt.timestamp() and keep the existing
logger.info/logger.warning behavior (keep catching ValueError/TypeError on parse
and logging the failure with the exception).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ca148f7-fb64-4add-8f47-49da84e11051

📥 Commits

Reviewing files that changed from the base of the PR and between 538ccbd and fcfa1dd.

📒 Files selected for processing (6)
  • components/backend/git/operations.go
  • components/backend/handlers/github_auth.go
  • components/backend/handlers/operations_test.go
  • components/backend/handlers/runtime_credentials.go
  • components/runners/ambient-runner/ambient_runner/bridge.py
  • components/runners/ambient-runner/ambient_runner/platform/auth.py

Remove the 3-minute GitHub App token cache so refresh_credentials
always mints fresh tokens. Fix restart_session tool crash by passing
bridge instance instead of None for deferred adapter access.

Fixes: RHOAIENG-52858

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py`:
- Around line 33-38: The build_mcp_servers function currently accepts
bridge_ref: Any = None but restart_session (registered inside build_mcp_servers)
requires a non-null bridge_ref, so callers can silently register a broken tool;
fix by making bridge_ref required or fail fast: either change the signature of
build_mcp_servers to require bridge_ref as a keyword-only parameter (remove the
None default) or add an immediate check at the start of build_mcp_servers that
raises a clear ValueError if bridge_ref is None; reference the build_mcp_servers
function and the restart_session registration to locate where to apply this
change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5bcc647-799d-4c71-baca-9e75d5f4fe75

📥 Commits

Reviewing files that changed from the base of the PR and between fcfa1dd and a1c8652.

📒 Files selected for processing (4)
  • components/backend/github/token.go
  • components/runners/ambient-runner/ambient_runner/bridges/claude/bridge.py
  • components/runners/ambient-runner/ambient_runner/bridges/claude/mcp.py
  • components/runners/ambient-runner/ambient_runner/bridges/claude/tools.py
💤 Files with no reviewable changes (1)
  • components/backend/github/token.go

- Fix CancelledError when calling exception() on cancelled futures in shutdown callback
- Fix Python 3.10 RFC3339 parsing by replacing Z with +00:00 timezone offset
- Clear stale GitHub token expiry on parse failures
- Auto-format code per pre-commit hooks

Co-Authored-By: Claude <noreply@anthropic.com>
@ambient-code
Copy link
Contributor

ambient-code bot commented Mar 13, 2026

🤖 PR Fixer Report

Summary

Addressed CodeRabbit review feedback by fixing two critical bugs affecting Python 3.10 compatibility and asyncio exception handling.

What Was Fixed

1. CancelledError in asyncio shutdown callback (bridge.py:58-65)

  • Issue: Calling f.exception() on a cancelled Future raises CancelledError
  • Fix: Added f.cancelled() check before calling exception()
  • Impact: Prevents unhandled exceptions in the event loop during session shutdown
  • Bonus: Eliminated redundant exception() calls (was called twice in the ternary expression)

2. Python 3.10 RFC3339 timestamp parsing (auth.py:148)

  • Issue: datetime.fromisoformat() doesn't support "Z" suffix until Python 3.11
  • Fix: Added .replace("Z", "+00:00") to normalize timestamps
  • Impact: GitHub token expiry tracking now works on Python 3.10+
  • Bonus: Added _credential_expiry.pop("github", None) in exception handler to prevent stale expiry tracking

What Was Skipped

3. Making bridge_ref required in mcp.py (mcp.py:38)

  • Reason: The tool already handles None gracefully by returning an error response ("Session not ready — cannot restart yet")
  • Assessment: Defensive coding suggestion, not a bug fix
  • Decision: Skipped per PR fixer guidelines (fix real issues, skip style preferences)

Code Review

Ran /simplify to review changes:

  • Code reuse: Acceptable patterns, no unnecessary duplication
  • Code quality: Both changes improve quality and fix real bugs
  • Efficiency: No performance regressions; bridge.py actually improves performance

CI Status

  • ✅ All 31 checks passing
  • ✅ All linters passing (pre-commit hooks auto-fixed formatting)
  • ✅ PR is mergeable

Commits Pushed

  • 5584cff - fix: address CodeRabbit review feedback

All reviewer feedback has been addressed. The PR is ready for merge.

The restart_session tool crashed the CLI with exit code 1 because
setting _restart_requested on the adapter mid-stream caused an
unrecoverable error. Remove the tool entirely until a proper
deferred restart mechanism is implemented.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Gkrumbach07 Gkrumbach07 enabled auto-merge (squash) March 13, 2026 04:16
@Gkrumbach07 Gkrumbach07 merged commit 10961da into main Mar 13, 2026
32 checks passed
@Gkrumbach07 Gkrumbach07 deleted the fix/github-token-refresh-52858 branch March 13, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant