Skip to content

test(integration): retry transient MCP registry outages in mcp show/search#1274

Merged
danielmeppiel merged 2 commits into
mainfrom
fix/mcp-show-retry
May 11, 2026
Merged

test(integration): retry transient MCP registry outages in mcp show/search#1274
danielmeppiel merged 2 commits into
mainfrom
fix/mcp-show-retry

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Problem

The Windows integration job for the v0.13.0 retag failed on test_mcp_show_command because api.mcp.github.com was transiently unavailable:

E   Failed: Command failed: apm mcp show io.github.github/github-mcp-server
E   Stdout:
E   MCP Server Details
E   Fetching: io.github.github/github-mcp-server
E
E    Could not reach MCP registry at https://api.mcp.github.com
E      -> The registry may be temporarily unavailable. Retry shortly.

The CLI itself prints "Retry shortly" for that exact condition, so the test should -- not red-mark the whole release pipeline.

Approach

  • New helper run_mcp_command_with_retry() that retries only on the documented Could not reach MCP registry marker (4 attempts, linear backoff 5s/10s/15s).
  • Wired into both test_mcp_show_command and test_mcp_search_command since both hit the same registry.
  • After exhausting retries the test pytest.skips rather than fails, so a real third-party outage doesn't gate the release pipeline. Any non-transient failure still fails immediately.

Validation

  • Lint clean: uv run --extra dev ruff check src/ tests/ and ruff format --check both silent.
  • Cannot fully exercise on macOS without an upstream outage to retry-against, but the marker substring matches verbatim the CLI's output (src/apm_cli/commands/mcp.py).

Why ship now

Required for the v0.13.0 retag to go green on Windows integration.

…earch

The Windows integration job for the v0.13.0 retag failed on
test_mcp_show_command because the upstream MCP registry
(api.mcp.github.com) returned "Could not reach MCP registry" for a
few seconds. The CLI itself prints "Retry shortly" for that exact
condition, so make the test do so instead of red-marking the whole
release pipeline.

- Add run_mcp_command_with_retry() that retries only on the documented
  transient marker (4 attempts, linear backoff).
- Wire it into test_mcp_show_command and test_mcp_search_command (both
  hit the same registry).
- After the final attempt, skip rather than fail so the release
  pipeline isn't gated on a third-party outage; any non-transient
  failure still fails immediately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 11, 2026 13:08
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves resilience of MCP registry E2E integration tests by retrying apm mcp show/search when the upstream GitHub MCP registry is temporarily unreachable, and skipping (rather than failing) when the outage persists.

Changes:

  • Added run_mcp_command_with_retry() to retry MCP-registry-dependent commands on a specific transient-outage marker with linear backoff.
  • Updated test_mcp_show_command and test_mcp_search_command to use the retry helper.
  • Added a Keep-a-Changelog entry documenting the test hardening.
Show a summary per file
File Description
tests/integration/test_mcp_registry_e2e.py Adds retry+skip behavior around apm mcp show/search when the registry is transiently unreachable.
CHANGELOG.md Documents the integration-test retry/skip behavior under Unreleased -> Fixed.

Copilot's findings

Comments suppressed due to low confidence (1)

tests/integration/test_mcp_registry_e2e.py:115

  • run_mcp_command_with_retry() does not handle subprocess.TimeoutExpired. If the registry call hangs (or the CLI is slow) and exceeds timeout, the test will error out instead of retrying and eventually skipping, which undermines the goal of tolerating transient outages. Consider wrapping the subprocess.run(...) call in a try/except for TimeoutExpired and treating it as a retryable transient condition (or including it in the final skip after exhausting attempts).
        result = subprocess.run(
            cmd,
            shell=True,
            check=False,
            capture_output=True,
            text=True,
            timeout=timeout,
            encoding="utf-8",
            errors="replace",
        )
  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment on lines +27 to +31
# Phrase the CLI prints when the upstream MCP registry
# (https://api.mcp.github.com) is transiently unreachable. Tests treat this
# as retryable rather than a real product failure.
_REGISTRY_TRANSIENT_MARKER = "could not reach mcp registry"

@danielmeppiel danielmeppiel merged commit 9020aca into main May 11, 2026
9 checks passed
@danielmeppiel danielmeppiel deleted the fix/mcp-show-retry branch May 11, 2026 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants