test(integration): retry transient MCP registry outages in mcp show/search#1274
Merged
Conversation
…earch The Windows integration job for the v0.13.0 retag failed on test_mcp_show_command because the upstream MCP registry (api.mcp.github.com) returned "Could not reach MCP registry" for a few seconds. The CLI itself prints "Retry shortly" for that exact condition, so make the test do so instead of red-marking the whole release pipeline. - Add run_mcp_command_with_retry() that retries only on the documented transient marker (4 attempts, linear backoff). - Wire it into test_mcp_show_command and test_mcp_search_command (both hit the same registry). - After the final attempt, skip rather than fail so the release pipeline isn't gated on a third-party outage; any non-transient failure still fails immediately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Improves resilience of MCP registry E2E integration tests by retrying apm mcp show/search when the upstream GitHub MCP registry is temporarily unreachable, and skipping (rather than failing) when the outage persists.
Changes:
- Added
run_mcp_command_with_retry()to retry MCP-registry-dependent commands on a specific transient-outage marker with linear backoff. - Updated
test_mcp_show_commandandtest_mcp_search_commandto use the retry helper. - Added a Keep-a-Changelog entry documenting the test hardening.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/test_mcp_registry_e2e.py | Adds retry+skip behavior around apm mcp show/search when the registry is transiently unreachable. |
| CHANGELOG.md | Documents the integration-test retry/skip behavior under Unreleased -> Fixed. |
Copilot's findings
Comments suppressed due to low confidence (1)
tests/integration/test_mcp_registry_e2e.py:115
run_mcp_command_with_retry()does not handlesubprocess.TimeoutExpired. If the registry call hangs (or the CLI is slow) and exceedstimeout, the test will error out instead of retrying and eventually skipping, which undermines the goal of tolerating transient outages. Consider wrapping thesubprocess.run(...)call in a try/except forTimeoutExpiredand treating it as a retryable transient condition (or including it in the final skip after exhausting attempts).
result = subprocess.run(
cmd,
shell=True,
check=False,
capture_output=True,
text=True,
timeout=timeout,
encoding="utf-8",
errors="replace",
)
- Files reviewed: 2/2 changed files
- Comments generated: 1
Comment on lines
+27
to
+31
| # Phrase the CLI prints when the upstream MCP registry | ||
| # (https://api.mcp.github.com) is transiently unreachable. Tests treat this | ||
| # as retryable rather than a real product failure. | ||
| _REGISTRY_TRANSIENT_MARKER = "could not reach mcp registry" | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Windows integration job for the v0.13.0 retag failed on
test_mcp_show_commandbecauseapi.mcp.github.comwas transiently unavailable:The CLI itself prints "Retry shortly" for that exact condition, so the test should -- not red-mark the whole release pipeline.
Approach
run_mcp_command_with_retry()that retries only on the documentedCould not reach MCP registrymarker (4 attempts, linear backoff 5s/10s/15s).test_mcp_show_commandandtest_mcp_search_commandsince both hit the same registry.pytest.skips rather than fails, so a real third-party outage doesn't gate the release pipeline. Any non-transient failure still fails immediately.Validation
uv run --extra dev ruff check src/ tests/andruff format --checkboth silent.src/apm_cli/commands/mcp.py).Why ship now
Required for the v0.13.0 retag to go green on Windows integration.