refactor(deps): introduce HostBackend Protocol + extract stamp_plugin_version#1223
Conversation
…struction Extract the vendor-conditional 'if is_github / elif is_ado / else generic' ladders out of DownloadDelegate.build_repo_url and the related download helpers into a typed Protocol with one backend per host kind. Backends are stateless frozen dataclasses dispatched via auth_resolver.classify_host; adding a new vendor is one new dataclass + one entry in _BACKEND_BY_KIND instead of new branches across 8+ call sites. - New module src/apm_cli/deps/host_backends.py: - HostBackend Protocol (runtime-checkable) - GitHubBackend, GHECloudBackend, GHESBackend (share _GitHubFamilyBase) - ADOBackend (org/project/repo URLs, bearer-via-env, no plain HTTP) - GenericGitBackend (GitLab/Gitea/Gogs/Bitbucket; never embeds tokens) - backend_for(dep_ref, auth_resolver) and backend_for_host(host, ...) - DownloadDelegate.build_repo_url collapses 80-line ladder to ~30 lines routed through backend_for(); preserves token='' suppression semantics and the ado_organization fallthrough case. - DownloadDelegate._build_contents_api_urls reuses backends rather than inline string templating. - New unit tests tests/unit/deps/test_host_backends.py: 49 cases covering Protocol conformance, capability flags, clone URL building (https/ssh/ http with port + token + bearer permutations), Contents API URLs, and dispatch for all five host kinds. No public-API changes. No CLI / lockfile / apm.yml schema changes. This is the seam every later phase exploits (bearer retry combinator, plugin version-stamp helper, Artifactory delegate, subdir decomposition). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Plugin packages produced by the marketplace builder ship a synthesized apm.yml with version='0.0.0'. Both download_package and download_subdirectory_package contained the identical 13-line block that: - checked package_type == MARKETPLACE_PLUGIN AND version == '0.0.0' - copied resolved_commit[:7] onto the package - reloaded apm.yml, set version, dumped it back Lift the duplicated block into stamp_plugin_version() in package_validator and call it from both sites. The helper is idempotent and safe under all combinations (no commit, missing apm.yml, non-plugin types, version already set, package=None). New tests/unit/deps/test_stamp_plugin_version.py covers all branches. No public-API change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the inline GitHub/GHES/GHEC API URL ladder in _resolve_commit_sha_for_ref with backend.build_commits_api_url(...). Adds a hostname-based fallback in backend_for() so unit tests that mock AuthResolver.classify_host (returning a MagicMock) still get the right backend for github.com / *.ghe.com / GHES hosts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configured GHES via GITHUB_HOST=<custom-host> uses
https://{host}/api/v3 (not https://api.{host}, which is GHE Cloud
shape). Distinguish between *.ghe.com and other GitHub-classified
hosts when constructing the synthetic HostInfo.
Also: `_url_host` falls back to `host_info.host` so partially
constructed DependencyReference objects (without `host`) still
yield well-formed clone URLs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Refactors dependency host/vendor URL construction by introducing a HostBackend Protocol with concrete backends (GitHub/GHE Cloud/GHES/ADO/Generic), and deduplicates marketplace plugin version stamping logic into a shared helper.
Changes:
- Added
src/apm_cli/deps/host_backends.pywith backend dispatch + per-vendor URL/API builders, and refactored call sites to use it. - Extracted
stamp_plugin_version()intopackage_validator.pyand replaced duplicated inline logic ingithub_downloader.py. - Added unit tests for both the new host backends and the version-stamping helper.
Show a summary per file
| File | Description |
|---|---|
| src/apm_cli/deps/host_backends.py | New HostBackend Protocol + concrete backends + dispatch used to centralize host-specific URL construction. |
| src/apm_cli/deps/download_strategies.py | Refactors repo URL building and Contents-API URL construction to use the new backend abstraction. |
| src/apm_cli/deps/github_downloader.py | Uses host backends for commit-SHA resolution and calls the extracted version-stamping helper. |
| src/apm_cli/deps/package_validator.py | Adds stamp_plugin_version() helper to keep plugin version stamping logic in one place. |
| tests/unit/deps/test_host_backends.py | New unit tests covering backend URL builders and dispatch behavior. |
| tests/unit/deps/test_stamp_plugin_version.py | New unit tests covering stamp_plugin_version() edge cases. |
Copilot's findings
Comments suppressed due to low confidence (9)
tests/unit/deps/test_host_backends.py:144
- These URL assertions use substring checks on a URL string (e.g. '"github.com" in url'). The repo's test conventions require parsing URLs with urllib.parse.urlparse and asserting on components to avoid CodeQL 'py/incomplete-url-substring-sanitization' findings. Refactor these checks to compare parsed hostname/netloc/path instead of substring membership.
def test_https_empty_token_suppresses_credential(self):
backend = GitHubBackend(host_info=_info("github.com", "github"))
url = backend.build_clone_https_url(_dep_ref(), token="")
assert "@" not in url
assert "github.com" in url
tests/unit/deps/test_host_backends.py:170
- This asserts a port via substring membership (':8443' in url) on an https:// URL. Per repo test conventions, parse with urllib.parse.urlparse and assert parsed.port == 8443 (and/or hostname) instead of substring checks to avoid CodeQL URL-substring findings.
def test_https_with_custom_port(self):
backend = GitHubBackend(host_info=_info("github.com", "github"))
url = backend.build_clone_https_url(_dep_ref(port=8443), token=None)
assert ":8443" in url
tests/unit/deps/test_host_backends.py:175
- These checks use startswith/endswith on a URL string. To comply with the repo's URL-assertion convention (and avoid CodeQL substring findings), parse with urllib.parse.urlparse and assert on scheme/path components (e.g. scheme == 'http', path endswith '.git').
def test_http_insecure(self):
backend = GitHubBackend(host_info=_info("github.com", "github"))
url = backend.build_clone_http_url(_dep_ref())
assert url.startswith("http://")
assert url.endswith(".git")
tests/unit/deps/test_host_backends.py:239
- These tests assert URL properties via substring membership on the clone URL ('token in url', 'myorg in url', etc.). Repo test conventions require parsing URLs and asserting on parsed components (netloc/path/query) rather than substring membership to avoid CodeQL URL-substring findings.
def test_https_with_pat(self):
backend = ADOBackend(host_info=_info("dev.azure.com", "ado"))
url = backend.build_clone_https_url(self._ado_dep(), token="ado_pat_xyz")
# ADO embeds the PAT as basic auth.
assert "ado_pat_xyz" in url
assert "myorg" in url
assert "myproj" in url
tests/unit/deps/test_host_backends.py:300
- This asserts the SSH port via substring membership on an 'ssh://' URL. Per repo test conventions, use urllib.parse.urlparse and assert parsed.port == 7999 (and parsed.scheme == 'ssh') instead of substring checks.
def test_ssh_url_with_port(self):
backend = GenericGitBackend(host_info=_info("bitbucket.example.com", "generic", port=7999))
url = backend.build_clone_ssh_url(_dep_ref(host="bitbucket.example.com", port=7999))
# Custom SSH port should be threaded through the URL.
assert "7999" in url
tests/unit/deps/test_host_backends.py:323
- These assertions check URL paths/queries via substring and endswith. To satisfy the repo's URL-assertion convention, parse each URL with urllib.parse.urlparse and assert on parsed.path and parsed.query components instead of substring membership.
def test_contents_api_v1_then_v3(self):
backend = GenericGitBackend(host_info=_info("gitea.example.com", "generic"))
urls = backend.build_contents_api_urls("o", "r", "f.md", "main")
assert len(urls) == 2
# v1 first (Gitea standard), v3 fallback (legacy).
assert "/api/v1/" in urls[0]
assert "/api/v3/" in urls[1]
assert urls[0].endswith("?ref=main")
assert urls[1].endswith("?ref=main")
tests/unit/deps/test_host_backends.py:150
- These assertions check token presence/absence via substring membership on a URL string. To comply with repo URL-assertion conventions and avoid CodeQL substring findings, parse the URL (urllib.parse.urlparse) and assert on parsed.netloc / parsed.username / parsed.hostname instead of using 'in'/'not in' against the full URL.
def test_https_bearer_scheme_does_not_embed_token(self):
backend = GitHubBackend(host_info=_info("github.com", "github"))
# Bearer is ADO-only; GitHub family should fall through to plain URL.
url = backend.build_clone_https_url(_dep_ref(), token="ghp_abc", auth_scheme="bearer")
assert "ghp_abc" not in url
assert "@" not in url
tests/unit/deps/test_host_backends.py:249
- These assertions validate URL properties via substring membership on an https:// URL ('bearer_jwt' not in url, 'myorg' in url). Per repo test conventions, parse with urllib.parse.urlparse and assert on components (hostname/path/netloc) rather than substring membership to avoid CodeQL URL-substring findings.
def test_https_bearer_scheme_drops_token(self):
backend = ADOBackend(host_info=_info("dev.azure.com", "ado"))
url = backend.build_clone_https_url(
self._ado_dep(), token="bearer_jwt", auth_scheme="bearer"
)
# Bearer goes via env, NOT in URL.
assert "bearer_jwt" not in url
# Still a valid ADO URL.
assert "myorg" in url
tests/unit/deps/test_host_backends.py:289
- These URL checks use substring membership on a URL string ('gitea.example.com' in url, token not in url). Repo test conventions require parsing URLs with urllib.parse.urlparse and asserting on parsed components (hostname/netloc/path) instead of substring checks to avoid CodeQL URL-substring findings.
def test_https_never_embeds_token(self):
backend = GenericGitBackend(host_info=_info("gitea.example.com", "generic"))
# Even when a token is passed, generic hosts defer to credential
# helpers -- we never embed the token in the URL.
url = backend.build_clone_https_url(_dep_ref(host="gitea.example.com"), token="some_token")
assert "some_token" not in url
assert "@" not in url
assert "gitea.example.com" in url
- Files reviewed: 6/6 changed files
- Comments generated: 7
CodeQL flagged `assert "github.com" in url` / `assert "gitea.example.com" in url` as incomplete URL substring sanitization (an attacker URL like `evil.com/?fake=github.com` would pass). Switch to `urlparse(url).hostname == ...` per the repo test convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two extractions reduce github_downloader.py from 2400 to 1926 lines:
A. Artifactory orchestration -> ArtifactoryOrchestrator (Adapter+Facade)
- New module: src/apm_cli/deps/artifactory_orchestrator.py
- ArtifactoryRouter: pure decision logic (is_registry_only, should_use_proxy,
parse_proxy_config) -- Adapter pattern over heterogeneous dep_ref shapes.
- ArtifactoryOrchestrator: download flow over a _HasArchiveDownloader
Protocol -- Facade with explicit collaborator boundary.
- 5 methods on the downloader become 3-7 line delegation stubs.
B. Git reference resolution -> GitReferenceResolver (Strategy+Facade)
- New module: src/apm_cli/deps/git_reference_resolver.py
- resolve_git_reference / list_remote_refs / _resolve_commit_sha_for_ref
extracted (~370 lines). Resolver collaborates with the downloader via a
duck-typed _DownloaderContext Protocol -- mirrors the existing
DownloadDelegate(host=self) pattern.
- 3 methods on the downloader become 1-3 line delegation stubs.
Stubs are intentionally retained on GitHubPackageDownloader because
tests/unit/test_artifactory_support.py and other suites access them
directly (self.downloader._method); breaking that contract is out of
scope for a pure refactor.
Test sweep: only the 6 pre-existing baseline failures remain; no new
failures introduced. Ruff lint+format clean across src/ and tests/.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Continues the github_downloader.py decomposition (extractions C+D of 4):
C. Clone engine -> Command + chain-of-responsibility
New: src/apm_cli/deps/clone_engine.py (~330L)
Moves _execute_transport_plan and the per-attempt ADO bearer-fallback
loop out of the monolith. The plan becomes a list of Command objects
(TransportAttempt) and the chain is the per-attempt retry then
plan-level next-attempt fallback. The engine reads transport selector,
protocol pref, allow-fallback flag, and the fallback-port-warned dedup
set dynamically from the host so callers (and tests) can mutate them
after construction.
D. Git auth environment -> Builder
New: src/apm_cli/deps/git_auth_env.py (~150L)
Moves _git_env_dict, _setup_git_environment, and
_build_noninteractive_git_env into a focused GitAuthEnvBuilder.
Test-patch interception preserved:
github_downloader.py keeps re-exported 'git' and '_rich_warning'
names with noqa F401 so existing test patch sites keep working.
git_reference_resolver.py and clone_engine.py route through the
github_downloader namespace for those two names.
Cumulative reduction across all 4 extractions (A, B, C, D):
github_downloader.py: 2400 -> 1645 lines (-755 lines, -31%).
A. ArtifactoryRouter + ArtifactoryOrchestrator
Adapter + Facade.
B. GitReferenceResolver
Strategy + Facade with Protocol-based collaboration.
C. CloneEngine (this commit)
Command + chain-of-responsibility.
D. GitAuthEnvBuilder (this commit)
Builder.
All extractions communicate with the host via narrow Protocol types
(_DownloaderContext / _HasArchiveDownloader) rather than bidirectional
dependencies on the GitHubPackageDownloader concrete class.
Verification:
uv run pytest tests/unit -q # 7909 passed
uv run --extra dev ruff check src/ tests/ # silent
uv run --extra dev ruff format --check src/ tests/ # silent
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The PAT->bearer fallback loop that was previously exempted in github_downloader._execute_transport_plan was moved to deps/clone_engine.CloneEngine.execute as part of the monolith decomposition (commit 1dfc18c). The exemption rationale is unchanged -- the loop wraps a stateful clone_action that mutates target_path, so collapsing onto AuthResolver.execute_with_bearer_fallback is non-trivial and remains tracked as a follow-up to #1212. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dator Real bugs (Copilot reviewer comments on PR #1223): 1. host_backends.backend_for fallback dispatch built api_base for *.ghe.com as 'https://api.{host}', which disagrees with AuthResolver.classify_host (which uses 'https://{host}/api/v3'). This silently produced wrong Contents/Commits API URLs whenever classify_host returned a non-HostInfo value (the typical mocked- resolver shape in unit tests). Aligned with classify_host. 2. download_strategies._build_contents_api_urls had the same drift on the *.ghe.com branch. Fixed to match the canonical classification. 3. host_backends.HostBackend.build_clone_http_url docstring claimed '*.ghe.com reject HTTP and will raise', but only ADO actually raises in the implementation. Updated docstring to describe real behaviour. 4. host_backends module docstring claimed 'no class hierarchy, no inheritance' but the GitHub-family backends share _GitHubFamilyBase. Reworded to describe the actual structure. 5. package_validator.stamp_plugin_version was missing type hints. Added APMPackage | None, PackageType | None, Path; introduced from __future__ import annotations and a TYPE_CHECKING import for PackageType to keep the lazy runtime import. Encoding cleanups: 6. Replaced em dashes (U+2014) with ASCII '--' across host_backends.py, download_strategies.py, and tests/unit/deps/test_host_backends.py to comply with the printable-ASCII repository rule. CodeQL false-positive remediation: 7. tests/unit/deps/test_host_backends.py asserted '@' not in url as a credential-leak guard. CodeQL flagged the pattern; switched to urlparse(url).username/password is None which is both stronger and no longer trips the substring-sanitization rule. Verification: uv run pytest tests/unit -q # 7909 passed uv run --extra dev ruff check src/ tests/ # silent uv run --extra dev ruff format --check src/ tests/ # silent bash scripts/lint-auth-signals.sh # auth-signal lint clean Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APM Review Panel:
|
| Persona | B | R | N | Takeaway |
|---|---|---|---|---|
| Python Architect | 0 | 2 | 1 | Strong decomposition; _DownloaderContext protocol is wide (6 methods), ADO no-org silent re-dispatch is a latent bug trap, F401 re-export block is acceptable transit scaffolding. |
| CLI Logging Expert | 0 | 0 | 1 | Logging architecture preserved correctly across extraction; double-indirection shim for _rich_warning is sound and test-compatible; one nit on implicit coupling documentation. |
| DevX UX Expert | 0 | 1 | 2 | Backend refactor; CLI surface unchanged. One silent-fallback gap hurts unknown-host DevX; two minor error message nits. |
| Supply Chain Security Expert | 0 | 2 | 2 | GHES api_base fix closes URL confusion bug; fallback path and ADO short-circuit are safe; one recommended: clone_engine.py exempt list expansion needs a ticket anchor; no blocking findings. |
| OSS Growth Hacker | 0 | 1 | 1 | Strong contributor-readiness gain; GenericGitBackend is the GitLab on-ramp; one discoverability gap: no CONTRIBUTING pointer to host_backends.py for new vendor authors. |
| Auth Expert | 0 | 2 | 1 | Clean refactor; no auth bypasses or token leakage regressions. Two recommended improvements around ADO short-circuit HostInfo kind consistency and the growing exempt list. |
| Test Coverage Expert | 0 | 5 | 0 | Three new modules (git_reference_resolver, git_auth_env, artifactory_orchestrator) have zero test coverage; GHES api_base regression trap is missing; clone_engine unit coverage absent (integration tests cover the flow indirectly). |
B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.
Top 5 follow-ups
- [Test Coverage Expert] Add unit tests for git_auth_env.py (PAT path sets GIT_ASKPASS; bearer path sets Authorization header; empty-token produces sanitized env) -- 150-line auth env builder that constructs credentials for every clone attempt has zero coverage. Missing tests on a secure-by-default surface rank above all recommended opinion findings. Silent drift here passes a wrong env dict to the git subprocess with no test failing.
- [Test Coverage Expert] Add unit tests for git_reference_resolver.py (resolve_ref for tag/branch/SHA; fallback chain; 404 handling; ADO short-circuit) -- 440-line module handling SHA resolution and commit API fallback chains has zero coverage at any tier. Tagged portability-by-manifest and secure-by-default. If fallback logic silently drifts, apm install resolves tags non-deterministically with no automated signal.
- [Test Coverage Expert] Add unit tests for artifactory_orchestrator.py (parametrize should_use_proxy; test _resolve_host_prefix raises; test download_package delegation) and add GHES api_base value assertion to TestBackendDispatch.test_dispatch_configured_ghes -- 319-line orchestrator with 4-branch routing has zero coverage (governed-by-policy surface). The GHES api_base bug was real and fixed, but the existing dispatch test asserts isinstance only -- a one-line assertion prevents silent reintroduction.
- [Python Architect] Replace ADO no-org silent re-dispatch with explicit ValueError('ADO dependency is missing ado_organization; cannot construct clone URL') -- A malformed ADO dep_ref currently produces a GitHub-style URL and fails at clone time with a confusing git error. An explicit ValueError at URL-construction time gives users an actionable message and prevents a class of misleading support reports.
- [Auth Expert] Add tracking issue reference to lint-auth-signals.sh exempt list entry for clone_engine.py; consider scoping exemption to the specific function -- Auth-expert and supply-chain-security-expert independently flagged this. The lint-auth-signals linter is the only automated guardrail preventing open-coded PAT/bearer from spreading; an unanchored exemption can persist indefinitely. One inline comment with an issue number closes both findings.
Architecture
classDiagram
direction LR
class HostBackend {
<<Protocol>>
+host_info: HostInfo
+kind: str
+is_github_family: bool
+is_generic: bool
+build_clone_https_url(dep_ref, token, auth_scheme) str
+build_clone_ssh_url(dep_ref) str
+build_clone_http_url(dep_ref) str
+build_commits_api_url(dep_ref, ref) str|None
+build_contents_api_urls(owner, repo, file_path, ref) list
}
class _GitHubFamilyBase {
<<SharedBase>>
+host_info: HostInfo
+build_clone_https_url()
+build_clone_ssh_url()
+build_commits_api_url()
+build_contents_api_urls()
}
class GitHubBackend {
<<ConcreteStrategy>>
+kind = "github"
}
class GHECloudBackend {
<<ConcreteStrategy>>
+kind = "ghe_cloud"
}
class GHESBackend {
<<ConcreteStrategy>>
+kind = "ghes"
}
class ADOBackend {
<<ConcreteStrategy>>
+kind = "ado"
+build_clone_https_url()
+build_clone_http_url() ValueError
+build_contents_api_urls() list
}
class GenericGitBackend {
<<ConcreteStrategy>>
+kind = "generic"
+build_contents_api_urls() list
}
class HostInfo {
<<ValueObject>>
+host: str
+kind: str
+api_base: str
+has_public_repos: bool
+port: int|None
}
class AuthResolver {
<<Strategy>>
+classify_host(host, port) HostInfo
}
class _DownloaderContext {
<<Protocol>>
+auth_resolver: AuthResolver
+git_env: dict
+has_ado_token: bool
+_resolve_dep_token()
+_resolve_dep_auth_ctx()
+_build_noninteractive_git_env()
+_build_repo_url()
+_sanitize_git_error()
}
class CloneEngine {
<<Command+ChainOfResponsibility>>
+execute(dep_ref, target_path, clone_action)
}
class ArtifactoryOrchestrator {
<<Facade>>
+download_package()
+download_subdirectory()
}
class ArtifactoryRouter {
<<Adapter>>
+should_use_proxy(dep_ref) bool
+parse_proxy_config() tuple
}
class _HasArchiveDownloader {
<<Protocol>>
+download_artifactory_archive()
}
class GitHubPackageDownloader {
+auth_resolver: AuthResolver
+git_env: dict
+has_ado_token: bool
}
_GitHubFamilyBase <|-- GitHubBackend
_GitHubFamilyBase <|-- GHECloudBackend
_GitHubFamilyBase <|-- GHESBackend
GitHubBackend ..|> HostBackend
GHECloudBackend ..|> HostBackend
GHESBackend ..|> HostBackend
ADOBackend ..|> HostBackend
GenericGitBackend ..|> HostBackend
HostBackend ..> HostInfo : carries
AuthResolver ..> HostInfo : returns
CloneEngine ..> _DownloaderContext : injected
GitHubPackageDownloader ..|> _DownloaderContext
ArtifactoryOrchestrator ..> _HasArchiveDownloader : injected
note for HostBackend "Strategy via Protocol + dispatch dict. backend_for() is the dispatcher."
note for _DownloaderContext "WIDE: 8 attrs/methods. Recommend splitting into _HostConfig + _URLBuilder."
class HostBackend:::touched
class _GitHubFamilyBase:::touched
class GitHubBackend:::touched
class GHECloudBackend:::touched
class GHESBackend:::touched
class ADOBackend:::touched
class GenericGitBackend:::touched
class CloneEngine:::touched
class _DownloaderContext:::touched
class ArtifactoryOrchestrator:::touched
class ArtifactoryRouter:::touched
class _HasArchiveDownloader:::touched
classDef touched fill:#fff3b0,stroke:#d47600
flowchart TD
A(["apm install / apm run"]) --> B["GitHubPackageDownloader._download_package\nSRC: github_downloader.py"]
B --> C{"ArtifactoryRouter.should_use_proxy\nSRC: artifactory_orchestrator.py"}
C -- yes --> D["ArtifactoryOrchestrator.download_package\nSRC: artifactory_orchestrator.py"]
C -- no --> E["backend_for(dep_ref, auth_resolver)\nSRC: host_backends.py"]
E --> F["AuthResolver.classify_host(host)\nSRC: core/auth.py"]
F --> G(["HostInfo (ValueObject)"])
G --> H{"_BACKEND_BY_KIND dispatch"}
H -- github/ghe/ghes --> I["GitHubBackend / GHECloudBackend / GHESBackend\nSRC: host_backends.py"]
H -- ado --> J["ADOBackend\nSRC: host_backends.py"]
H -- generic/fallback --> K["GenericGitBackend\nSRC: host_backends.py"]
I & J & K --> L["DownloadDelegate.build_repo_url\nSRC: download_strategies.py"]
L --> M["backend.build_clone_https_url / ssh / http\nSRC: host_backends.py"]
M --> N["CloneEngine.execute(dep_ref, target_path, clone_action)\nSRC: clone_engine.py"]
N --> O["GitAuthEnv.build_env\nSRC: git_auth_env.py"]
N --> P{"TransportPlan attempt loop"}
P -- auth failure --> Q["ADO bearer in-attempt retry"]
P -- plan exhausted --> R(["AggregatedCloneError"])
P -- success --> S(["[FS] target_path populated"])
D --> T["[NET] download_artifactory_archive"]
T --> U["[FS] tempfile.mkdtemp extraction"]
U --> V["stamp_plugin_version\nSRC: package_validator.py"]
V --> S
style I fill:#fff3b0,stroke:#d47600
style J fill:#fff3b0,stroke:#d47600
style K fill:#fff3b0,stroke:#d47600
style L fill:#fff3b0,stroke:#d47600
style M fill:#fff3b0,stroke:#d47600
style N fill:#fff3b0,stroke:#d47600
style O fill:#fff3b0,stroke:#d47600
style D fill:#fff3b0,stroke:#d47600
style V fill:#fff3b0,stroke:#d47600
style E fill:#fff3b0,stroke:#d47600
Recommendation
Merge this PR. The architectural direction is correct, the auth refactor is clean, and no panelist found a blocking issue. The five follow-ups above should be tracked as a single 'post-extraction test coverage' issue opened before or immediately after merge -- particularly the three zero-coverage modules (git_auth_env, git_reference_resolver, artifactory_orchestrator) which represent the highest-signal gap: production code on secure-by-default and governed-by-policy surfaces with no automated regression protection. The ADO no-org silent re-dispatch fix is a one-liner that is worth landing in this PR if the author is willing; otherwise it should be the first item in the follow-up issue.
Full per-persona findings
Python Architect
-
[recommended] _DownloaderContext protocol is too wide to be genuinely narrow at
src/apm_cli/deps/clone_engine.py:69
The protocol declared in clone_engine.py exposes 6 methods (auth_resolver, git_env, has_ado_token, _resolve_dep_token, _resolve_dep_auth_ctx, _build_noninteractive_git_env, _build_repo_url, _sanitize_git_error). A truly narrow protocol should expose only what the consumer strictly needs; 6 method slots means CloneEngine is still tightly coupled to GitHubPackageDownloader's internal API surface.
Suggested: Split _DownloaderContext into two narrower protocols: _HostConfig (3 attrs + 1 env builder) and _URLBuilder (1 method). Pass both to CloneEngine.init so each can be stubbed independently in tests. -
[recommended] ADO no-org silent re-dispatch in build_repo_url can silently mis-route ADO deps at
src/apm_cli/deps/download_strategies.py:248
When an ADO-classified dep_ref has no ado_organization, the code silently calls backend_for(None, ...) which returns a non-ADO backend. A malformed ADO dep_ref will silently produce a GitHub-style HTTPS URL and fail at clone time with a confusing git error.
Suggested: Replace the silent re-dispatch with an explicit raise ValueError('ADO dependency is missing ado_organization; cannot construct clone URL'). -
[nit] 18 # noqa: F401 re-exports in github_downloader.py should carry a deprecation timeline at
src/apm_cli/deps/github_downloader.py:16
Without a deprecation marker or follow-up issue reference, the block will persist indefinitely and is a regression risk if future tests are written against the old namespace.
Suggested: Add '# TRANSITIONAL: remove after all test patches are updated to the canonical module path (tracked in #XXXX)' above the re-export block.
CLI Logging Expert
- [nit] clone_engine._rich_warning double-indirection creates implicit coupling that future test authors may not discover at
src/apm_cli/deps/clone_engine.py:56
clone_engine._rich_warning routes through github_downloader._rich_warning so that existing test patches still intercept warnings from CloneEngine. This works correctly today, but the coupling is invisible to new test authors who would naturally patch apm_cli.deps.clone_engine._rich_warning and miss all emissions.
Suggested: Add one sentence to both the clone_engine._rich_warning docstring and the github_downloader.py noqa comment: 'New clone_engine tests that need to capture warnings must patch apm_cli.deps.github_downloader._rich_warning, not apm_cli.deps.clone_engine._rich_warning.'
DevX UX Expert
-
[recommended] Unknown host silently falls back to GenericGitBackend with no user-visible signal at
src/apm_cli/deps/host_backends.py
In backend_for(), when classify_host returns a kind not in _BACKEND_BY_KIND AND the host is not a GitHub hostname, execution reaches the else branch and returns GenericGitBackend with no log, no warning. A user pointing APM at a GitLab or Bitbucket host will get silent generic handling with confusing auth failures several layers deeper.
Suggested: In the else branch of backend_for(), emit a _rich_warning: "Host '{host}' is not recognized as GitHub, GHE, GHES, or ADO -- falling back to generic Git. Auth may be limited." -
[nit] Artifactory FQDN error omits where to set ARTIFACTORY_BASE_URL at
src/apm_cli/deps/artifactory_orchestrator.py
RuntimeError('Artifactory download requires either FQDN or ARTIFACTORY_BASE_URL') names the env var but gives no guidance on where to set it (apm.yml? shell env?). -
[nit] ADO bearer fallback success logged only at debug, invisible to users on slow/retried clones at
src/apm_cli/deps/clone_engine.py
A user whose PAT fails and APM silently retries with AAD bearer gets no indication that auth was retried, making it hard to understand why the clone was slower or to know the PAT is misconfigured.
Supply Chain Security Expert
-
[recommended] clone_engine.py added to lint-auth-signals Rule-A exempt list without a tracking ticket reference at
scripts/lint-auth-signals.sh:22
Without a concrete ticket anchor the exemption can persist indefinitely and the Rule-A signal becomes permanently suppressed for all future code added to clone_engine.py.
Suggested: Add a GitHub issue reference to the exemption comment. Also consider scoping the exemption to the specific function rather than the whole file. -
[recommended] backend_for defensive fallback silently swallows unexpected classify_host return values at
src/apm_cli/deps/host_backends.py
If classify_host returns an unexpected type due to a future refactor or subtle bug, the code silently constructs a GenericGitBackend, bypassing classify_host's security classification silently.
Suggested: Emit a _warning() or _debug() when the isinstance(info, HostInfo) check fails in the non-ADO path. -
[nit] GHES api_base fix is correct and closes URL confusion -- Old form (api/redacted){host} would resolve to a different host for any GHES appliance. The fix aligns to https://{host}/api/v3.
-
[nit] _HasArchiveDownloader Protocol is private-module boundary, not a supply-chain risk -- Only wired at construction by github_downloader.py; no attacker-controlled implementation path at runtime.
OSS Growth Hacker
-
[recommended] No contributor signpost to HostBackend as the GitLab/Bitbucket extension point at
src/apm_cli/deps/host_backends.py
The HostBackend Protocol + _BACKEND_BY_KIND registry is exactly the seam a GitLab contributor needs, but nothing in CONTRIBUTING.md, the module docstring, or the PR description links these two ideas.
Suggested: Add a 'Adding a new vendor backend' paragraph to the host_backends.py module docstring and a matching entry in CONTRIBUTING.md under an 'Extending APM' section. -
[nit] GenericGitBackend's GitLab classification as 'generic' should be surfaced in the multi-host story -- A follow-up issue titled 'GitLab first-class backend' would give the community a concrete contribution target.
Auth Expert
-
[recommended] ADO short-circuit in backend_for() passes classify_host HostInfo unmodified even when its kind field does not match 'ado' at
src/apm_cli/deps/host_backends.py
If classify_host ever returns HostInfo(kind='github') for an ADO host (wrong CNAME, mislabeled GHES entry), ADOBackend gets host_info whose kind='github', which could produce malformed ADO API calls.
Suggested: After classify_host returns in the ADO short-circuit block, normalize: if isinstance(info, HostInfo) and info.kind != 'ado': info = dataclasses.replace(info, kind='ado') -
[recommended] clone_engine.py added to auth-signals exempt list without a tracking issue at
scripts/lint-auth-signals.sh
The lint-auth-signals linter is the only automated guardrail preventing open-coded PAT/bearer from spreading. Exempting a second file doubles the unguarded surface without a concrete issue reference.
Suggested: Add inline issue reference, e.g.: '# PAT->bearer open-coded; tracked in #NNNN' -
[nit] Hostname-based fallback in backend_for() assigns has_public_repos=False for all GHES hosts -- Production path unreachable; only affects mocks/tests. Harmless but noteworthy if callers ever branch on has_public_repos in the fallback path.
Doc Writer -- inactive
All 11 changed files are internal (src/apm_cli/deps/ modules, scripts/lint-auth-signals.sh, and unit tests); no user-facing CLI surface, CHANGELOG entry, README claim, or docs page is affected by this refactor.
Test Coverage Expert
-
[recommended] git_reference_resolver.py (~440 lines) has no tests at any tier at
src/apm_cli/deps/git_reference_resolver.py
New 440-line module handling SHA resolution and commit API fallback chains. No test file at any tier. The install-pipeline surface requires integration-with-fixtures floor per the tier matrix.
Suggested: Add tests/unit/deps/test_git_reference_resolver.py covering resolve_ref for tag/branch/sha; fallback chain; 404 handling; ADO short-circuit path.
Proof (missing):tests/unit/deps/test_git_reference_resolver.py::test_resolve_ref_tag_returns_sha-- proves: apm install resolves a tagged version to a deterministic SHA before cloning [portability-by-manifest,secure-by-default]
assert resolver.resolve_ref(dep_ref, 'v1.0') == 'abc123sha' -
[recommended] git_auth_env.py has no tests at any tier at
src/apm_cli/deps/git_auth_env.py
New ~150-line auth env builder that constructs GIT_ASKPASS / GIT_CONFIG_COUNT env dicts for every clone attempt. grep of tests/ for 'git_auth_env', 'GitAuthEnv' returns no hits.
Suggested: Add tests/unit/deps/test_git_auth_env.py: assert PAT path sets GIT_ASKPASS; assert bearer path sets Authorization header; assert empty-token path produces sanitized env.
Proof (missing):tests/unit/deps/test_git_auth_env.py::test_pat_auth_sets_git_askpass-- proves: apm install passes the correct credential env to git subprocess for PAT-authenticated clones [secure-by-default]
assert 'GIT_ASKPASS' in env or 'GIT_CONFIG_COUNT' in env -
[recommended] artifactory_orchestrator.py (~319 lines) has no tests at any tier at
src/apm_cli/deps/artifactory_orchestrator.py
ArtifactoryRouter.should_use_proxy and ArtifactoryOrchestrator.download_package are new routing and download facade. grep returns no hits. Router has non-trivial branching with 4 different routing cases.
Suggested: Add tests/unit/deps/test_artifactory_orchestrator.py: parametrize should_use_proxy; test _resolve_host_prefix raises on missing host/prefix; test download_package delegates correctly.
Proof (missing):tests/unit/deps/test_artifactory_orchestrator.py::test_should_use_proxy_returns_false_for_explicit_artifactory-- proves: apm install routes Artifactory dependencies correctly without double-proxying explicit URLs [governed-by-policy,vendor-neutral]
assert ArtifactoryRouter.should_use_proxy(dep_ref_artifactory) is False -
[recommended] GHES api_base bug fix has no regression trap -- dispatch test asserts type only, not the corrected api_base value at
tests/unit/deps/test_host_backends.py:357
The fix corrects api_base from '(api/redacted){host}' to 'https://{host}/api/v3'. Existing dispatch test asserts isinstance(backend, GHESBackend) but not backend.host_info.api_base. A future refactor could silently reintroduce the wrong api_base.
Suggested: In TestBackendDispatch.test_dispatch_configured_ghes, add: assert backend.host_info.api_base == f'(git.acme.com/redacted)
Proof (missing):tests/unit/deps/test_host_backends.py::TestBackendDispatch::test_dispatch_configured_ghes-- proves: apm install against a GHES instance uses the correct REST API base URL [portability-by-manifest]
assert backend.host_info.api_base == '(git.acme.com/redacted) # not (api.git.acme.com/redacted) -
[recommended] clone_engine.py PAT->bearer fallback has no unit tests; integration tests cover the flow indirectly at
src/apm_cli/deps/clone_engine.py
CloneEngine.execute contains the ADO bearer in-attempt retry logic. No unit test file for clone_engine.py. Integration tests cover the user promise at integration tier via GitHubPackageDownloader, not CloneEngine directly.
Suggested: Add tests/unit/deps/test_clone_engine.py: mock clone_action to raise on PAT (401 equivalent), assert bearer retry attempted; assert cleanup on final failure.
Proof (missing):tests/unit/deps/test_clone_engine.py::test_ado_bearer_retry_on_pat_failure-- proves: apm install retries with ADO bearer token when initial PAT clone returns 401 [secure-by-default]
assert clone_action.call_count == 2 # first PAT, then bearer retry
This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.
Generated by PR Review Panel for issue #1223 · ● 5.4M · ◷
…r, artifactory_orchestrator + ADO no-org ValueError Address Test Coverage Expert and Python Architect findings on PR #1223: 1. New: tests/unit/deps/test_git_auth_env.py (12 tests) Cover GitAuthEnvBuilder.setup_environment (PAT, bearer, empty), noninteractive_env (pop-then-conditionally-restore fence, suppress_credential_helpers full fence), and subprocess_env_dict (sanitized merge, non-string skip). 2. New: tests/unit/deps/test_git_reference_resolver.py (17 tests) Cover resolve_commit_sha_for_ref (artifactory/ADO short-circuit, 40-char SHA passthrough, tag via commits API, 404, network error, no commits API). Cover list_remote_refs (artifactory empty, success, GitCommandError -> RuntimeError, ADO bearer fallback, noninteractive env path, sanitized error). Cover resolve() artifactory short-circuit paths. 3. New: tests/unit/deps/test_artifactory_orchestrator.py (16 tests) Parametrized ArtifactoryRouter.should_use_proxy across all 4 routing branches. ArtifactoryOrchestrator._resolve_host_prefix raises on missing host/prefix and on no-config. download_package delegation contract (host/prefix/scheme/ref propagated to _HasArchiveDownloader, default ref "main", validation failure raises, archive RuntimeError propagates). 4. ADO no-org silent re-dispatch -> explicit ValueError ADOBackend.build_clone_https_url and build_clone_ssh_url now raise ValueError("ADO dependency is missing ado_organization; cannot construct clone URL") at URL-construction time instead of producing a malformed URL that fails later with a confusing git error. 5. GHES api_base regression guard test_dispatch_ghes_via_github_host_env asserts backend.host_info.api_base == "https://git.acme.com/api/v3" to prevent silent reintroduction of the api.<host> bug. Total: 45 new/updated unit tests. Full unit suite green (7987 pass). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Round 4 follow-ups: status
Details1.
2.
3.
4. ADO no-org explicit
Verification
|
Resolve conflicts between PR microsoft#1149 (GitLab marketplace host support) and main's recent refactors (microsoft#1223 HostBackend Protocol, microsoft#630 gh CLI fallback). - Add GitLabBackend (kind='gitlab', is_generic=True) embedding PAT via oauth2:token@host using build_gitlab_https_clone_url, registered in _BACKEND_BY_KIND so backend_for routes gitlab.com correctly. - download_strategies.build_repo_url: adopt main's backend dispatch and add a GitLab branch resolving the per-dep PAT through auth_resolver.resolve_for_dep. - github_downloader: keep main's delegating refactor (GitAuthEnvBuilder, CloneEngine, GitReferenceResolver); merge debug log to include has_gitlab_token. - auth.py / token_manager.py docstrings: reflect new chain ordering (env vars -> gh CLI -> credential helper) for GitHub class plus GitLab and Generic entries. - authentication.md: per-host-class chain narrative, table, package source table and mermaid flowchart updated to include gh CLI step. - CHANGELOG: keep both Added entries (GitLab + gh CLI fallback). - Update test_dispatch_generic to use a non-GitLab generic host and add test_dispatch_gitlab covering the new backend dispatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TL;DR
Introduces a
HostBackendProtocol that consolidates per-vendor URL/API construction (GitHub, GHE Cloud, GHES, Azure DevOps, generic git) that was previously scattered acrossgithub_downloader.pyanddownload_strategies.py. Also extracts astamp_plugin_version()helper that was duplicated across twodownload_*methods. Net:+1190 / -126with the bulk being a new module + tests;download_strategies.build_repo_urlcollapses from a ~80-line vendor ladder to ~30 lines.Problem (WHY)
src/apm_cli/deps/github_downloader.pyis a 2,400-line monolith. Every new host vendor (recently: ADO bearer-PAT, GHE Cloud, GitLab in #1149) requires editing 5-8 separate vendor-conditional branches scattered throughbuild_repo_url,_build_contents_api_urls,_resolve_commit_sha_for_ref,_clone_with_fallback,_resolve_dep_token, etc. The duplication is fragile -- e.g., the GHES api_base shape (https://{host}/api/v3) is correct in some places and silently wrong (https://api.{host}) in others.The
python-architectagent identified six extraction opportunities of decreasing leverage. This PR ships the two highest-value ones.Approach (WHAT)
Phase 1: HostBackend Protocol (commit 18c5a3c, 5a8bd9c, bb695be)
New module
src/apm_cli/deps/host_backends.py:HostBackendProtocol (runtime_checkable) with:kind,is_github_family,is_generic,build_clone_https_url,build_clone_ssh_url,build_clone_http_url,build_commits_api_url,build_contents_api_urls.GitHubBackend,GHECloudBackend,GHESBackend,ADOBackend,GenericGitBackend._GitHubFamilyBaseshares URL builders across the 3 GitHub-family concretes via composition.backend_for(dep_ref, auth_resolver)dispatch usingauth_resolver.classify_host(), with two safe short-circuits:dep_ref.is_azure_devops()is True, forceADOBackend(parsed dep shape wins).classify_hostreturns non-HostInfo(e.g., a mocked resolver in unit tests), route by hostname.Refactored call sites:
download_strategies.build_repo_url-- collapsed ~80-line vendor ladder to a single dispatch + ~30 lines of variants.download_strategies._build_contents_api_urls-- now uses backends; also fixes a latent GHES bug where api_base washttps://api.{host}instead ofhttps://{host}/api/v3.github_downloader._resolve_commit_sha_for_ref-- replaced the GitHub/GHE/GHES URL ladder withbackend.build_commits_api_url(...).Phase 3:
stamp_plugin_versionhelper (commit 513ba00)Extract
stamp_plugin_version(package, package_type, resolved_commit, target_path)topackage_validator.py. Replace 2 duplicates (download_packageanddownload_subdirectory_package).Implementation (HOW)
host_info: HostInfo; the Protocol is@runtime_checkablesoisinstance(b, HostBackend)works.dep_ref.host or self.host_info.hostso partial mocks (unit tests) and partial DependencyReference objects both yield correct URLs.build_clone_http_urlraisesValueError(no plain-HTTP path for ADO).build_contents_api_urlsreturns[]-- signal to fall through to the Items REST API.build_commits_api_urlreturnsNonefor ADO/generic, AND for already-resolved 40-char SHAs (caller can skip the API call).token == ""semantics ("explicitly suppress per-instance default", used by TransportSelector for plain-HTTPS / SSH attempts) are preserved end-to-end.tests/unit/deps/test_host_backends.pycovering Protocol conformance, capability flags, every URL builder per backend, and dispatch.stamp_plugin_versioncovering missing package/version/resolved_commit edge cases.Validation evidence
uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/both silent.tests/unit/deps/+tests/test_github_downloader.py+tests/unit/test_generic_git_urls.py-- all pass except 6 failures pre-existing onorigin/main(unrelated; verified by running on bare main earlier in the work). Full unit suite: 8009 passed, 6 pre-existing failures, 2 skipped._build_contents_api_urls, (b)_url_hostnow falls back tohost_info.host. Pre-existing classification scatter in_resolve_dep_token/_resolve_dep_auth_ctx/_clone_with_fallbackis documented as a follow-up below.Trade-offs and deferred follow-ups
for attempt in plan.attemptsretry has CalledProcessError stderr decoding + AzureCliBearerError catch + ImportError catch that don't map cleanly ontoexecute_with_bearer_fallback(primary_op, bearer_op, is_auth_failure). Worth doing as a focused PR with its own tests._download_artifactory_archive); cleaner as a focused move with its own tests.patch.object(GitHubPackageDownloader, '_method')on stub-only methods; removal requires retargeting patches todownload_strategies.DownloadDelegateorgit_remote_ops.download_subdirectory_packagedecomposition): deferred -- 328 lines, multiple variants (cache / shared clone / legacy clone / artifactory). Deserves a dedicated PR._resolve_dep_token,_resolve_dep_auth_ctx,_clone_with_fallbackstill classify locally viais_github_hostname(host)rather than going throughbackend_for(...). Not made worse by this PR, but worth routing through the new abstraction in a follow-up so vendor routing is fully consolidated.How to test
The 6 failures in
tests/test_github_downloader.py(test_resolve_git_reference_commit,test_download_package_*,test_clone_fallback_respects_enterprise_host,test_credential_fill_used_when_no_env_token) reproduce on bareorigin/mainand are not caused by this branch.Update: Additional monolith decomposition (commits 9499264, 1dfc18c)
After review feedback that "7 lines is not a meaningful reduction," the python-architect agent identified four high-leverage extractions targeting the bulk of the monolith. All four are now shipped on this branch.
Cumulative impact
src/apm_cli/deps/github_downloader.py: 2400 -> 1645 lines (-755 lines, -31%)Four new focused modules in
src/apm_cli/deps/:artifactory_orchestrator.pydep_refshapes to the archive downloader; centralizes Artifactory base-URL parsing and the proxy-eligibility predicate.git_reference_resolver.pylist_remote_refs,resolve(branch/tag/commit),resolve_commit_sha_for_ref-- all the ref-resolution arms that used to fan out across the monolith.clone_engine.py_execute_transport_planlifted out. EachTransportAttemptis a Command; the chain is per-attempt ADO bearer fallback then plan-level next-attempt fallback.git_auth_env.pysetup_environment(live env mutation),noninteractive_env,subprocess_env_dict. One coherent place for git auth env construction.Collaboration contracts
Each new module communicates with the host downloader through narrow Protocol types (
_DownloaderContext,_HasArchiveDownloader) rather than depending on the concreteGitHubPackageDownloaderclass. This keeps the dependency direction one-way and lets the modules be unit-tested in isolation against fakes.Test-patch interception preserved
To avoid breaking the substantial existing test suite that patches
apm_cli.deps.github_downloader.git,apm_cli.deps.github_downloader._rich_warning, etc., the monolith keeps re-exportedgitand_rich_warningnames with# noqa: F401. The extracted modules route those specific symbols through thegithub_downloadernamespace so existing patch sites continue to intercept the calls.Verification
uv run pytest tests/unit -q-> 7909 passed (1 deprecation warning, no failures).uv run --extra dev ruff check src/ tests/-> silent.uv run --extra dev ruff format --check src/ tests/-> silent.GitHubPackageDownloaderpublic methods preserved as thin delegation stubs; CLI surface,apm.ymlschema, and lockfile schema unchanged.