Skip to content

refactor(deps): introduce HostBackend Protocol + extract stamp_plugin_version#1223

Merged
danielmeppiel merged 12 commits into
mainfrom
feat/host-backends-protocol
May 9, 2026
Merged

refactor(deps): introduce HostBackend Protocol + extract stamp_plugin_version#1223
danielmeppiel merged 12 commits into
mainfrom
feat/host-backends-protocol

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel commented May 9, 2026

TL;DR

Introduces a HostBackend Protocol that consolidates per-vendor URL/API construction (GitHub, GHE Cloud, GHES, Azure DevOps, generic git) that was previously scattered across github_downloader.py and download_strategies.py. Also extracts a stamp_plugin_version() helper that was duplicated across two download_* methods. Net: +1190 / -126 with the bulk being a new module + tests; download_strategies.build_repo_url collapses from a ~80-line vendor ladder to ~30 lines.

Problem (WHY)

src/apm_cli/deps/github_downloader.py is a 2,400-line monolith. Every new host vendor (recently: ADO bearer-PAT, GHE Cloud, GitLab in #1149) requires editing 5-8 separate vendor-conditional branches scattered through build_repo_url, _build_contents_api_urls, _resolve_commit_sha_for_ref, _clone_with_fallback, _resolve_dep_token, etc. The duplication is fragile -- e.g., the GHES api_base shape (https://{host}/api/v3) is correct in some places and silently wrong (https://api.{host}) in others.

The python-architect agent identified six extraction opportunities of decreasing leverage. This PR ships the two highest-value ones.

Approach (WHAT)

Phase 1: HostBackend Protocol (commit 18c5a3c, 5a8bd9c, bb695be)

New module src/apm_cli/deps/host_backends.py:

  • HostBackend Protocol (runtime_checkable) with: kind, is_github_family, is_generic, build_clone_https_url, build_clone_ssh_url, build_clone_http_url, build_commits_api_url, build_contents_api_urls.
  • 5 frozen-dataclass implementations: GitHubBackend, GHECloudBackend, GHESBackend, ADOBackend, GenericGitBackend.
  • _GitHubFamilyBase shares URL builders across the 3 GitHub-family concretes via composition.
  • backend_for(dep_ref, auth_resolver) dispatch using auth_resolver.classify_host(), with two safe short-circuits:
    • ADO short-circuit: when dep_ref.is_azure_devops() is True, force ADOBackend (parsed dep shape wins).
    • Hostname-based fallback: when classify_host returns non-HostInfo (e.g., a mocked resolver in unit tests), route by hostname.

Refactored call sites:

  • download_strategies.build_repo_url -- collapsed ~80-line vendor ladder to a single dispatch + ~30 lines of variants.
  • download_strategies._build_contents_api_urls -- now uses backends; also fixes a latent GHES bug where api_base was https://api.{host} instead of https://{host}/api/v3.
  • github_downloader._resolve_commit_sha_for_ref -- replaced the GitHub/GHE/GHES URL ladder with backend.build_commits_api_url(...).

Phase 3: stamp_plugin_version helper (commit 513ba00)

Extract stamp_plugin_version(package, package_type, resolved_commit, target_path) to package_validator.py. Replace 2 duplicates (download_package and download_subdirectory_package).

Implementation (HOW)

  • All backends are stateless beyond host_info: HostInfo; the Protocol is @runtime_checkable so isinstance(b, HostBackend) works.
  • Backends prefer dep_ref.host or self.host_info.host so partial mocks (unit tests) and partial DependencyReference objects both yield correct URLs.
  • ADO build_clone_http_url raises ValueError (no plain-HTTP path for ADO).
  • ADO build_contents_api_urls returns [] -- signal to fall through to the Items REST API.
  • build_commits_api_url returns None for ADO/generic, AND for already-resolved 40-char SHAs (caller can skip the API call).
  • token == "" semantics ("explicitly suppress per-instance default", used by TransportSelector for plain-HTTPS / SSH attempts) are preserved end-to-end.
  • 49 unit tests in tests/unit/deps/test_host_backends.py covering Protocol conformance, capability flags, every URL builder per backend, and dispatch.
  • 7 unit tests for stamp_plugin_version covering missing package/version/resolved_commit edge cases.

Validation evidence

  • Lint: uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/ both silent.
  • Tests: tests/unit/deps/ + tests/test_github_downloader.py + tests/unit/test_generic_git_urls.py -- all pass except 6 failures pre-existing on origin/main (unrelated; verified by running on bare main earlier in the work). Full unit suite: 8009 passed, 6 pre-existing failures, 2 skipped.
  • Rubber-duck critique: routed through and addressed: (a) GHES api_base bug fixed in _build_contents_api_urls, (b) _url_host now falls back to host_info.host. Pre-existing classification scatter in _resolve_dep_token/_resolve_dep_auth_ctx/_clone_with_fallback is documented as a follow-up below.

Trade-offs and deferred follow-ups

  • Phase 2 (bearer-retry combinator unification): deferred -- existing inline for attempt in plan.attempts retry has CalledProcessError stderr decoding + AzureCliBearerError catch + ImportError catch that don't map cleanly onto execute_with_bearer_fallback(primary_op, bearer_op, is_auth_failure). Worth doing as a focused PR with its own tests.
  • Phase 4 (Artifactory orchestration -> DownloadDelegate): deferred -- the helpers depend on other downloader internals (_download_artifactory_archive); cleaner as a focused move with its own tests.
  • Phase 5 (back-compat stub removal): deferred -- many tests use patch.object(GitHubPackageDownloader, '_method') on stub-only methods; removal requires retargeting patches to download_strategies.DownloadDelegate or git_remote_ops.
  • Phase 6 (download_subdirectory_package decomposition): deferred -- 328 lines, multiple variants (cache / shared clone / legacy clone / artifactory). Deserves a dedicated PR.
  • Pre-existing classification scatter: _resolve_dep_token, _resolve_dep_auth_ctx, _clone_with_fallback still classify locally via is_github_hostname(host) rather than going through backend_for(...). Not made worse by this PR, but worth routing through the new abstraction in a follow-up so vendor routing is fully consolidated.

How to test

git checkout feat/host-backends-protocol
uv sync --extra dev
uv run --extra dev ruff check src/ tests/
uv run --extra dev ruff format --check src/ tests/
uv run pytest tests/unit/deps/test_host_backends.py tests/unit/deps/test_stamp_plugin_version.py -v
uv run pytest tests/unit/ tests/test_github_downloader.py

The 6 failures in tests/test_github_downloader.py (test_resolve_git_reference_commit, test_download_package_*, test_clone_fallback_respects_enterprise_host, test_credential_fill_used_when_no_env_token) reproduce on bare origin/main and are not caused by this branch.

Update: Additional monolith decomposition (commits 9499264, 1dfc18c)

After review feedback that "7 lines is not a meaningful reduction," the python-architect agent identified four high-leverage extractions targeting the bulk of the monolith. All four are now shipped on this branch.

Cumulative impact

src/apm_cli/deps/github_downloader.py: 2400 -> 1645 lines (-755 lines, -31%)

Four new focused modules in src/apm_cli/deps/:

Module Lines Pattern Responsibility
artifactory_orchestrator.py ~330 Adapter + Facade Routes heterogeneous dep_ref shapes to the archive downloader; centralizes Artifactory base-URL parsing and the proxy-eligibility predicate.
git_reference_resolver.py ~440 Strategy + Facade list_remote_refs, resolve (branch/tag/commit), resolve_commit_sha_for_ref -- all the ref-resolution arms that used to fan out across the monolith.
clone_engine.py ~330 Command + Chain-of-responsibility _execute_transport_plan lifted out. Each TransportAttempt is a Command; the chain is per-attempt ADO bearer fallback then plan-level next-attempt fallback.
git_auth_env.py ~150 Builder setup_environment (live env mutation), noninteractive_env, subprocess_env_dict. One coherent place for git auth env construction.

Collaboration contracts

Each new module communicates with the host downloader through narrow Protocol types (_DownloaderContext, _HasArchiveDownloader) rather than depending on the concrete GitHubPackageDownloader class. This keeps the dependency direction one-way and lets the modules be unit-tested in isolation against fakes.

Test-patch interception preserved

To avoid breaking the substantial existing test suite that patches apm_cli.deps.github_downloader.git, apm_cli.deps.github_downloader._rich_warning, etc., the monolith keeps re-exported git and _rich_warning names with # noqa: F401. The extracted modules route those specific symbols through the github_downloader namespace so existing patch sites continue to intercept the calls.

Verification

  • uv run pytest tests/unit -q -> 7909 passed (1 deprecation warning, no failures).
  • uv run --extra dev ruff check src/ tests/ -> silent.
  • uv run --extra dev ruff format --check src/ tests/ -> silent.
  • All GitHubPackageDownloader public methods preserved as thin delegation stubs; CLI surface, apm.yml schema, and lockfile schema unchanged.

Daniel Meppiel and others added 4 commits May 9, 2026 10:38
…struction

Extract the vendor-conditional 'if is_github / elif is_ado / else generic'
ladders out of DownloadDelegate.build_repo_url and the related download
helpers into a typed Protocol with one backend per host kind. Backends are
stateless frozen dataclasses dispatched via auth_resolver.classify_host;
adding a new vendor is one new dataclass + one entry in _BACKEND_BY_KIND
instead of new branches across 8+ call sites.

- New module src/apm_cli/deps/host_backends.py:
  - HostBackend Protocol (runtime-checkable)
  - GitHubBackend, GHECloudBackend, GHESBackend (share _GitHubFamilyBase)
  - ADOBackend (org/project/repo URLs, bearer-via-env, no plain HTTP)
  - GenericGitBackend (GitLab/Gitea/Gogs/Bitbucket; never embeds tokens)
  - backend_for(dep_ref, auth_resolver) and backend_for_host(host, ...)
- DownloadDelegate.build_repo_url collapses 80-line ladder to ~30 lines
  routed through backend_for(); preserves token='' suppression semantics
  and the ado_organization fallthrough case.
- DownloadDelegate._build_contents_api_urls reuses backends rather than
  inline string templating.
- New unit tests tests/unit/deps/test_host_backends.py: 49 cases covering
  Protocol conformance, capability flags, clone URL building (https/ssh/
  http with port + token + bearer permutations), Contents API URLs, and
  dispatch for all five host kinds.

No public-API changes. No CLI / lockfile / apm.yml schema changes.
This is the seam every later phase exploits (bearer retry combinator,
plugin version-stamp helper, Artifactory delegate, subdir decomposition).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Plugin packages produced by the marketplace builder ship a synthesized
apm.yml with version='0.0.0'. Both download_package and download_subdirectory_package
contained the identical 13-line block that:
  - checked package_type == MARKETPLACE_PLUGIN AND version == '0.0.0'
  - copied resolved_commit[:7] onto the package
  - reloaded apm.yml, set version, dumped it back

Lift the duplicated block into stamp_plugin_version() in package_validator
and call it from both sites. The helper is idempotent and safe under all
combinations (no commit, missing apm.yml, non-plugin types, version already
set, package=None).

New tests/unit/deps/test_stamp_plugin_version.py covers all branches.

No public-API change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the inline GitHub/GHES/GHEC API URL ladder in
_resolve_commit_sha_for_ref with backend.build_commits_api_url(...).

Adds a hostname-based fallback in backend_for() so unit tests that
mock AuthResolver.classify_host (returning a MagicMock) still get
the right backend for github.com / *.ghe.com / GHES hosts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Configured GHES via GITHUB_HOST=<custom-host> uses
https://{host}/api/v3 (not https://api.{host}, which is GHE Cloud
shape). Distinguish between *.ghe.com and other GitHub-classified
hosts when constructing the synthetic HostInfo.

Also: `_url_host` falls back to `host_info.host` so partially
constructed DependencyReference objects (without `host`) still
yield well-formed clone URLs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 9, 2026 08:54
Comment thread tests/unit/deps/test_host_backends.py Fixed
Comment thread tests/unit/deps/test_host_backends.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors dependency host/vendor URL construction by introducing a HostBackend Protocol with concrete backends (GitHub/GHE Cloud/GHES/ADO/Generic), and deduplicates marketplace plugin version stamping logic into a shared helper.

Changes:

  • Added src/apm_cli/deps/host_backends.py with backend dispatch + per-vendor URL/API builders, and refactored call sites to use it.
  • Extracted stamp_plugin_version() into package_validator.py and replaced duplicated inline logic in github_downloader.py.
  • Added unit tests for both the new host backends and the version-stamping helper.
Show a summary per file
File Description
src/apm_cli/deps/host_backends.py New HostBackend Protocol + concrete backends + dispatch used to centralize host-specific URL construction.
src/apm_cli/deps/download_strategies.py Refactors repo URL building and Contents-API URL construction to use the new backend abstraction.
src/apm_cli/deps/github_downloader.py Uses host backends for commit-SHA resolution and calls the extracted version-stamping helper.
src/apm_cli/deps/package_validator.py Adds stamp_plugin_version() helper to keep plugin version stamping logic in one place.
tests/unit/deps/test_host_backends.py New unit tests covering backend URL builders and dispatch behavior.
tests/unit/deps/test_stamp_plugin_version.py New unit tests covering stamp_plugin_version() edge cases.

Copilot's findings

Comments suppressed due to low confidence (9)

tests/unit/deps/test_host_backends.py:144

  • These URL assertions use substring checks on a URL string (e.g. '"github.com" in url'). The repo's test conventions require parsing URLs with urllib.parse.urlparse and asserting on components to avoid CodeQL 'py/incomplete-url-substring-sanitization' findings. Refactor these checks to compare parsed hostname/netloc/path instead of substring membership.
    def test_https_empty_token_suppresses_credential(self):
        backend = GitHubBackend(host_info=_info("github.com", "github"))
        url = backend.build_clone_https_url(_dep_ref(), token="")
        assert "@" not in url
        assert "github.com" in url

tests/unit/deps/test_host_backends.py:170

  • This asserts a port via substring membership (':8443' in url) on an https:// URL. Per repo test conventions, parse with urllib.parse.urlparse and assert parsed.port == 8443 (and/or hostname) instead of substring checks to avoid CodeQL URL-substring findings.
    def test_https_with_custom_port(self):
        backend = GitHubBackend(host_info=_info("github.com", "github"))
        url = backend.build_clone_https_url(_dep_ref(port=8443), token=None)
        assert ":8443" in url

tests/unit/deps/test_host_backends.py:175

  • These checks use startswith/endswith on a URL string. To comply with the repo's URL-assertion convention (and avoid CodeQL substring findings), parse with urllib.parse.urlparse and assert on scheme/path components (e.g. scheme == 'http', path endswith '.git').
    def test_http_insecure(self):
        backend = GitHubBackend(host_info=_info("github.com", "github"))
        url = backend.build_clone_http_url(_dep_ref())
        assert url.startswith("http://")
        assert url.endswith(".git")

tests/unit/deps/test_host_backends.py:239

  • These tests assert URL properties via substring membership on the clone URL ('token in url', 'myorg in url', etc.). Repo test conventions require parsing URLs and asserting on parsed components (netloc/path/query) rather than substring membership to avoid CodeQL URL-substring findings.
    def test_https_with_pat(self):
        backend = ADOBackend(host_info=_info("dev.azure.com", "ado"))
        url = backend.build_clone_https_url(self._ado_dep(), token="ado_pat_xyz")
        # ADO embeds the PAT as basic auth.
        assert "ado_pat_xyz" in url
        assert "myorg" in url
        assert "myproj" in url

tests/unit/deps/test_host_backends.py:300

  • This asserts the SSH port via substring membership on an 'ssh://' URL. Per repo test conventions, use urllib.parse.urlparse and assert parsed.port == 7999 (and parsed.scheme == 'ssh') instead of substring checks.
    def test_ssh_url_with_port(self):
        backend = GenericGitBackend(host_info=_info("bitbucket.example.com", "generic", port=7999))
        url = backend.build_clone_ssh_url(_dep_ref(host="bitbucket.example.com", port=7999))
        # Custom SSH port should be threaded through the URL.
        assert "7999" in url

tests/unit/deps/test_host_backends.py:323

  • These assertions check URL paths/queries via substring and endswith. To satisfy the repo's URL-assertion convention, parse each URL with urllib.parse.urlparse and assert on parsed.path and parsed.query components instead of substring membership.
    def test_contents_api_v1_then_v3(self):
        backend = GenericGitBackend(host_info=_info("gitea.example.com", "generic"))
        urls = backend.build_contents_api_urls("o", "r", "f.md", "main")
        assert len(urls) == 2
        # v1 first (Gitea standard), v3 fallback (legacy).
        assert "/api/v1/" in urls[0]
        assert "/api/v3/" in urls[1]
        assert urls[0].endswith("?ref=main")
        assert urls[1].endswith("?ref=main")

tests/unit/deps/test_host_backends.py:150

  • These assertions check token presence/absence via substring membership on a URL string. To comply with repo URL-assertion conventions and avoid CodeQL substring findings, parse the URL (urllib.parse.urlparse) and assert on parsed.netloc / parsed.username / parsed.hostname instead of using 'in'/'not in' against the full URL.
    def test_https_bearer_scheme_does_not_embed_token(self):
        backend = GitHubBackend(host_info=_info("github.com", "github"))
        # Bearer is ADO-only; GitHub family should fall through to plain URL.
        url = backend.build_clone_https_url(_dep_ref(), token="ghp_abc", auth_scheme="bearer")
        assert "ghp_abc" not in url
        assert "@" not in url

tests/unit/deps/test_host_backends.py:249

  • These assertions validate URL properties via substring membership on an https:// URL ('bearer_jwt' not in url, 'myorg' in url). Per repo test conventions, parse with urllib.parse.urlparse and assert on components (hostname/path/netloc) rather than substring membership to avoid CodeQL URL-substring findings.
    def test_https_bearer_scheme_drops_token(self):
        backend = ADOBackend(host_info=_info("dev.azure.com", "ado"))
        url = backend.build_clone_https_url(
            self._ado_dep(), token="bearer_jwt", auth_scheme="bearer"
        )
        # Bearer goes via env, NOT in URL.
        assert "bearer_jwt" not in url
        # Still a valid ADO URL.
        assert "myorg" in url

tests/unit/deps/test_host_backends.py:289

  • These URL checks use substring membership on a URL string ('gitea.example.com' in url, token not in url). Repo test conventions require parsing URLs with urllib.parse.urlparse and asserting on parsed components (hostname/netloc/path) instead of substring checks to avoid CodeQL URL-substring findings.
    def test_https_never_embeds_token(self):
        backend = GenericGitBackend(host_info=_info("gitea.example.com", "generic"))
        # Even when a token is passed, generic hosts defer to credential
        # helpers -- we never embed the token in the URL.
        url = backend.build_clone_https_url(_dep_ref(host="gitea.example.com"), token="some_token")
        assert "some_token" not in url
        assert "@" not in url
        assert "gitea.example.com" in url

  • Files reviewed: 6/6 changed files
  • Comments generated: 7

Comment thread src/apm_cli/deps/host_backends.py Outdated
Comment thread src/apm_cli/deps/download_strategies.py Outdated
Comment thread tests/unit/deps/test_host_backends.py
Comment thread src/apm_cli/deps/host_backends.py Outdated
Comment thread src/apm_cli/deps/download_strategies.py Outdated
Comment thread src/apm_cli/deps/host_backends.py Outdated
Comment thread src/apm_cli/deps/package_validator.py
Daniel Meppiel and others added 6 commits May 9, 2026 15:43
CodeQL flagged `assert "github.com" in url` / `assert
"gitea.example.com" in url` as incomplete URL substring
sanitization (an attacker URL like `evil.com/?fake=github.com`
would pass). Switch to `urlparse(url).hostname == ...` per the
repo test convention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two extractions reduce github_downloader.py from 2400 to 1926 lines:

A. Artifactory orchestration -> ArtifactoryOrchestrator (Adapter+Facade)
   - New module: src/apm_cli/deps/artifactory_orchestrator.py
   - ArtifactoryRouter: pure decision logic (is_registry_only, should_use_proxy,
     parse_proxy_config) -- Adapter pattern over heterogeneous dep_ref shapes.
   - ArtifactoryOrchestrator: download flow over a _HasArchiveDownloader
     Protocol -- Facade with explicit collaborator boundary.
   - 5 methods on the downloader become 3-7 line delegation stubs.

B. Git reference resolution -> GitReferenceResolver (Strategy+Facade)
   - New module: src/apm_cli/deps/git_reference_resolver.py
   - resolve_git_reference / list_remote_refs / _resolve_commit_sha_for_ref
     extracted (~370 lines). Resolver collaborates with the downloader via a
     duck-typed _DownloaderContext Protocol -- mirrors the existing
     DownloadDelegate(host=self) pattern.
   - 3 methods on the downloader become 1-3 line delegation stubs.

Stubs are intentionally retained on GitHubPackageDownloader because
tests/unit/test_artifactory_support.py and other suites access them
directly (self.downloader._method); breaking that contract is out of
scope for a pure refactor.

Test sweep: only the 6 pre-existing baseline failures remain; no new
failures introduced. Ruff lint+format clean across src/ and tests/.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Continues the github_downloader.py decomposition (extractions C+D of 4):

C. Clone engine -> Command + chain-of-responsibility
   New: src/apm_cli/deps/clone_engine.py (~330L)
   Moves _execute_transport_plan and the per-attempt ADO bearer-fallback
   loop out of the monolith. The plan becomes a list of Command objects
   (TransportAttempt) and the chain is the per-attempt retry then
   plan-level next-attempt fallback. The engine reads transport selector,
   protocol pref, allow-fallback flag, and the fallback-port-warned dedup
   set dynamically from the host so callers (and tests) can mutate them
   after construction.

D. Git auth environment -> Builder
   New: src/apm_cli/deps/git_auth_env.py (~150L)
   Moves _git_env_dict, _setup_git_environment, and
   _build_noninteractive_git_env into a focused GitAuthEnvBuilder.

Test-patch interception preserved:
   github_downloader.py keeps re-exported 'git' and '_rich_warning'
   names with noqa F401 so existing test patch sites keep working.
   git_reference_resolver.py and clone_engine.py route through the
   github_downloader namespace for those two names.

Cumulative reduction across all 4 extractions (A, B, C, D):
   github_downloader.py: 2400 -> 1645 lines (-755 lines, -31%).

   A. ArtifactoryRouter + ArtifactoryOrchestrator
      Adapter + Facade.
   B. GitReferenceResolver
      Strategy + Facade with Protocol-based collaboration.
   C. CloneEngine (this commit)
      Command + chain-of-responsibility.
   D. GitAuthEnvBuilder (this commit)
      Builder.

All extractions communicate with the host via narrow Protocol types
(_DownloaderContext / _HasArchiveDownloader) rather than bidirectional
dependencies on the GitHubPackageDownloader concrete class.

Verification:
   uv run pytest tests/unit -q  # 7909 passed
   uv run --extra dev ruff check src/ tests/  # silent
   uv run --extra dev ruff format --check src/ tests/  # silent

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The PAT->bearer fallback loop that was previously exempted in
github_downloader._execute_transport_plan was moved to
deps/clone_engine.CloneEngine.execute as part of the monolith
decomposition (commit 1dfc18c). The exemption rationale is unchanged --
the loop wraps a stateful clone_action that mutates target_path, so
collapsing onto AuthResolver.execute_with_bearer_fallback is non-trivial
and remains tracked as a follow-up to #1212.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dator

Real bugs (Copilot reviewer comments on PR #1223):

1. host_backends.backend_for fallback dispatch built api_base for
   *.ghe.com as 'https://api.{host}', which disagrees with
   AuthResolver.classify_host (which uses 'https://{host}/api/v3').
   This silently produced wrong Contents/Commits API URLs whenever
   classify_host returned a non-HostInfo value (the typical mocked-
   resolver shape in unit tests). Aligned with classify_host.

2. download_strategies._build_contents_api_urls had the same drift on
   the *.ghe.com branch. Fixed to match the canonical classification.

3. host_backends.HostBackend.build_clone_http_url docstring claimed
   '*.ghe.com reject HTTP and will raise', but only ADO actually raises
   in the implementation. Updated docstring to describe real behaviour.

4. host_backends module docstring claimed 'no class hierarchy, no
   inheritance' but the GitHub-family backends share _GitHubFamilyBase.
   Reworded to describe the actual structure.

5. package_validator.stamp_plugin_version was missing type hints. Added
   APMPackage | None, PackageType | None, Path; introduced
   from __future__ import annotations and a TYPE_CHECKING import for
   PackageType to keep the lazy runtime import.

Encoding cleanups:
6. Replaced em dashes (U+2014) with ASCII '--' across host_backends.py,
   download_strategies.py, and tests/unit/deps/test_host_backends.py
   to comply with the printable-ASCII repository rule.

CodeQL false-positive remediation:
7. tests/unit/deps/test_host_backends.py asserted '@' not in url as a
   credential-leak guard. CodeQL flagged the pattern; switched to
   urlparse(url).username/password is None which is both stronger and
   no longer trips the substring-sanitization rule.

Verification:
   uv run pytest tests/unit -q  # 7909 passed
   uv run --extra dev ruff check src/ tests/  # silent
   uv run --extra dev ruff format --check src/ tests/  # silent
   bash scripts/lint-auth-signals.sh  # auth-signal lint clean

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel added the panel-review Trigger the apm-review-panel gh-aw workflow label May 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

APM Review Panel: ship_with_followups

Extracts HostBackend Protocol + 4 focused modules from a 2400-line monolith (-31% LOC), making multi-vendor Git hosting a first-class extension point with no auth regressions.

cc @danielmeppiel @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

All six active panelists converge on the same reading: the architectural direction is correct and the implementation is clean. The HostBackend Protocol plus the _BACKEND_BY_KIND dispatch dict is a genuine decomposition win -- not a cosmetic rename -- and the 31% reduction in github_downloader.py is evidence of real coupling being broken. Auth-expert and supply-chain-security-expert both reviewed the refactor for token leakage and bypass regressions and found none. The GHES api_base fix is unambiguously correct. No panelist raised a blocking finding.

The most consequential gap is test coverage on the three newly extracted modules. git_reference_resolver.py (~440 lines), git_auth_env.py (~150 lines), and artifactory_orchestrator.py (~319 lines) all have zero test coverage at any tier. These are not trivial utility extractions: git_reference_resolver handles SHA resolution and commit API fallback chains (tagged portability-by-manifest and secure-by-default by test-coverage-expert); git_auth_env constructs the GIT_ASKPASS / credential env dict for every clone attempt (secure-by-default); artifactory_orchestrator contains 4-branch routing logic (governed-by-policy). Missing tests on secure-by-default and governed-by-policy surfaces rank above recommended-tier opinion findings per panel weighting rules. These three follow-ups are the highest-priority post-merge items. The GHES api_base regression trap (assert api_base value, not just isinstance) is a one-line add to an existing test and should be treated as equally urgent since the underlying bug was real. The ADO no-org silent re-dispatch (python-architect) is also a latent user-facing bug trap: a malformed ADO dep_ref silently produces a GitHub URL and fails at clone time with a misleading git error -- replacing it with an explicit ValueError costs one line and prevents a class of confusing support reports.

Two recommended findings overlap across panelists: auth-expert and supply-chain-security-expert independently flagged the lint-auth-signals.sh exempt list expansion for clone_engine.py without a tracking issue. The signals are identical; a single follow-up issue with an inline comment reference resolves both. The oss-growth-hacker's signal on CONTRIBUTING.md and the host_backends.py module docstring is strategically sound: the HostBackend extension seam is exactly what a GitLab or Gitea contributor needs to find, and a one-paragraph 'Adding a new vendor backend' note converts curious readers into actual contributors. This is a low-cost, high-leverage addition that should accompany the PR or land immediately after.

Dissent. No substantive disagreements between panelists. The devx-ux-expert and supply-chain-security-expert both flagged the unknown-host silent fallback to GenericGitBackend from slightly different angles (UX confusion vs. security classification bypass); these are complementary, not contradictory, and the recommended fix (emit a warning in the else branch of backend_for()) satisfies both simultaneously.

Aligned with: Portable by manifest (GHES api_base fix corrects URL construction; HostBackend makes manifest-to-host resolution explicit and testable; regression trap still missing); Secure by default (no token leakage regressions found; git_auth_env.py credential env builder has zero test coverage -- highest-priority secure-by-default gap); Governed by policy (artifactory_orchestrator.py 4-branch routing has zero tests; ArtifactoryRouter.should_use_proxy parametric tests needed); Multi-harness / multi-host (HostBackend Protocol with _BACKEND_BY_KIND is the canonical multi-host extension point; GenericGitBackend handles GitLab/Bitbucket today); OSS community-driven (Protocol + registry creates a clear vendor extension seam; CONTRIBUTING.md pointer to host_backends.py would activate it for external contributors).

Growth signal. The 31% LOC reduction in github_downloader.py is a repostable headline: 'We cut our biggest internal file by a third to make multi-vendor Git hosting pluggable.' The HostBackend Protocol and _BACKEND_BY_KIND registry are now the canonical seam for a GitLab or Bitbucket contributor -- GenericGitBackend already handles these hosts. Growth actions: (1) add 'Adding a new vendor backend' to CONTRIBUTING.md and the host_backends.py module docstring in this PR or immediately after; (2) open a 'GitLab first-class backend' issue as a concrete contribution target; (3) use the multi-host + monolith-decomposition story as the lead beat in the next release post.

Panel summary

Persona B R N Takeaway
Python Architect 0 2 1 Strong decomposition; _DownloaderContext protocol is wide (6 methods), ADO no-org silent re-dispatch is a latent bug trap, F401 re-export block is acceptable transit scaffolding.
CLI Logging Expert 0 0 1 Logging architecture preserved correctly across extraction; double-indirection shim for _rich_warning is sound and test-compatible; one nit on implicit coupling documentation.
DevX UX Expert 0 1 2 Backend refactor; CLI surface unchanged. One silent-fallback gap hurts unknown-host DevX; two minor error message nits.
Supply Chain Security Expert 0 2 2 GHES api_base fix closes URL confusion bug; fallback path and ADO short-circuit are safe; one recommended: clone_engine.py exempt list expansion needs a ticket anchor; no blocking findings.
OSS Growth Hacker 0 1 1 Strong contributor-readiness gain; GenericGitBackend is the GitLab on-ramp; one discoverability gap: no CONTRIBUTING pointer to host_backends.py for new vendor authors.
Auth Expert 0 2 1 Clean refactor; no auth bypasses or token leakage regressions. Two recommended improvements around ADO short-circuit HostInfo kind consistency and the growing exempt list.
Test Coverage Expert 0 5 0 Three new modules (git_reference_resolver, git_auth_env, artifactory_orchestrator) have zero test coverage; GHES api_base regression trap is missing; clone_engine unit coverage absent (integration tests cover the flow indirectly).

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Test Coverage Expert] Add unit tests for git_auth_env.py (PAT path sets GIT_ASKPASS; bearer path sets Authorization header; empty-token produces sanitized env) -- 150-line auth env builder that constructs credentials for every clone attempt has zero coverage. Missing tests on a secure-by-default surface rank above all recommended opinion findings. Silent drift here passes a wrong env dict to the git subprocess with no test failing.
  2. [Test Coverage Expert] Add unit tests for git_reference_resolver.py (resolve_ref for tag/branch/SHA; fallback chain; 404 handling; ADO short-circuit) -- 440-line module handling SHA resolution and commit API fallback chains has zero coverage at any tier. Tagged portability-by-manifest and secure-by-default. If fallback logic silently drifts, apm install resolves tags non-deterministically with no automated signal.
  3. [Test Coverage Expert] Add unit tests for artifactory_orchestrator.py (parametrize should_use_proxy; test _resolve_host_prefix raises; test download_package delegation) and add GHES api_base value assertion to TestBackendDispatch.test_dispatch_configured_ghes -- 319-line orchestrator with 4-branch routing has zero coverage (governed-by-policy surface). The GHES api_base bug was real and fixed, but the existing dispatch test asserts isinstance only -- a one-line assertion prevents silent reintroduction.
  4. [Python Architect] Replace ADO no-org silent re-dispatch with explicit ValueError('ADO dependency is missing ado_organization; cannot construct clone URL') -- A malformed ADO dep_ref currently produces a GitHub-style URL and fails at clone time with a confusing git error. An explicit ValueError at URL-construction time gives users an actionable message and prevents a class of misleading support reports.
  5. [Auth Expert] Add tracking issue reference to lint-auth-signals.sh exempt list entry for clone_engine.py; consider scoping exemption to the specific function -- Auth-expert and supply-chain-security-expert independently flagged this. The lint-auth-signals linter is the only automated guardrail preventing open-coded PAT/bearer from spreading; an unanchored exemption can persist indefinitely. One inline comment with an issue number closes both findings.

Architecture

classDiagram
    direction LR

    class HostBackend {
        <<Protocol>>
        +host_info: HostInfo
        +kind: str
        +is_github_family: bool
        +is_generic: bool
        +build_clone_https_url(dep_ref, token, auth_scheme) str
        +build_clone_ssh_url(dep_ref) str
        +build_clone_http_url(dep_ref) str
        +build_commits_api_url(dep_ref, ref) str|None
        +build_contents_api_urls(owner, repo, file_path, ref) list
    }

    class _GitHubFamilyBase {
        <<SharedBase>>
        +host_info: HostInfo
        +build_clone_https_url()
        +build_clone_ssh_url()
        +build_commits_api_url()
        +build_contents_api_urls()
    }

    class GitHubBackend {
        <<ConcreteStrategy>>
        +kind = "github"
    }
    class GHECloudBackend {
        <<ConcreteStrategy>>
        +kind = "ghe_cloud"
    }
    class GHESBackend {
        <<ConcreteStrategy>>
        +kind = "ghes"
    }
    class ADOBackend {
        <<ConcreteStrategy>>
        +kind = "ado"
        +build_clone_https_url()
        +build_clone_http_url() ValueError
        +build_contents_api_urls() list
    }
    class GenericGitBackend {
        <<ConcreteStrategy>>
        +kind = "generic"
        +build_contents_api_urls() list
    }

    class HostInfo {
        <<ValueObject>>
        +host: str
        +kind: str
        +api_base: str
        +has_public_repos: bool
        +port: int|None
    }

    class AuthResolver {
        <<Strategy>>
        +classify_host(host, port) HostInfo
    }

    class _DownloaderContext {
        <<Protocol>>
        +auth_resolver: AuthResolver
        +git_env: dict
        +has_ado_token: bool
        +_resolve_dep_token()
        +_resolve_dep_auth_ctx()
        +_build_noninteractive_git_env()
        +_build_repo_url()
        +_sanitize_git_error()
    }

    class CloneEngine {
        <<Command+ChainOfResponsibility>>
        +execute(dep_ref, target_path, clone_action)
    }

    class ArtifactoryOrchestrator {
        <<Facade>>
        +download_package()
        +download_subdirectory()
    }

    class ArtifactoryRouter {
        <<Adapter>>
        +should_use_proxy(dep_ref) bool
        +parse_proxy_config() tuple
    }

    class _HasArchiveDownloader {
        <<Protocol>>
        +download_artifactory_archive()
    }

    class GitHubPackageDownloader {
        +auth_resolver: AuthResolver
        +git_env: dict
        +has_ado_token: bool
    }

    _GitHubFamilyBase <|-- GitHubBackend
    _GitHubFamilyBase <|-- GHECloudBackend
    _GitHubFamilyBase <|-- GHESBackend
    GitHubBackend ..|> HostBackend
    GHECloudBackend ..|> HostBackend
    GHESBackend ..|> HostBackend
    ADOBackend ..|> HostBackend
    GenericGitBackend ..|> HostBackend

    HostBackend ..> HostInfo : carries
    AuthResolver ..> HostInfo : returns

    CloneEngine ..> _DownloaderContext : injected
    GitHubPackageDownloader ..|> _DownloaderContext
    ArtifactoryOrchestrator ..> _HasArchiveDownloader : injected

    note for HostBackend "Strategy via Protocol + dispatch dict. backend_for() is the dispatcher."
    note for _DownloaderContext "WIDE: 8 attrs/methods. Recommend splitting into _HostConfig + _URLBuilder."

    class HostBackend:::touched
    class _GitHubFamilyBase:::touched
    class GitHubBackend:::touched
    class GHECloudBackend:::touched
    class GHESBackend:::touched
    class ADOBackend:::touched
    class GenericGitBackend:::touched
    class CloneEngine:::touched
    class _DownloaderContext:::touched
    class ArtifactoryOrchestrator:::touched
    class ArtifactoryRouter:::touched
    class _HasArchiveDownloader:::touched

    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A(["apm install / apm run"]) --> B["GitHubPackageDownloader._download_package\nSRC: github_downloader.py"]

    B --> C{"ArtifactoryRouter.should_use_proxy\nSRC: artifactory_orchestrator.py"}
    C -- yes --> D["ArtifactoryOrchestrator.download_package\nSRC: artifactory_orchestrator.py"]
    C -- no --> E["backend_for(dep_ref, auth_resolver)\nSRC: host_backends.py"]

    E --> F["AuthResolver.classify_host(host)\nSRC: core/auth.py"]
    F --> G(["HostInfo (ValueObject)"])
    G --> H{"_BACKEND_BY_KIND dispatch"}
    H -- github/ghe/ghes --> I["GitHubBackend / GHECloudBackend / GHESBackend\nSRC: host_backends.py"]
    H -- ado --> J["ADOBackend\nSRC: host_backends.py"]
    H -- generic/fallback --> K["GenericGitBackend\nSRC: host_backends.py"]

    I & J & K --> L["DownloadDelegate.build_repo_url\nSRC: download_strategies.py"]
    L --> M["backend.build_clone_https_url / ssh / http\nSRC: host_backends.py"]

    M --> N["CloneEngine.execute(dep_ref, target_path, clone_action)\nSRC: clone_engine.py"]
    N --> O["GitAuthEnv.build_env\nSRC: git_auth_env.py"]
    N --> P{"TransportPlan attempt loop"}
    P -- auth failure --> Q["ADO bearer in-attempt retry"]
    P -- plan exhausted --> R(["AggregatedCloneError"])
    P -- success --> S(["[FS] target_path populated"])

    D --> T["[NET] download_artifactory_archive"]
    T --> U["[FS] tempfile.mkdtemp extraction"]
    U --> V["stamp_plugin_version\nSRC: package_validator.py"]
    V --> S

    style I fill:#fff3b0,stroke:#d47600
    style J fill:#fff3b0,stroke:#d47600
    style K fill:#fff3b0,stroke:#d47600
    style L fill:#fff3b0,stroke:#d47600
    style M fill:#fff3b0,stroke:#d47600
    style N fill:#fff3b0,stroke:#d47600
    style O fill:#fff3b0,stroke:#d47600
    style D fill:#fff3b0,stroke:#d47600
    style V fill:#fff3b0,stroke:#d47600
    style E fill:#fff3b0,stroke:#d47600
Loading

Recommendation

Merge this PR. The architectural direction is correct, the auth refactor is clean, and no panelist found a blocking issue. The five follow-ups above should be tracked as a single 'post-extraction test coverage' issue opened before or immediately after merge -- particularly the three zero-coverage modules (git_auth_env, git_reference_resolver, artifactory_orchestrator) which represent the highest-signal gap: production code on secure-by-default and governed-by-policy surfaces with no automated regression protection. The ADO no-org silent re-dispatch fix is a one-liner that is worth landing in this PR if the author is willing; otherwise it should be the first item in the follow-up issue.


Full per-persona findings

Python Architect

  • [recommended] _DownloaderContext protocol is too wide to be genuinely narrow at src/apm_cli/deps/clone_engine.py:69
    The protocol declared in clone_engine.py exposes 6 methods (auth_resolver, git_env, has_ado_token, _resolve_dep_token, _resolve_dep_auth_ctx, _build_noninteractive_git_env, _build_repo_url, _sanitize_git_error). A truly narrow protocol should expose only what the consumer strictly needs; 6 method slots means CloneEngine is still tightly coupled to GitHubPackageDownloader's internal API surface.
    Suggested: Split _DownloaderContext into two narrower protocols: _HostConfig (3 attrs + 1 env builder) and _URLBuilder (1 method). Pass both to CloneEngine.init so each can be stubbed independently in tests.

  • [recommended] ADO no-org silent re-dispatch in build_repo_url can silently mis-route ADO deps at src/apm_cli/deps/download_strategies.py:248
    When an ADO-classified dep_ref has no ado_organization, the code silently calls backend_for(None, ...) which returns a non-ADO backend. A malformed ADO dep_ref will silently produce a GitHub-style HTTPS URL and fail at clone time with a confusing git error.
    Suggested: Replace the silent re-dispatch with an explicit raise ValueError('ADO dependency is missing ado_organization; cannot construct clone URL').

  • [nit] 18 # noqa: F401 re-exports in github_downloader.py should carry a deprecation timeline at src/apm_cli/deps/github_downloader.py:16
    Without a deprecation marker or follow-up issue reference, the block will persist indefinitely and is a regression risk if future tests are written against the old namespace.
    Suggested: Add '# TRANSITIONAL: remove after all test patches are updated to the canonical module path (tracked in #XXXX)' above the re-export block.

CLI Logging Expert

  • [nit] clone_engine._rich_warning double-indirection creates implicit coupling that future test authors may not discover at src/apm_cli/deps/clone_engine.py:56
    clone_engine._rich_warning routes through github_downloader._rich_warning so that existing test patches still intercept warnings from CloneEngine. This works correctly today, but the coupling is invisible to new test authors who would naturally patch apm_cli.deps.clone_engine._rich_warning and miss all emissions.
    Suggested: Add one sentence to both the clone_engine._rich_warning docstring and the github_downloader.py noqa comment: 'New clone_engine tests that need to capture warnings must patch apm_cli.deps.github_downloader._rich_warning, not apm_cli.deps.clone_engine._rich_warning.'

DevX UX Expert

  • [recommended] Unknown host silently falls back to GenericGitBackend with no user-visible signal at src/apm_cli/deps/host_backends.py
    In backend_for(), when classify_host returns a kind not in _BACKEND_BY_KIND AND the host is not a GitHub hostname, execution reaches the else branch and returns GenericGitBackend with no log, no warning. A user pointing APM at a GitLab or Bitbucket host will get silent generic handling with confusing auth failures several layers deeper.
    Suggested: In the else branch of backend_for(), emit a _rich_warning: "Host '{host}' is not recognized as GitHub, GHE, GHES, or ADO -- falling back to generic Git. Auth may be limited."

  • [nit] Artifactory FQDN error omits where to set ARTIFACTORY_BASE_URL at src/apm_cli/deps/artifactory_orchestrator.py
    RuntimeError('Artifactory download requires either FQDN or ARTIFACTORY_BASE_URL') names the env var but gives no guidance on where to set it (apm.yml? shell env?).

  • [nit] ADO bearer fallback success logged only at debug, invisible to users on slow/retried clones at src/apm_cli/deps/clone_engine.py
    A user whose PAT fails and APM silently retries with AAD bearer gets no indication that auth was retried, making it hard to understand why the clone was slower or to know the PAT is misconfigured.

Supply Chain Security Expert

  • [recommended] clone_engine.py added to lint-auth-signals Rule-A exempt list without a tracking ticket reference at scripts/lint-auth-signals.sh:22
    Without a concrete ticket anchor the exemption can persist indefinitely and the Rule-A signal becomes permanently suppressed for all future code added to clone_engine.py.
    Suggested: Add a GitHub issue reference to the exemption comment. Also consider scoping the exemption to the specific function rather than the whole file.

  • [recommended] backend_for defensive fallback silently swallows unexpected classify_host return values at src/apm_cli/deps/host_backends.py
    If classify_host returns an unexpected type due to a future refactor or subtle bug, the code silently constructs a GenericGitBackend, bypassing classify_host's security classification silently.
    Suggested: Emit a _warning() or _debug() when the isinstance(info, HostInfo) check fails in the non-ADO path.

  • [nit] GHES api_base fix is correct and closes URL confusion -- Old form (api/redacted){host} would resolve to a different host for any GHES appliance. The fix aligns to https://{host}/api/v3.

  • [nit] _HasArchiveDownloader Protocol is private-module boundary, not a supply-chain risk -- Only wired at construction by github_downloader.py; no attacker-controlled implementation path at runtime.

OSS Growth Hacker

  • [recommended] No contributor signpost to HostBackend as the GitLab/Bitbucket extension point at src/apm_cli/deps/host_backends.py
    The HostBackend Protocol + _BACKEND_BY_KIND registry is exactly the seam a GitLab contributor needs, but nothing in CONTRIBUTING.md, the module docstring, or the PR description links these two ideas.
    Suggested: Add a 'Adding a new vendor backend' paragraph to the host_backends.py module docstring and a matching entry in CONTRIBUTING.md under an 'Extending APM' section.

  • [nit] GenericGitBackend's GitLab classification as 'generic' should be surfaced in the multi-host story -- A follow-up issue titled 'GitLab first-class backend' would give the community a concrete contribution target.

Auth Expert

  • [recommended] ADO short-circuit in backend_for() passes classify_host HostInfo unmodified even when its kind field does not match 'ado' at src/apm_cli/deps/host_backends.py
    If classify_host ever returns HostInfo(kind='github') for an ADO host (wrong CNAME, mislabeled GHES entry), ADOBackend gets host_info whose kind='github', which could produce malformed ADO API calls.
    Suggested: After classify_host returns in the ADO short-circuit block, normalize: if isinstance(info, HostInfo) and info.kind != 'ado': info = dataclasses.replace(info, kind='ado')

  • [recommended] clone_engine.py added to auth-signals exempt list without a tracking issue at scripts/lint-auth-signals.sh
    The lint-auth-signals linter is the only automated guardrail preventing open-coded PAT/bearer from spreading. Exempting a second file doubles the unguarded surface without a concrete issue reference.
    Suggested: Add inline issue reference, e.g.: '# PAT->bearer open-coded; tracked in #NNNN'

  • [nit] Hostname-based fallback in backend_for() assigns has_public_repos=False for all GHES hosts -- Production path unreachable; only affects mocks/tests. Harmless but noteworthy if callers ever branch on has_public_repos in the fallback path.

Doc Writer -- inactive

All 11 changed files are internal (src/apm_cli/deps/ modules, scripts/lint-auth-signals.sh, and unit tests); no user-facing CLI surface, CHANGELOG entry, README claim, or docs page is affected by this refactor.

Test Coverage Expert

  • [recommended] git_reference_resolver.py (~440 lines) has no tests at any tier at src/apm_cli/deps/git_reference_resolver.py
    New 440-line module handling SHA resolution and commit API fallback chains. No test file at any tier. The install-pipeline surface requires integration-with-fixtures floor per the tier matrix.
    Suggested: Add tests/unit/deps/test_git_reference_resolver.py covering resolve_ref for tag/branch/sha; fallback chain; 404 handling; ADO short-circuit path.
    Proof (missing): tests/unit/deps/test_git_reference_resolver.py::test_resolve_ref_tag_returns_sha -- proves: apm install resolves a tagged version to a deterministic SHA before cloning [portability-by-manifest,secure-by-default]
    assert resolver.resolve_ref(dep_ref, 'v1.0') == 'abc123sha'

  • [recommended] git_auth_env.py has no tests at any tier at src/apm_cli/deps/git_auth_env.py
    New ~150-line auth env builder that constructs GIT_ASKPASS / GIT_CONFIG_COUNT env dicts for every clone attempt. grep of tests/ for 'git_auth_env', 'GitAuthEnv' returns no hits.
    Suggested: Add tests/unit/deps/test_git_auth_env.py: assert PAT path sets GIT_ASKPASS; assert bearer path sets Authorization header; assert empty-token path produces sanitized env.
    Proof (missing): tests/unit/deps/test_git_auth_env.py::test_pat_auth_sets_git_askpass -- proves: apm install passes the correct credential env to git subprocess for PAT-authenticated clones [secure-by-default]
    assert 'GIT_ASKPASS' in env or 'GIT_CONFIG_COUNT' in env

  • [recommended] artifactory_orchestrator.py (~319 lines) has no tests at any tier at src/apm_cli/deps/artifactory_orchestrator.py
    ArtifactoryRouter.should_use_proxy and ArtifactoryOrchestrator.download_package are new routing and download facade. grep returns no hits. Router has non-trivial branching with 4 different routing cases.
    Suggested: Add tests/unit/deps/test_artifactory_orchestrator.py: parametrize should_use_proxy; test _resolve_host_prefix raises on missing host/prefix; test download_package delegates correctly.
    Proof (missing): tests/unit/deps/test_artifactory_orchestrator.py::test_should_use_proxy_returns_false_for_explicit_artifactory -- proves: apm install routes Artifactory dependencies correctly without double-proxying explicit URLs [governed-by-policy,vendor-neutral]
    assert ArtifactoryRouter.should_use_proxy(dep_ref_artifactory) is False

  • [recommended] GHES api_base bug fix has no regression trap -- dispatch test asserts type only, not the corrected api_base value at tests/unit/deps/test_host_backends.py:357
    The fix corrects api_base from '(api/redacted){host}' to 'https://{host}/api/v3'. Existing dispatch test asserts isinstance(backend, GHESBackend) but not backend.host_info.api_base. A future refactor could silently reintroduce the wrong api_base.
    Suggested: In TestBackendDispatch.test_dispatch_configured_ghes, add: assert backend.host_info.api_base == f'(git.acme.com/redacted)
    Proof (missing): tests/unit/deps/test_host_backends.py::TestBackendDispatch::test_dispatch_configured_ghes -- proves: apm install against a GHES instance uses the correct REST API base URL [portability-by-manifest]
    assert backend.host_info.api_base == '(git.acme.com/redacted) # not (api.git.acme.com/redacted)

  • [recommended] clone_engine.py PAT->bearer fallback has no unit tests; integration tests cover the flow indirectly at src/apm_cli/deps/clone_engine.py
    CloneEngine.execute contains the ADO bearer in-attempt retry logic. No unit test file for clone_engine.py. Integration tests cover the user promise at integration tier via GitHubPackageDownloader, not CloneEngine directly.
    Suggested: Add tests/unit/deps/test_clone_engine.py: mock clone_action to raise on PAT (401 equivalent), assert bearer retry attempted; assert cleanup on final failure.
    Proof (missing): tests/unit/deps/test_clone_engine.py::test_ado_bearer_retry_on_pat_failure -- proves: apm install retries with ADO bearer token when initial PAT clone returns 401 [secure-by-default]
    assert clone_action.call_count == 2 # first PAT, then bearer retry

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

Generated by PR Review Panel for issue #1223 · ● 5.4M ·

@microsoft microsoft deleted a comment from github-actions Bot May 9, 2026
…r, artifactory_orchestrator + ADO no-org ValueError

Address Test Coverage Expert and Python Architect findings on PR #1223:

1. New: tests/unit/deps/test_git_auth_env.py (12 tests)
   Cover GitAuthEnvBuilder.setup_environment (PAT, bearer, empty),
   noninteractive_env (pop-then-conditionally-restore fence,
   suppress_credential_helpers full fence), and subprocess_env_dict
   (sanitized merge, non-string skip).

2. New: tests/unit/deps/test_git_reference_resolver.py (17 tests)
   Cover resolve_commit_sha_for_ref (artifactory/ADO short-circuit,
   40-char SHA passthrough, tag via commits API, 404, network error,
   no commits API). Cover list_remote_refs (artifactory empty,
   success, GitCommandError -> RuntimeError, ADO bearer fallback,
   noninteractive env path, sanitized error). Cover resolve()
   artifactory short-circuit paths.

3. New: tests/unit/deps/test_artifactory_orchestrator.py (16 tests)
   Parametrized ArtifactoryRouter.should_use_proxy across all
   4 routing branches. ArtifactoryOrchestrator._resolve_host_prefix
   raises on missing host/prefix and on no-config. download_package
   delegation contract (host/prefix/scheme/ref propagated to
   _HasArchiveDownloader, default ref "main", validation failure
   raises, archive RuntimeError propagates).

4. ADO no-org silent re-dispatch -> explicit ValueError
   ADOBackend.build_clone_https_url and build_clone_ssh_url now
   raise ValueError("ADO dependency is missing ado_organization;
   cannot construct clone URL") at URL-construction time instead
   of producing a malformed URL that fails later with a confusing
   git error.

5. GHES api_base regression guard
   test_dispatch_ghes_via_github_host_env asserts
   backend.host_info.api_base == "https://git.acme.com/api/v3" to
   prevent silent reintroduction of the api.<host> bug.

Total: 45 new/updated unit tests. Full unit suite green (7987 pass).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Round 4 follow-ups: status

# Reviewer Item Status Commit
1 Test Coverage Expert Unit tests for git_auth_env.py (PAT / bearer / empty) Done dacaac40
2 Test Coverage Expert Unit tests for git_reference_resolver.py (tag / branch / SHA / fallback / 404 / ADO short-circuit) Done dacaac40
3 Test Coverage Expert Unit tests for artifactory_orchestrator.py + GHES api_base value assertion Done dacaac40
4 Python Architect Replace ADO no-org silent re-dispatch with explicit ValueError Done dacaac40

Details

1. tests/unit/deps/test_git_auth_env.py (12 tests) -- Builder-pattern coverage:

  • setup_environment: PAT path sets GIT_ASKPASS=echo, GIT_TERMINAL_PROMPT=0, GIT_CONFIG_NOSYSTEM=1, ConnectTimeout=30 in GIT_SSH_COMMAND. Bearer path preserves GIT_HTTP_EXTRAHEADER. Empty token still produces sanitized env. Existing GIT_SSH_COMMAND with ConnectTimeout is preserved; without is appended.
  • noninteractive_env: pop-then-conditionally-restore credential-helper fence; suppress_credential_helpers=True re-sets full fence (GIT_CONFIG_COUNT/KEY_0/VALUE_0); default clears stale fence keys.
  • subprocess_env_dict: merges auth env over sanitized base; sanitization removes ambient GIT_DIR; non-string values skipped.

2. tests/unit/deps/test_git_reference_resolver.py (17 tests) -- Strategy + Facade coverage via a minimal _DownloaderContext stub:

  • resolve_commit_sha_for_ref: artifactory and ADO short-circuit to None; 40-char SHA returned lowercase; tag resolved via mocked commits API; 404 / unexpected body / network exception all return None; non-GitHub backend (build_commits_api_url -> None) returns None.
  • list_remote_refs: artifactory returns []; success path parses + sorts; GitCommandError raises RuntimeError with sanitized error context; ADO basic + token routes through execute_with_bearer_fallback; unauthenticated path uses _build_noninteractive_git_env.
  • resolve: artifactory and artifactory-proxy short-circuits avoid clone entirely.

3. tests/unit/deps/test_artifactory_orchestrator.py (16 tests) -- Adapter + Facade coverage:

  • ArtifactoryRouter.should_use_proxy: 7-row parametrize covers explicit artifactory FQDN, GitHub host (proxy), ADO (skip), GitLab (skip / registry-only forces), default host treated as github.com.
  • ArtifactoryRouter.parse_proxy_config: returns None when no env; returns (host, prefix, scheme) tuple when configured.
  • ArtifactoryOrchestrator._resolve_host_prefix: explicit FQDN wins; missing host/prefix raises ValueError; falls back to proxy_info; no FQDN + no proxy raises RuntimeError.
  • download_package: delegates to _HasArchiveDownloader with correct (host, prefix, owner, repo, ref, target_path, scheme=...) tuple; default ref is "main"; validation failure raises with cleanup; archive RuntimeError propagates.
  • GHES api_base regression guard added to test_dispatch_ghes_via_github_host_env: asserts backend.host_info.api_base == "https://git.acme.com/api/v3".

4. ADO no-org explicit ValueError (src/apm_cli/deps/host_backends.py):

  • ADOBackend.build_clone_https_url and build_clone_ssh_url now raise ValueError("ADO dependency is missing ado_organization; cannot construct clone URL") at URL-construction time. Replaces the silent fallback that produced a malformed URL failing later with a confusing git error.
  • Two new tests in TestADOBackend: test_https_missing_org_raises_value_error, test_ssh_missing_org_raises_value_error.

Verification

  • uv run pytest tests/unit -q -> 7987 passed (45 new tests added)
  • uv run --extra dev ruff check src/ tests/ -> silent
  • uv run --extra dev ruff format --check src/ tests/ -> silent
  • bash scripts/lint-auth-signals.sh -> clean

@danielmeppiel danielmeppiel merged commit cda5104 into main May 9, 2026
9 checks passed
@danielmeppiel danielmeppiel deleted the feat/host-backends-protocol branch May 9, 2026 18:49
danielmeppiel pushed a commit to slava-kudzinau/apm that referenced this pull request May 10, 2026
Resolve conflicts between PR microsoft#1149 (GitLab marketplace host support) and
main's recent refactors (microsoft#1223 HostBackend Protocol, microsoft#630 gh CLI fallback).

- Add GitLabBackend (kind='gitlab', is_generic=True) embedding PAT via
  oauth2:token@host using build_gitlab_https_clone_url, registered in
  _BACKEND_BY_KIND so backend_for routes gitlab.com correctly.
- download_strategies.build_repo_url: adopt main's backend dispatch and
  add a GitLab branch resolving the per-dep PAT through
  auth_resolver.resolve_for_dep.
- github_downloader: keep main's delegating refactor (GitAuthEnvBuilder,
  CloneEngine, GitReferenceResolver); merge debug log to include
  has_gitlab_token.
- auth.py / token_manager.py docstrings: reflect new chain ordering
  (env vars -> gh CLI -> credential helper) for GitHub class plus
  GitLab and Generic entries.
- authentication.md: per-host-class chain narrative, table, package
  source table and mermaid flowchart updated to include gh CLI step.
- CHANGELOG: keep both Added entries (GitLab + gh CLI fallback).
- Update test_dispatch_generic to use a non-GitLab generic host and add
  test_dispatch_gitlab covering the new backend dispatch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

panel-review Trigger the apm-review-panel gh-aw workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants