Skip to content

support virtual packages on generic git hosts (Gitea)#587

Merged
danielmeppiel merged 37 commits into
microsoft:mainfrom
ganesanviji:feat/genric-host-gitea-private
May 8, 2026
Merged

support virtual packages on generic git hosts (Gitea)#587
danielmeppiel merged 37 commits into
microsoft:mainfrom
ganesanviji:feat/genric-host-gitea-private

Conversation

@ganesanviji
Copy link
Copy Markdown
Contributor

Description

Add support for installing virtual packages from self-hosted Git services like Gitea. Currently, APM only supports virtual packages (subdirectories) on GitHub. This change enables users with Gitea to install packages from subdirectories within repositories.

Changes:

  • Enhanced virtual package detection in DependencyReference to recognize subdirectory packages on generic Git hosts (any FQDN)
  • Added authenticated raw file downloads for private repositories on generic hosts
  • Updated API endpoint from /api/v3 to /api/v1 for better compatibility with Gitea and other Git services
  • Maintains full backward compatibility with existing GitHub functionality

More details about the changes:
✅ Change 1: Virtual Package Detection (reference.py)

Analysis: This only affects generic Git hosts, not GitHub. Allows subdirectory packages to be detected as virtual even without specific file extensions. Safe because:

GitHub uses separate logic path (is_generic_host = False)
Validation still requires package markers (apm.yml, SKILL.md, etc.) in the subdirectory
No impact on existing GitHub virtual file detection

✅ Change 2: Authenticated Raw Downloads (github_downloader.py)

Analysis: Improves private repo support. Safe because:

Only applies to generic hosts, not GitHub
Falls back to API if raw fails
Uses standard Authorization header format

✅ Change 3: API Endpoint Update

Analysis: Gitea uses /api/v1/, GitHub uses /api/v3/. Safe because:

GitHub still uses /api/v3/
Gitea API v1 is compatible for contents endpoint
Falls back gracefully if endpoint doesn't exist

Motivation:
Enterprise teams using self-hosted Git services (Gitea) cannot currently use APM to install packages from repository subdirectories. This is a significant limitation for organizations that don't use GitHub. These changes enable APM to work seamlessly across all Git hosting platforms.

Type of change

  • New feature
  • Bug fix
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally

    • Gitea virtual package parsing: PASS
    • GitHub virtual file parsing: PASS (unchanged)
    • Regular repo parsing: PASS (unchanged)
  • All existing tests pass

    • Code validated with custom test cases for Gitea URLs
    • Backward compatibility verified for GitHub usage
  • Added tests for new functionality (if applicable)

    • Validated with multiple test scenarios

@ganesanviji
Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Review Feedback

Thanks @ganesanviji for adding Gitea support! The raw URL download approach is a good idea. A few issues need addressing:

1. API version change breaks GitLab (critical)

Changing /api/v3/ to /api/v1/ fixes Gitea but breaks GitLab (which uses /api/v4/). The current /api/v3/ also doesn't work for Gitea, so the real fix is per-host API version detection.

Options:

  • Preferred: Try the raw URL path first (your new code), then fall back to API with version negotiation (try v1, then v3, then v4)
  • Alternative: Make API version configurable per host in marketplace or auth config

2. Virtual package detection too broad

len(path_segments) > 2 would treat any path with 3+ segments as virtual. For example, gitea.example.com/owner/repo has exactly 2 segments (owner + repo) but gitea.example.com/owner/repo/subdir has 3. The current logic (has_virtual_ext or has_collection) is more precise. Could you check if the issue is specifically that Gitea paths aren't being detected, and narrow the condition?

3. Bare except: pass (line ~1069)

Please catch specific exceptions:

except (requests.RequestException, OSError):
    pass

4. No unit tests

Please add tests for:

  • Gitea raw URL download succeeds
  • GitLab API URL still works (regression test)
  • Virtual package detection for generic hosts

Relationship with PR #584

This PR complements #584 (which fixes the validation/ls-remote path). They don't conflict and can merge independently.

Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per previous comment

@ganesanviji
Copy link
Copy Markdown
Contributor Author

Hi @danielmeppiel ,

Thanks for review and I have addressed all the reviewed suggestions,

1. API version change breaks GitLab (critical)

Addressed with the preferred approach. For non-GitHub/GHE hosts we now attempt
the raw URL path first:

https://{host}/{owner}/{repo}/raw/{ref}/{file_path}

If that returns a non-200 we fall through to API version negotiation, trying
v1 -> v3 -> v4 in order. This covers Gitea (v1), legacy Gogs (v3), and
GitLab (v4) without hardcoding anything per host. GitHub and GHE continue to
use their existing code paths unchanged.


2. Virtual package detection too broad

We did not use len(path_segments) > 2. The existing
has_virtual_ext or has_collection guard is kept intact. The only change is
the else branch (no virtual indicator present):

  • GitLab (gitlab.com or any gitlab.* hostname): keeps
    min_base_segments = len(path_segments) -- the full path is the repo,
    preserving nested-group support.
  • All other generic hosts (Gitea, Bitbucket, self-hosted git, etc.): uses
    min_base_segments = 2 -- owner/repo convention, any extra segments are
    treated as a virtual subdirectory path.

The distinction is driven by a new is_gitlab_hostname() helper added to
github_host.py.


3. Bare except: pass

Fixed. The catch at that location is now:

except (requests.RequestException, OSError):
    pass

4. No unit tests

Added in two files:

tests/unit/test_github_host.py -- test_is_gitlab_hostname() covers:

  • gitlab.com and gitlab.* self-hosted instances return True
  • Case-insensitive matching (GITLAB.COM)
  • Negative cases: GitHub, Gitea, Bitbucket, Azure DevOps, None, ""

tests/unit/test_generic_git_urls.py -- TestGiteaVirtualPackageDetection
class covers:

  • Gitea virtual file extension detected as virtual (owner/repo/file.prompt.md)
  • Gitea /collections/ path detected as virtual collection
  • Dict-format virtual package on Gitea host
  • Plain two-segment owner/repo on Gitea is never virtual

TestNestedGroupSupport provides the GitLab regression guard --
gitlab.com/group/subgroup/repo must not be detected as virtual.

@ganesanviji
Copy link
Copy Markdown
Contributor Author

@danielmeppiel - Could you please review the changes and update is there any changes or explanation needed on these changes AS SOON AS POSSIBLE. It would be very helpful to include the gitea support in APM in next release to use.

@danielmeppiel danielmeppiel requested a review from Copilot April 9, 2026 04:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds broader support for installing virtual packages from non-GitHub Git hosts (with a focus on Gitea), by updating dependency parsing heuristics and expanding the downloader’s raw/API fetching logic, plus new regression tests around hostname classification and generic-host URL handling.

Changes:

  • Add is_gitlab_hostname() and use it during virtual-package detection to treat GitLab nested-group paths as repo paths by default.
  • Extend generic-host downloads with a raw URL attempt and API version “negotiation”.
  • Add unit tests covering GitLab hostname detection, Gitea/generic URL parsing expectations, and generic-host download behavior.
Show a summary per file
File Description
tests/unit/test_github_host.py Adds tests for GitLab hostname detection.
tests/unit/test_generic_git_urls.py Adds Gitea/generic-host virtual package detection regression tests.
tests/test_github_downloader.py Adds tests for generic-host raw download + API version fallback behavior.
src/apm_cli/utils/github_host.py Introduces is_gitlab_hostname() helper.
src/apm_cli/models/dependency/reference.py Adjusts virtual-package detection and standard URL parsing behavior for generic hosts/GitLab.
src/apm_cli/deps/github_downloader.py Adds generic-host raw fetch and API version candidate list changes.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 5

Comment thread src/apm_cli/models/dependency/reference.py
Comment thread src/apm_cli/deps/github_downloader.py Outdated
Comment thread tests/test_github_downloader.py Outdated
Comment thread tests/unit/test_generic_git_urls.py
Comment thread src/apm_cli/models/dependency/reference.py Outdated
ganesanviji and others added 2 commits April 9, 2026 16:09
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown
Collaborator

@sergio-sisternes-epam sergio-sisternes-epam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work on this, @ganesanviji! The raw URL approach and API version negotiation are solid ideas. Two blockers need attention before this can merge:

  1. Tests vs implementation mismatch (4 CI failures): min_base_segments = 2 for non-GitLab generic hosts means 3+ segment paths are treated as virtual, but test_three_segment_gitea_path_is_not_virtual and siblings expect them NOT to be virtual. One side needs to change -- please clarify the intended behavior for nested-group repos on Gitea/generic hosts and align tests with implementation.

  2. GitLab /api/v4/repos/... doesn't exist: GitLab's v4 API uses /api/v4/projects/{id}/repository/files/..., not the /repos/.../contents/ format. Suggest removing v4 from the candidate list (stick to v1/v3 for Gitea/Gogs) or adding a separate GitLab-specific code path.

Happy to help iterate on the virtual package detection design if useful!

elif is_gitlab_hostname(validated_host):
min_base_segments = len(path_segments)
else:
min_base_segments = 2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: This sets min_base_segments = 2 for all non-GitLab generic hosts (Gitea, Bitbucket, self-hosted git). That means gitea.myorg.com/group/subgroup/repo parses as repo_url="group/subgroup" + virtual_path="repo" + is_virtual=True.

But your tests (test_three_segment_gitea_path_is_not_virtual, test_four_segment_generic_path_without_indicators_is_not_virtual) expect is_virtual=False with repo_url="group/subgroup/repo".

This is the root cause of the 4 CI failures. Either:

  • Change this to min_base_segments = len(path_segments) (all generic hosts treat full path as repo, require dict form for virtual), or
  • Update the tests to match the current 2-segment split behavior.

The Copilot bot suggested the first option -- I agree that's safer, since Gitea also supports nested orgs/groups.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. but when I go with this logic and try to use the below structure of gitea repo to install, I am facing the issue and installation process is not completing. So, I have added this logic to work with the below structure of gitea repo to mention in apm.yml file.

apm install gitea.host.com/group/repo/skills/create-pull-request#Skill_Feature

Comment thread tests/unit/test_generic_git_urls.py
Comment thread src/apm_cli/deps/github_downloader.py Outdated
Comment thread src/apm_cli/deps/github_downloader.py Outdated
@ganesanviji
Copy link
Copy Markdown
Contributor Author

@danielmeppiel / @sergio-sisternes-epam - I have addressed the review comments. Could you please check my comments and approve it?

@danielmeppiel danielmeppiel added CI/CD Deprecated: use area/ci-cd. Kept for issue history; will be removed in milestone 0.10.0. and removed CI/CD Deprecated: use area/ci-cd. Kept for issue history; will be removed in milestone 0.10.0. labels Apr 19, 2026
@danielmeppiel danielmeppiel added panel-review Trigger the apm-review-panel gh-aw workflow and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: REQUEST_CHANGES (two pre-merge blockers: CHANGELOG conflict and test URL assertion convention)


Per-persona findings

Python Architect:

This is a routine feature PR (no new abstract bases, no hierarchy restructure). Two mermaid blocks below.

1. OO / class diagram

classDiagram
    direction LR
    class GitHubPackageDownloader {
        <<IOBoundary>>
        +download_raw_file(dep_ref, file_path, ref) bytes
        +_download_github_file(dep_ref, file_path, ref) bytes
        +_try_raw_download(owner, repo, ref, file_path) bytes
        +_resilient_get(url, headers, timeout) Response
    }
    class DependencyReference {
        <<ValueObject>>
        +host str
        +repo_url str
        +is_virtual bool
        +parse(url) DependencyReference
        +_parse_standard_url(...) DependencyReference
        +_detect_virtual_package(...) tuple
    }
    class AuthResolver {
        <<Strategy>>
        +resolve(host, org, port) AuthContext
    }
    class AuthContext {
        <<ValueObject>>
        +token str
        +source str
    }
    GitHubPackageDownloader ..> DependencyReference : reads
    GitHubPackageDownloader ..> AuthResolver : resolves token
    AuthResolver ..> AuthContext : returns
    GitHubPackageDownloader ..> AuthContext : uses for headers
    class GitHubPackageDownloader:::touched
    class DependencyReference:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

2. Execution flow diagram

flowchart TD
    A[download_raw_file dep_ref file_path ref] --> G{host == github.com\nand no token?}
    G -->|yes| H["[NET] _try_raw_download raw.githubusercontent.com CDN"]
    H --> I{200?}
    I -->|yes| R1[return content]
    I -->|no - try fallback ref| J["[NET] _try_raw_download fallback ref"]
    J --> K{200?}
    K -->|yes| R2[return content]
    G -->|no| L
    K -->|no| L
    L{generic host?\nnot github.com\nnot .ghe.com} -->|yes| M["[NET] GET host/owner/repo/raw/ref/file\nwith Authorization header if token set"]
    M --> N{200?}
    N -->|yes| R3[return content]
    N -->|no - pass| O
    L -->|no| O
    O[Build api_url_candidates list] --> P{host type}
    P -->|github.com| Q["candidates = [api.github.com/repos/...]"]
    P -->|.ghe.com| QQ["candidates = [api.host/repos/...]"]
    P -->|generic| QQQ["candidates = [host/api/v1/..., host/api/v3/...]"]
    Q --> T["[NET] GET api_url_candidates at index 0"]
    QQ --> T
    QQQ --> T
    T --> U{200?}
    U -->|yes| R4[return content]
    U -->|404 + remaining candidates| V["[NET] try remaining candidates in order"]
    V --> W{any 200?}
    W -->|yes| R5[return content]
    W -->|no| X{ref in main or master?}
    X -->|non-default ref| Y[raise RuntimeError: not found at ref]
    X -->|default ref| Z["Build fallback_url_candidates\n(opposite branch)"]
    Z --> AA["[NET] try each fallback URL"]
    AA --> AB{any 200?}
    AB -->|yes| R6[return content]
    AB -->|no| AC[raise RuntimeError: not found]
    U -->|401 or 403| AD[raise RuntimeError: auth or rate-limit error]
Loading

Design patterns

  • Used in this PR: Chain of Responsibility / URL cascade -- api_url_candidates implements an explicit priority chain (raw -> v1 -> v3) iterated in order, consistent with the Strategy chain already used in AuthResolver. Visible as the <<IOBoundary>> stereotype in the class diagram.
  • Pragmatic suggestion: Extract a private helper _build_api_url_candidates(host, owner, repo, file_path, ref) -> list[str]. The three-way if host == "github.com" / elif .ghe.com / else block that builds this list appears twice in _download_github_file (once for the initial ref, once for the fallback ref) and has already diverged subtly (trailing whitespace inconsistency in one list literal). Extracting it removes the DRY violation in ~12 lines and prevents future host types from being added in only one branch.

Additional findings:

  • GHES regression (recommended fix): Hosts configured via GITHUB_HOST=myghes.company.com are classified as generic hosts (is_github_hostname() returns False for custom GHES). They now make one extra failed raw URL request (Gitea format, wrong for GHES) and one extra failed v1 API call before succeeding on v3. Fails gracefully but adds latency per file for every GHES install.
  • reference.py change (or validated_host is not None) is a valid fix for usernames with dots on validated generic hosts, but the specific edge case (dot in username on a generic host) has no test coverage.

CLI Logging Expert: No issues. The two new verbose_callback(f"Downloaded file: ...") calls in the generic-host raw URL path match the existing pattern used in the github.com CDN path. No direct _rich_* calls added. No CommandLogger or DiagnosticCollector changes. STATUS_SYMBOLS convention unaffected. This section is clean.


DevX UX Expert: No CLI surface changes -- no new commands, flags, or help text to review. cli-commands.md update is not required. The user-visible behavior (apm install gitea.myorg.com/owner/repo now works) is intuitive and matches the existing mental model. Error messages reuse existing RuntimeError patterns. One note: the PR description mentions a known limitation for GitLab nested groups ("dict form required for virtual packages") -- this limitation should appear in a docs page for the new Gitea support rather than only in the CHANGELOG, so users who search the docs can find the workaround. File as a follow-up issue rather than a merge blocker.


Supply Chain Security Expert: No new security vulnerabilities introduced.

  • Token obtained via self.auth_resolver.resolve(host, org, port=dep_ref.port) -- correct, goes through AuthResolver (line 1570). Token used only in request header, not logged, not present in error messages. Pass.
  • Raw URL attempt catches (requests.RequestException, OSError) and falls through -- fail-gracefully not fail-silent (404 is not silently accepted as content, it just falls to the next candidate). Pass.
  • API candidate loop: non-404 HTTP errors on candidate URLs raise RuntimeError immediately rather than trying the next candidate. Correctly fail-closed for unexpected errors. Pass.
  • No new path traversal surface: downloaded bytes are returned to callers that already existed. No new file-write paths added. Pass.
  • No hash verification on raw URL content: pre-existing gap (same as github.com CDN path). Lockfile commit-SHA pinning is the outer protection layer. Not introduced by this PR.
  • host.lower() normalization prevents case-variation bypass of the github.com / .ghe.com exclusion. Pass.

Auth Expert: ACTIVATED (fast-path: src/apm_cli/deps/github_downloader.py changed).

  • Token resolution for generic hosts: auth_resolver.resolve(host, org, port=dep_ref.port) is called once at line 1570, before both the CDN fast-path and the new generic raw URL path. The same resolved token flows to both paths. No AuthResolver bypass. Pass.
  • Token format: Authorization: token {token} -- Gitea accepts this format. Other generic hosts that require Bearer would fail the raw URL and fall through to the API. Graceful degradation.
  • AuthResolver precedence unchanged: GITHUB_APM_PAT_{ORG} -> GITHUB_APM_PAT -> GITHUB_TOKEN -> GH_TOKEN -> git credential fill. Generic host users need GITHUB_APM_PAT or a configured credential helper for private repos. This is undocumented in the PR.
  • No host classification regression: is_github_hostname() and .ghe.com suffix checks are not modified. AuthResolver's per-org resolution is not touched.
  • Unauthenticated generic hosts: raw_headers = {} when token is falsy -- correct, no spurious auth header sent to public Gitea repos. Pass.

One gap: users of private Gitea repos who set GITHUB_APM_PAT will have their token sent to the raw URL AND the API endpoint on the same host. This is by design (one token per host in AuthResolver's model) and is acceptable. However, per-org token scoping (GITHUB_APM_PAT_{ORG}) does not apply to generic hosts in the current AuthResolver -- this is a pre-existing limitation, not introduced here.


OSS Growth Hacker: This is a strong enterprise-adoption signal. Gitea/Gogs is widely deployed in regulated industries, enterprises that cannot use GitHub.com due to data-residency requirements, and the Chinese developer ecosystem (Gitea is the leading self-hosted git platform in that market). "apm install gitea.myorg.com/owner/repo works" removes a hard adoption blocker for a meaningful segment.

Side-channel to CEO: the CHANGELOG entry is story-shaped ("Virtual package support for self-hosted Git services (Gitea, Gogs)..."). Recommend featuring this in the release notes with a one-liner apm install gitea.myorg.com/owner/my-agent example. The documentation gap (no Gitea credential setup guide) limits conversion -- users who hit auth errors on private repos will churn. A docs follow-up in the same minor release compounds this feature's value significantly.


CEO arbitration

Specialists are aligned. The core approach -- raw URL cascade with v1/v3 API negotiation for generic Git hosts -- is architecturally sound, correctly routes through AuthResolver, and fails closed. Two items are hard blockers.

First, the CHANGELOG.md conflict: the PR branch diverged from main before the 0.9.3 cut, and the contributor appears to have merged or pulled main, causing already-released entries (#917, #884, #849, #887, #882, #915, #885) to re-appear in the diff as additions to [Unreleased]. Only the #587 entry belongs there. This is a blocker because shipping with duplicate CHANGELOG entries would be a visible quality signal to external contributors and users who check the changelog.

Second, the URL substring assertions in the new tests (lines 1905, 1936, 1937, 1953, 1978 of tests/test_github_downloader.py) violate the project's codified test convention: all URL assertions must use urllib.parse.urlparse and compare on parsed components, not substring membership. This is both a CodeQL (py/incomplete-url-substring-sanitization) concern and a project convention enforced in CI. Fix: replace assert "/api/v1/" in urls[1] with assert urlparse(urls[1]).path.startswith("/api/v1/") (or the equivalent).

The DRY violation (duplicated api_url_candidates builder) and GHES latency regression are recommended pre-merge fixes but not hard blockers -- the code is correct, and the regressions are performance-only and fail-gracefully.

The Growth Hacker's documentation gap is a valid follow-up issue, not a gate.

@ganesanviji -- great contribution. Two concrete changes needed (CHANGELOG cleanup + urlparse assertions), then this is ready.


Required actions before merge

  1. CHANGELOG.md: Remove the 10 lines that duplicate already-released 0.9.3 entries (feat(gemini): add Gemini CLI as supported target with integration tests #917, add(skills): pr-description-skill -- anchored, self-sufficient PR bodies for microsoft/apm #884, feat(cli): apm experimental - feature-flag registry with list/enable/disable/reset #849, feat: close audit-blindness gap for local .apm/ content via virtual self-entry + includes: manifest field #887, harden(apm-review-panel): one-comment discipline + Hybrid E auth routing + apm-primitives-architect persona #882, feat(skills): add apm-triage-panel for issue triage #915, ci: dogfood apm audit --ci and integration-drift gate (closes #883) #885). Keep only the single #587 bullet under [Unreleased] / ### Added. The easiest path: rebase onto current main and re-add only the support virtual packages on generic git hosts (Gitea) #587 line.

  2. URL assertions in tests/test_github_downloader.py: Replace substring URL checks with urlparse-based assertions at lines 1905, 1936, 1937, 1953, and 1978. Example fix for line 1905:

    from urllib.parse import urlparse
    # Before (flagged):
    assert "/api/v1/" in urls[1]
    # After (correct):
    assert urlparse(urls[1]).path.startswith("/api/v1/repos/")

    Apply the same transform to the other four occurrences.


Optional follow-ups

  • Extract _build_api_url_candidates(host, owner, repo, file_path, ref) -> list[str] to eliminate the duplicated if/elif/else block in _download_github_file (appears at both the initial-ref and fallback-ref resolution points).
  • Guard GHES hosts: when is_github_hostname(host) would return True for a custom GITHUB_HOST setting, skip the /api/v1/ candidate and start directly at /api/v3/ to avoid an unnecessary failed round-trip per file.
  • Add a test for the reference.py edge case that motivated the change: a username containing a dot (e.g., gitea.myorg.com/user.name/repo) should parse correctly on a validated generic host.
  • Open a follow-up issue to add a docs page for Gitea/generic host credential setup (GITHUB_APM_PAT configuration, private repo example), which the Growth Hacker correctly identifies as a conversion gap for the enterprise audience this feature targets.

Generated by PR Review Panel for issue #587 · ● 1.3M ·

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Hi @ganesanviji -- friendly check-in: this PR has been in changes-requested state since 2026-04-19, and we are working through the wave-3 merge backlog. If you are still able to address the review feedback in the next ~7 days, we would love to land this; otherwise we will close it as stale and you can re-open later (no hard feelings -- contribution graveyard happens to everyone).

Concretely, the outstanding feedback is:
Fix the min_base_segments=2 parsing heuristic -- on nested groups like gitea.example.com/group/subgroup/repo it currently parses as repo_url=group/subgroup, virtual=repo and loses the real repo name -- plus the other blockers listed in the review (most are scope reductions, not throw-aways).

Let us know either way -- a quick "still on it" or "going to close it" reply is enough.

Thanks for the contribution!

…ric-host-gitea-private

Resolved conflicts:
- CHANGELOG.md: keep microsoft#587 entry under [Unreleased], merge new [0.12.2] release
- src/apm_cli/deps/github_downloader.py: fix _ssh_attempt_allowed to properly
  delegate to validation module (remove misplaced CDN/Gitea code from method body)
- src/apm_cli/models/dependency/reference.py: use parts[1:] for generic hosts
  (supports nested groups on Gitea/Bitbucket)
- tests/test_github_downloader.py: keep TestGiteaRawUrlDownload and
  TestGiteaGogsApiVersionNegotiation; restore TestRefExistsViaLsRemote

Post-merge fixes:
- src/apm_cli/deps/download_strategies.py: add Gitea raw URL + api/v1/v3
  negotiation to download_github_file (replaces bare /api/v3 fallback)
- tests/unit/test_generic_git_urls.py: update test_gitea_collections_path_is_virtual
  to use is_virtual_subdirectory() (is_virtual_collection was removed in microsoft#1094)
@ganesanviji
Copy link
Copy Markdown
Contributor Author

@danielmeppiel - I’ve resolved all PR review concerns and conflicts. Please review and merge it as soon as possible, as resolving the conflicts took considerable time.

@ganesanviji
Copy link
Copy Markdown
Contributor Author

@danielmeppiel - can we merge it asap? since I am continuously working on the conflicts

danielmeppiel and others added 4 commits May 8, 2026 21:55
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…PI envelope

Address apm-review-panel needs_rework findings on PR microsoft#587:

Security (BLOCKING):
- Stop forwarding env-var bearer tokens (GITHUB_APM_PAT, GITHUB_TOKEN,
  GH_TOKEN) as Authorization headers to arbitrary non-GitHub hosts.
  Tokens are forwarded to a generic host only when one of:
  (a) source is git's host-scoped credential helper, (b) source is a
  per-host GITHUB_APM_PAT_<ORG> env var (explicit opt-in), or
  (c) the host matches a non-empty GITHUB_HOST env var (declared GHES).
  Bare GITHUB_APM_PAT against an arbitrary FQDN no longer leaks.

Correctness:
- Decode Gitea/Gogs Contents API JSON envelope
  ({"content": <base64>, "encoding": "base64"}) instead of returning
  the JSON bytes as file content.
- Re-raise non-404 RequestException in the ref-fallback API loop
  (was silently swallowed; primary loop already re-raised).

Refactor:
- Extract _build_contents_api_urls, _build_generic_host_auth_headers,
  _extract_contents_api_payload, _build_unsupported_or_missing_error.
- Use is_github_hostname() for host-class checks; extend with
  GITHUB_HOST env var for custom-domain GHES.
- 401/403 phrasing now host-class-aware (rate-limit / SAML hints
  only on GitHub family).

Tests:
- Remove vicious test test_raw_url_with_token_adds_auth_header that
  pinned the PAT-leak as desired behavior.
- Add 16 e2e tests covering: env-var token does NOT leak to gitea
  host, GitHub still gets token, ghe.com still gets token,
  git-credential-fill still authenticates, GITHUB_HOST opt-in,
  Gitea JSON envelope decode, raw bytes passthrough, v1->v3 API
  negotiation, fallback loop reraise on non-404 (both primary +
  ref-fallback), descriptive unsupported-host error, verbose logs.

Docs:
- README, docs/guides/dependencies, packages/apm-guide dependencies
  reference: name Gitea/Gogs explicitly; note GitLab nested-group
  repos still require object form for virtual paths.
- CHANGELOG: tighten [Unreleased] entry to one user-facing line.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract _is_configured_ghes() helper to centralise the GITHUB_HOST
  opt-in check; use it in both routing classification and the auth
  header builder so the rule cannot drift.
- Thread is_github_host into _build_contents_api_urls so GHES hosts
  declared via GITHUB_HOST skip the wasted Gitea v1 round-trip.
- Widen the 401/403 generic-host error to enumerate all three token
  sources (git credential helper, per-org GITHUB_APM_PAT_<ORG>,
  GITHUB_HOST opt-in).
- Include file_path and ref in the raw-URL fallback verbose log for
  easier diagnosis.
- Bump requests floor to >=2.31.0 to mitigate CVE-2023-32681
  (cross-host redirect Authorization leak); matters now that the PR
  widens cross-host download surface.
- Doc polish: slim the dependencies.md intro, move v1/v3 fallback
  detail next to Virtual Packages, normalise punctuation, add a
  cross-link to the GitHub Authentication Setup section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread tests/test_github_downloader.py Fixed
Daniel Meppiel and others added 2 commits May 8, 2026 22:38
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per .github/instructions/tests.instructions.md, URL/host assertions in
tests must extract URL tokens and compare via urllib.parse, never via
substring match. The 404-error test asserted 'git.example.com' as a
substring of the error message, which CodeQL flags.

Embed a canonical raw URL (https://<host>/<repo>/raw/<ref>/<file>) in
the not-found error and parse it in the test via urlparse. The URL is
also more useful UX -- terminals render it clickable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit 575d2a0 into microsoft:main May 8, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

panel-review Trigger the apm-review-panel gh-aw workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants