support virtual packages on generic git hosts (Gitea)#587
Conversation
|
@microsoft-github-policy-service agree |
Review FeedbackThanks @ganesanviji for adding Gitea support! The raw URL download approach is a good idea. A few issues need addressing: 1. API version change breaks GitLab (critical)Changing Options:
2. Virtual package detection too broad
3. Bare
|
danielmeppiel
left a comment
There was a problem hiding this comment.
As per previous comment
|
Hi @danielmeppiel , Thanks for review and I have addressed all the reviewed suggestions, 1. API version change breaks GitLab (critical)Addressed with the preferred approach. For non-GitHub/GHE hosts we now attempt If that returns a non-200 we fall through to API version negotiation, trying 2. Virtual package detection too broadWe did not use
The distinction is driven by a new 3. Bare
|
|
@danielmeppiel - Could you please review the changes and update is there any changes or explanation needed on these changes AS SOON AS POSSIBLE. It would be very helpful to include the gitea support in APM in next release to use. |
There was a problem hiding this comment.
Pull request overview
Adds broader support for installing virtual packages from non-GitHub Git hosts (with a focus on Gitea), by updating dependency parsing heuristics and expanding the downloader’s raw/API fetching logic, plus new regression tests around hostname classification and generic-host URL handling.
Changes:
- Add
is_gitlab_hostname()and use it during virtual-package detection to treat GitLab nested-group paths as repo paths by default. - Extend generic-host downloads with a raw URL attempt and API version “negotiation”.
- Add unit tests covering GitLab hostname detection, Gitea/generic URL parsing expectations, and generic-host download behavior.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_github_host.py | Adds tests for GitLab hostname detection. |
| tests/unit/test_generic_git_urls.py | Adds Gitea/generic-host virtual package detection regression tests. |
| tests/test_github_downloader.py | Adds tests for generic-host raw download + API version fallback behavior. |
| src/apm_cli/utils/github_host.py | Introduces is_gitlab_hostname() helper. |
| src/apm_cli/models/dependency/reference.py | Adjusts virtual-package detection and standard URL parsing behavior for generic hosts/GitLab. |
| src/apm_cli/deps/github_downloader.py | Adds generic-host raw fetch and API version candidate list changes. |
Copilot's findings
- Files reviewed: 6/6 changed files
- Comments generated: 5
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread |
sergio-sisternes-epam
left a comment
There was a problem hiding this comment.
Thanks for the continued work on this, @ganesanviji! The raw URL approach and API version negotiation are solid ideas. Two blockers need attention before this can merge:
-
Tests vs implementation mismatch (4 CI failures):
min_base_segments = 2for non-GitLab generic hosts means 3+ segment paths are treated as virtual, buttest_three_segment_gitea_path_is_not_virtualand siblings expect them NOT to be virtual. One side needs to change -- please clarify the intended behavior for nested-group repos on Gitea/generic hosts and align tests with implementation. -
GitLab
/api/v4/repos/...doesn't exist: GitLab's v4 API uses/api/v4/projects/{id}/repository/files/..., not the/repos/.../contents/format. Suggest removing v4 from the candidate list (stick to v1/v3 for Gitea/Gogs) or adding a separate GitLab-specific code path.
Happy to help iterate on the virtual package detection design if useful!
| elif is_gitlab_hostname(validated_host): | ||
| min_base_segments = len(path_segments) | ||
| else: | ||
| min_base_segments = 2 |
There was a problem hiding this comment.
Blocking: This sets min_base_segments = 2 for all non-GitLab generic hosts (Gitea, Bitbucket, self-hosted git). That means gitea.myorg.com/group/subgroup/repo parses as repo_url="group/subgroup" + virtual_path="repo" + is_virtual=True.
But your tests (test_three_segment_gitea_path_is_not_virtual, test_four_segment_generic_path_without_indicators_is_not_virtual) expect is_virtual=False with repo_url="group/subgroup/repo".
This is the root cause of the 4 CI failures. Either:
- Change this to
min_base_segments = len(path_segments)(all generic hosts treat full path as repo, require dict form for virtual), or - Update the tests to match the current 2-segment split behavior.
The Copilot bot suggested the first option -- I agree that's safer, since Gitea also supports nested orgs/groups.
There was a problem hiding this comment.
Yes, you are correct. but when I go with this logic and try to use the below structure of gitea repo to install, I am facing the issue and installation process is not completing. So, I have added this logic to work with the below structure of gitea repo to mention in apm.yml file.
apm install gitea.host.com/group/repo/skills/create-pull-request#Skill_Feature
|
@danielmeppiel / @sergio-sisternes-epam - I have addressed the review comments. Could you please check my comments and approve it? |
…anesanviji/apm into feat/genric-host-gitea-private
APM Review Panel VerdictDisposition: REQUEST_CHANGES (two pre-merge blockers: CHANGELOG conflict and test URL assertion convention) Per-persona findingsPython Architect: This is a routine feature PR (no new abstract bases, no hierarchy restructure). Two mermaid blocks below. 1. OO / class diagram classDiagram
direction LR
class GitHubPackageDownloader {
<<IOBoundary>>
+download_raw_file(dep_ref, file_path, ref) bytes
+_download_github_file(dep_ref, file_path, ref) bytes
+_try_raw_download(owner, repo, ref, file_path) bytes
+_resilient_get(url, headers, timeout) Response
}
class DependencyReference {
<<ValueObject>>
+host str
+repo_url str
+is_virtual bool
+parse(url) DependencyReference
+_parse_standard_url(...) DependencyReference
+_detect_virtual_package(...) tuple
}
class AuthResolver {
<<Strategy>>
+resolve(host, org, port) AuthContext
}
class AuthContext {
<<ValueObject>>
+token str
+source str
}
GitHubPackageDownloader ..> DependencyReference : reads
GitHubPackageDownloader ..> AuthResolver : resolves token
AuthResolver ..> AuthContext : returns
GitHubPackageDownloader ..> AuthContext : uses for headers
class GitHubPackageDownloader:::touched
class DependencyReference:::touched
classDef touched fill:#fff3b0,stroke:#d47600
2. Execution flow diagram flowchart TD
A[download_raw_file dep_ref file_path ref] --> G{host == github.com\nand no token?}
G -->|yes| H["[NET] _try_raw_download raw.githubusercontent.com CDN"]
H --> I{200?}
I -->|yes| R1[return content]
I -->|no - try fallback ref| J["[NET] _try_raw_download fallback ref"]
J --> K{200?}
K -->|yes| R2[return content]
G -->|no| L
K -->|no| L
L{generic host?\nnot github.com\nnot .ghe.com} -->|yes| M["[NET] GET host/owner/repo/raw/ref/file\nwith Authorization header if token set"]
M --> N{200?}
N -->|yes| R3[return content]
N -->|no - pass| O
L -->|no| O
O[Build api_url_candidates list] --> P{host type}
P -->|github.com| Q["candidates = [api.github.com/repos/...]"]
P -->|.ghe.com| QQ["candidates = [api.host/repos/...]"]
P -->|generic| QQQ["candidates = [host/api/v1/..., host/api/v3/...]"]
Q --> T["[NET] GET api_url_candidates at index 0"]
QQ --> T
QQQ --> T
T --> U{200?}
U -->|yes| R4[return content]
U -->|404 + remaining candidates| V["[NET] try remaining candidates in order"]
V --> W{any 200?}
W -->|yes| R5[return content]
W -->|no| X{ref in main or master?}
X -->|non-default ref| Y[raise RuntimeError: not found at ref]
X -->|default ref| Z["Build fallback_url_candidates\n(opposite branch)"]
Z --> AA["[NET] try each fallback URL"]
AA --> AB{any 200?}
AB -->|yes| R6[return content]
AB -->|no| AC[raise RuntimeError: not found]
U -->|401 or 403| AD[raise RuntimeError: auth or rate-limit error]
Design patterns
Additional findings:
CLI Logging Expert: No issues. The two new DevX UX Expert: No CLI surface changes -- no new commands, flags, or help text to review. Supply Chain Security Expert: No new security vulnerabilities introduced.
Auth Expert: ACTIVATED (fast-path:
One gap: users of private Gitea repos who set OSS Growth Hacker: This is a strong enterprise-adoption signal. Gitea/Gogs is widely deployed in regulated industries, enterprises that cannot use GitHub.com due to data-residency requirements, and the Chinese developer ecosystem (Gitea is the leading self-hosted git platform in that market). "apm install gitea.myorg.com/owner/repo works" removes a hard adoption blocker for a meaningful segment. Side-channel to CEO: the CHANGELOG entry is story-shaped ("Virtual package support for self-hosted Git services (Gitea, Gogs)..."). Recommend featuring this in the release notes with a one-liner CEO arbitrationSpecialists are aligned. The core approach -- raw URL cascade with v1/v3 API negotiation for generic Git hosts -- is architecturally sound, correctly routes through AuthResolver, and fails closed. Two items are hard blockers. First, the CHANGELOG.md conflict: the PR branch diverged from main before the 0.9.3 cut, and the contributor appears to have merged or pulled main, causing already-released entries (#917, #884, #849, #887, #882, #915, #885) to re-appear in the diff as additions to Second, the URL substring assertions in the new tests (lines 1905, 1936, 1937, 1953, 1978 of The DRY violation (duplicated The Growth Hacker's documentation gap is a valid follow-up issue, not a gate. @ganesanviji -- great contribution. Two concrete changes needed (CHANGELOG cleanup + urlparse assertions), then this is ready. Required actions before merge
Optional follow-ups
|
|
Hi @ganesanviji -- friendly check-in: this PR has been in Concretely, the outstanding feedback is: Let us know either way -- a quick "still on it" or "going to close it" reply is enough. Thanks for the contribution! |
…ric-host-gitea-private Resolved conflicts: - CHANGELOG.md: keep microsoft#587 entry under [Unreleased], merge new [0.12.2] release - src/apm_cli/deps/github_downloader.py: fix _ssh_attempt_allowed to properly delegate to validation module (remove misplaced CDN/Gitea code from method body) - src/apm_cli/models/dependency/reference.py: use parts[1:] for generic hosts (supports nested groups on Gitea/Bitbucket) - tests/test_github_downloader.py: keep TestGiteaRawUrlDownload and TestGiteaGogsApiVersionNegotiation; restore TestRefExistsViaLsRemote Post-merge fixes: - src/apm_cli/deps/download_strategies.py: add Gitea raw URL + api/v1/v3 negotiation to download_github_file (replaces bare /api/v3 fallback) - tests/unit/test_generic_git_urls.py: update test_gitea_collections_path_is_virtual to use is_virtual_subdirectory() (is_virtual_collection was removed in microsoft#1094)
|
@danielmeppiel - I’ve resolved all PR review concerns and conflicts. Please review and merge it as soon as possible, as resolving the conflicts took considerable time. |
|
@danielmeppiel - can we merge it asap? since I am continuously working on the conflicts |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…PI envelope Address apm-review-panel needs_rework findings on PR microsoft#587: Security (BLOCKING): - Stop forwarding env-var bearer tokens (GITHUB_APM_PAT, GITHUB_TOKEN, GH_TOKEN) as Authorization headers to arbitrary non-GitHub hosts. Tokens are forwarded to a generic host only when one of: (a) source is git's host-scoped credential helper, (b) source is a per-host GITHUB_APM_PAT_<ORG> env var (explicit opt-in), or (c) the host matches a non-empty GITHUB_HOST env var (declared GHES). Bare GITHUB_APM_PAT against an arbitrary FQDN no longer leaks. Correctness: - Decode Gitea/Gogs Contents API JSON envelope ({"content": <base64>, "encoding": "base64"}) instead of returning the JSON bytes as file content. - Re-raise non-404 RequestException in the ref-fallback API loop (was silently swallowed; primary loop already re-raised). Refactor: - Extract _build_contents_api_urls, _build_generic_host_auth_headers, _extract_contents_api_payload, _build_unsupported_or_missing_error. - Use is_github_hostname() for host-class checks; extend with GITHUB_HOST env var for custom-domain GHES. - 401/403 phrasing now host-class-aware (rate-limit / SAML hints only on GitHub family). Tests: - Remove vicious test test_raw_url_with_token_adds_auth_header that pinned the PAT-leak as desired behavior. - Add 16 e2e tests covering: env-var token does NOT leak to gitea host, GitHub still gets token, ghe.com still gets token, git-credential-fill still authenticates, GITHUB_HOST opt-in, Gitea JSON envelope decode, raw bytes passthrough, v1->v3 API negotiation, fallback loop reraise on non-404 (both primary + ref-fallback), descriptive unsupported-host error, verbose logs. Docs: - README, docs/guides/dependencies, packages/apm-guide dependencies reference: name Gitea/Gogs explicitly; note GitLab nested-group repos still require object form for virtual paths. - CHANGELOG: tighten [Unreleased] entry to one user-facing line. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract _is_configured_ghes() helper to centralise the GITHUB_HOST opt-in check; use it in both routing classification and the auth header builder so the rule cannot drift. - Thread is_github_host into _build_contents_api_urls so GHES hosts declared via GITHUB_HOST skip the wasted Gitea v1 round-trip. - Widen the 401/403 generic-host error to enumerate all three token sources (git credential helper, per-org GITHUB_APM_PAT_<ORG>, GITHUB_HOST opt-in). - Include file_path and ref in the raw-URL fallback verbose log for easier diagnosis. - Bump requests floor to >=2.31.0 to mitigate CVE-2023-32681 (cross-host redirect Authorization leak); matters now that the PR widens cross-host download surface. - Doc polish: slim the dependencies.md intro, move v1/v3 fallback detail next to Virtual Packages, normalise punctuation, add a cross-link to the GitHub Authentication Setup section. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per .github/instructions/tests.instructions.md, URL/host assertions in tests must extract URL tokens and compare via urllib.parse, never via substring match. The 404-error test asserted 'git.example.com' as a substring of the error message, which CodeQL flags. Embed a canonical raw URL (https://<host>/<repo>/raw/<ref>/<file>) in the not-found error and parse it in the test via urlparse. The URL is also more useful UX -- terminals render it clickable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Add support for installing virtual packages from self-hosted Git services like Gitea. Currently, APM only supports virtual packages (subdirectories) on GitHub. This change enables users with Gitea to install packages from subdirectories within repositories.
Changes:
DependencyReferenceto recognize subdirectory packages on generic Git hosts (any FQDN)/api/v3to/api/v1for better compatibility with Gitea and other Git servicesMore details about the changes:
✅ Change 1: Virtual Package Detection (reference.py)
Analysis: This only affects generic Git hosts, not GitHub. Allows subdirectory packages to be detected as virtual even without specific file extensions. Safe because:
GitHub uses separate logic path (is_generic_host = False)
Validation still requires package markers (apm.yml, SKILL.md, etc.) in the subdirectory
No impact on existing GitHub virtual file detection
✅ Change 2: Authenticated Raw Downloads (github_downloader.py)
Analysis: Improves private repo support. Safe because:
Only applies to generic hosts, not GitHub
Falls back to API if raw fails
Uses standard Authorization header format
✅ Change 3: API Endpoint Update
Analysis: Gitea uses /api/v1/, GitHub uses /api/v3/. Safe because:
GitHub still uses /api/v3/
Gitea API v1 is compatible for contents endpoint
Falls back gracefully if endpoint doesn't exist
Motivation:
Enterprise teams using self-hosted Git services (Gitea) cannot currently use APM to install packages from repository subdirectories. This is a significant limitation for organizations that don't use GitHub. These changes enable APM to work seamlessly across all Git hosting platforms.
Type of change
Testing
Tested locally
All existing tests pass
Added tests for new functionality (if applicable)