diff --git a/CHANGELOG.md b/CHANGELOG.md index d7a06f650..4dfbc99ee 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added +- Virtual subdirectory and raw-file packages now resolve from self-hosted Git services (Gitea, Gogs) via raw URL with API v1/v3 fallback. (#587) - `shared/apm.md` gh-aw shared workflow exposes a `target:` import input (default `all`) so consumer workflows can ship slim, single-harness bundles instead of always packing every layout. (#1184) ### Fixed diff --git a/NOTICE b/NOTICE index c2fa9f37b..828c17e75 100644 --- a/NOTICE +++ b/NOTICE @@ -139,7 +139,7 @@ SOFTWARE. ## Component. requests -- Version requirement: `>=2.28.0` +- Version requirement: `>=2.31.0` - Upstream: https://github.com/psf/requests - SPDX: `Apache-2.0` - Notes: Apache-2.0 requires forwarding the upstream NOTICE file verbatim. diff --git a/README.md b/README.md index a3d934993..394682122 100644 --- a/README.md +++ b/README.md @@ -68,7 +68,7 @@ One command, no configuration -- VS Code and GitHub Copilot read the file automa One `apm.yml` describes every primitive your agents need — instructions, skills, prompts, agents, hooks, plugins, MCP servers — and `apm install` reproduces the exact same setup across every client on every machine. `apm.lock.yaml` pins the resolved tree the way `package-lock.json` does for npm. - **[One manifest for everything](https://microsoft.github.io/apm/reference/primitive-types/)** — declared once, deployed across Copilot, Claude, Cursor, OpenCode, Codex, Gemini, Windsurf -- **[Install from anywhere](https://microsoft.github.io/apm/guides/dependencies/)** — GitHub, GitLab, Bitbucket, Azure DevOps, GitHub Enterprise, any git host +- **[Install from anywhere](https://microsoft.github.io/apm/guides/dependencies/)** — GitHub, GitLab, Bitbucket, Azure DevOps, GitHub Enterprise, Gitea, Gogs, any git host - **[Transitive dependencies](https://microsoft.github.io/apm/guides/dependencies/)** — packages can depend on packages; APM resolves the full tree - **[Author plugins](https://microsoft.github.io/apm/guides/plugins/)** — build Copilot, Claude, and Cursor plugins with dependency management, then export standard `plugin.json` - **[Marketplaces](https://microsoft.github.io/apm/guides/marketplaces/)** — install plugins from curated registries in one command, deployed across all targets and locked diff --git a/docs/src/content/docs/guides/dependencies.md b/docs/src/content/docs/guides/dependencies.md index 7090e6e7c..57dccb107 100644 --- a/docs/src/content/docs/guides/dependencies.md +++ b/docs/src/content/docs/guides/dependencies.md @@ -15,7 +15,7 @@ APM dependencies are git repositories containing `.apm/` directories with contex - **Build on tested context** instead of starting from scratch - **Maintain consistency** across multiple repositories and teams -APM supports any git-accessible host — GitHub, GitLab, Bitbucket, self-hosted instances, and more. +APM supports any git-accessible host — GitHub, GitLab, Bitbucket, Gitea, Gogs, self-hosted instances, and more. See [GitHub Authentication Setup](#github-authentication-setup) below for how tokens flow to non-GitHub hosts via the git credential helper. ## Dependency Types @@ -36,6 +36,8 @@ APM supports multiple dependency types: **Virtual File Packages** download a single file (like a prompt or instruction) and integrate it directly. +For self-hosted **Gitea** and **Gogs**, virtual subdirectory and file packages resolve via the `/{owner}/{repo}/raw/{ref}/{path}` URL first, then fall back to the Contents API (v1 native, v3 Gogs-compat). GitLab is not yet supported for virtual packages -- use git-clone-based dependencies for GitLab repos. + ### Claude Skills Claude Skills are packages with a `SKILL.md` file that describe capabilities for AI agents. APM can install them and transform them for your target platform: diff --git a/packages/apm-guide/.apm/skills/apm-usage/dependencies.md b/packages/apm-guide/.apm/skills/apm-usage/dependencies.md index 75fd09a98..d6903997e 100644 --- a/packages/apm-guide/.apm/skills/apm-usage/dependencies.md +++ b/packages/apm-guide/.apm/skills/apm-usage/dependencies.md @@ -129,6 +129,8 @@ Virtual packages reference a subset of a repository. Classification is by extension only. A path like `owner/repo/collections/security` (no extension) is a Subdirectory; the actual shape -- APM package (incl. dep-only `apm.yml` with no `.apm/`), skill bundle, or plugin -- is resolved at fetch time by probing for `apm.yml`. +**Gitea and Gogs (self-hosted or vendor-hosted):** virtual packages resolve via the host's `/{owner}/{repo}/raw/{ref}/{path}` URL first, then fall back to the Contents API (v1 native, v3 Gogs-compat). GitLab nested-group repos (`group/subgroup/repo`) require the object form (`git: `, `path: `) -- shorthand is ambiguous on >2-segment paths. + > **Removed (#1094):** the legacy `.collection.yml` / `.collection.yaml` virtual-package form is no longer supported. Convert any `.collection.yml` to an `apm.yml` with a `dependencies:` section, then reference the resulting subdirectory as a regular subdirectory virtual package. ## Canonical storage rules diff --git a/pyproject.toml b/pyproject.toml index 02716890c..149c98d6f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -25,7 +25,7 @@ dependencies = [ "click>=8.0.0", "colorama>=0.4.6", "pyyaml>=6.0.0", - "requests>=2.28.0", + "requests>=2.31.0", "python-frontmatter>=1.0.0", "llm>=0.17.0", "llm-github-models>=0.1.0", diff --git a/src/apm_cli/deps/download_strategies.py b/src/apm_cli/deps/download_strategies.py index f30f0f827..9074f0c08 100644 --- a/src/apm_cli/deps/download_strategies.py +++ b/src/apm_cli/deps/download_strategies.py @@ -7,6 +7,8 @@ operations to it (Facade/Delegate pattern). """ +import base64 +import json import os import random import sys @@ -634,77 +636,138 @@ def download_github_file( # All raw attempts failed -- fall through to API path which # handles private repos, rate-limit messaging, and SAML errors. + # --- Generic host: raw URL first, then API version negotiation --- + # For non-GitHub non-GHE hosts (Gitea, Gogs, self-hosted git), try the + # raw URL path first, then negotiate API versions v1 -> v3. + is_github_host = is_github_hostname(host) or self._is_configured_ghes(host) + if not is_github_host: + raw_url = f"https://{host}/{owner}/{repo}/raw/{ref}/{file_path}" + raw_headers = self._build_generic_host_auth_headers(host, file_ctx, accept=None) + if verbose_callback: + verbose_callback(f"Trying raw URL on generic host {host}: {raw_url}") + try: + response = self._host._resilient_get(raw_url, headers=raw_headers, timeout=30) + if response.status_code == 200: + if verbose_callback: + verbose_callback(f"Downloaded file: {host}/{dep_ref.repo_url}/{file_path}") + return response.content + except (requests.RequestException, OSError) as raw_err: + if verbose_callback: + verbose_callback( + f"Raw URL on {host} failed for {file_path}@{ref}: " + f"{type(raw_err).__name__}; falling back to Contents API." + ) + # --- Contents API path (authenticated, enterprise, or raw fallback) --- - # Build GitHub API URL - format differs by host type - if host == "github.com": - api_url = f"https://api.github.com/repos/{owner}/{repo}/contents/{file_path}?ref={ref}" - elif host.lower().endswith(".ghe.com"): - api_url = f"https://api.{host}/repos/{owner}/{repo}/contents/{file_path}?ref={ref}" - else: - api_url = f"https://{host}/api/v3/repos/{owner}/{repo}/contents/{file_path}?ref={ref}" + # Build API URL candidates - format differs by host type + api_url_candidates = self._build_contents_api_urls( + host, owner, repo, file_path, ref, is_github_host=is_github_host + ) + api_url = api_url_candidates[0] # Set up authentication headers - headers: dict[str, str] = { - "Accept": "application/vnd.github.v3.raw" # Returns raw content - } - if token: - headers["Authorization"] = f"token {token}" + # GitHub family: use GitHub raw-media accept header. Generic hosts + # ignore it and may return JSON envelopes -- handle that on read. + accept = "application/vnd.github.v3.raw" if is_github_host else "application/json" + if is_github_host: + headers: dict[str, str] = {"Accept": accept} + if token: + headers["Authorization"] = f"token {token}" + else: + headers = self._build_generic_host_auth_headers(host, file_ctx, accept=accept) # Try to download with the specified ref try: + if verbose_callback and not is_github_host: + verbose_callback(f"Trying Contents API on {host}: {api_url}") response = self._host._resilient_get(api_url, headers=headers, timeout=30) response.raise_for_status() if verbose_callback: verbose_callback(f"Downloaded file: {host}/{dep_ref.repo_url}/{file_path}") - return response.content + return self._extract_contents_api_payload(response, is_github_host) except requests.exceptions.HTTPError as e: if e.response.status_code == 404: + # For generic hosts, try remaining API version candidates before ref fallback + for candidate_url in api_url_candidates[1:]: + try: + if verbose_callback: + verbose_callback( + f"Contents API 404; trying next candidate: {candidate_url}" + ) + candidate_resp = self._host._resilient_get( + candidate_url, headers=headers, timeout=30 + ) + candidate_resp.raise_for_status() + if verbose_callback: + verbose_callback( + f"Downloaded file: {host}/{dep_ref.repo_url}/{file_path}" + ) + return self._extract_contents_api_payload(candidate_resp, is_github_host) + except requests.exceptions.HTTPError as ce: + if ce.response.status_code != 404: + raise RuntimeError( # noqa: B904 + f"Failed to download {file_path}: HTTP {ce.response.status_code}" + ) + # Try fallback branches if the specified ref fails if ref not in ["main", "master"]: raise RuntimeError( # noqa: B904 - f"File not found: {file_path} at ref '{ref}' in {dep_ref.repo_url}" + self._build_unsupported_or_missing_error( + host, + dep_ref.repo_url, + file_path, + ref, + api_url_candidates, + is_github_host=is_github_host, + ) ) # Try the other default branch fallback_ref = "master" if ref == "main" else "main" + fallback_url_candidates = self._build_contents_api_urls( + host, owner, repo, file_path, fallback_ref + ) - # Build fallback API URL - if host == "github.com": - fallback_url = ( - f"https://api.github.com/repos/{owner}/{repo}" - f"/contents/{file_path}?ref={fallback_ref}" - ) - elif host.lower().endswith(".ghe.com"): - fallback_url = ( - f"https://api.{host}/repos/{owner}/{repo}" - f"/contents/{file_path}?ref={fallback_ref}" - ) - else: - fallback_url = ( - f"https://{host}/api/v3/repos/{owner}/{repo}" - f"/contents/{file_path}?ref={fallback_ref}" - ) + for fallback_url in fallback_url_candidates: + try: + response = self._host._resilient_get( + fallback_url, headers=headers, timeout=30 + ) + response.raise_for_status() + if verbose_callback: + verbose_callback( + f"Downloaded file: {host}/{dep_ref.repo_url}/{file_path}" + ) + return self._extract_contents_api_payload(response, is_github_host) + except requests.exceptions.HTTPError as fe: + if fe.response.status_code != 404: + raise RuntimeError( # noqa: B904 + f"Failed to download {file_path}: HTTP {fe.response.status_code}" + ) - try: - response = self._host._resilient_get(fallback_url, headers=headers, timeout=30) - response.raise_for_status() - if verbose_callback: - verbose_callback(f"Downloaded file: {host}/{dep_ref.repo_url}/{file_path}") - return response.content - except requests.exceptions.HTTPError: - raise RuntimeError( # noqa: B904 - f"File not found: {file_path} in {dep_ref.repo_url} " - f"(tried refs: {ref}, {fallback_ref})" + raise RuntimeError( # noqa: B904 + self._build_unsupported_or_missing_error( + host, + dep_ref.repo_url, + file_path, + ref, + api_url_candidates, + is_github_host=is_github_host, + fallback_ref=fallback_ref, ) + ) elif e.response.status_code in (401, 403): # Distinguish rate limiting from auth failure. + # X-RateLimit-* headers are GitHub-specific; treat as + # rate-limit only when the host is in the GitHub family. is_rate_limit = False - try: - rl_remaining = e.response.headers.get("X-RateLimit-Remaining") - if rl_remaining is not None and int(rl_remaining) == 0: - is_rate_limit = True - except (TypeError, ValueError): - pass + if is_github_host: + try: + rl_remaining = e.response.headers.get("X-RateLimit-Remaining") + if rl_remaining is not None and int(rl_remaining) == 0: + is_rate_limit = True + except (TypeError, ValueError): + pass if is_rate_limit: error_msg = f"GitHub API rate limit exceeded for {dep_ref.repo_url}. " @@ -728,9 +791,9 @@ def download_github_file( ) raise RuntimeError(error_msg) # noqa: B904 - # Token may lack SSO/SAML authorization for this org. # Retry without auth -- the repo might be public. - if token and not host.lower().endswith(".ghe.com"): + # GHES/GHE-DR don't support unauthenticated org-scoped retries. + if token and is_github_host and not host.lower().endswith(".ghe.com"): try: unauth_headers: dict[str, str] = {"Accept": "application/vnd.github.v3.raw"} response = self._host._resilient_get( @@ -741,7 +804,7 @@ def download_github_file( verbose_callback( f"Downloaded file: {host}/{dep_ref.repo_url}/{file_path}" ) - return response.content + return self._extract_contents_api_payload(response, is_github_host) except requests.exceptions.HTTPError: pass # Fall through to the original error @@ -757,17 +820,194 @@ def download_github_file( port=dep_ref.port if dep_ref else None, dep_url=dep_ref.repo_url if dep_ref else None, ) - elif token and not host.lower().endswith(".ghe.com"): + elif is_github_host and not host.lower().endswith(".ghe.com"): error_msg += ( "Both authenticated and unauthenticated access " "were attempted. The repository may be private, " "or your token may lack SSO/SAML authorization " "for this organization." ) - else: + elif is_github_host: error_msg += "Please check your GitHub token permissions." + else: + # Generic host: don't claim SSO/SAML or "GitHub token". + error_msg += ( + f"Host {host} rejected the request. " + "Verify the repository exists and that the token has " + "access. Tokens are sourced from your git credential " + "helper, a per-org GITHUB_APM_PAT_ env var, or " + f"GITHUB_HOST={host} when this host is your GitHub " + "Enterprise Server." + ) raise RuntimeError(error_msg) # noqa: B904 else: raise RuntimeError(f"Failed to download {file_path}: HTTP {e.response.status_code}") # noqa: B904 except requests.exceptions.RequestException as e: raise RuntimeError(f"Network error downloading {file_path}: {e}") # noqa: B904 + + # ------------------------------------------------------------------ + # Helpers for download_github_file + # ------------------------------------------------------------------ + + @staticmethod + def _is_configured_ghes(host: str) -> bool: + """Return True when *host* matches the user's declared GHES via GITHUB_HOST. + + ``GITHUB_HOST=`` is the documented opt-in for treating + a non-``*.ghe.com`` FQDN as GitHub-family. Centralised so the routing + check, header builder, and Contents-API URL builder cannot drift. + """ + configured = os.environ.get("GITHUB_HOST", "").strip().lower() + if not configured: + return False + return (host or "").lower() == configured + + @staticmethod + def _build_contents_api_urls( + host: str, + owner: str, + repo: str, + file_path: str, + ref: str, + *, + is_github_host: bool | None = None, + ) -> list[str]: + """Return the ordered list of Contents-API URL candidates for *host*. + + - github.com -> single api.github.com candidate + - *.ghe.com (GHE Cloud / GHE Data Residency) or GITHUB_HOST-declared + GHES -> single api. candidate (skips Gitea v1 round-trip) + - generic host -> Gitea-native /api/v1/ then Gogs-compat /api/v3/ + + GitLab uses /api/v4/projects/:id/repository/files/... which has a + different shape; it is intentionally NOT included. GitLab support + is limited to git-clone operations. + + ``is_github_host`` lets the caller pass its already-computed + classification (which honours ``GITHUB_HOST``); when omitted we + fall back to ``is_github_hostname`` plus the GHES env-var check. + """ + if is_github_host is None: + is_github_host = is_github_hostname(host) or DownloadDelegate._is_configured_ghes(host) + if is_github_host: + if host.lower() == "github.com": + return [ + f"https://api.github.com/repos/{owner}/{repo}/contents/{file_path}?ref={ref}" + ] + return [f"https://api.{host}/repos/{owner}/{repo}/contents/{file_path}?ref={ref}"] + return [ + f"https://{host}/api/v1/repos/{owner}/{repo}/contents/{file_path}?ref={ref}", + f"https://{host}/api/v3/repos/{owner}/{repo}/contents/{file_path}?ref={ref}", + ] + + @staticmethod + def _build_generic_host_auth_headers( + host: str, auth_ctx, *, accept: str | None = None + ) -> dict[str, str]: + """Build HTTP headers for a generic-host (non-GitHub) request. + + SECURITY GUARD: Only attach Authorization when the token is + unambiguously intended for this host. A token resolved from a + global env var (GITHUB_APM_PAT, GITHUB_TOKEN, GH_TOKEN) MUST NOT + be sent to an arbitrary non-GitHub host -- doing so leaks the + user's GitHub PAT to whatever FQDN is in the dependency line. + The clone path at ``get_clone_url`` already enforces the same + guard via ``is_github_hostname``; this mirrors it for HTTP file + downloads. + + Forwarding is allowed when: + - source == ``git-credential-fill``: git's credential helper + looks tokens up by host, so they are host-scoped by + construction. + - source == ``GITHUB_APM_PAT_``: per-org env var is + explicit user opt-in for that org's host. + - the user opted into this host as their GitHub Enterprise + Server via ``GITHUB_HOST=``: the token is intended for + this host, even if the FQDN is not under ``*.ghe.com``. + """ + headers: dict[str, str] = {} + if accept: + headers["Accept"] = accept + if auth_ctx is None or not getattr(auth_ctx, "token", None): + return headers + source = getattr(auth_ctx, "source", None) or "" + host_scoped = source == "git-credential-fill" + org_scoped = source.startswith("GITHUB_APM_PAT_") + configured_ghes = DownloadDelegate._is_configured_ghes(host) + if host_scoped or org_scoped or configured_ghes: + headers["Authorization"] = f"token {auth_ctx.token}" + return headers + + @staticmethod + def _extract_contents_api_payload(response, is_github_host: bool) -> bytes: + """Decode a Contents-API response into raw file bytes. + + - GitHub family: ``Accept: application/vnd.github.v3.raw`` returns + the file bytes directly; pass through ``response.content``. + - Generic hosts (Gitea, Gogs): the raw-media accept header is + ignored and the server returns a JSON envelope of the form:: + + {"content": "", "encoding": "base64", ...} + + Decode ``content`` as base64 and return the resulting bytes. + Some Gitea installations also emit ``encoding: ""`` with raw + content -- pass that through unchanged. If the response is not + a JSON envelope at all (custom proxy, raw bytes), fall back to + ``response.content``. + """ + if is_github_host: + return response.content + + body = response.content + try: + ctype = str((response.headers or {}).get("Content-Type") or "").lower() + except (AttributeError, TypeError): + ctype = "" + if "json" not in ctype and not ( + isinstance(body, (bytes, bytearray)) and body.lstrip().startswith(b"{") + ): + return body + try: + payload = json.loads(body.decode("utf-8")) + except (ValueError, UnicodeDecodeError, AttributeError): + return body + if not isinstance(payload, dict) or "content" not in payload: + return body + encoding = (payload.get("encoding") or "").lower() + content_field = payload.get("content") or "" + if encoding == "base64": + try: + return base64.b64decode(content_field, validate=False) + except (ValueError, TypeError): + return body + # Non-base64 envelope (rare): return literal content if it's a string, + # otherwise fall back to the raw body. + if isinstance(content_field, str): + return content_field.encode("utf-8") + return body + + @staticmethod + def _build_unsupported_or_missing_error( + host: str, + repo_url: str, + file_path: str, + ref: str, + api_url_candidates: list[str], + *, + is_github_host: bool, + fallback_ref: str | None = None, + ) -> str: + """Build a discoverable error when no Contents-API candidate hits 200.""" + ref_part = f"(tried refs: {ref}, {fallback_ref})" if fallback_ref else f"at ref '{ref}'" + if is_github_host: + return f"File not found: {file_path} in {repo_url} {ref_part}" + # Non-GitHub host: name what was tried so users can diagnose + # GitLab / unsupported-host cases without re-reading source. + tried = ", ".join(["raw"] + [u.split("/api/")[1].split("/")[0] for u in api_url_candidates]) + canonical_url = f"https://{host}/{repo_url}/raw/{ref}/{file_path}" + return ( + f"File not found on generic host {host}: {canonical_url} {ref_part}. " + f"Tried URL families: {tried}. " + "If this is GitLab, virtual subdirectory packages are not " + "supported (use the dict-form full repo URL instead)." + ) diff --git a/tests/test_github_downloader.py b/tests/test_github_downloader.py index 55a355362..677255e77 100644 --- a/tests/test_github_downloader.py +++ b/tests/test_github_downloader.py @@ -1431,14 +1431,22 @@ def test_credential_fill_for_non_default_host(self): assert actual_headers.get("Authorization") == "token enterprise-token" def test_non_default_host_uses_global_token(self): - """Global env vars (GITHUB_APM_PAT) are now tried for all hosts, not just the default.""" + """Global env vars (GITHUB_APM_PAT) must NOT leak to an arbitrary non-GitHub host. + + SECURITY: forwarding a GitHub PAT to ``ghes.company.com`` (or any + other FQDN) without explicit user opt-in exfiltrates the token. + The user must opt in via ``GITHUB_HOST=`` (declares the + host as their GitHub Enterprise Server) or via a per-org env var + ``GITHUB_APM_PAT_`` -- bare ``GITHUB_APM_PAT`` against a + custom-domain host gets no Authorization header. + """ with ( patch.dict(os.environ, {"GITHUB_APM_PAT": "default-host-pat"}, clear=True), patch( "apm_cli.core.token_manager.GitHubTokenManager.resolve_credential_from_git", - ) as mock_cred, + return_value=None, + ), ): - mock_cred.return_value = "enterprise-cred" downloader = GitHubPackageDownloader() assert downloader.github_token == "default-host-pat" @@ -1451,19 +1459,59 @@ def test_non_default_host_uses_global_token(self): mock_response_200.status_code = 200 mock_response_200.content = b"enterprise content" mock_response_200.raise_for_status = Mock() + mock_response_200.headers = {} with patch.object( downloader, "_resilient_get", return_value=mock_response_200 ) as mock_get: + # The raw-URL path runs first; mock returns 200 immediately. result = downloader._download_github_file(dep_ref, "SKILL.md", "main") assert result == b"enterprise content" - actual_headers = mock_get.call_args[1].get("headers") or mock_get.call_args[0][1] - # Global PAT is now used for non-default hosts too - assert actual_headers.get("Authorization") == "token default-host-pat" + for call in mock_get.call_args_list: + req_headers = call[1].get("headers", {}) or {} + assert "Authorization" not in req_headers, ( + f"PAT leaked to {call[0][0]} without GITHUB_HOST opt-in: {req_headers!r}" + ) - # Credential fill is not reached because the global env var is found first - mock_cred.assert_not_called() + def test_global_token_forwarded_when_github_host_is_configured(self): + """When ``GITHUB_HOST=`` is set, the global PAT IS forwarded. + + This is the explicit user opt-in: declaring a custom domain as + their GitHub Enterprise Server. The complement to + ``test_non_default_host_uses_global_token``. + """ + with ( + patch.dict( + os.environ, + {"GITHUB_APM_PAT": "ghes-pat", "GITHUB_HOST": "ghes.company.com"}, + clear=True, + ), + patch( + "apm_cli.core.token_manager.GitHubTokenManager.resolve_credential_from_git", + return_value=None, + ), + ): + downloader = GitHubPackageDownloader() + dep_ref = DependencyReference( + repo_url="owner/repo", + host="ghes.company.com", + ) + + mock_response_200 = Mock() + mock_response_200.status_code = 200 + mock_response_200.content = b"enterprise content" + mock_response_200.raise_for_status = Mock() + mock_response_200.headers = {} + + with patch.object( + downloader, "_resilient_get", return_value=mock_response_200 + ) as mock_get: + result = downloader._download_github_file(dep_ref, "SKILL.md", "main") + assert result == b"enterprise content" + + first_call_headers = mock_get.call_args_list[0][1].get("headers", {}) + assert first_call_headers.get("Authorization") == "token ghes-pat" def test_error_message_mentions_gh_auth_login(self): """Error message should mention 'gh auth login' when no token is available.""" @@ -1794,9 +1842,7 @@ def test_yaml_without_special_characters_still_valid(self, tmp_path): class TestRefExistsViaLsRemote: - """Tests for the ``_ref_exists_via_ls_remote`` two/three-attempt chain. - - The chain mirrors ``_clone_with_fallback``'s auth path so validation + """Tests for ``_clone_with_fallback``'s auth path so validation accepts what install would actually clone. These tests pin that behavior so a refactor of the auth chain can't silently regress validation lenience for users with SSO-half-authorized PATs or @@ -2060,5 +2106,385 @@ def _always_fail(*args, **kwargs): self._exit(ctxs) +# --------------------------------------------------------------------------- +# Generic host (Gitea / GitLab) download tests +# --------------------------------------------------------------------------- + + +def _make_resp( + status_code: int, + content: bytes = b"", + *, + content_type: str = "", + headers: dict | None = None, +) -> Mock: + """Build a minimal mock requests.Response. + + Set content_type='application/json' (or include 'json' in headers + Content-Type) when simulating a Gitea/Gogs JSON envelope. + """ + resp = Mock() + resp.status_code = status_code + resp.content = content + hdrs = dict(headers or {}) + if content_type and "Content-Type" not in hdrs: + hdrs["Content-Type"] = content_type + resp.headers = hdrs + if status_code >= 400: + resp.raise_for_status = Mock(side_effect=requests_lib.exceptions.HTTPError(response=resp)) + else: + resp.raise_for_status = Mock() + return resp + + +def _gitea_json_envelope(file_bytes: bytes) -> bytes: + """Encode *file_bytes* as a Gitea/Gogs Contents-API JSON envelope.""" + import base64 as _b64 + import json as _json + + return _json.dumps( + { + "name": "skill.md", + "encoding": "base64", + "content": _b64.b64encode(file_bytes).decode("ascii"), + } + ).encode("utf-8") + + +class TestGiteaRawUrlDownload: + """Gitea raw URL path: /{owner}/{repo}/raw/{ref}/{file}.""" + + def setup_method(self): + with patch.dict(os.environ, {}, clear=True), _CRED_FILL_PATCH: + self.downloader = GitHubPackageDownloader() + + def test_raw_url_succeeds_on_first_attempt(self): + """Raw URL returns 200 -- content returned without calling the API.""" + dep_ref = DependencyReference.parse("gitea.myorg.com/owner/repo") + expected = b"# README content" + raw_ok = _make_resp(200, expected) + + with patch.object(self.downloader, "_resilient_get", return_value=raw_ok) as mock_get: + result = self.downloader.download_raw_file(dep_ref, "README.md", "main") + + assert result == expected + first_url = mock_get.call_args_list[0][0][0] + assert first_url == "https://gitea.myorg.com/owner/repo/raw/main/README.md" + assert mock_get.call_count == 1 + + def test_no_token_sent_to_non_github_host_via_env_var(self): + """SECURITY: GITHUB_APM_PAT MUST NOT leak to a non-GitHub host. + + Regression trap for the PAT exfiltration vector. The clone path at + ``get_clone_url`` (download_strategies.py:262-279) only embeds a + token when ``is_github_hostname(host)``; the file-download path + must mirror that guard. + """ + dep_ref = DependencyReference.parse("gitea.evil.example.com/owner/repo") + raw_ok = _make_resp(200, b"data") + + with patch.dict( + os.environ, + { + "GITHUB_APM_PAT": "ghp_supersecret", + "GITHUB_TOKEN": "ghp_other", + }, + clear=True, + ): + with _CRED_FILL_PATCH: + downloader = GitHubPackageDownloader() + with patch.object(downloader, "_resilient_get", return_value=raw_ok) as mock_get: + downloader.download_raw_file(dep_ref, "README.md", "main") + + # Inspect EVERY HTTP call made for this download. + for call in mock_get.call_args_list: + req_headers = call[1].get("headers", {}) or {} + assert "Authorization" not in req_headers, ( + f"PAT leaked to {call[0][0]}: headers={req_headers!r}" + ) + + def test_token_still_sent_when_host_is_github(self): + """github.com receives the Authorization header (regression trap).""" + dep_ref = DependencyReference.parse("owner/repo") # default host + api_ok = _make_resp(200, b"data") + + with patch.dict(os.environ, {"GITHUB_APM_PAT": "ghp_real_gh"}, clear=True): + with _CRED_FILL_PATCH: + downloader = GitHubPackageDownloader() + with patch.object(downloader, "_try_raw_download", return_value=None): + with patch.object(downloader, "_resilient_get", return_value=api_ok) as mock_get: + downloader.download_raw_file(dep_ref, "README.md", "main") + + api_headers = mock_get.call_args_list[0][1].get("headers", {}) + assert api_headers.get("Authorization") == "token ghp_real_gh" + + def test_token_still_sent_when_host_is_ghe(self): + """*.ghe.com (GHE Cloud / Data Residency) receives the token too.""" + dep_ref = DependencyReference.parse("acme.ghe.com/owner/repo") + api_ok = _make_resp(200, b"data") + + with patch.dict(os.environ, {"GITHUB_APM_PAT": "ghp_ghe"}, clear=True): + with _CRED_FILL_PATCH: + downloader = GitHubPackageDownloader() + with patch.object(downloader, "_resilient_get", return_value=api_ok) as mock_get: + downloader.download_raw_file(dep_ref, "README.md", "main") + + api_headers = mock_get.call_args_list[0][1].get("headers", {}) + assert api_headers.get("Authorization") == "token ghp_ghe" + + def test_git_credential_helper_token_sent_to_generic_host(self): + """Host-scoped credentials (git credential helper) ARE sent to generic hosts. + + The credential helper is host-scoped by construction, so forwarding + is safe and necessary for private Gitea/Gogs repos. This is the + symmetric case to the security guard test above. + """ + dep_ref = DependencyReference.parse("gitea.myorg.com/owner/repo") + raw_ok = _make_resp(200, b"data") + + with patch.dict(os.environ, {}, clear=True): + with patch( + "apm_cli.core.token_manager.GitHubTokenManager.resolve_credential_from_git", + return_value="gitea-host-scoped-token", + ): + downloader = GitHubPackageDownloader() + with patch.object(downloader, "_resilient_get", return_value=raw_ok) as mock_get: + downloader.download_raw_file(dep_ref, "README.md", "main") + + raw_headers = mock_get.call_args_list[0][1].get("headers", {}) + assert raw_headers.get("Authorization") == "token gitea-host-scoped-token" + + def test_falls_back_to_api_v1_when_raw_returns_non_200(self): + """When the raw URL returns 404, the API v1 path is tried next.""" + dep_ref = DependencyReference.parse("gitea.myorg.com/owner/repo") + expected = b"file via API" + + with patch.object( + self.downloader, + "_resilient_get", + side_effect=[_make_resp(404), _make_resp(200, expected)], + ) as mock_get: + result = self.downloader.download_raw_file(dep_ref, "README.md", "main") + + assert result == expected + urls = [c[0][0] for c in mock_get.call_args_list] + assert urls[0] == "https://gitea.myorg.com/owner/repo/raw/main/README.md" + assert urlparse(urls[1]).path.startswith("/api/v1/") + + def test_raw_url_request_exception_falls_through_to_api(self): + """RequestException on the raw URL path must not abort -- API path runs. + + Regression trap for the ``except (RequestException, OSError)`` + swallow at the raw-URL try block. Previously this only had unit + coverage that pinned the swallow itself; this test exercises the + downstream "fallthrough must reach the API path" promise. + """ + dep_ref = DependencyReference.parse("gitea.myorg.com/owner/repo") + expected = b"recovered via api" + api_ok = _make_resp(200, expected) + + side_effects = [ + requests_lib.exceptions.ConnectionError("boom"), + api_ok, + ] + with patch.object(self.downloader, "_resilient_get", side_effect=side_effects) as mock_get: + result = self.downloader.download_raw_file(dep_ref, "README.md", "main") + + assert result == expected + urls = [c[0][0] for c in mock_get.call_args_list] + assert urls[0].endswith("/owner/repo/raw/main/README.md") + assert urlparse(urls[1]).path.startswith("/api/v1/") + + +class TestGiteaGogsApiVersionNegotiation: + """API version negotiation: raw URL -> v1 -> v3 for Gitea/Gogs generic hosts. + + The implementation intentionally stops at v3. GitLab uses a completely + different API shape (/api/v4/projects/:id/repository/files/...) that is + not compatible with the GitHub Contents-style endpoint negotiated here; + GitLab support is limited to git-clone operations only. + """ + + def setup_method(self): + with patch.dict(os.environ, {}, clear=True), _CRED_FILL_PATCH: + self.downloader = GitHubPackageDownloader() + + def test_v1_falls_back_to_v3_for_generic_hosts(self): + """When Gitea raw URL and v1 both return 404, v3 is tried and succeeds.""" + dep_ref = DependencyReference.parse("gitea.myorg.com/owner/repo") + expected = b"gitea v3 file content" + envelope_resp = _make_resp( + 200, + _gitea_json_envelope(expected), + content_type="application/json", + ) + + side_effects = [ + _make_resp(404), # raw URL + _make_resp(404), # v1 + envelope_resp, # v3 + ] + with patch.object(self.downloader, "_resilient_get", side_effect=side_effects) as mock_get: + result = self.downloader.download_raw_file(dep_ref, "skill.md", "main") + + assert result == expected + urls = [c[0][0] for c in mock_get.call_args_list] + assert urlparse(urls[1]).path.startswith("/api/v1/") + assert urlparse(urls[2]).path.startswith("/api/v3/") + assert len(mock_get.call_args_list) == 3 + + def test_gitea_v1_succeeds_without_trying_v3(self): + """When v1 returns 200, v3 must never be called.""" + dep_ref = DependencyReference.parse("gitea.example.com/owner/repo") + expected = b"gitea content" + envelope_resp = _make_resp( + 200, + _gitea_json_envelope(expected), + content_type="application/json", + ) + + with patch.object( + self.downloader, + "_resilient_get", + side_effect=[_make_resp(404), envelope_resp], + ) as mock_get: + result = self.downloader.download_raw_file(dep_ref, "file.md", "main") + + assert result == expected + urls = [c[0][0] for c in mock_get.call_args_list] + assert not any(urlparse(u).path.startswith("/api/v3/") for u in urls) + + def test_gitea_api_decodes_json_envelope_into_file_bytes(self): + """API path returns Gitea ``{content,encoding}`` envelope -> decoded bytes.""" + dep_ref = DependencyReference.parse("gitea.example.com/owner/repo") + expected = b"# Hello from Gitea base64\n" + envelope_resp = _make_resp( + 200, + _gitea_json_envelope(expected), + content_type="application/json; charset=utf-8", + ) + + with patch.object( + self.downloader, + "_resilient_get", + side_effect=[_make_resp(404), envelope_resp], + ): + result = self.downloader.download_raw_file(dep_ref, "skill.md", "main") + + assert result == expected, "Gitea JSON envelope must be base64-decoded" + + def test_gitea_api_passthrough_when_server_returns_raw_bytes(self): + """Some Gitea proxies serve raw bytes; passthrough must still work.""" + dep_ref = DependencyReference.parse("gitea.example.com/owner/repo") + expected = b"raw markdown bytes" + # No JSON content-type; body is not a JSON envelope. + raw_resp = _make_resp(200, expected, content_type="text/plain") + + with patch.object( + self.downloader, + "_resilient_get", + side_effect=[_make_resp(404), raw_resp], + ): + result = self.downloader.download_raw_file(dep_ref, "skill.md", "main") + + assert result == expected + + def test_fallback_candidate_loop_reraises_non_404(self): + """500 on a candidate URL must surface as RuntimeError, not silent skip. + + Pins the symmetry-fix between the primary loop (already re-raised + non-404) and the fallback-ref loop (previously swallowed all + HTTPErrors via bare ``pass``). + """ + dep_ref = DependencyReference.parse("gitea.example.com/owner/repo") + + # raw=404, v1=404 (forces ref-fallback), v1@master=500 + side_effects = [ + _make_resp(404), # raw main + _make_resp(404), # v1 main + _make_resp(404), # v3 main + _make_resp(500), # v1 master -- must re-raise + ] + with patch.object(self.downloader, "_resilient_get", side_effect=side_effects): + with pytest.raises(RuntimeError, match=r"HTTP 500"): + self.downloader.download_raw_file(dep_ref, "missing.md", "main") + + def test_primary_candidate_loop_reraises_non_404(self): + """500 on the v3 fallback in the primary loop also re-raises.""" + dep_ref = DependencyReference.parse("gitea.example.com/owner/repo") + + side_effects = [ + _make_resp(404), # raw + _make_resp(404), # v1 + _make_resp(500), # v3 -- must re-raise + ] + with patch.object(self.downloader, "_resilient_get", side_effect=side_effects): + with pytest.raises(RuntimeError, match=r"HTTP 500"): + self.downloader.download_raw_file(dep_ref, "missing.md", "main") + + def test_all_api_versions_404_raises_descriptive_error(self): + """When every API version returns 404 for both refs, a clear error is raised. + + The error must name the host, the file path, and which API + families were attempted -- so users staring at a GitLab or + unsupported-host failure see an actionable signal. + """ + dep_ref = DependencyReference.parse("git.example.com/owner/repo") + # raw(main) + v1(main) + v3(main) = 3 calls, then v1(master) + v3(master) = 2 calls + side_effects = [_make_resp(404)] * 5 + + with patch.object(self.downloader, "_resilient_get", side_effect=side_effects): + with pytest.raises(RuntimeError) as excinfo: + self.downloader.download_raw_file(dep_ref, "missing.md", "main") + + msg = str(excinfo.value) + # Use urlparse on the canonical URL embedded in the error message + # (per tests.instructions.md: never substring-match URLs). + url_tokens = [tok.strip("(),.;'\"") for tok in msg.split() if "://" in tok] + hosts = {urlparse(t).hostname for t in url_tokens} + assert hosts == {"git.example.com"}, f"Host not surfaced in error: {msg!r}" + paths = [urlparse(t).path for t in url_tokens] + assert any("missing.md" in p for p in paths) + assert "GitLab" in msg, "Error should hint at GitLab unsupported case" + + def test_github_com_uses_api_github_com_not_api_v4(self): + """github.com must still use api.github.com, never /api/v4/.""" + dep_ref = DependencyReference.parse("owner/repo") + expected = b"github content" + api_ok = _make_resp(200, expected) + + with patch.object(self.downloader, "_try_raw_download", return_value=None): + with patch.object(self.downloader, "_resilient_get", return_value=api_ok) as mock_get: + result = self.downloader.download_raw_file(dep_ref, "README.md", "main") + + assert result == expected + url_called = mock_get.call_args_list[0][0][0] + assert url_called.startswith("https://api.github.com/") + assert not urlparse(url_called).path.startswith("/api/v4/") + + def test_verbose_callback_logs_each_attempt(self): + """--verbose surfaces raw -> v1 -> v3 chain so users can diagnose failures.""" + dep_ref = DependencyReference.parse("gitea.example.com/owner/repo") + expected = b"ok" + envelope_resp = _make_resp( + 200, _gitea_json_envelope(expected), content_type="application/json" + ) + side_effects = [ + _make_resp(404), # raw URL + _make_resp(404), # v1 + envelope_resp, # v3 + ] + captured: list[str] = [] + with patch.object(self.downloader, "_resilient_get", side_effect=side_effects): + self.downloader.download_raw_file( + dep_ref, "skill.md", "main", verbose_callback=captured.append + ) + + joined = "\n".join(captured) + assert "Trying raw URL" in joined + assert "Trying Contents API" in joined or "trying next candidate" in joined + assert "/api/v3/" in joined + + if __name__ == "__main__": pytest.main([__file__]) diff --git a/tests/unit/test_generic_git_urls.py b/tests/unit/test_generic_git_urls.py index cbc639c90..a0a3530df 100644 --- a/tests/unit/test_generic_git_urls.py +++ b/tests/unit/test_generic_git_urls.py @@ -828,7 +828,7 @@ def test_scp_port_only_no_path_raises(self): DependencyReference.parse("git@host.example.com:7999") def test_scp_port_trailing_slash_no_path_raises(self): - """git@host:7999/ — trailing slash but empty remaining path.""" + """git@host:7999/ -- trailing slash but empty remaining path.""" with pytest.raises(ValueError, match="no repository path follows"): DependencyReference.parse("git@host.example.com:7999/") @@ -902,3 +902,64 @@ def test_ssh_protocol_with_port_still_works(self): assert dep.host == "bitbucket.example.com" assert dep.port == 7999 assert dep.repo_url == "project/repo" + + +class TestGiteaVirtualPackageDetection: + """Gitea-specific virtual package detection -- supplements TestFQDNVirtualPaths + and TestNestedGroupSupport with Gitea host fixtures and regression guards + for the len(path_segments) > 2 over-trigger.""" + + # --- Must NOT be virtual (nested-group repo, no virtual indicators) --- + + def test_three_segment_gitea_path_is_not_virtual(self): + """group/subgroup/repo on Gitea is a nested-group repo, not virtual.""" + dep = DependencyReference.parse("gitea.myorg.com/group/subgroup/repo") + assert dep.host == "gitea.myorg.com" + assert dep.repo_url == "group/subgroup/repo" + assert dep.is_virtual is False + + def test_two_segment_gitea_path_is_not_virtual(self): + """Simple owner/repo on a Gitea host is never virtual.""" + dep = DependencyReference.parse("gitea.myorg.com/owner/repo") + assert dep.host == "gitea.myorg.com" + assert dep.repo_url == "owner/repo" + assert dep.is_virtual is False + + def test_four_segment_generic_path_without_indicators_is_not_virtual(self): + """Deep nested groups without file extensions or /collections/ are not virtual.""" + dep = DependencyReference.parse("git.company.internal/team/skills/brand-guidelines") + assert dep.is_virtual is False + assert dep.repo_url == "team/skills/brand-guidelines" + + # --- Must be virtual (explicit virtual indicators) --- + + def test_gitea_virtual_file_extension(self): + """Path with virtual file extension on Gitea is detected as virtual.""" + dep = DependencyReference.parse("gitea.myorg.com/owner/repo/file.prompt.md") + assert dep.host == "gitea.myorg.com" + assert dep.repo_url == "owner/repo" + assert dep.virtual_path == "file.prompt.md" + assert dep.is_virtual is True + assert dep.is_virtual_file() is True + + def test_gitea_collections_path_is_virtual(self): + """Path with /collections/ on Gitea is detected as a virtual subdirectory package.""" + dep = DependencyReference.parse("gitea.myorg.com/owner/repo/collections/security") + assert dep.host == "gitea.myorg.com" + assert dep.repo_url == "owner/repo" + assert dep.virtual_path == "collections/security" + assert dep.is_virtual is True + assert dep.is_virtual_subdirectory() is True + + def test_dict_format_virtual_on_gitea(self): + """Dict format with path= on Gitea host yields a virtual package.""" + dep = DependencyReference.parse_from_dict( + { + "git": "gitea.myorg.com/owner/repo", + "path": "prompts/review.prompt.md", + } + ) + assert dep.host == "gitea.myorg.com" + assert dep.repo_url == "owner/repo" + assert dep.virtual_path == "prompts/review.prompt.md" + assert dep.is_virtual is True diff --git a/uv.lock b/uv.lock index 5d448ed8f..39a9a0595 100644 --- a/uv.lock +++ b/uv.lock @@ -228,7 +228,7 @@ requires-dist = [ { name = "pytest-xdist", marker = "extra == 'dev'", specifier = ">=3.0.0" }, { name = "python-frontmatter", specifier = ">=1.0.0" }, { name = "pyyaml", specifier = ">=6.0.0" }, - { name = "requests", specifier = ">=2.28.0" }, + { name = "requests", specifier = ">=2.31.0" }, { name = "rich", specifier = ">=13.0.0" }, { name = "rich-click", specifier = ">=1.7.0" }, { name = "ruamel-yaml", specifier = ">=0.18.0" },