Skip to content

Commit 1d774e3

Browse files
committed
fix: address PR #150 review comments
- Fix is_github variable readability (operator precedence → explicit if/else) - Fix test name mismatch (test_custom_ghe_host → test_default_github_host_stripped) - Fix dict identity parsing (use parse_from_dict() for dict-style deps) - Fix validation env for generic hosts (relax locked-down env for non-GitHub/ADO) - Add nested GitLab group support (N-segment repo paths for generic hosts) - Add 35 new tests covering nested groups, dict identity, env scoping, and host classification - Update docs for nested group syntax and FQDN shorthand examples
1 parent bd0c83d commit 1d774e3

9 files changed

Lines changed: 499 additions & 56 deletions

File tree

docs/dependencies.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,10 @@ APM accepts dependencies in two forms:
101101
- **Shorthand** (`owner/repo`) — defaults to GitHub
102102
- **HTTPS URL** (`https://host/owner/repo.git`) — any git host, whole repo
103103
- **SSH URL** (`git@host:owner/repo.git`) — any git host, whole repo
104-
- **FQDN shorthand** (`host/owner/repo/path`) — any host, with virtual path support
104+
- **FQDN shorthand** (`host/owner/repo`) — any host, supports nested groups
105+
- GitLab nested groups: `gitlab.com/group/subgroup/repo`
106+
- Virtual paths on simple repos: `gitlab.com/owner/repo/file.prompt.md`
107+
- For nested groups + virtual paths, use the object format below
105108

106109
**Object format** (when you need `path`, `ref`, or `alias` on a git URL):
107110

@@ -118,7 +121,7 @@ dependencies:
118121

119122
Fields: `git` (required), `path`, `ref`, `alias` (all optional). The `git` value is any HTTPS or SSH clone URL.
120123

121-
> **Tip:** Use the object format for non-GitHub hosts when you need to reference a sub-path inside a repo. FQDN shorthand (`gitlab.com/owner/repo/path`) also works but can be ambiguous with nested groups (e.g., GitLab).
124+
> **Tip:** For generic hosts (GitLab, Gitea, Bitbucket, etc.), APM treats all path segments after the host as the repo path. This supports GitLab's nested groups (`gitlab.com/group/subgroup/repo`). For virtual packages on repos with nested groups, use the object format with `path:`.
122125

123126
### How Dependencies Are Stored (Canonical Format)
124127

@@ -134,6 +137,8 @@ APM normalizes every dependency entry on write — no matter how you specify a p
134137
| `git@github.com:microsoft/apm-sample-package.git` | `microsoft/apm-sample-package` |
135138
| `github.com/microsoft/apm-sample-package` | `microsoft/apm-sample-package` |
136139
| `https://gitlab.com/acme/rules.git` | `gitlab.com/acme/rules` |
140+
| `gitlab.com/group/subgroup/repo` | `gitlab.com/group/subgroup/repo` |
141+
| `git@gitlab.com:group/subgroup/repo.git` | `gitlab.com/group/subgroup/repo` |
137142
| `git@bitbucket.org:team/standards.git` | `bitbucket.org/team/standards` |
138143

139144
Virtual paths, refs, and aliases are preserved:

docs/getting-started.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,8 @@ APM installs packages from multiple sources. Use the format that matches your re
101101
|--------|--------|---------|
102102
| GitHub.com | `owner/repo` | `apm install microsoft/apm-sample-package` |
103103
| GitHub Enterprise | `ghe.company.com/owner/repo` | `apm install ghe.myco.com/team/standards` |
104-
| GitLab | `https://gitlab.com/owner/repo.git` | `apm install https://gitlab.com/acme/rules.git` |
104+
| GitLab | `gitlab.com/group/subgroup/repo` | `apm install gitlab.com/acme/platform/rules` |
105+
| GitLab (HTTPS) | `https://gitlab.com/group/repo.git` | `apm install https://gitlab.com/acme/rules.git` |
105106
| Bitbucket | `https://bitbucket.org/owner/repo.git` | `apm install https://bitbucket.org/team/rules.git` |
106107
| Any git host | `git@host:owner/repo.git` | `apm install git@git.company.com:team/rules.git` |
107108
| Azure DevOps | `dev.azure.com/org/project/repo` | `apm install dev.azure.com/myorg/proj/rules` |

src/apm_cli/cli.py

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -431,11 +431,15 @@ def _validate_and_add_packages_to_apm_yml(packages, dry_run=False):
431431
existing_identities = builtins.set()
432432
for dep_entry in current_deps:
433433
try:
434-
dep_str = dep_entry if isinstance(dep_entry, str) else dep_entry.get("git", "")
435-
ref = DependencyReference.parse(dep_str)
434+
if isinstance(dep_entry, str):
435+
ref = DependencyReference.parse(dep_entry)
436+
elif isinstance(dep_entry, dict):
437+
ref = DependencyReference.parse_from_dict(dep_entry)
438+
else:
439+
continue
436440
existing_identities.add(ref.get_identity())
437-
except Exception:
438-
pass
441+
except (ValueError, TypeError, AttributeError, KeyError):
442+
continue
439443

440444
# First, validate all packages
441445
_rich_info(f"Validating {len(packages)} package(s)...")
@@ -526,6 +530,8 @@ def _validate_package_exists(package):
526530
# For Azure DevOps or GitHub Enterprise (non-github.com hosts),
527531
# use the downloader which handles authentication properly
528532
if dep_ref.is_azure_devops() or (dep_ref.host and dep_ref.host != "github.com"):
533+
from apm_cli.utils.github_host import is_github_hostname, is_azure_devops_hostname
534+
529535
downloader = GitHubPackageDownloader()
530536
# Set the host
531537
if dep_ref.host:
@@ -536,14 +542,24 @@ def _validate_package_exists(package):
536542
dep_ref.repo_url, use_ssh=False, dep_ref=dep_ref
537543
)
538544

539-
# Use the downloader's git environment which has auth configured
545+
# For generic hosts (not GitHub, not ADO), relax the env so native
546+
# credential helpers (SSH keys, macOS Keychain, etc.) can work.
547+
# This mirrors _clone_with_fallback() which does the same relaxation.
548+
is_generic = not is_github_hostname(dep_ref.host) and not is_azure_devops_hostname(dep_ref.host)
549+
if is_generic:
550+
validate_env = {k: v for k, v in downloader.git_env.items()
551+
if k not in ('GIT_ASKPASS', 'GIT_CONFIG_GLOBAL', 'GIT_CONFIG_NOSYSTEM')}
552+
validate_env['GIT_TERMINAL_PROMPT'] = '0'
553+
else:
554+
validate_env = {**os.environ, **downloader.git_env}
555+
540556
cmd = ["git", "ls-remote", "--heads", "--exit-code", package_url]
541557
result = subprocess.run(
542558
cmd,
543559
capture_output=True,
544560
text=True,
545561
timeout=30,
546-
env={**os.environ, **downloader.git_env},
562+
env=validate_env,
547563
)
548564
return result.returncode == 0
549565

@@ -1109,14 +1125,19 @@ def uninstall(ctx, packages, dry_run):
11091125
pkg_identity = package
11101126

11111127
for dep_entry in current_deps:
1112-
dep_str = dep_entry if isinstance(dep_entry, str) else str(dep_entry)
11131128
try:
1114-
dep_ref = DependencyReference.parse(dep_str)
1129+
if isinstance(dep_entry, str):
1130+
dep_ref = DependencyReference.parse(dep_entry)
1131+
elif isinstance(dep_entry, dict):
1132+
dep_ref = DependencyReference.parse_from_dict(dep_entry)
1133+
else:
1134+
continue
11151135
if dep_ref.get_identity() == pkg_identity:
11161136
matched_dep = dep_entry # preserve original entry for removal
11171137
break
1118-
except Exception:
1138+
except (ValueError, TypeError, AttributeError, KeyError):
11191139
# Fallback: exact string match
1140+
dep_str = dep_entry if isinstance(dep_entry, str) else str(dep_entry)
11201141
if dep_str == package:
11211142
matched_dep = dep_entry
11221143
break

src/apm_cli/deps/github_downloader.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,11 @@ def _clone_with_fallback(self, repo_url_base: str, target_path: Path, progress_r
270270

271271
# Determine host type for auth decisions
272272
dep_host = dep_ref.host if dep_ref else None
273-
is_github = dep_host and is_github_hostname(dep_host) or (not dep_host)
273+
if dep_host:
274+
is_github = is_github_hostname(dep_host)
275+
else:
276+
# When no host is specified, default to GitHub behavior
277+
is_github = True
274278
is_generic = not is_ado and not is_github
275279

276280
# Tokens are only valid for their matching host type

src/apm_cli/models/apm_package.py

Lines changed: 64 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import re
44
import urllib.parse
5-
from ..utils.github_host import is_supported_git_host, is_azure_devops_hostname, default_host, unsupported_host_error
5+
from ..utils.github_host import is_supported_git_host, is_azure_devops_hostname, is_github_hostname, default_host, unsupported_host_error
66
import yaml
77
from dataclasses import dataclass
88
from enum import Enum
@@ -331,25 +331,25 @@ def get_install_path(self, apm_modules_dir: Path) -> Path:
331331
# ADO: org/project/repo/subdir
332332
return apm_modules_dir / repo_parts[0] / repo_parts[1] / repo_parts[2] / self.virtual_path
333333
elif len(repo_parts) >= 2:
334-
# GitHub: owner/repo/subdir
335-
return apm_modules_dir / repo_parts[0] / repo_parts[1] / self.virtual_path
334+
# owner/repo/subdir or group/subgroup/repo/subdir
335+
return apm_modules_dir.joinpath(*repo_parts, self.virtual_path)
336336
else:
337337
# Virtual file/collection: use sanitized package name (flattened)
338338
package_name = self.get_virtual_package_name()
339339
if self.is_azure_devops() and len(repo_parts) >= 3:
340340
# ADO: org/project/virtual-pkg-name
341341
return apm_modules_dir / repo_parts[0] / repo_parts[1] / package_name
342342
elif len(repo_parts) >= 2:
343-
# GitHub: owner/virtual-pkg-name
343+
# owner/virtual-pkg-name (use first segment as namespace)
344344
return apm_modules_dir / repo_parts[0] / package_name
345345
else:
346346
# Regular package: use full repo path
347347
if self.is_azure_devops() and len(repo_parts) >= 3:
348348
# ADO: org/project/repo
349349
return apm_modules_dir / repo_parts[0] / repo_parts[1] / repo_parts[2]
350350
elif len(repo_parts) >= 2:
351-
# GitHub: owner/repo
352-
return apm_modules_dir / repo_parts[0] / repo_parts[1]
351+
# owner/repo or group/subgroup/repo (generic hosts)
352+
return apm_modules_dir.joinpath(*repo_parts)
353353

354354
# Fallback: join all parts
355355
return apm_modules_dir.joinpath(*repo_parts)
@@ -571,15 +571,36 @@ def parse(cls, dependency_str: str) -> "DependencyReference":
571571
# For Azure DevOps, the base package format is org/project/repo (3 segments)
572572
# Virtual packages would have 4+ segments: org/project/repo/path/to/file
573573
# For GitHub, base is owner/repo (2 segments), virtual is 3+ segments
574+
# For generic hosts (GitLab, Gitea, etc.), all segments are repo path
575+
# unless virtual indicators (file extensions, collections) are present
574576
is_ado = validated_host is not None and is_azure_devops_hostname(validated_host)
577+
is_generic_host = (validated_host is not None
578+
and not is_github_hostname(validated_host)
579+
and not is_azure_devops_hostname(validated_host))
575580

576581
# Handle _git in ADO URLs: org/project/_git/repo -> org/project/repo
577582
if is_ado and '_git' in path_segments:
578583
git_idx = path_segments.index('_git')
579584
# Remove _git from the path segments
580585
path_segments = path_segments[:git_idx] + path_segments[git_idx+1:]
581586

582-
min_base_segments = 3 if is_ado else 2
587+
if is_ado:
588+
min_base_segments = 3
589+
elif is_generic_host:
590+
# For generic hosts (GitLab, Gitea), check for virtual indicators
591+
# If present, use 2-segment base (simple owner/repo + virtual path)
592+
# If absent, treat ALL segments as the repo path (nested groups)
593+
has_virtual_ext = any(
594+
any(seg.endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS)
595+
for seg in path_segments
596+
)
597+
has_collection = 'collections' in path_segments
598+
if has_virtual_ext or has_collection:
599+
min_base_segments = 2 # Simple repo with virtual path
600+
else:
601+
min_base_segments = len(path_segments) # All segments = repo path
602+
else:
603+
min_base_segments = 2 # GitHub: owner/repo
583604
min_virtual_segments = min_base_segments + 1
584605

585606
if len(path_segments) >= min_virtual_segments:
@@ -683,6 +704,8 @@ def parse(cls, dependency_str: str) -> "DependencyReference":
683704
raise ValueError("Invalid Azure DevOps virtual package format: must be dev.azure.com/org/project/repo/path")
684705
repo_url = "/".join(parts[1:4]) # org/project/repo
685706
else:
707+
# For virtual packages with host prefix, base is always 2 segments
708+
# (virtual indicators already detected in early detection)
686709
repo_url = "/".join(parts[1:3]) # owner/repo
687710
elif len(parts) >= 2:
688711
# No host prefix
@@ -718,6 +741,9 @@ def parse(cls, dependency_str: str) -> "DependencyReference":
718741
if is_azure_devops_hostname(host) and len(parts) >= 4:
719742
# ADO format: dev.azure.com/org/project/repo
720743
user_repo = "/".join(parts[1:4])
744+
elif not is_github_hostname(host) and not is_azure_devops_hostname(host):
745+
# Generic host (GitLab, Gitea, etc.): all segments after host = repo path
746+
user_repo = "/".join(parts[1:])
721747
else:
722748
# GitHub format: github.com/user/repo
723749
user_repo = "/".join(parts[1:3])
@@ -728,6 +754,9 @@ def parse(cls, dependency_str: str) -> "DependencyReference":
728754
# Check if default host is ADO
729755
if is_azure_devops_hostname(host) and len(parts) >= 3:
730756
user_repo = "/".join(parts[:3]) # org/project/repo
757+
elif host and not is_github_hostname(host) and not is_azure_devops_hostname(host):
758+
# Generic host: all segments = repo path
759+
user_repo = "/".join(parts)
731760
else:
732761
user_repo = "/".join(parts[:2]) # user/repo
733762
else:
@@ -739,12 +768,12 @@ def parse(cls, dependency_str: str) -> "DependencyReference":
739768

740769
uparts = user_repo.split("/")
741770
is_ado_host = host and is_azure_devops_hostname(host)
742-
expected_parts = 3 if is_ado_host else 2
743771

744-
if len(uparts) < expected_parts:
745-
if is_ado_host:
772+
if is_ado_host:
773+
if len(uparts) < 3:
746774
raise ValueError(f"Invalid Azure DevOps repository format: {repo_url}. Expected 'org/project/repo'")
747-
else:
775+
else:
776+
if len(uparts) < 2:
748777
raise ValueError(f"Invalid repository format: {repo_url}. Expected 'user/repo'")
749778

750779
# Security: validate characters to prevent injection
@@ -784,13 +813,20 @@ def parse(cls, dependency_str: str) -> "DependencyReference":
784813

785814
# Validate path format based on host type
786815
is_ado_host = is_azure_devops_hostname(hostname)
787-
expected_parts = 3 if is_ado_host else 2
788816

789-
if len(path_parts) != expected_parts:
790-
if is_ado_host:
817+
if is_ado_host:
818+
if len(path_parts) != 3:
791819
raise ValueError(f"Invalid Azure DevOps repository path: expected 'org/project/repo', got '{path}'")
792-
else:
793-
raise ValueError(f"Invalid repository path: expected 'user/repo', got '{path}'")
820+
else:
821+
if len(path_parts) < 2:
822+
raise ValueError(f"Invalid repository path: expected at least 'user/repo', got '{path}'")
823+
# HTTPS URLs cannot embed virtual paths — reject virtual file extensions
824+
for pp in path_parts:
825+
if any(pp.endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS):
826+
raise ValueError(
827+
f"Invalid repository path: '{path}' contains a virtual file extension. "
828+
f"Use the dict format with 'path:' for virtual packages in HTTPS URLs"
829+
)
794830

795831
# Validate all path parts contain only allowed characters
796832
# ADO project names may contain spaces
@@ -820,9 +856,19 @@ def parse(cls, dependency_str: str) -> "DependencyReference":
820856
ado_project = ado_parts[1]
821857
ado_repo = ado_parts[2]
822858
else:
823-
# GitHub format: user/repo (2 segments)
824-
if not re.match(r'^[a-zA-Z0-9._-]+/[a-zA-Z0-9._-]+$', repo_url):
859+
# Non-ADO format: user/repo or group/subgroup/repo (2+ segments)
860+
segments = repo_url.split('/')
861+
if len(segments) < 2:
825862
raise ValueError(f"Invalid repository format: {repo_url}. Expected 'user/repo'")
863+
if not all(re.match(r'^[a-zA-Z0-9._-]+$', s) for s in segments):
864+
raise ValueError(f"Invalid repository format: {repo_url}. Contains invalid characters")
865+
# SSH/HTTPS URLs cannot embed virtual paths — reject virtual file extensions
866+
for seg in segments:
867+
if any(seg.endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS):
868+
raise ValueError(
869+
f"Invalid repository format: '{repo_url}' contains a virtual file extension. "
870+
f"Use the dict format with 'path:' for virtual packages in SSH/HTTPS URLs"
871+
)
826872
ado_organization = None
827873
ado_project = None
828874
ado_repo = None

tests/test_apm_package_models.py

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -110,24 +110,18 @@ def test_parse_invalid_formats(self):
110110

111111
def test_parse_malicious_url_bypass_attempts(self):
112112
"""Test that URL parsing prevents injection attacks.
113-
113+
114114
This tests the security fix for CWE-20: Improper Input Validation.
115115
With generic git host support, any valid FQDN is accepted as a host.
116116
The security focus is on preventing:
117-
- Path injection: 'evil.com/github.com/user/repo' → host=evil.com, but
118-
path 'github.com/user/repo' has 3 segments, violating the expected
119-
2-segment 'owner/repo' format → rejected
120117
- Protocol-relative URL attacks
121-
122-
Valid FQDNs like 'github.com.evil.com' are accepted because the hostname
123-
parses correctly and the path 'user/repo' has the expected 2 segments.
118+
119+
With nested group support on generic hosts, path segments that happen
120+
to look like hostnames (e.g., 'github.com/user/repo') are treated as
121+
repo path segments — not injection. The host is correctly identified.
124122
"""
125123
# Attack vectors that should still be REJECTED
126124
rejected_formats = [
127-
# Path injection: embedding github.com in path creates invalid repo format
128-
("evil.com/github.com/user/repo", "Use 'user/repo'"),
129-
("attacker.net/github.com/malicious/repo", "Use 'user/repo'"),
130-
131125
# Protocol-relative URL attacks
132126
("//evil.com/github.com/user/repo", "Protocol-relative URLs are not supported"),
133127
]
@@ -136,6 +130,18 @@ def test_parse_malicious_url_bypass_attempts(self):
136130
with pytest.raises(ValueError, match=expected_match):
137131
DependencyReference.parse(malicious_url)
138132

133+
# With generic git host support + nested groups, these are valid
134+
# (host is correctly identified, remaining segments are repo path)
135+
nested_group_on_generic_host = [
136+
("evil.com/github.com/user/repo", "evil.com", "github.com/user/repo"),
137+
("attacker.net/github.com/malicious/repo", "attacker.net", "github.com/malicious/repo"),
138+
]
139+
for url, expected_host, expected_repo in nested_group_on_generic_host:
140+
dep = DependencyReference.parse(url)
141+
assert dep.host == expected_host
142+
assert dep.repo_url == expected_repo
143+
assert dep.is_virtual is False
144+
139145
# With generic git host support, valid FQDNs are accepted as hosts.
140146
# These are not injection attacks — they are legitimate host references.
141147
accepted_as_generic_hosts = [

0 commit comments

Comments
 (0)