fix: apm install no longer permanently blocked by policy after lockfile wipe (required-packages-deployed catch-22)#1313
Conversation
…integrators
Restores symmetry with skill_integrator's content-identity adopt
(`_dirs_equal` short-circuit) for agent, instruction, prompt, and
command integrators. Without this, a degraded `apm.lock.yaml` (missing
`deployed_files`) combined with the original deployed files still on
disk produces a permanent catch-22:
1. `check_collision` sees a pre-existing file not in `managed_files`
2. Skips integration -> `_attach_deployed_files` writes empty list
3. Next install: `policy_gate` runs *before* `integrate` in the
pipeline, so `required-packages-deployed` is evaluated against
the empty `deployed_files` and blocks the install
4. Install can never self-heal because adopt was the only path
Fix is conservative: `is_content_identical_to_source` returns True
only on byte-identical match, so format-transforming targets (cursor,
claude, windsurf, gemini rules; codex agents) intentionally do not
adopt and continue to skip-as-user-authored.
Reproduced by zava-storefront's failed install of `secure-baseline`.
Tests:
- 8 unit tests in test_content_identical_adopt.py
- 2 e2e tests in test_silent_adopt_existing_files_e2e.py reproducing
the catch-22 against microsoft/apm-sample-package
- Both e2e tests fail on main without the fix, pass with it
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a self-healing gap in apm install where non-skill integrators could repeatedly skip pre-existing (but byte-identical) deployed files after lockfile degradation, leaving deployed_files empty and causing required-packages-deployed to block subsequent installs.
Changes:
- Add a shared
BaseIntegrator.is_content_identical_to_source()helper to detect byte-identical target/source files. - Wire a pre-
check_collision“adopt” short-circuit into agent, instruction, prompt, and command integration loops so identical on-disk files are recorded as deployed. - Add new unit + integration regression tests covering lockfile-wipe reinstall behavior and per-integrator adoption paths.
Show a summary per file
| File | Description |
|---|---|
| src/apm_cli/integration/base_integrator.py | Adds shared helper to detect byte-identical pre-existing files for adoption. |
| src/apm_cli/integration/agent_integrator.py | Adopts identical pre-existing agent files before collision handling (including legacy multi-target paths). |
| src/apm_cli/integration/instruction_integrator.py | Adopts identical pre-existing instruction files before collision handling (primary repro fix). |
| src/apm_cli/integration/prompt_integrator.py | Adopts identical pre-existing prompt files before collision handling. |
| src/apm_cli/integration/command_integrator.py | Adopts identical pre-existing command outputs (only meaningful for identity-copy paths). |
| tests/unit/integration/test_content_identical_adopt.py | Adds unit tests for helper + per-integrator adopt behavior. |
| tests/integration/test_silent_adopt_existing_files_e2e.py | Adds E2E regression tests for lockfile wipe + reinstall self-healing. |
| CHANGELOG.md | Adds an Unreleased entry describing the fix. |
Copilot's findings
- Files reviewed: 8/8 changed files
- Comments generated: 3
|
|
||
| ### Fixed | ||
|
|
||
| - Non-skill integrators (agent, instruction, prompt, command) now silently *adopt* byte-identical pre-existing files instead of treating them as user-authored and skipping integration. This restores symmetry with the skill integrator (which already had this content-identity short-circuit via `_dirs_equal`) and breaks a permanent catch-22: when a project's `apm.lock.yaml` lost `deployed_files` for non-skill packages (e.g. lockfile wiped, hand-edited, partial-install crash, or regenerated by an older APM build) and the deployed files were still on disk, every subsequent `apm install` would re-skip the files, write empty `deployed_files` back, and the next install with `required-packages-deployed` enforced was permanently blocked because `policy_gate` runs before `integrate` in the pipeline. Adopt is conservative-by-design: only fires for byte-identical matches, so format-transforming targets (cursor / claude / windsurf / gemini rules and commands, codex agents) intentionally do not adopt. Fixes the zava-storefront repro where `apm install` blocked on `secure-baseline` despite the file being present and identical. |
There was a problem hiding this comment.
Addressed in 75f91a7 -- shortened the Unreleased Fixed entry to a single concise line ending with (#1313), matching the repo changelog contract.
| from apm_cli.integration.instruction_integrator import InstructionIntegrator | ||
| from apm_cli.integration.prompt_integrator import PromptIntegrator | ||
| from apm_cli.integration.targets import KNOWN_TARGETS | ||
| from apm_cli.models.apm_package import APMPackage, GitReferenceType, PackageInfo, ResolvedReference |
There was a problem hiding this comment.
Addressed in 75f91a7 -- split into a parenthesized multi-line import. (Line was 99 chars, just under the 100-char ruff limit, so format --check was passing, but the multi-line form is clearer.)
| @pytest.fixture | ||
| def apm_command(): | ||
| """Resolve an apm binary that is wired to *this* checkout's source. | ||
|
|
||
| The repo binary (homebrew/system ``apm``) may be a stable release that | ||
| predates the fix under test, which would silently make this regression | ||
| test pass against unfixed code. Prefer a venv-installed editable | ||
| binary so the test exercises the in-repo apm_cli source. | ||
|
|
||
| Resolution order: | ||
| 1. ``APM_TEST_BINARY`` env override (CI / explicit pin). | ||
| 2. Repo-local ``.venv/bin/apm`` (this worktree). | ||
| 3. Sibling ``../awd-cli/.venv/bin/apm`` (shared dev venv pointing | ||
| at this worktree via ``pip install -e .``). | ||
| 4. ``apm`` on PATH (last resort; may be stale). | ||
| """ | ||
| import os | ||
|
|
||
| override = os.environ.get("APM_TEST_BINARY") | ||
| if override and Path(override).exists(): | ||
| return override | ||
|
|
||
| repo_root = Path(__file__).parent.parent.parent | ||
| candidates = [ | ||
| repo_root / ".venv" / "bin" / "apm", | ||
| repo_root.parent / "awd-cli" / ".venv" / "bin" / "apm", | ||
| ] | ||
| for candidate in candidates: | ||
| if candidate.exists(): | ||
| return str(candidate) | ||
|
|
||
| apm_on_path = shutil.which("apm") | ||
| if apm_on_path: | ||
| return apm_on_path | ||
| return "apm" |
There was a problem hiding this comment.
Addressed in 75f91a7 -- APM_BINARY_PATH is now consulted right after the explicit APM_TEST_BINARY test override (still ahead of .venv / PATH discovery), so CI's built artifact wins over a stale system or venv apm. Kept the test-local fixture rather than swapping in shared apm_binary_path because this regression specifically needs to fall back to the in-repo editable install when neither env var is set.
Convergent panel signals addressed in-PR: - Wire is_content_identical_to_source into hook_integrator script-copy paths at :602 (vscode flow) and :790 (claude/cursor/codex merged flow). 3-panelist convergence (architect, devx-ux, test-coverage): same shutil.copy2-after-check_collision shape as the four non-skill integrators just fixed -- a degraded lockfile would have reproduced the catch-22 for any package shipping hook scripts. - Reject symlinks in is_content_identical_to_source. supply-chain flagged that the adopt branch fires before check_collision and therefore skips its containment guard; Path.read_bytes() follows symlinks silently. is_symlink() rejection closes the bypass without needing project_root context. - Tighten CHANGELOG entry from 8 lines to 3 sentences and add (#1313) citation. doc-writer flagged a 4-5x deviation from the established one-to-two sentence Unreleased voice. - Flatten agent_integrator.py:406 if/elif/else into the linear if/else+continue shape so the primary path matches the other three integrators and the same file's per-file loop at line 134. (Behavior preserved -- claude/cursor mirror still runs after primary adopt, same as before.) - Add TestHookScriptAdopt to test_hook_integrator.py (2 tests): byte-identical pre-existing scripts are adopted with managed_files= None; modified scripts are not. Closes the test-coverage 'no regression-trap' gap on the deferred hook surface. Deferred to follow-up issue (per CEO synthesis): - IntegrationResult.files_adopted observability counter (cli-logging + devx-ux convergence) - e2e test redesign with real policy fixture (devx-ux + test-coverage convergence) - BaseIntegrator.try_adopt_identical helper extraction (architect) - PR retitle and growth-narrative framing (oss-growth) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI Lint job rejected: - RUF100 unused noqa: S603 in tests/integration/test_silent_adopt_existing_files_e2e.py:88 - formatter diffs in test_content_identical_adopt.py and test_hook_integrator.py Mirrors the CI Lint contract at .apm/instructions/linting.instructions.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-identical-adopt # Conflicts: # CHANGELOG.md
Update — review-panel amendments + green CINote This comment summarizes commits TL;DRActed on the convergent recommendations from a local Problem (WHY) — the panel's convergent findings
Why these matter: the panel's Approach (WHAT)
Implementation (HOW)
DiagramLegend: control flow inside flowchart LR
A[for source_file in scripts] --> B{target exists?}
B -- "no" --> C[copy source to target]
B -- "yes" --> D[is_content_identical_to_source]
D --> E{symlink?}
E -- "yes" --> F[reject, treat as collision]
E -- "no" --> G{bytes equal?}
G -- "yes" --> H[append to target_paths, continue]
G -- "no" --> F
F --> I[check_collision, on_collision_skip]
C --> J[append to target_paths]
H --> K[next source_file]
I --> K
J --> K
classDef new stroke-dasharray: 5 5, stroke-width: 2px;
class D,E,H new;
Trade-offs
Benefits
ValidationCI on Local lint mirror: Local pytest (1147 unit + integration, 26s)Scenario Evidence
How to test
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com |
- CHANGELOG: shorten Unreleased Fixed entry to single line per repo changelog contract; PR ref (#1313) preserved. - test_content_identical_adopt: split long apm_package import into parenthesized multi-line form for readability. - test_silent_adopt_existing_files_e2e: include APM_BINARY_PATH in apm_command resolution order so CI's built artifact is preferred over a stale system/venv apm. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address review-panel findings on PR #1313: Security/architecture: - Close TOCTOU race in BaseIntegrator.is_content_identical_to_source by reading the deploy path with O_NOFOLLOW on POSIX (falls back to plain read on Windows where the constant is unavailable). Defends against an attacker swapping the file for a symlink between the is_symlink() check and the read_bytes() call. - Add ensure_path_within containment guards before the adopt branch in agent_integrator (4 sites: copilot primary, copilot package, claude secondary, cursor secondary) and prompt_integrator (1 site) to match the pattern already enforced by command/instruction/hook integrators. UX visibility: - New IntegrationResult.files_adopted counter, wired through all five non-skill integrators (agent, prompt, instruction, command, hook) and HookIntegrationResult. - install/services.py aggregates per-kind adopt counts and surfaces them in the install summary so adopt-only runs no longer look like silent no-ops: Both: N X integrated (M adopted) -> path Adopt-only: M X adopted -> path Tests: - tests/unit/integration/test_content_identical_adopt.py: add three test classes covering the symlink guard (target-symlink rejected, source-symlink rejected, regression trap), the files_adopted counter on instruction/prompt/agent, and the previously untested cursor/claude secondary adopt sites in integrate_package_agents. - tests/unit/install/test_services_rendering.py: add adopt-only and mixed integrate+adopt rendering assertions so the visibility contract is locked in. Verified locally: 8350 unit tests pass, ruff check + ruff format both silent (CI Lint job mirror). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
apm installcan land in a permanent catch-22 where every subsequent install fails therequired-packages-deployedpolicy gate even though the deployed files are still on disk and byte-identical to the package source.Reproduced by zava-storefront:
The lockfile shows
deployed_files: []forsecure-baselinedespitesecure-baseline.instructions.mdbeing present and byte-identical to the package source.Root cause: skill vs. non-skill asymmetry
SkillIntegrator._promote_sub_skillsalready silently adopts byte-identical pre-existing files via the_dirs_equalshort-circuit (skill_integrator.py:590-594). The other four integrators (agent,instruction,prompt,command) do not — they callcheck_collision, which treats any pre-existing file whose path is not inmanaged_filesas user-authored, and skip integration.The catch-22:
deployed_filesfor a non-skill package while files remain on disk (lockfile wiped, hand-edited, partial-install crash, regenerated by an older APM build).check_collisionsees pre-existing files not inmanaged_files→ skip._attach_deployed_fileswrites emptydeployed_filesback to the lockfile.policy_gateruns beforeintegrateinpipeline.py:449→600, sorequired-packages-deployedis evaluated against the emptydeployed_filesand blocks install.--no-policyor--forceare the only escape hatches.Fix
Mirror the skill-integrator's content-identity adopt in the other four integrators via a shared helper:
In each per-file loop,
if is_content_identical_to_source(target, source): target_paths.append(target); continueruns beforecheck_collision. Adopt is conservative-by-design: byte-identical only, so format-transforming targets (cursor / claude / windsurf / gemini rules and commands, codex agents) intentionally do not adopt and continue with the existing skip-as-user-authored behavior. Only the straight-copy paths (copilotagent.md, copilotinstructions.md, copilotprompt.md, copilotcommand.md) benefit, which is exactly the failing scenario.Files changed
src/apm_cli/integration/base_integrator.py— addedis_content_identical_to_sourcestatic helper.src/apm_cli/integration/agent_integrator.py— wired adopt at the main per-file loop and at three secondary call sites in the legacy multi-target loop.src/apm_cli/integration/instruction_integrator.py— wired adopt at main loop. The direct fix for the zava-storefront repro.src/apm_cli/integration/prompt_integrator.py— wired adopt at main loop.src/apm_cli/integration/command_integrator.py— wired adopt at main loop (only fires for untransformed copy paths).Tests
Unit (
tests/unit/integration/test_content_identical_adopt.py) — 8 new tests:E2E (
tests/integration/test_silent_adopt_existing_files_e2e.py) — 2 new tests reproducing the catch-22 againstmicrosoft/apm-sample-package:test_reinstall_with_wiped_lockfile_repopulates_deployed_files— install, wipeapm.lock.yaml, re-install, assertdeployed_filesmatches the original set byte-for-byte.test_required_packages_deployed_passes_after_lockfile_wipe— same but asserts the second install's lockfile has non-emptydeployed_filesfor the package.Both e2e tests fail on
mainwithout the fix, pass with the fix (TDD-verified by stashing source changes only and re-running).Regression sweep:
tests/unit/integration/ + tests/test_collision_integration.py + tests/test_lockfile.py→ 1193 passed, no regressions.Unblock for affected projects
Once released:
Today's workaround:
apm install --forceor--no-policy.Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>