diff --git a/.apm/agents/test-coverage-expert.agent.md b/.apm/agents/test-coverage-expert.agent.md
index fd30e55d5..d0fa720cb 100644
--- a/.apm/agents/test-coverage-expert.agent.md
+++ b/.apm/agents/test-coverage-expert.agent.md
@@ -148,7 +148,12 @@ it via tool calls before emitting it as a finding. The procedure:
5. **Probe the test tree** with `view` / `grep` / `glob`:
- Look in `tests/unit//` for unit tests on the touched module.
- Look in `tests/integration/` for integration tests on the touched
- command or flow.
+ command or flow. New integration tests must follow the marker
+ placement contract in
+ [`.apm/instructions/tests.instructions.md`](../instructions/tests.instructions.md);
+ flag ungated live-network or runtime-binary calls in
+ `tests/integration/` as `recommended` regardless of whether the
+ test self-skips at runtime.
- Search for the specific symbol, error string, or flag name being
changed. Absence of ANY hit on the changed symbol is a strong
signal of a coverage gap.
diff --git a/.apm/instructions/tests.instructions.md b/.apm/instructions/tests.instructions.md
index 1378efb63..3a2b40949 100644
--- a/.apm/instructions/tests.instructions.md
+++ b/.apm/instructions/tests.instructions.md
@@ -107,3 +107,68 @@ production code must follow (see
- **Targeted runs during iteration.** Run the specific test file first
(`uv run pytest tests/unit/install/test_X.py -x`) before running the
full suite (`uv run pytest tests/unit tests/test_console.py`).
+
+## Integration tests: placement and markers
+
+The integration suite uses **declarative gating** via pytest markers,
+not per-file orchestrator enumeration. Adding a new integration test
+is two steps.
+
+### Procedure
+
+1. Drop the file under `tests/integration/test_.py`.
+2. At the top of the module, declare the runtime / network / E2E
+ prerequisites as a single `pytestmark`:
+
+ ```python
+ import pytest
+
+ pytestmark = pytest.mark.requires_network_integration
+ # OR for multiple prerequisites:
+ pytestmark = [
+ pytest.mark.requires_e2e_mode,
+ pytest.mark.requires_runtime_codex,
+ ]
+ ```
+
+That is it. The orchestrator (`scripts/test-integration.sh`) and the
+CI integration job collect everything under `tests/integration/` in
+a single `pytest` invocation; markers are honored automatically.
+
+### Marker selection
+
+Pick the marker that matches the **strongest** prerequisite the test
+has. The full registry lives in `pyproject.toml` under
+`[tool.pytest.ini_options].markers` and is documented (with the
+opt-in commands) in
+[`docs/src/content/docs/contributing/integration-testing.md`](../../docs/src/content/docs/contributing/integration-testing.md).
+Quick map for the common cases:
+
+| Test prerequisite | Marker |
+|----------------------------------------------|---------------------------------|
+| Real HTTP to APM-owned services | `requires_network_integration` |
+| Real codex / copilot / llm runtime binary | `requires_runtime_` |
+| Downloads runtimes; full E2E flow | `requires_e2e_mode` |
+| GitHub / ADO token required | `requires_github_token` / `requires_ado_pat` |
+| Paid or third-party external service | `live` (deselected by default) |
+| Performance measurement | `benchmark` (deselected by default) |
+| Hermetic (mocks all I/O) | *no marker required* |
+
+Need a marker that does not exist yet? Register it in
+`pyproject.toml` AND add a row to the docs registry table in the
+same PR. Both must stay in sync.
+
+### Anti-patterns (will land as `recommended` findings on review)
+
+- **Editing `scripts/test-integration.sh` per file.** The orchestrator
+ enumerates the directory, not the files. Per-file blocks are drift
+ by construction.
+- **Runtime self-skips inside the test body.** A bare
+ `if not os.getenv("APM_E2E_TESTS"): pytest.skip(...)` runs before
+ collection-time gating and weakens the contract. Use
+ module-level `pytestmark` instead -- declarative gating is the
+ single source of truth.
+- **Reading the gate env var inside test logic.** If your test
+ reads `APM_RUN_INTEGRATION_TESTS` to branch behaviour, the marker
+ is wrong (or missing). The marker is the gate; the test body
+ should assume the gate already passed.
diff --git a/.github/agents/test-coverage-expert.agent.md b/.github/agents/test-coverage-expert.agent.md
index fd30e55d5..c6e9cfb40 100644
--- a/.github/agents/test-coverage-expert.agent.md
+++ b/.github/agents/test-coverage-expert.agent.md
@@ -148,7 +148,12 @@ it via tool calls before emitting it as a finding. The procedure:
5. **Probe the test tree** with `view` / `grep` / `glob`:
- Look in `tests/unit//` for unit tests on the touched module.
- Look in `tests/integration/` for integration tests on the touched
- command or flow.
+ command or flow. New integration tests must follow the marker
+ placement contract in
+ [`.apm/instructions/tests.instructions.md`](../../.apm/instructions/tests.instructions.md);
+ flag ungated live-network or runtime-binary calls in
+ `tests/integration/` as `recommended` regardless of whether the
+ test self-skips at runtime.
- Search for the specific symbol, error string, or flag name being
changed. Absence of ANY hit on the changed symbol is a strong
signal of a coverage gap.
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 8b37a0200..f72f6c294 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -1,6 +1,6 @@
-
-
+
+
# Linting (canonical contract)
@@ -49,4 +49,4 @@ skills; honor this instruction before invoking them.
---
*This file was generated by APM CLI. Do not edit manually.*
-*To regenerate: `specify apm compile`*
+*To regenerate: `apm compile`*
diff --git a/.github/instructions/tests.instructions.md b/.github/instructions/tests.instructions.md
index 1378efb63..3a2b40949 100644
--- a/.github/instructions/tests.instructions.md
+++ b/.github/instructions/tests.instructions.md
@@ -107,3 +107,68 @@ production code must follow (see
- **Targeted runs during iteration.** Run the specific test file first
(`uv run pytest tests/unit/install/test_X.py -x`) before running the
full suite (`uv run pytest tests/unit tests/test_console.py`).
+
+## Integration tests: placement and markers
+
+The integration suite uses **declarative gating** via pytest markers,
+not per-file orchestrator enumeration. Adding a new integration test
+is two steps.
+
+### Procedure
+
+1. Drop the file under `tests/integration/test_.py`.
+2. At the top of the module, declare the runtime / network / E2E
+ prerequisites as a single `pytestmark`:
+
+ ```python
+ import pytest
+
+ pytestmark = pytest.mark.requires_network_integration
+ # OR for multiple prerequisites:
+ pytestmark = [
+ pytest.mark.requires_e2e_mode,
+ pytest.mark.requires_runtime_codex,
+ ]
+ ```
+
+That is it. The orchestrator (`scripts/test-integration.sh`) and the
+CI integration job collect everything under `tests/integration/` in
+a single `pytest` invocation; markers are honored automatically.
+
+### Marker selection
+
+Pick the marker that matches the **strongest** prerequisite the test
+has. The full registry lives in `pyproject.toml` under
+`[tool.pytest.ini_options].markers` and is documented (with the
+opt-in commands) in
+[`docs/src/content/docs/contributing/integration-testing.md`](../../docs/src/content/docs/contributing/integration-testing.md).
+Quick map for the common cases:
+
+| Test prerequisite | Marker |
+|----------------------------------------------|---------------------------------|
+| Real HTTP to APM-owned services | `requires_network_integration` |
+| Real codex / copilot / llm runtime binary | `requires_runtime_` |
+| Downloads runtimes; full E2E flow | `requires_e2e_mode` |
+| GitHub / ADO token required | `requires_github_token` / `requires_ado_pat` |
+| Paid or third-party external service | `live` (deselected by default) |
+| Performance measurement | `benchmark` (deselected by default) |
+| Hermetic (mocks all I/O) | *no marker required* |
+
+Need a marker that does not exist yet? Register it in
+`pyproject.toml` AND add a row to the docs registry table in the
+same PR. Both must stay in sync.
+
+### Anti-patterns (will land as `recommended` findings on review)
+
+- **Editing `scripts/test-integration.sh` per file.** The orchestrator
+ enumerates the directory, not the files. Per-file blocks are drift
+ by construction.
+- **Runtime self-skips inside the test body.** A bare
+ `if not os.getenv("APM_E2E_TESTS"): pytest.skip(...)` runs before
+ collection-time gating and weakens the contract. Use
+ module-level `pytestmark` instead -- declarative gating is the
+ single source of truth.
+- **Reading the gate env var inside test logic.** If your test
+ reads `APM_RUN_INTEGRATION_TESTS` to branch behaviour, the marker
+ is wrong (or missing). The marker is the gate; the test body
+ should assume the gate already passed.
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 57ab88b9b..3dd564e1d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -22,6 +22,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Integration tests now use marker-driven discovery: 21 `pytestmark = pytest.mark.skipif(...)` chains across `tests/integration/` are replaced with declarative `requires_*` markers, with precondition logic centralized in `tests/integration/conftest.py` and auto-skipping at collection time. PR1 of #1166. (#1167)
- Integration test apm-binary resolution now prefers the local build (`./dist/apm--/apm`) over a system-wide `apm` on `PATH`, so contributors validating the binary under test are not silently shadowed by a global install; the bearer-token marker (`requires_ado_bearer`) discards the captured JWT immediately and persists only the boolean outcome. (#1167)
- `scripts/test-integration.sh` is now a thin orchestrator: it builds/locates the apm binary, sets up runtimes and tokens, then invokes `pytest tests/integration/` exactly once. The 28 per-file pytest enumerations were removed; the marker registry handles per-test gating, and new test files dropped into `tests/integration/` are picked up automatically. PR2 of #1166. (#1247)
+- Integration-test marker procedure codified as `.apm/instructions/tests.instructions.md` (wired into `test-coverage-expert` persona) and guarded by a regression-trap test that asserts `pyproject.toml`, `tests/integration/conftest.py::_MARKER_CHECKS`, the docs registry table, and the instructions rule stay in sync. (#1166)
### Fixed
diff --git a/apm.lock.yaml b/apm.lock.yaml
index a690b45c4..1c7666e82 100644
--- a/apm.lock.yaml
+++ b/apm.lock.yaml
@@ -45,7 +45,7 @@ local_deployed_file_hashes:
.github/agents/oss-growth-hacker.agent.md: sha256:1cd56bb78ab37d52c50e45ab69d759f775cd49cdf35981b3dc6c4004315c6b83
.github/agents/python-architect.agent.md: sha256:7587ee7c684c61046a83dfa1b7e39d1345f2f119c3395478e3ca2dbbaaaff0e9
.github/agents/supply-chain-security-expert.agent.md: sha256:8fb8cc426d6af17ba084a28b3f026c2b475b62e3ca63ed2f88b83bd823f877af
- .github/agents/test-coverage-expert.agent.md: sha256:4df107de6179b5237fa9300582921e7fcb439928649601f0fef2d3b9275cea40
+ .github/agents/test-coverage-expert.agent.md: sha256:bc588d89530362469502bfbea788df892a9a0b00e630cd0f3926d3dfd2c2a9e2
.github/instructions/changelog.instructions.md: sha256:1e51ec4c74e847967962bd279dc4c6e582c5d3578490b3c28d5f3acd3e05f73e
.github/instructions/cicd.instructions.md: sha256:9c0fafc74f743aa97e5adba2168d66c9e3a327b135065e3b804bdbb5f04cda5d
.github/instructions/cli.instructions.md: sha256:8e39e8d5047ce88575cb02f87c2bcede584dfef258bd86f7466c7badf136541a
@@ -54,4 +54,4 @@ local_deployed_file_hashes:
.github/instructions/integrators.instructions.md: sha256:b151e0438088d2c0b636dfc28532ecf43c3b51e5f1070a354b8d5b57c345e335
.github/instructions/linting.instructions.md: sha256:312acd32353567834ec9f4f246710a47a991729a11c0380aa6a010b63de607eb
.github/instructions/python.instructions.md: sha256:45173f778eddc126c37c7ace96acd0e17adb1895031eec134ec0754638d3ba37
- .github/instructions/tests.instructions.md: sha256:4c6335e3373f9735778a05913f2d8ef250d118f8c5305e70ba407e578a525ef7
+ .github/instructions/tests.instructions.md: sha256:b527ccaecf0e92f74d300fc9027f1bc49bb43d8ddcdd36338c1556fcde0d8b2d
diff --git a/tests/integration/test_marker_registry_sync.py b/tests/integration/test_marker_registry_sync.py
new file mode 100644
index 000000000..ab3fca581
--- /dev/null
+++ b/tests/integration/test_marker_registry_sync.py
@@ -0,0 +1,232 @@
+"""Marker-registry sync invariants for the integration suite.
+
+This test asserts the integrity contract surfaced in
+``.apm/instructions/tests.instructions.md`` (section "Integration tests:
+placement and markers"): the marker names referenced by the rule, declared
+in ``pyproject.toml``, documented in
+``docs/src/content/docs/contributing/integration-testing.md``, and wired
+into ``tests/integration/conftest.py::_MARKER_CHECKS`` MUST stay in sync.
+
+Why this matters: the rule tells future contributors and agents that
+
+* ``pyproject.toml`` is the marker source of truth,
+* the docs table is the canonical registry,
+* the conftest predicate map is the gate logic.
+
+If any of those drifts, the rule's pointer becomes misleading and a future
+PR that adds an ungated network/runtime test will not be caught by the
+collection-time gate it claims to honour.
+
+The tests are hermetic (read-only on repo files); no marker required.
+"""
+
+from __future__ import annotations
+
+import re
+import sys
+from pathlib import Path
+
+import tomllib
+
+# ---------------------------------------------------------------------------
+# Resolve repo root from this file location -- the test relies on relative
+# layout, not on a cwd assumption.
+# ---------------------------------------------------------------------------
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+PYPROJECT = REPO_ROOT / "pyproject.toml"
+DOCS_REGISTRY = (
+ REPO_ROOT / "docs" / "src" / "content" / "docs" / "contributing" / "integration-testing.md"
+)
+APM_RULE = REPO_ROOT / ".apm" / "instructions" / "tests.instructions.md"
+CONFTEST = REPO_ROOT / "tests" / "integration" / "conftest.py"
+INTEGRATION_DIR = REPO_ROOT / "tests" / "integration"
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _pyproject_marker_names() -> set[str]:
+ """Return the set of marker names declared in pyproject.toml."""
+ with PYPROJECT.open("rb") as fh:
+ data = tomllib.load(fh)
+ raw = data["tool"]["pytest"]["ini_options"]["markers"]
+ out: set[str] = set()
+ for line in raw:
+ name = line.split(":", 1)[0].strip()
+ if name:
+ out.add(name)
+ return out
+
+
+def _docs_registry_marker_names() -> set[str]:
+ """Parse the marker table in the docs registry and return marker names.
+
+ Table rows have shape: ``| `marker_name` | precondition | how |``.
+ """
+ text = DOCS_REGISTRY.read_text(encoding="utf-8")
+ out: set[str] = set()
+ # Match the first cell content if it is a backtick-quoted identifier.
+ row_re = re.compile(r"^\|\s*`([a-z_][a-z0-9_]*)`\s*\|", re.MULTILINE)
+ for m in row_re.finditer(text):
+ out.add(m.group(1))
+ return out
+
+
+def _conftest_marker_names() -> set[str]:
+ """Return marker names that have a predicate registered in conftest."""
+ # Static parse to avoid importing the conftest module under test.
+ text = CONFTEST.read_text(encoding="utf-8")
+ # Look for entries inside the _MARKER_CHECKS dict: ' "name": ('
+ key_re = re.compile(r'^\s*"(requires_[a-z0-9_]+)"\s*:\s*\(', re.MULTILINE)
+ return set(key_re.findall(text))
+
+
+def _gating_markers_in_pyproject() -> set[str]:
+ """Subset of pyproject markers that gate execution (requires_* + live).
+
+ ``integration``, ``slow``, ``benchmark`` are taxonomy markers, not
+ gates, and are intentionally excluded from the docs registry.
+ """
+ names = _pyproject_marker_names()
+ return {n for n in names if n.startswith("requires_") or n == "live"}
+
+
+# ---------------------------------------------------------------------------
+# Tests
+# ---------------------------------------------------------------------------
+
+
+def test_every_pyproject_gating_marker_has_conftest_predicate() -> None:
+ """A declared ``requires_*`` marker without a predicate is dead config.
+
+ If pyproject declares ``requires_X`` but ``_MARKER_CHECKS`` has no
+ entry for it, the marker never auto-skips and the gate is a lie.
+ """
+ declared = {n for n in _pyproject_marker_names() if n.startswith("requires_")}
+ wired = _conftest_marker_names()
+ missing = sorted(declared - wired)
+ assert not missing, (
+ f"requires_* markers in pyproject.toml missing a predicate in "
+ f"tests/integration/conftest.py::_MARKER_CHECKS: {missing}"
+ )
+
+
+def test_every_conftest_predicate_is_declared_in_pyproject() -> None:
+ """A wired predicate without a marker declaration warns with PytestUnknownMarkWarning."""
+ declared = _pyproject_marker_names()
+ wired = _conftest_marker_names()
+ orphans = sorted(wired - declared)
+ assert not orphans, (
+ f"_MARKER_CHECKS entries with no matching pyproject.toml marker "
+ f"declaration (would emit PytestUnknownMarkWarning): {orphans}"
+ )
+
+
+def test_every_gating_marker_is_documented_in_registry() -> None:
+ """Every ``requires_*`` / ``live`` marker MUST appear in the docs table.
+
+ The instructions rule tells agents and humans to consult the docs
+ table as the canonical registry. A marker missing from the table is
+ invisible to anyone following the rule.
+ """
+ gating = _gating_markers_in_pyproject()
+ documented = _docs_registry_marker_names()
+ missing = sorted(gating - documented)
+ assert not missing, (
+ f"Gating markers declared in pyproject.toml but missing from "
+ f"docs/src/content/docs/contributing/integration-testing.md "
+ f"registry table: {missing}"
+ )
+
+
+def test_docs_registry_only_names_declared_markers() -> None:
+ """The docs table must not advertise a marker that does not exist."""
+ declared = _pyproject_marker_names()
+ documented = _docs_registry_marker_names()
+ phantom = sorted(documented - declared)
+ assert not phantom, (
+ f"Markers documented in the docs registry but not declared in pyproject.toml: {phantom}"
+ )
+
+
+def test_apm_rule_only_names_declared_markers() -> None:
+ """Marker names cited in the APM instructions rule must really exist.
+
+ Scans for ``requires_*`` tokens in the rule body and asserts each one
+ matches a declared marker (or the documented ``requires_runtime_``
+ placeholder).
+ """
+ declared = _pyproject_marker_names()
+ # Allow the literal placeholder used in the docs/quick-map.
+ declared_with_placeholder = declared | {"requires_runtime_"}
+ body = APM_RULE.read_text(encoding="utf-8")
+ # Match identifiers OR the placeholder shape inside backticks / plain.
+ token_re = re.compile(r"\brequires_(?:runtime_|[a-z][a-z0-9_]*)")
+ cited = set(token_re.findall(body))
+ # Re-prefix because the regex captured only the suffix; rebuild full names.
+ # (The regex above intentionally returns the full token thanks to \b
+ # anchoring and the group inside; re-derive with findall on full pattern.)
+ full_re = re.compile(r"\brequires_(?:runtime_|[a-z][a-z0-9_]+)")
+ cited = set(full_re.findall(body))
+ unknown = sorted(cited - declared_with_placeholder)
+ assert not unknown, (
+ f"Marker names cited in .apm/instructions/tests.instructions.md "
+ f"that are not declared in pyproject.toml: {unknown}"
+ )
+
+
+def test_integration_tests_use_pytestmark_not_runtime_self_skip() -> None:
+ """Tests must declare gates via ``pytestmark``, not runtime ``os.getenv``.
+
+ The rule says: never write ``if not os.getenv("APM_E2E_TESTS"):
+ pytest.skip(...)`` inside a test body. Use module-level ``pytestmark =
+ pytest.mark.requires_e2e_mode`` instead. This guard catches future
+ regressions of that pattern in ``tests/integration/test_*.py``.
+
+ ``conftest.py`` is intentionally exempt: the marker registry itself
+ must read the env vars to implement the gate.
+ """
+ gate_env_vars = (
+ "APM_E2E_TESTS",
+ "APM_RUN_INTEGRATION_TESTS",
+ "APM_RUN_INFERENCE_TESTS",
+ "APM_TEST_ADO_BEARER",
+ )
+ # Two-line window pattern: an os.getenv on a gate var followed by
+ # pytest.skip on the next non-blank line (or same line).
+ offenders: list[str] = []
+ this_file = Path(__file__).resolve()
+ for path in sorted(INTEGRATION_DIR.glob("test_*.py")):
+ if path.resolve() == this_file:
+ # The lint itself names the env var strings literally; skip.
+ continue
+ text = path.read_text(encoding="utf-8")
+ for var in gate_env_vars:
+ # Heuristic: an os.getenv(...gate var...) call AND a pytest.skip
+ # in the SAME function body. We check for the var name appearing
+ # within 200 chars of pytest.skip.
+ for m in re.finditer(rf'os\.getenv\(\s*[\'"]{var}[\'"]', text):
+ window = text[m.start() : m.start() + 400]
+ if "pytest.skip" in window or "pytest.exit" in window:
+ offenders.append(f"{path.relative_to(REPO_ROOT)}: gates on {var}")
+ break
+ assert not offenders, (
+ "Integration test files use runtime os.getenv self-skip on a gate "
+ "env var. Replace with a module-level pytestmark = pytest.mark."
+ "requires_* marker (see .apm/instructions/tests.instructions.md):\n "
+ + "\n ".join(offenders)
+ )
+
+
+# ---------------------------------------------------------------------------
+# Sanity: confirm tomllib is available (Python 3.11+). All CI matrix
+# entries currently use 3.12+; this test is the canary if that changes.
+# ---------------------------------------------------------------------------
+
+
+def test_tomllib_available() -> None:
+ """tomllib was added in 3.11; CI runs on 3.12. Guard the assumption."""
+ assert sys.version_info >= (3, 11), "This suite needs Python 3.11+ for tomllib"