Refactor test-integration.sh to fixture-driven pytest discovery (eliminate manual test wiring)

## Problem

`scripts/test-integration.sh` is a 665-line bash script that explicitly enumerates ~20 `pytest tests/integration/test_*.py` invocations, one per file. Adding a new integration test file requires editing the script. Forgetting to wire a file means the test silently never runs.

This is not a hypothetical risk -- it just bit us:

- `tests/integration/test_skill_install.py` was never wired into the script. Two existing tests (`test_reinstall_same_skill_is_idempotent`, and the new `test_reinstall_does_not_leak_apm_pin_to_deploy_targets` from PR #1153) silently rotted until I noticed and patched it in PR #1153 (commit `941c6942`).
- No CI workflow runs the full integration script. `.github/workflows/ci-integration.yml` only invokes `test_runtime_smoke.py`; the other ~19 files run only on manual release-validation execution. Most "integration coverage" runs on the honor system.

## Real constraints (what's legitimate)

The script does several things that genuinely need orchestration:

- Builds a PyInstaller binary (or reuses CI artifact at `./dist/$BINARY_NAME/apm`)
- Symlinks the binary into `$PATH` so tests exercise the **shipped binary**, not in-process `python -m apm_cli`
- Sets up runtimes (copilot/codex/llm) by invoking `apm runtime setup`
- Resolves GitHub tokens via `github-token-helper.sh`

These are real concerns. **Per-file pytest enumeration is not.**

## What's wrong (per Python testing world-class standards)

1. **Manual wiring is the bug we just hit.** Pytest's whole value-prop is auto-discovery (`pytest <dir>` finds and runs everything matching `test_*.py`). The bash script defeats that.
2. **Bash is the wrong layer for test selection, reporting, and exit codes.** The 20 `if pytest ... ; then log_success ; else log_error ; exit 1 ; fi` blocks reimplement what pytest does natively, and worse: a single test file failure aborts the whole run via `set -e`, masking subsequent failures.
3. **No marker-based gating.** Different files need different env (`ADO_APM_PAT`, `GITHUB_APM_PAT`, `APM_RUN_INTEGRATION_TESTS=1`, runtime presence). Today this is encoded as conditional bash branches; it should be `@pytest.mark.requires_ado_pat` etc., selectable via `pytest -m`.
4. **The "built binary" constraint doesn't justify bash sprawl.** It's a `session`-scoped fixture, not a loop.

## Proposed pattern

```python
# tests/integration/conftest.py
@pytest.fixture(scope="session")
def apm_binary():
    # build once OR read APM_BINARY_PATH from env (CI), prepend to PATH
    ...
```

```toml
# pyproject.toml
[tool.pytest.ini_options]
markers = [
    "requires_apm_binary",
    "requires_github_apm_pat",
    "requires_ado_pat",
    "requires_runtime_codex",
    "requires_runtime_copilot",
    "requires_runtime_llm",
]
```

```python
# Per test file (replaces today's pytestmark skipif chains):
pytestmark = [pytest.mark.requires_apm_binary, pytest.mark.requires_github_apm_pat]
```

`scripts/test-integration.sh` shrinks to ~50 lines: detect platform, build/locate binary, set up runtimes, export tokens, then **one** invocation:

```bash
pytest tests/integration/ -v --tb=short \
  $( [[ -z "${ADO_APM_PAT:-}" ]] && echo '-m "not requires_ado_pat"' )
```

Pytest handles discovery, selection, reporting, exit codes. Bash handles env setup. Each layer does what it's good at.

## Acceptance criteria

- `scripts/test-integration.sh` no longer enumerates individual `pytest test_*.py` calls; it invokes `pytest tests/integration/` with marker selection.
- A new test file dropped into `tests/integration/` with the right markers automatically runs without script edits.
- All currently-wired integration tests still run (no regressions).
- Conditional gating (ADO PAT, GitHub PAT, runtime presence) preserved via markers + skipif fixtures, not bash branches.
- CI integration workflow (`.github/workflows/ci-integration.yml`) optionally extended to run the full marker-filtered suite (currently only runs smoke).

## Out of scope

- Migrating tests off the built binary to in-process invocation (that's a separate, larger discussion).
- Adding new test coverage (this is a pure refactor).
- Changing `github-token-helper.sh` or runtime-setup commands.

## References

- PR #1153 (where the wiring gap was discovered and patched in commit `941c6942`)
- `scripts/test-integration.sh:336-664` (the enumerated pytest blocks)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor test-integration.sh to fixture-driven pytest discovery (eliminate manual test wiring) #1166

Problem

Real constraints (what's legitimate)

What's wrong (per Python testing world-class standards)

Proposed pattern

Acceptance criteria

Out of scope

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor test-integration.sh to fixture-driven pytest discovery (eliminate manual test wiring) #1166

Description

Problem

Real constraints (what's legitimate)

What's wrong (per Python testing world-class standards)

Proposed pattern

Acceptance criteria

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions