Skip to content

Refactor test-integration.sh to fixture-driven pytest discovery (eliminate manual test wiring) #1166

@danielmeppiel

Description

@danielmeppiel

Problem

scripts/test-integration.sh is a 665-line bash script that explicitly enumerates ~20 pytest tests/integration/test_*.py invocations, one per file. Adding a new integration test file requires editing the script. Forgetting to wire a file means the test silently never runs.

This is not a hypothetical risk -- it just bit us:

  • tests/integration/test_skill_install.py was never wired into the script. Two existing tests (test_reinstall_same_skill_is_idempotent, and the new test_reinstall_does_not_leak_apm_pin_to_deploy_targets from PR fix: exclude .apm-pin from skill deploy copytree #1153) silently rotted until I noticed and patched it in PR fix: exclude .apm-pin from skill deploy copytree #1153 (commit 941c6942).
  • No CI workflow runs the full integration script. .github/workflows/ci-integration.yml only invokes test_runtime_smoke.py; the other ~19 files run only on manual release-validation execution. Most "integration coverage" runs on the honor system.

Real constraints (what's legitimate)

The script does several things that genuinely need orchestration:

  • Builds a PyInstaller binary (or reuses CI artifact at ./dist/$BINARY_NAME/apm)
  • Symlinks the binary into $PATH so tests exercise the shipped binary, not in-process python -m apm_cli
  • Sets up runtimes (copilot/codex/llm) by invoking apm runtime setup
  • Resolves GitHub tokens via github-token-helper.sh

These are real concerns. Per-file pytest enumeration is not.

What's wrong (per Python testing world-class standards)

  1. Manual wiring is the bug we just hit. Pytest's whole value-prop is auto-discovery (pytest <dir> finds and runs everything matching test_*.py). The bash script defeats that.
  2. Bash is the wrong layer for test selection, reporting, and exit codes. The 20 if pytest ... ; then log_success ; else log_error ; exit 1 ; fi blocks reimplement what pytest does natively, and worse: a single test file failure aborts the whole run via set -e, masking subsequent failures.
  3. No marker-based gating. Different files need different env (ADO_APM_PAT, GITHUB_APM_PAT, APM_RUN_INTEGRATION_TESTS=1, runtime presence). Today this is encoded as conditional bash branches; it should be @pytest.mark.requires_ado_pat etc., selectable via pytest -m.
  4. The "built binary" constraint doesn't justify bash sprawl. It's a session-scoped fixture, not a loop.

Proposed pattern

# tests/integration/conftest.py
@pytest.fixture(scope="session")
def apm_binary():
    # build once OR read APM_BINARY_PATH from env (CI), prepend to PATH
    ...
# pyproject.toml
[tool.pytest.ini_options]
markers = [
    "requires_apm_binary",
    "requires_github_apm_pat",
    "requires_ado_pat",
    "requires_runtime_codex",
    "requires_runtime_copilot",
    "requires_runtime_llm",
]
# Per test file (replaces today's pytestmark skipif chains):
pytestmark = [pytest.mark.requires_apm_binary, pytest.mark.requires_github_apm_pat]

scripts/test-integration.sh shrinks to ~50 lines: detect platform, build/locate binary, set up runtimes, export tokens, then one invocation:

pytest tests/integration/ -v --tb=short \
  $( [[ -z "${ADO_APM_PAT:-}" ]] && echo '-m "not requires_ado_pat"' )

Pytest handles discovery, selection, reporting, exit codes. Bash handles env setup. Each layer does what it's good at.

Acceptance criteria

  • scripts/test-integration.sh no longer enumerates individual pytest test_*.py calls; it invokes pytest tests/integration/ with marker selection.
  • A new test file dropped into tests/integration/ with the right markers automatically runs without script edits.
  • All currently-wired integration tests still run (no regressions).
  • Conditional gating (ADO PAT, GitHub PAT, runtime presence) preserved via markers + skipif fixtures, not bash branches.
  • CI integration workflow (.github/workflows/ci-integration.yml) optionally extended to run the full marker-filtered suite (currently only runs smoke).

Out of scope

  • Migrating tests off the built binary to in-process invocation (that's a separate, larger discussion).
  • Adding new test coverage (this is a pure refactor).
  • Changing github-token-helper.sh or runtime-setup commands.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/ci-cdGitHub workflows, merge queue, gh-aw integrations, release pipeline.area/testingTest infrastructure, fixtures, e2e harness, coverage.enhancementDeprecated: use type/feature. Kept for issue history; will be removed in milestone 0.10.0.priority/lowAccepted but not time-sensitivestatus/acceptedDirection approved, safe to start work.status/triagedInitial agentic triage complete; pending maintainer ratification (silence = approval).tech-debttestingDeprecated: use area/testing. Kept for issue history; will be removed in milestone 0.10.0.type/refactorInternal restructure, no behavior change.

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions