|
| 1 | +# /test --target aiat-pmo-module — Value-Proof Report |
| 2 | + |
| 3 | +> **Issue:** [#385](https://gitlab.gotzendorfer.at/infrastructure/session-orchestrator/-/issues/385) — end2end-proof: /test --target aiat-pmo-module (web-gate) |
| 4 | +> **Session:** main-2026-05-14-deep-3 W1 (coord-direct) |
| 5 | +> **Status:** Mechanism + live-execution proven. Coverage-proof partial — rubric-v1 artifact gap surfaced for follow-up. |
| 6 | +
|
| 7 | +## Run Metadata |
| 8 | + |
| 9 | +| Field | Value | |
| 10 | +|---|---| |
| 11 | +| Target | `/Users/bernhardg./Projects/intern/aiat-pmo-module/tests/e2e` | |
| 12 | +| Profile | `web-gate` (`.orchestrator/policy/test-profiles.json`) | |
| 13 | +| Driver | `scripts/lib/playwright-driver/runner.mjs` (Playwright 1.x via global npx) | |
| 14 | +| Run-ID | `aiat-pmo-2026-05-14-170021-v3` | |
| 15 | +| Run-Dir | `.orchestrator/metrics/test-runs/aiat-pmo-2026-05-14-170021-v3/` | |
| 16 | +| Started | 2026-05-14T15:00:22.316Z | |
| 17 | +| Duration | 547 ms (test execution); ~30 s wall-clock incl. spawn + reporter | |
| 18 | +| Exit code | 1 (Playwright: ≥1 unexpected failure) — runner.mjs maps to exit 1 per spec | |
| 19 | +| Orchestrator session | `main-2026-05-14-deep-3` | |
| 20 | +| Plugin version | v3.5.0 | |
| 21 | + |
| 22 | +## Stack Setup |
| 23 | + |
| 24 | +Pre-existing (3 h uptime at session start): |
| 25 | + |
| 26 | +```bash |
| 27 | +docker compose -f ~/Projects/intern/aiat-pmo-module/dev/docker-compose.yml ps |
| 28 | +# aiat-pmo-daemon, aiat-pmo-ws, aiat-pmo-espo, aiat-pmo-db (healthy) |
| 29 | +``` |
| 30 | + |
| 31 | +Bootstrap (coord-direct W1, ~30 s): |
| 32 | + |
| 33 | +```bash |
| 34 | +cd ~/Projects/intern/aiat-pmo-module/tests/e2e && npm install # 4 packages |
| 35 | +npx playwright install chromium # 92.4 MiB → ~/Library/Caches/ms-playwright/chromium_headless_shell-1223 |
| 36 | +``` |
| 37 | + |
| 38 | +Health: `curl http://localhost:8090` → 200 OK (EspoCRM responding). |
| 39 | + |
| 40 | +Env: `tests/e2e/.env` not used; tests read `process.env.ESPOCRM_URL` (defaults to `http://localhost:8090`). No `TEST_INITIATIVE_ID` env var set → most tests conditionally skipped. |
| 41 | + |
| 42 | +## Test Execution Summary |
| 43 | + |
| 44 | +| Metric | Value | |
| 45 | +|---|---| |
| 46 | +| Expected (passed) | 0 | |
| 47 | +| Unexpected (failed) | 1 | |
| 48 | +| Flaky | 0 | |
| 49 | +| Skipped | 31 | |
| 50 | +| Total declared | 32 | |
| 51 | + |
| 52 | +The single failure is `initiative-list.spec.ts:27 — GET /api/v1/Initiative returns 200 with total and list`. The test asserts `expect(response.status()).toBe(200)` but the server returns 401 because no API key / session token was provided in the test environment. The 31 skipped tests all carry conditional `test.skip(!ENV_VAR, '...')` guards; this one is missing that guard, so it executes and fails immediately. **This is a minor finding in `aiat-pmo-module` (missing skip-guard) — not a /test bug.** |
| 53 | + |
| 54 | +Skip distribution (31 across 14 spec files): |
| 55 | + |
| 56 | +| File | Skipped | |
| 57 | +|---|---| |
| 58 | +| `api/restricted-role-403.spec.ts` | 6 | |
| 59 | +| `api/acl-team-isolation.spec.ts` | 3 | |
| 60 | +| `api/auth-token.spec.ts` | 3 | |
| 61 | +| `api/cluster-routing.spec.ts` | 3 | |
| 62 | +| `api/create-via-api-key.spec.ts` | 3 | |
| 63 | +| `api/stale-filter.spec.ts` | 3 | |
| 64 | +| `initiative-auth.spec.ts` | 2 | |
| 65 | +| `api/score-live.spec.ts` | 2 | |
| 66 | +| Remaining 6 spec files | 1 each | |
| 67 | + |
| 68 | +## Artifacts Captured |
| 69 | + |
| 70 | +``` |
| 71 | +.orchestrator/metrics/test-runs/aiat-pmo-2026-05-14-170021-v3/ |
| 72 | +├── console.log 17 279 B combined stdout+stderr from npx |
| 73 | +├── exit_code 1 B Playwright exit code (1) |
| 74 | +├── report/index.html Playwright HTML reporter output |
| 75 | +├── results.json 27 154 B Playwright JSON reporter |
| 76 | +└── test-results/ per-test artifacts |
| 77 | + ├── .last-run.json |
| 78 | + └── <test-name-chromium>/ 32 sub-dirs |
| 79 | + └── trace.zip Playwright trace (`--trace on`) |
| 80 | +``` |
| 81 | + |
| 82 | +## ux-evaluator Status — Coverage Gap |
| 83 | + |
| 84 | +ux-evaluator agent **not dispatched** this run. Rationale: the rubric-v1 specifies 4 checks each requiring artifact shapes the current playwright-driver runner does not produce: |
| 85 | + |
| 86 | +| rubric-v1 check | Required artifact | Produced this run? | |
| 87 | +|---|---|---| |
| 88 | +| 1. onboarding-step-count ≤ 7 | AX-tree snapshots (`ax-snapshots/*.yaml` or similar) | ❌ no — peekaboo-style concept, not implemented for web in v1 | |
| 89 | +| 2. axe-violations critical/serious | `axe-*.json` from @axe-core/playwright | ❌ no — soft-skipped (axe-core not in tests/e2e deps) | |
| 90 | +| 3. console-errors visible-to-user | `console.ndjson` structured | ❌ no — only flat `console.log` (combined stdout) | |
| 91 | +| 4. Apple-Liquid-Glass conformance | macOS-only (peekaboo) | n/a — web target | |
| 92 | + |
| 93 | +The agent would have nothing actionable to classify. Two follow-up issues were filed to close this gap (see Findings & Follow-ups below). |
| 94 | + |
| 95 | +## Findings & Follow-ups (filed this session) |
| 96 | + |
| 97 | +| # | Severity | Description | Disposition | |
| 98 | +|---|---|---|---| |
| 99 | +| RUNNER-1 | MED | `runner.mjs:174-180` used Jest/Vitest `--reporter html:<path>,json:<path>` syntax; Playwright canonical is `--reporter=html,json` + `PLAYWRIGHT_HTML_OUTPUT_DIR` / `PLAYWRIGHT_JSON_OUTPUT_FILE` / `PLAYWRIGHT_HTML_OPEN` env vars. | **Fixed inline this session** (coord-direct W1 hotfix; deviation logged in STATE.md). Filed retro issue for the regression-test gap (mechanism-proof dry-run didn't catch this — only live spawn does). | |
| 100 | +| RUNNER-2 | MED | `runner.mjs` does not write rubric-v1 expected artifacts: no `ax-snapshots/`, no `console.ndjson`, no `screenshots/` namespace. Only Playwright-native artifacts. Skips axe-core unconditionally if `@axe-core/playwright` isn't in target's package.json. | **New issue filed** — V2 capture-extension to bridge runner.mjs ↔ rubric-v1. Until then, /test on web targets is mechanism-proven but coverage-proof partial. | |
| 101 | +| TARGET-RESOLUTION | LOW | Runner uses `--target <repo-root>` but tests/e2e is a nested package (own playwright.config.ts + node_modules). First retry failed with "two different versions of @playwright/test" because npx fell back to global. Resolved by passing `--target tests/e2e` directly. Profile registry should grow a `tests-dir` field or runner.mjs should walk for the closest `playwright.config.*`. | **Documented here**; deferred to a future profile-schema enhancement. | |
| 102 | +| AIAT-PMO-INIT-LIST | LOW | `aiat-pmo-module tests/e2e/tests/initiative-list.spec.ts:27` lacks the `test.skip(!AUTH_ENV, …)` guard the other 13 spec files use; fails 401 in any env without auth. | **Cross-repo finding** — not filed here. Will surface to aiat-pmo-module backlog. | |
| 103 | + |
| 104 | +## Re-Run Dedupe Verification |
| 105 | + |
| 106 | +Not exercised this session. The first live run (`aiat-pmo-2026-05-14-165941-retry`, target=repo-root) errored at the spawn level before reporter output. The second run (`aiat-pmo-2026-05-14-170021-v3`, target=tests/e2e) is the first artifact-producing run. A re-run dedupe pass requires reconcile triage, which is gated on the ux-evaluator artifact-shape fix (RUNNER-2 above). |
| 107 | + |
| 108 | +## Conclusion |
| 109 | + |
| 110 | +The /test command's end-to-end pipeline is **mechanically proven** against a real live target: bootstrap → driver spawn → Playwright execution → HTML/JSON reporter → exit-code mapping all work as specified. The reporter-syntax bug (RUNNER-1) blocked the value-proof at first attempt; it was fixed inline using canonical Playwright documentation (https://playwright.dev/docs/test-reporters) sourced via ref-mcp + WebFetch, then re-verified in the same session. |
| 111 | + |
| 112 | +The **coverage-proof is partial**: the runner's artifact shape does not yet match rubric-v1's expectations (RUNNER-2), so the ux-evaluator agent cannot perform its 4-check classification. This is a V2-substrate gap, not a mechanism failure. /test on web targets is usable today for "did Playwright tests pass" answers; value-proof for the agentic UX-rubric flow needs RUNNER-2. |
| 113 | + |
| 114 | +Real findings in the target repo (1 missing skip-guard) demonstrate the pipeline produces actionable, repo-relevant signal even in this stub state. |
| 115 | + |
| 116 | +**Recommendation:** Close #385 with status "mechanism + minimal-coverage proof PARTIAL". File RUNNER-2 as the gating issue for full rubric-v1 coverage on the next /test --target aiat-pmo-module pass. |
0 commit comments