test: dissolve test-suite quarantine - all three suites green (closes #2581) by rmusser01 · Pull Request #2583 · rmusser01/tldw_server

rmusser01 · 2026-07-03T15:25:20Z

Completes the #2581 burn-down — faster than expected: full unrestricted re-runs on current dev show all three formerly-quarantined suites green, so the entire quarantine mechanism is dissolved.

Re-run results (60s per-test timeout, `RUN_QUARANTINED=1` before removal)

Suite	2026-07-02 audit (main-based)	2026-07-03 re-run (dev)
Character_Chat_NEW	68 failed / 408 passed	476 passed / 4 skipped / 0 failed (1h14m)
TTS_NEW	325 failed / 309 passed	626 passed / 8 skipped / 0 failed (7m06s)
Embeddings	192 failed / 230 passed	421 passed / 18 skipped / 0 failed (2m37s) †

† after fixing the one genuine bug this PR contains: Embeddings' 46 directory-run failures were cross-test interference — any test running the app lifespan (with TestClient(app)) marks the shared module-level app as draining on exit, and every later request gets 503 {"status":"not_ready","reason":"shutdown_in_progress"} from DrainGateMiddleware. Every affected file passes solo. Fixed with an autouse _reset_app_lifecycle_state fixture in tests/Embeddings/conftest.py (uses the existing reset_lifecycle_state helper).

The Character_Chat_NEW and TTS_NEW failures recorded on 2026-07-02 were measured against a main-based checkout under back-to-back-suite load and no longer reproduce on dev.

Changes

Remove quarantine hooks from all three conftests
Remove the now-unused tests/_plugins/quarantine.py helper
Remove ci.yml's six now-inert RUN_QUARANTINED: "1" env lines (the shards run the same files either way)
Add the Embeddings lifecycle-reset fixture (the real fix)
Burn-down doc updated and retained as a historical record

Verification

Default (no env var) execution spot-checks: tests run, not skipped (15 passed)
Shard-coverage guard: OK; ci.yml YAML valid; skip-reason meta-test green
Watch item: Character_Chat_NEW/property/test_ambiguous_sender_heuristics.py showed one transient failure in a killed partial run but passed the full run (possible hypothesis flake)

Closes #2581.

🤖 Generated with Claude Code

coderabbitai · 2026-07-03T15:25:26Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9709a9ac-4529-40d7-83d6-e91247bdea4d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch burndown/character-chat-new

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request unquarantines the Character_Chat_NEW test suite by removing the pytest_collection_modifyitems quarantine hook in conftest.py and updating the status in the audits/2026-07-02-quarantined-suites.md documentation. There are no review comments, so I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

qodo-code-review · 2026-07-03T15:26:26Z

PR Summary by Qodo

Unquarantine Character_Chat_NEW test suite and update burn-down audit

🧪 Tests 📝 Documentation 🕐 10-20 Minutes

AI Description

• Remove the pytest quarantine hook so Character_Chat_NEW runs in default collection
• Record the 2026-07-03 re-run results and unquarantine status in the audit tracker
• Leave other audited suites (TTS_NEW, Embeddings) quarantined pending re-triage

Diagram

graph TD
  A["tests/Character_Chat_NEW"] --> B["pytest collection"] --> C["Quarantine plugin"]
  D["audits/2026-07-02-quarantined-suites.md"] --> E["Quarantine status"]
  C -. removed hook .-> B
  E --> A

High-Level Assessment

The following are alternative approaches to this PR:

1. Convert quarantine to xfail markers instead of collection filtering

➕ Keeps the suite visible in default runs while still tracking known flakes
➕ Produces explicit reporting for expected failures rather than hiding tests
➖ Requires maintaining per-test xfail annotations (more churn)
➖ May increase noise if failures fluctuate or root cause is unclear

2. Gate quarantine via a marker + default -m expression (pytest.ini/CI)

➕ Centralizes policy in config rather than per-suite conftest hooks
➕ Easier to reason about which suites are excluded in each environment
➖ Broader config change may affect other suites unintentionally
➖ Requires aligning local developer defaults and CI invocation flags

Recommendation: The PR’s approach (removing the suite-level collection hook) is the right call given the full re-run is green on current dev and avoids hidden default skips. If flakes reappear, prefer targeted xfail/flake tracking over reintroducing broad collection-time quarantine.

Files changed (2) +12 / -10

Tests (1) +0 / -5

conftest.pyRemove pytest collection-time quarantine hook for Character_Chat_NEW +0/-5

Remove pytest collection-time quarantine hook for Character_Chat_NEW
• Deletes the import and the pytest_collection_modifyitems hook that quarantined this suite at collection time. As a result, Character_Chat_NEW tests run in default pytest collection without requiring RUN_QUARANTINED opt-in.
tldw_Server_API/tests/Character_Chat_NEW/conftest.py

Documentation (1) +12 / -5

2026-07-02-quarantined-suites.mdAdd status column and mark Character_Chat_NEW unquarantined +12/-5

Add status column and mark Character_Chat_NEW unquarantined

• Extends the audited suites table with a Status column. Records that Character_Chat_NEW was unquarantined on 2026-07-03 with rerun results and notes a single watch-item file; leaves TTS_NEW and Embeddings quarantined pending re-triage.

audits/2026-07-02-quarantined-suites.md

qodo-code-review · 2026-07-03T15:28:29Z

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

…cleanup Code-review response for PR #2583: - Verified empirically the Embeddings lifecycle fixture is NOT redundant with the root-conftest reset: removing it reproduces the 46 failures exactly. Probe plugin revealed the mechanism: reload_app_main() (used by test_backpressure_and_quotas.py) permanently swaps sys.modules' main module, so the root fixture resets the NEW app while pinned tests still route through the drained ORIGINAL. Docstring now documents the proven chain; the underlying reload leak is tracked in issue #2585. - Removed six stale 'pre-quarantine' comment blocks from ci.yml (the hooks they referenced no longer exist). - Tightened the burn-down doc's exit-criteria phrasing to past tense. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

rmusser01 · 2026-07-03T16:11:07Z

Code review (internal) outcome — the reviewer challenged the fixture's necessity (a root-conftest reset already exists) and was right to: verification deepened the root cause.

Fixture proven load-bearing, not redundant: removing it reproduces the 46 failures exactly (46 failed without, 0 failed with, same directory run).

Why the root-conftest reset doesn't cover this (found with an app-identity probe — three distinct app objects observed in one run): test_backpressure_and_quotas.py calls reload_app_main(), which permanently swaps sys.modules['...app.main']. Test modules pinned the ORIGINAL app at collection; a later lifespan exit drains that original; the root fixture re-imports app.main at reset time so it only ever resets the NEW instance. This suite's fixture resets the pinned original directly.

The underlying reload_app_main() sys.modules leak is a repo-wide hazard (~15 files reload main) — tracked in #2585 with a suggested snapshot/restore fix.

Also addressed from the review: six stale 'pre-quarantine' comment blocks removed from ci.yml; burn-down doc exit-criteria phrasing tightened. The fixture docstring now documents the verified four-step mechanism.

🤖 Generated with Claude Code

…2581) Full unrestricted re-run on current dev: 476 passed, 4 skipped, 0 failed (1h14m, 60s per-test timeout). The 68 failures recorded 2026-07-02 were measured against a main-based checkout and did not reproduce; intervening Character_Chat work on dev fixed them. Quarantine hook removed; burn-down tracker updated (TTS_NEW and Embeddings remain quarantined pending their re-triage). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ism (closes #2581) Full unrestricted re-runs on current dev: - TTS_NEW: 626 passed / 8 skipped / 0 failed (7m06s) - the 325 recorded failures no longer reproduce - Embeddings: 421 passed / 18 skipped / 0 failed (2m37s) after fixing the one real bug: 46 directory-run failures were cross-test interference from stale app-lifecycle drain state (a lifespan exit marks the shared app draining; later tests got 503 shutdown_in_progress from DrainGateMiddleware). New autouse _reset_app_lifecycle_state fixture clears it per test; every affected file passes solo and in-directory. With all three suites green the quarantine mechanism is fully removed: hooks, the shared tests/_plugins/quarantine.py helper, and ci.yml's six now-inert RUN_QUARANTINED env lines. Burn-down doc retained as history. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…cleanup Code-review response for PR #2583: - Verified empirically the Embeddings lifecycle fixture is NOT redundant with the root-conftest reset: removing it reproduces the 46 failures exactly. Probe plugin revealed the mechanism: reload_app_main() (used by test_backpressure_and_quotas.py) permanently swaps sys.modules' main module, so the root fixture resets the NEW app while pinned tests still route through the drained ORIGINAL. Docstring now documents the proven chain; the underlying reload leak is tracked in issue #2585. - Removed six stale 'pre-quarantine' comment blocks from ci.yml (the hooks they referenced no longer exist). - Tightened the burn-down doc's exit-criteria phrasing to past tense. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jul 3, 2026

View reviewed changes

rmusser01 changed the title ~~test: unquarantine Character_Chat_NEW (issue #2581, 1/3)~~ test: dissolve test-suite quarantine - all three suites green (closes #2581) Jul 3, 2026

rmusser01 mentioned this pull request Jul 3, 2026

reload_app_main() permanently swaps sys.modules app — pinned-app tests inherit stale drain state #2585

Open

rmusser01 and others added 3 commits July 3, 2026 09:27

rmusser01 force-pushed the burndown/character-chat-new branch from 4f6754d to 6f23c56 Compare July 3, 2026 16:28

rmusser01 merged commit 51372f5 into dev Jul 3, 2026
16 of 18 checks passed

rmusser01 mentioned this pull request Jul 3, 2026

Burn down quarantined test suites (Character_Chat_NEW, TTS_NEW, Embeddings) #2581

Closed

rmusser01 deleted the burndown/character-chat-new branch July 3, 2026 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: dissolve test-suite quarantine - all three suites green (closes #2581)#2583

test: dissolve test-suite quarantine - all three suites green (closes #2581)#2583
rmusser01 merged 3 commits into
devfrom
burndown/character-chat-new

rmusser01 commented Jul 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

qodo-code-review Bot commented Jul 3, 2026

Uh oh!

qodo-code-review Bot commented Jul 3, 2026

Uh oh!

rmusser01 commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rmusser01 commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-run results (60s per-test timeout, RUN_QUARANTINED=1 before removal)

Changes

Verification

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

qodo-code-review Bot commented Jul 3, 2026

PR Summary by Qodo

Uh oh!

qodo-code-review Bot commented Jul 3, 2026

Code Review by Qodo

Great, no issues found!

Uh oh!

rmusser01 commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rmusser01 commented Jul 3, 2026 •

edited

Loading

Re-run results (60s per-test timeout, `RUN_QUARANTINED=1` before removal)

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading