Skip to content

test: dissolve test-suite quarantine - all three suites green (closes #2581)#2583

Merged
rmusser01 merged 3 commits into
devfrom
burndown/character-chat-new
Jul 3, 2026
Merged

test: dissolve test-suite quarantine - all three suites green (closes #2581)#2583
rmusser01 merged 3 commits into
devfrom
burndown/character-chat-new

Conversation

@rmusser01

@rmusser01 rmusser01 commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Completes the #2581 burn-down — faster than expected: full unrestricted re-runs on current dev show all three formerly-quarantined suites green, so the entire quarantine mechanism is dissolved.

Re-run results (60s per-test timeout, RUN_QUARANTINED=1 before removal)

Suite 2026-07-02 audit (main-based) 2026-07-03 re-run (dev)
Character_Chat_NEW 68 failed / 408 passed 476 passed / 4 skipped / 0 failed (1h14m)
TTS_NEW 325 failed / 309 passed 626 passed / 8 skipped / 0 failed (7m06s)
Embeddings 192 failed / 230 passed 421 passed / 18 skipped / 0 failed (2m37s) †

† after fixing the one genuine bug this PR contains: Embeddings' 46 directory-run failures were cross-test interference — any test running the app lifespan (with TestClient(app)) marks the shared module-level app as draining on exit, and every later request gets 503 {"status":"not_ready","reason":"shutdown_in_progress"} from DrainGateMiddleware. Every affected file passes solo. Fixed with an autouse _reset_app_lifecycle_state fixture in tests/Embeddings/conftest.py (uses the existing reset_lifecycle_state helper).

The Character_Chat_NEW and TTS_NEW failures recorded on 2026-07-02 were measured against a main-based checkout under back-to-back-suite load and no longer reproduce on dev.

Changes

  • Remove quarantine hooks from all three conftests
  • Remove the now-unused tests/_plugins/quarantine.py helper
  • Remove ci.yml's six now-inert RUN_QUARANTINED: "1" env lines (the shards run the same files either way)
  • Add the Embeddings lifecycle-reset fixture (the real fix)
  • Burn-down doc updated and retained as a historical record

Verification

  • Default (no env var) execution spot-checks: tests run, not skipped (15 passed)
  • Shard-coverage guard: OK; ci.yml YAML valid; skip-reason meta-test green
  • Watch item: Character_Chat_NEW/property/test_ambiguous_sender_heuristics.py showed one transient failure in a killed partial run but passed the full run (possible hypothesis flake)

Closes #2581.

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9709a9ac-4529-40d7-83d6-e91247bdea4d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch burndown/character-chat-new

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request unquarantines the Character_Chat_NEW test suite by removing the pytest_collection_modifyitems quarantine hook in conftest.py and updating the status in the audits/2026-07-02-quarantined-suites.md documentation. There are no review comments, so I have no additional feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Unquarantine Character_Chat_NEW test suite and update burn-down audit

🧪 Tests 📝 Documentation 🕐 10-20 Minutes

Grey Divider

AI Description

• Remove the pytest quarantine hook so Character_Chat_NEW runs in default collection
• Record the 2026-07-03 re-run results and unquarantine status in the audit tracker
• Leave other audited suites (TTS_NEW, Embeddings) quarantined pending re-triage
Diagram

graph TD
  A["tests/Character_Chat_NEW"] --> B["pytest collection"] --> C["Quarantine plugin"]
  D["audits/2026-07-02-quarantined-suites.md"] --> E["Quarantine status"]
  C -. removed hook .-> B
  E --> A
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Convert quarantine to xfail markers instead of collection filtering
  • ➕ Keeps the suite visible in default runs while still tracking known flakes
  • ➕ Produces explicit reporting for expected failures rather than hiding tests
  • ➖ Requires maintaining per-test xfail annotations (more churn)
  • ➖ May increase noise if failures fluctuate or root cause is unclear
2. Gate quarantine via a marker + default -m expression (pytest.ini/CI)
  • ➕ Centralizes policy in config rather than per-suite conftest hooks
  • ➕ Easier to reason about which suites are excluded in each environment
  • ➖ Broader config change may affect other suites unintentionally
  • ➖ Requires aligning local developer defaults and CI invocation flags

Recommendation: The PR’s approach (removing the suite-level collection hook) is the right call given the full re-run is green on current dev and avoids hidden default skips. If flakes reappear, prefer targeted xfail/flake tracking over reintroducing broad collection-time quarantine.

Files changed (2) +12 / -10

Tests (1) +0 / -5
conftest.pyRemove pytest collection-time quarantine hook for Character_Chat_NEW +0/-5

Remove pytest collection-time quarantine hook for Character_Chat_NEW

• Deletes the import and the pytest_collection_modifyitems hook that quarantined this suite at collection time. As a result, Character_Chat_NEW tests run in default pytest collection without requiring RUN_QUARANTINED opt-in.

tldw_Server_API/tests/Character_Chat_NEW/conftest.py

Documentation (1) +12 / -5
2026-07-02-quarantined-suites.mdAdd status column and mark Character_Chat_NEW unquarantined +12/-5

Add status column and mark Character_Chat_NEW unquarantined

• Extends the audited suites table with a Status column. Records that Character_Chat_NEW was unquarantined on 2026-07-03 with rerun results and notes a single watch-item file; leaves TTS_NEW and Embeddings quarantined pending re-triage.

audits/2026-07-02-quarantined-suites.md

@qodo-code-review

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider

Great, no issues found!

Qodo reviewed your code and found no material issues that require review

Grey Divider

Qodo Logo

@rmusser01 rmusser01 changed the title test: unquarantine Character_Chat_NEW (issue #2581, 1/3) test: dissolve test-suite quarantine - all three suites green (closes #2581) Jul 3, 2026
rmusser01 added a commit that referenced this pull request Jul 3, 2026
…cleanup

Code-review response for PR #2583:

- Verified empirically the Embeddings lifecycle fixture is NOT redundant
  with the root-conftest reset: removing it reproduces the 46 failures
  exactly. Probe plugin revealed the mechanism: reload_app_main() (used by
  test_backpressure_and_quotas.py) permanently swaps sys.modules' main
  module, so the root fixture resets the NEW app while pinned tests still
  route through the drained ORIGINAL. Docstring now documents the proven
  chain; the underlying reload leak is tracked in issue #2585.
- Removed six stale 'pre-quarantine' comment blocks from ci.yml (the
  hooks they referenced no longer exist).
- Tightened the burn-down doc's exit-criteria phrasing to past tense.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@rmusser01

Copy link
Copy Markdown
Owner Author

Code review (internal) outcome — the reviewer challenged the fixture's necessity (a root-conftest reset already exists) and was right to: verification deepened the root cause.

Fixture proven load-bearing, not redundant: removing it reproduces the 46 failures exactly (46 failed without, 0 failed with, same directory run).

Why the root-conftest reset doesn't cover this (found with an app-identity probe — three distinct app objects observed in one run): test_backpressure_and_quotas.py calls reload_app_main(), which permanently swaps sys.modules['...app.main']. Test modules pinned the ORIGINAL app at collection; a later lifespan exit drains that original; the root fixture re-imports app.main at reset time so it only ever resets the NEW instance. This suite's fixture resets the pinned original directly.

The underlying reload_app_main() sys.modules leak is a repo-wide hazard (~15 files reload main) — tracked in #2585 with a suggested snapshot/restore fix.

Also addressed from the review: six stale 'pre-quarantine' comment blocks removed from ci.yml; burn-down doc exit-criteria phrasing tightened. The fixture docstring now documents the verified four-step mechanism.

🤖 Generated with Claude Code

rmusser01 and others added 3 commits July 3, 2026 09:27
…2581)

Full unrestricted re-run on current dev: 476 passed, 4 skipped, 0 failed
(1h14m, 60s per-test timeout). The 68 failures recorded 2026-07-02 were
measured against a main-based checkout and did not reproduce; intervening
Character_Chat work on dev fixed them. Quarantine hook removed; burn-down
tracker updated (TTS_NEW and Embeddings remain quarantined pending their
re-triage).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ism (closes #2581)

Full unrestricted re-runs on current dev:
- TTS_NEW: 626 passed / 8 skipped / 0 failed (7m06s) - the 325 recorded
  failures no longer reproduce
- Embeddings: 421 passed / 18 skipped / 0 failed (2m37s) after fixing the
  one real bug: 46 directory-run failures were cross-test interference
  from stale app-lifecycle drain state (a lifespan exit marks the shared
  app draining; later tests got 503 shutdown_in_progress from
  DrainGateMiddleware). New autouse _reset_app_lifecycle_state fixture
  clears it per test; every affected file passes solo and in-directory.

With all three suites green the quarantine mechanism is fully removed:
hooks, the shared tests/_plugins/quarantine.py helper, and ci.yml's six
now-inert RUN_QUARANTINED env lines. Burn-down doc retained as history.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…cleanup

Code-review response for PR #2583:

- Verified empirically the Embeddings lifecycle fixture is NOT redundant
  with the root-conftest reset: removing it reproduces the 46 failures
  exactly. Probe plugin revealed the mechanism: reload_app_main() (used by
  test_backpressure_and_quotas.py) permanently swaps sys.modules' main
  module, so the root fixture resets the NEW app while pinned tests still
  route through the drained ORIGINAL. Docstring now documents the proven
  chain; the underlying reload leak is tracked in issue #2585.
- Removed six stale 'pre-quarantine' comment blocks from ci.yml (the
  hooks they referenced no longer exist).
- Tightened the burn-down doc's exit-criteria phrasing to past tense.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@rmusser01 rmusser01 force-pushed the burndown/character-chat-new branch from 4f6754d to 6f23c56 Compare July 3, 2026 16:28
@rmusser01 rmusser01 merged commit 51372f5 into dev Jul 3, 2026
16 of 18 checks passed
@rmusser01 rmusser01 deleted the burndown/character-chat-new branch July 3, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant