Skip to content

test: stabilize flaky retain and load batch tests#1053

Merged
nicoloboschi merged 1 commit intomainfrom
fix/flaky-retain-load-tests
Apr 14, 2026
Merged

test: stabilize flaky retain and load batch tests#1053
nicoloboschi merged 1 commit intomainfrom
fix/flaky-retain-load-tests

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

Fixes four tests that were consistently failing/flaking on CI.

test_retain.py::test_mentioned_at_vs_occurred and test_chunk_fact_mapping

Both retained content and then recalled with fact_type=["world"]. The LLM was free to classify facts as experience vs world, so recall returned 0 results whenever classification landed on experience. Fix: pass fact_type_override="world" on retain so the classification is deterministic.

test_load_large_batch.py::test_db_connection_pool_under_load

Was timing out at 60s (3/3 runs locally). Root cause: enable_observations=True (default) triggers inline consolidation via SyncTaskBackend after each retain. The mock LLMProvider.call in this test wasn't special-casing scope="consolidation", so consolidation received a fact-extraction-shaped response and stalled. With 10 concurrent retains, this compounded into timeouts. Fix: add a disable_observations fixture that sets config.enable_observations = False for the test's duration. Post-fix: 3/3 runs pass in <1s call time.

test_load_large_batch.py::test_large_batch_500k_chars_20_items

Applied the same disable_observations fixture (it was also spending time in inline consolidation, and the fixture is the cleaner pattern now that it exists for the file).

test_load_large_batch.py xdist grouping

The file is also marked with pytest.mark.xdist_group("load_batch_tests") so the heavy load tests don't run simultaneously with each other or with the rest of the suite under -n 8. One of the runs was crashing under worker contention; grouping serializes them and eliminates the crash.

Test plan

  • pytest tests/test_retain.py::test_mentioned_at_vs_occurred tests/test_retain.py::test_chunk_fact_mapping — 3/3 green
  • pytest tests/test_load_large_batch.py::TestLargeBatchRetain::test_db_connection_pool_under_load — 3/3 green (was 3/3 failing before)
  • pytest tests/test_load_large_batch.py — all 3 tests pass together in ~68s

- test_retain.py: pin fact_type_override="world" on retains that later
  filter recall by fact_type=["world"]; the LLM was classifying facts as
  "experience" non-deterministically, returning 0 recall results.
- test_load_large_batch.py: add disable_observations fixture so inline
  consolidation (SyncTaskBackend) doesn't run during load tests — the
  pool-under-load mock wasn't handling scope="consolidation" and was
  timing out under 10 concurrent retains.
- test_load_large_batch.py: mark the file with xdist_group so the heavy
  load tests don't contend for CPU/memory with other parallel workers.
@nicoloboschi nicoloboschi force-pushed the fix/flaky-retain-load-tests branch from 4c52f9d to 2ae8e74 Compare April 14, 2026 12:50
@nicoloboschi nicoloboschi merged commit 43dc50d into main Apr 14, 2026
51 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant