[codex] Stabilize OsaurusCore test gate by mimeding · Pull Request #1015 · osaurus-ai/osaurus

mimeding · 2026-05-03T16:41:31Z

Business rationale

The high-fidelity business-file slices need the project-wide OsaurusCore gate to be reliable before their PRs can be opened honestly. The current full test suite could fail or hang because several tests share process-wide chat-history, agent-manager, loopback HTTP server state, CI DerivedData, or brittle wall-clock timing while Swift Testing and GitHub Actions run under load.

This PR makes those gate failures deterministic so future feature PRs can trust the local and CI gate instead of chasing unrelated flakes.

Coding rationale

The fix keeps production behavior unchanged and narrows stabilization to test/CI isolation:

loopback HTTP server tests acquire a process-wide async lease while their NIO server is alive;
chat-history tests use an isolated temporary Osaurus root plus deterministic test storage key;
HTTP streaming tests use the existing X-Persist: false request knob so test traffic does not write chat history;
ChatSessionStore gets a DEBUG-only reset seam to close the shared database between isolated test roots;
PR test-core runs cold-build DerivedData so stale Swift/C module artifacts cannot surface as EventSource dependency-resolution failures;
ToolRegistryTimeoutTests keeps the timeout-envelope assertion but uses a fixture-relative no-drain budget instead of a brittle fixed <4s wall-clock check.

What changed

Added HTTPServerTestLock for tests that boot real local HTTP servers.
Added ChatHistoryTestStorage for tests that touch chat persistence or AgentManager.shared snapshots.
Wrapped CORS, auth, body-limit, streaming, and MCP HTTP test servers in the lease.
Wrapped chat session reset/stop and chat-window agent sync tests in isolated storage.
Added a DEBUG-only ChatSessionStore._resetForTesting() helper.
Updated CI to skip restoring DerivedData on pull-request runs while still preserving SPM package caching.
Hardened ToolRegistryTimeoutTests.slowToolReturnsTimeoutEnvelopeBeforeBudgetExpires() against loaded CI scheduler latency.

Validation

git fetch origin && git rebase origin/main
swift build --package-path Packages/OsaurusCore
swift build -c release --package-path Packages/OsaurusCore
swift test --package-path Packages/OsaurusCore
swift test --package-path Packages/OsaurusCore --filter ToolRegistryTimeoutTests
xcrun swift-format lint --strict on every touched Swift file
swiftlint lint --strict on every touched Swift file accepted by the repo SwiftLint configuration
git diff --check

Non-scope

No production HTTP handler behavior changes.
No production chat persistence behavior changes.
No user data migration.
No skipped or disabled tests.
No Xcode version or package dependency changes.

Residual risks

Future tests that create loopback NIO servers should use HTTPServerTestLock or they can reintroduce process-wide port/server contention.
Future tests that mutate OsaurusPaths.overrideRoot, StorageKeyManager, chat history, or synthetic agents should use the existing storage/sandbox locks or the new ChatHistoryTestStorage helper.
PR test-core cold-builds DerivedData, so first-attempt CI may be slower but should be more trustworthy.

Copilot

Pull request overview

Stabilizes the OsaurusCore test gate by isolating process-wide shared state (loopback NIO servers, chat-history persistence, and agent snapshots) so Swift Testing concurrency doesn’t introduce flakes/hangs.

Changes:

Serialize tests that boot real loopback NIO servers via a process-wide HTTPServerTestLock lease.
Run chat-history/persistence-sensitive tests under isolated temporary roots via ChatHistoryTestStorage, including a deterministic storage key.
Prevent HTTP streaming tests from writing chat history by sending X-Persist: false, and add a DEBUG-only ChatSessionStore._resetForTesting() seam to close/reopen the shared DB.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
Packages/OsaurusCore/Tests/Networking/MCPHTTPHandlerTests.swift	Wraps test server lifecycle in `HTTPServerTestLock` lease to avoid concurrent loopback contention.
Packages/OsaurusCore/Tests/Networking/HTTPHandlerChatStreamingTests.swift	Adds `X-Persist: false` helper and wraps server lifecycle in `HTTPServerTestLock` lease.
Packages/OsaurusCore/Tests/Networking/HTTPBodySizeLimitTests.swift	Wraps loopback server in `HTTPServerTestLock` lease.
Packages/OsaurusCore/Tests/Networking/HTTPAuthGateTests.swift	Wraps loopback server in `HTTPServerTestLock` lease.
Packages/OsaurusCore/Tests/Networking/CORSHandlerTests.swift	Wraps loopback server in `HTTPServerTestLock` lease.
Packages/OsaurusCore/Tests/Helpers/HTTPServerTestLock.swift	Introduces process-wide async lease/lock for serializing server-booting tests.
Packages/OsaurusCore/Tests/Helpers/ChatHistoryTestStorage.swift	Introduces isolated temp-root + deterministic key runner for chat-history sensitive tests.
Packages/OsaurusCore/Tests/Chat/ChatWindowStateAgentSyncTests.swift	Moves agent/persistence-sensitive tests under `ChatHistoryTestStorage`.
Packages/OsaurusCore/Tests/Chat/ChatSessionStopTests.swift	Runs stop/save-sensitive tests under `ChatHistoryTestStorage`.
Packages/OsaurusCore/Tests/Chat/ChatSessionResetForAgentTests.swift	Runs reset/save-sensitive tests under `ChatHistoryTestStorage`.
Packages/OsaurusCore/Models/Chat/ChatSessionStore.swift	Adds DEBUG-only `_resetForTesting()` to close/reset the shared chat-history DB between isolated roots.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+final class HTTPServerTestLease: @unchecked Sendable {
+    private let lock: HTTPServerTestLock
+    private let id: UUID
+
+    fileprivate init(lock: HTTPServerTestLock, id: UUID) {
+        self.lock = lock
+        self.id = id
+    }
+
+    func release() async {
+        await lock.release(id: id)
+    }
+}


+        static func _resetForTesting() {
+            didOpen = false
+            ChatHistoryDatabase.shared.close()
+        }


mimeding · 2026-05-03T18:17:12Z

Gate update is green again after broadening this stabilizer.

Validated by the refreshed CI run:

test-core: passed in 12m58s on the new PR cold-DerivedData path
test-cli: passed
swiftlint: passed
shellcheck: passed
update_release_draft: passed

This should also cover the two stale red check classes we found on older open PRs:

feat(documents): wrap PlainText/PDF/DOCX as adapters and route DocumentParser through the registry #927 EventSource missing C module dependencies: caused by stale/restored DerivedData; PR runs now skip restoring DerivedData and wipe it before test-core.
[codex] Add OpenAI compatibility guardrail report #987 ToolRegistryTimeoutTests wall-clock failure: the timeout test now keeps the timeout-envelope assertion but uses a fixture-relative no-drain budget instead of the brittle fixed <4s cutoff.

Local gate was rerun after the update:

git fetch origin && git rebase origin/main
swift build --package-path Packages/OsaurusCore
swift build -c release --package-path Packages/OsaurusCore
swift test --package-path Packages/OsaurusCore
swift test --package-path Packages/OsaurusCore --filter ToolRegistryTimeoutTests
xcrun swift-format lint --strict on touched Swift files
swiftlint lint --strict on touched Swift files accepted by repo config
git diff --check

tpae

Looks good to me! Thank you for your contribution

Business rationale: On-demand skills are meant to strengthen the Osaurus harness without making every chat heavier by default. This keeps high-fidelity file and workflow skills discoverable and loadable while protecting the user's trust in fast, lightweight startup prompts. Coding rationale: Filter startup skill injection to activation-selected skills and let capabilities_load remain the only path for on-demand session loads. The tests now cover stale all-skill allowlists and loaded on-demand session persistence. The remaining touched-file style fixes are mechanical lint alignment after rebasing onto the stricter osaurus-ai#1015 gate. Co-authored-by: Codex <codex@openai.com>

Michael Meding added 2 commits May 3, 2026 13:40

Stabilize OsaurusCore test gate

5b5e556

Stabilize remaining CI gate flakes

649eada

tpae requested a review from Copilot May 3, 2026 18:01

Copilot started reviewing on behalf of tpae May 3, 2026 18:01 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

tpae approved these changes May 3, 2026

View reviewed changes

tpae merged commit fd2ffec into osaurus-ai:main May 3, 2026
5 checks passed

mimeding mentioned this pull request May 3, 2026

feat(documents): wrap PlainText/PDF/DOCX as adapters and route DocumentParser through the registry #927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Stabilize OsaurusCore test gate#1015

[codex] Stabilize OsaurusCore test gate#1015
tpae merged 2 commits into
osaurus-ai:mainfrom
mimeding:codex/stabilize-http-handler-tests

mimeding commented May 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

mimeding commented May 3, 2026

Uh oh!

tpae left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mimeding commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Business rationale

Coding rationale

What changed

Validation

Non-scope

Residual risks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

mimeding commented May 3, 2026

Uh oh!

tpae left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mimeding commented May 3, 2026 •

edited

Loading