[codex] Stabilize OsaurusCore test gate#1015
Merged
tpae merged 2 commits intoMay 3, 2026
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Stabilizes the OsaurusCore test gate by isolating process-wide shared state (loopback NIO servers, chat-history persistence, and agent snapshots) so Swift Testing concurrency doesn’t introduce flakes/hangs.
Changes:
- Serialize tests that boot real loopback NIO servers via a process-wide
HTTPServerTestLocklease. - Run chat-history/persistence-sensitive tests under isolated temporary roots via
ChatHistoryTestStorage, including a deterministic storage key. - Prevent HTTP streaming tests from writing chat history by sending
X-Persist: false, and add a DEBUG-onlyChatSessionStore._resetForTesting()seam to close/reopen the shared DB.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| Packages/OsaurusCore/Tests/Networking/MCPHTTPHandlerTests.swift | Wraps test server lifecycle in HTTPServerTestLock lease to avoid concurrent loopback contention. |
| Packages/OsaurusCore/Tests/Networking/HTTPHandlerChatStreamingTests.swift | Adds X-Persist: false helper and wraps server lifecycle in HTTPServerTestLock lease. |
| Packages/OsaurusCore/Tests/Networking/HTTPBodySizeLimitTests.swift | Wraps loopback server in HTTPServerTestLock lease. |
| Packages/OsaurusCore/Tests/Networking/HTTPAuthGateTests.swift | Wraps loopback server in HTTPServerTestLock lease. |
| Packages/OsaurusCore/Tests/Networking/CORSHandlerTests.swift | Wraps loopback server in HTTPServerTestLock lease. |
| Packages/OsaurusCore/Tests/Helpers/HTTPServerTestLock.swift | Introduces process-wide async lease/lock for serializing server-booting tests. |
| Packages/OsaurusCore/Tests/Helpers/ChatHistoryTestStorage.swift | Introduces isolated temp-root + deterministic key runner for chat-history sensitive tests. |
| Packages/OsaurusCore/Tests/Chat/ChatWindowStateAgentSyncTests.swift | Moves agent/persistence-sensitive tests under ChatHistoryTestStorage. |
| Packages/OsaurusCore/Tests/Chat/ChatSessionStopTests.swift | Runs stop/save-sensitive tests under ChatHistoryTestStorage. |
| Packages/OsaurusCore/Tests/Chat/ChatSessionResetForAgentTests.swift | Runs reset/save-sensitive tests under ChatHistoryTestStorage. |
| Packages/OsaurusCore/Models/Chat/ChatSessionStore.swift | Adds DEBUG-only _resetForTesting() to close/reset the shared chat-history DB between isolated roots. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+53
to
+65
| final class HTTPServerTestLease: @unchecked Sendable { | ||
| private let lock: HTTPServerTestLock | ||
| private let id: UUID | ||
|
|
||
| fileprivate init(lock: HTTPServerTestLock, id: UUID) { | ||
| self.lock = lock | ||
| self.id = id | ||
| } | ||
|
|
||
| func release() async { | ||
| await lock.release(id: id) | ||
| } | ||
| } |
Comment on lines
+79
to
+82
| static func _resetForTesting() { | ||
| didOpen = false | ||
| ChatHistoryDatabase.shared.close() | ||
| } |
Contributor
Author
|
Gate update is green again after broadening this stabilizer. Validated by the refreshed CI run:
This should also cover the two stale red check classes we found on older open PRs:
Local gate was rerun after the update:
|
tpae
approved these changes
May 3, 2026
Contributor
tpae
left a comment
There was a problem hiding this comment.
Looks good to me! Thank you for your contribution
mimeding
pushed a commit
to mimeding/osaurus
that referenced
this pull request
May 3, 2026
Business rationale: On-demand skills are meant to strengthen the Osaurus harness without making every chat heavier by default. This keeps high-fidelity file and workflow skills discoverable and loadable while protecting the user's trust in fast, lightweight startup prompts. Coding rationale: Filter startup skill injection to activation-selected skills and let capabilities_load remain the only path for on-demand session loads. The tests now cover stale all-skill allowlists and loaded on-demand session persistence. The remaining touched-file style fixes are mechanical lint alignment after rebasing onto the stricter osaurus-ai#1015 gate. Co-authored-by: Codex <codex@openai.com>
This was referenced May 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Business rationale
The high-fidelity business-file slices need the project-wide
OsaurusCoregate to be reliable before their PRs can be opened honestly. The current full test suite could fail or hang because several tests share process-wide chat-history, agent-manager, loopback HTTP server state, CI DerivedData, or brittle wall-clock timing while Swift Testing and GitHub Actions run under load.This PR makes those gate failures deterministic so future feature PRs can trust the local and CI gate instead of chasing unrelated flakes.
Coding rationale
The fix keeps production behavior unchanged and narrows stabilization to test/CI isolation:
X-Persist: falserequest knob so test traffic does not write chat history;ChatSessionStoregets a DEBUG-only reset seam to close the shared database between isolated test roots;test-coreruns cold-build DerivedData so stale Swift/C module artifacts cannot surface as EventSource dependency-resolution failures;ToolRegistryTimeoutTestskeeps the timeout-envelope assertion but uses a fixture-relative no-drain budget instead of a brittle fixed<4swall-clock check.What changed
HTTPServerTestLockfor tests that boot real local HTTP servers.ChatHistoryTestStoragefor tests that touch chat persistence orAgentManager.sharedsnapshots.ChatSessionStore._resetForTesting()helper.ToolRegistryTimeoutTests.slowToolReturnsTimeoutEnvelopeBeforeBudgetExpires()against loaded CI scheduler latency.Validation
git fetch origin && git rebase origin/mainswift build --package-path Packages/OsaurusCoreswift build -c release --package-path Packages/OsaurusCoreswift test --package-path Packages/OsaurusCoreswift test --package-path Packages/OsaurusCore --filter ToolRegistryTimeoutTestsxcrun swift-format lint --stricton every touched Swift fileswiftlint lint --stricton every touched Swift file accepted by the repo SwiftLint configurationgit diff --checkNon-scope
Residual risks
HTTPServerTestLockor they can reintroduce process-wide port/server contention.OsaurusPaths.overrideRoot,StorageKeyManager, chat history, or synthetic agents should use the existing storage/sandbox locks or the newChatHistoryTestStoragehelper.test-corecold-builds DerivedData, so first-attempt CI may be slower but should be more trustworthy.