Skip to content

[codex] Stabilize OsaurusCore test gate#1015

Merged
tpae merged 2 commits into
osaurus-ai:mainfrom
mimeding:codex/stabilize-http-handler-tests
May 3, 2026
Merged

[codex] Stabilize OsaurusCore test gate#1015
tpae merged 2 commits into
osaurus-ai:mainfrom
mimeding:codex/stabilize-http-handler-tests

Conversation

@mimeding
Copy link
Copy Markdown
Contributor

@mimeding mimeding commented May 3, 2026

Business rationale

The high-fidelity business-file slices need the project-wide OsaurusCore gate to be reliable before their PRs can be opened honestly. The current full test suite could fail or hang because several tests share process-wide chat-history, agent-manager, loopback HTTP server state, CI DerivedData, or brittle wall-clock timing while Swift Testing and GitHub Actions run under load.

This PR makes those gate failures deterministic so future feature PRs can trust the local and CI gate instead of chasing unrelated flakes.

Coding rationale

The fix keeps production behavior unchanged and narrows stabilization to test/CI isolation:

  • loopback HTTP server tests acquire a process-wide async lease while their NIO server is alive;
  • chat-history tests use an isolated temporary Osaurus root plus deterministic test storage key;
  • HTTP streaming tests use the existing X-Persist: false request knob so test traffic does not write chat history;
  • ChatSessionStore gets a DEBUG-only reset seam to close the shared database between isolated test roots;
  • PR test-core runs cold-build DerivedData so stale Swift/C module artifacts cannot surface as EventSource dependency-resolution failures;
  • ToolRegistryTimeoutTests keeps the timeout-envelope assertion but uses a fixture-relative no-drain budget instead of a brittle fixed <4s wall-clock check.

What changed

  • Added HTTPServerTestLock for tests that boot real local HTTP servers.
  • Added ChatHistoryTestStorage for tests that touch chat persistence or AgentManager.shared snapshots.
  • Wrapped CORS, auth, body-limit, streaming, and MCP HTTP test servers in the lease.
  • Wrapped chat session reset/stop and chat-window agent sync tests in isolated storage.
  • Added a DEBUG-only ChatSessionStore._resetForTesting() helper.
  • Updated CI to skip restoring DerivedData on pull-request runs while still preserving SPM package caching.
  • Hardened ToolRegistryTimeoutTests.slowToolReturnsTimeoutEnvelopeBeforeBudgetExpires() against loaded CI scheduler latency.

Validation

  • git fetch origin && git rebase origin/main
  • swift build --package-path Packages/OsaurusCore
  • swift build -c release --package-path Packages/OsaurusCore
  • swift test --package-path Packages/OsaurusCore
  • swift test --package-path Packages/OsaurusCore --filter ToolRegistryTimeoutTests
  • xcrun swift-format lint --strict on every touched Swift file
  • swiftlint lint --strict on every touched Swift file accepted by the repo SwiftLint configuration
  • git diff --check

Non-scope

  • No production HTTP handler behavior changes.
  • No production chat persistence behavior changes.
  • No user data migration.
  • No skipped or disabled tests.
  • No Xcode version or package dependency changes.

Residual risks

  • Future tests that create loopback NIO servers should use HTTPServerTestLock or they can reintroduce process-wide port/server contention.
  • Future tests that mutate OsaurusPaths.overrideRoot, StorageKeyManager, chat history, or synthetic agents should use the existing storage/sandbox locks or the new ChatHistoryTestStorage helper.
  • PR test-core cold-builds DerivedData, so first-attempt CI may be slower but should be more trustworthy.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stabilizes the OsaurusCore test gate by isolating process-wide shared state (loopback NIO servers, chat-history persistence, and agent snapshots) so Swift Testing concurrency doesn’t introduce flakes/hangs.

Changes:

  • Serialize tests that boot real loopback NIO servers via a process-wide HTTPServerTestLock lease.
  • Run chat-history/persistence-sensitive tests under isolated temporary roots via ChatHistoryTestStorage, including a deterministic storage key.
  • Prevent HTTP streaming tests from writing chat history by sending X-Persist: false, and add a DEBUG-only ChatSessionStore._resetForTesting() seam to close/reopen the shared DB.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Packages/OsaurusCore/Tests/Networking/MCPHTTPHandlerTests.swift Wraps test server lifecycle in HTTPServerTestLock lease to avoid concurrent loopback contention.
Packages/OsaurusCore/Tests/Networking/HTTPHandlerChatStreamingTests.swift Adds X-Persist: false helper and wraps server lifecycle in HTTPServerTestLock lease.
Packages/OsaurusCore/Tests/Networking/HTTPBodySizeLimitTests.swift Wraps loopback server in HTTPServerTestLock lease.
Packages/OsaurusCore/Tests/Networking/HTTPAuthGateTests.swift Wraps loopback server in HTTPServerTestLock lease.
Packages/OsaurusCore/Tests/Networking/CORSHandlerTests.swift Wraps loopback server in HTTPServerTestLock lease.
Packages/OsaurusCore/Tests/Helpers/HTTPServerTestLock.swift Introduces process-wide async lease/lock for serializing server-booting tests.
Packages/OsaurusCore/Tests/Helpers/ChatHistoryTestStorage.swift Introduces isolated temp-root + deterministic key runner for chat-history sensitive tests.
Packages/OsaurusCore/Tests/Chat/ChatWindowStateAgentSyncTests.swift Moves agent/persistence-sensitive tests under ChatHistoryTestStorage.
Packages/OsaurusCore/Tests/Chat/ChatSessionStopTests.swift Runs stop/save-sensitive tests under ChatHistoryTestStorage.
Packages/OsaurusCore/Tests/Chat/ChatSessionResetForAgentTests.swift Runs reset/save-sensitive tests under ChatHistoryTestStorage.
Packages/OsaurusCore/Models/Chat/ChatSessionStore.swift Adds DEBUG-only _resetForTesting() to close/reset the shared chat-history DB between isolated roots.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +53 to +65
final class HTTPServerTestLease: @unchecked Sendable {
private let lock: HTTPServerTestLock
private let id: UUID

fileprivate init(lock: HTTPServerTestLock, id: UUID) {
self.lock = lock
self.id = id
}

func release() async {
await lock.release(id: id)
}
}
Comment on lines +79 to +82
static func _resetForTesting() {
didOpen = false
ChatHistoryDatabase.shared.close()
}
@mimeding
Copy link
Copy Markdown
Contributor Author

mimeding commented May 3, 2026

Gate update is green again after broadening this stabilizer.

Validated by the refreshed CI run:

  • test-core: passed in 12m58s on the new PR cold-DerivedData path
  • test-cli: passed
  • swiftlint: passed
  • shellcheck: passed
  • update_release_draft: passed

This should also cover the two stale red check classes we found on older open PRs:

Local gate was rerun after the update:

  • git fetch origin && git rebase origin/main
  • swift build --package-path Packages/OsaurusCore
  • swift build -c release --package-path Packages/OsaurusCore
  • swift test --package-path Packages/OsaurusCore
  • swift test --package-path Packages/OsaurusCore --filter ToolRegistryTimeoutTests
  • xcrun swift-format lint --strict on touched Swift files
  • swiftlint lint --strict on touched Swift files accepted by repo config
  • git diff --check

Copy link
Copy Markdown
Contributor

@tpae tpae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for your contribution

@tpae tpae merged commit fd2ffec into osaurus-ai:main May 3, 2026
5 checks passed
mimeding pushed a commit to mimeding/osaurus that referenced this pull request May 3, 2026
Business rationale: On-demand skills are meant to strengthen the Osaurus harness without making every chat heavier by default. This keeps high-fidelity file and workflow skills discoverable and loadable while protecting the user's trust in fast, lightweight startup prompts.

Coding rationale: Filter startup skill injection to activation-selected skills and let capabilities_load remain the only path for on-demand session loads. The tests now cover stale all-skill allowlists and loaded on-demand session persistence. The remaining touched-file style fixes are mechanical lint alignment after rebasing onto the stricter osaurus-ai#1015 gate.

Co-authored-by: Codex <codex@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants