fix(codex-api): complete chat SSE on finish_reason #9000

joshka-oai · 2026-01-09T20:28:46Z

Problem

Windows CI intermittently flakes in codex-app-server integration tests that use WireMock to stub Chat Completions streaming (/v1/chat/completions). The mock expects multiple model requests, but WireMock often only sees the first request and then panics on drop with verification errors like "expected N requests, got 1".

Root cause

Our Chat Completions SSE parser treated the transport sentinel (DONE/[DONE]) as the primary completion signal. On Windows/WireMock the HTTP connection can stay open after the model already indicates completion via finish_reason (e.g. tool_calls), and the sentinel may not be observed promptly. That means we didn’t emit ResponseEvent::Completed, so core kept waiting and never issued the follow-up model request.

Fix

Treat finish_reason as authoritative for end-of-response and emit ResponseEvent::Completed immediately:

finish_reason == "tool_calls": emit all tool call items (and flush any reasoning), then complete.
finish_reason == "stop": flush assistant/reasoning, then complete.
Still accept DONE/[DONE] as a compatibility completion signal, but correctness no longer depends on it.

To address correctness/risk concerns, the implementation now includes detailed in-code documentation about these edge cases and ordering guarantees.

Regression coverage

Added codex-api unit tests that simulate the problematic condition (a stream that emits a single JSON event with finish_reason and then never closes / never sends DONE/[DONE]), including a multi-choice tool_calls chunk. These tests ensure we still emit a terminal Completed event and return without relying on sentinel/close.

Background doc: docs/ci/windows-wiremock-chat-sse-flake.md.

Tests

cargo test -p codex-api --lib
cargo test -p codex-app-server codex_message_processor_flow

Windows CI flaked in codex-app-server integration tests that use WireMock to stub the Chat Completions streaming endpoint (/v1/chat/completions). The mock responses include tool calls and a trailing DONE sentinel, but on Windows/WireMock the HTTP connection can remain open after the model has already indicated completion via finish_reason (e.g. "tool_calls"). Our SSE parser emitted FunctionCall items but didn't emit ResponseEvent::Completed until it observed the DONE/[DONE] sentinel, so core kept waiting for completion and never issued the follow-up model request. WireMock then failed verification with "expected N requests, got 1". Treat finish_reason as authoritative for end-of-response: - On finish_reason == "tool_calls", flush reasoning/tool calls and emit Completed immediately. - On finish_reason == "stop", flush assistant/reasoning and emit Completed. - Keep accepting DONE/[DONE] as a terminal sentinel for compatibility. Added a short doc describing the Windows/WireMock failure mode and rationale. Tests: - cargo test -p codex-api --lib - cargo test -p codex-app-server codex_message_processor_flow

joshka-oai · 2026-01-09T22:14:40Z

Added better docs and regression test on this to show where the problem happened in our current approach (and to ensure that this doesn't happen again)

celia-oai · 2026-01-09T22:37:07Z

does this apply to responses API too? If so can we update the PR descriptions?

I'm prob not the best person to review this corner of the codebase - will leave the review to @pakrym-oai

joshka-oai force-pushed the joshka/fix-windows-chat-sse-finish-reason branch 2 times, most recently from a9c4f86 to ba78a38 Compare January 9, 2026 21:30

joshka-oai force-pushed the joshka/fix-windows-chat-sse-finish-reason branch from ba78a38 to b1cd91d Compare January 9, 2026 21:36

joshka-oai requested review from celia-oai and pakrym-oai January 9, 2026 22:14

joshka-oai closed this Jan 9, 2026

joshka-oai deleted the joshka/fix-windows-chat-sse-finish-reason branch January 9, 2026 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(codex-api): complete chat SSE on finish_reason #9000

fix(codex-api): complete chat SSE on finish_reason #9000

joshka-oai commented Jan 9, 2026 •

edited

Loading

Uh oh!

joshka-oai commented Jan 9, 2026

Uh oh!

celia-oai commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(codex-api): complete chat SSE on finish_reason #9000

fix(codex-api): complete chat SSE on finish_reason #9000

Conversation

joshka-oai commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Regression coverage

Tests

Uh oh!

joshka-oai commented Jan 9, 2026

Uh oh!

celia-oai commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joshka-oai commented Jan 9, 2026 •

edited

Loading