Skip to content

fix: use native fetch for Ollama embedding to ensure AbortController works#362

Open
jlin53882 wants to merge 2 commits intoCortexReach:masterfrom
jlin53882:fix/ollama-native-fetch-abort
Open

fix: use native fetch for Ollama embedding to ensure AbortController works#362
jlin53882 wants to merge 2 commits intoCortexReach:masterfrom
jlin53882:fix/ollama-native-fetch-abort

Conversation

@jlin53882
Copy link
Contributor

@jlin53882 jlin53882 commented Mar 26, 2026

Problem

When using Ollama as the embedding provider, the \embedQuery\ timeout (EMBED_TIMEOUT_MS = 10s) does not reliably abort stalled Ollama HTTP requests. This causes the gateway-level \autoRecallTimeoutMs\ (120s) to fire instead.

Evidence from gateway logs:
\
20:48:46 auto-recall query truncated from 1233 to 1000 chars
[120 seconds of silence]
20:50:46 auto-recall timed out after 120000ms
\\

CPU ~20%, Ollama CPU ~0% — signature of a hanging HTTP connection.

Root Cause

\embedder.ts\ uses the OpenAI SDK to call Ollama. The SDK HTTP client in Node.js does not reliably abort the underlying TCP connection when \AbortController.abort()\ is called. Ollama keeps processing and the socket hangs until the 120s gateway timeout fires.

Fix

Use *native \ etch* for Ollama endpoints. Node.js 18+ native \ etch\ correctly respects \AbortController\ — TCP connection is properly closed when the signal fires.

Added

  • \isOllamaProvider(): detects \localhost:11434\ / \127.0.0.1:11434\ / /ollama\ URLs
  • \embedWithNativeFetch(): calls Ollama via native \ etch\ with proper signal handling

Modified

\embedWithRetry()\ now routes Ollama URLs through \embedWithNativeFetch()\ instead of the OpenAI SDK.

Test

  • Added \ estOllamaAbortWithNativeFetch\ (Test 8) to \cjk-recursion-regression.test.mjs\
  • Added \ est/pr354-standalone.mjs\ for quick verification
  • Added \ est/pr354-30iter.mjs\ for stress testing

30 iterations — all passed ✅

Abort time consistent at ~208–215ms (signal fires at 200ms).

Fixes #361

…works

Root cause: OpenAI SDK HTTP client does not reliably abort Ollama
TCP connections when AbortController.abort() fires in Node.js. This
causes stalled sockets that hang until the gateway-level 120s timeout.

Fix: Add isOllamaProvider() to detect localhost:11434 endpoints, and
embedWithNativeFetch() using Node.js 18+ native fetch instead of the
OpenAI SDK. Native fetch properly closes TCP connections on abort.

Added Test 8 (testOllamaAbortWithNativeFetch) to cjk-recursion-regression
test suite. Also added standalone test (pr354-standalone.mjs) and
30-iteration stress test (pr354-30iter.mjs).

Fixes CortexReach#361.
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Ollama embedding AbortController doesn't abort — causes 120s gateway timeout

1 participant