Skip to content

Fix: bound socket thread pool#2604

Draft
NewAlexandria wants to merge 2 commits intomanaflow-ai:mainfrom
NewAlexandria:fix/bound-socket-thread-pool
Draft

Fix: bound socket thread pool#2604
NewAlexandria wants to merge 2 commits intomanaflow-ai:mainfrom
NewAlexandria:fix/bound-socket-thread-pool

Conversation

@NewAlexandria
Copy link
Copy Markdown

@NewAlexandria NewAlexandria commented Apr 5, 2026

Summary

What changed?

  • Replaced unbounded Thread.detachNewThread for per-connection socket client handlers with a bounded concurrent DispatchQueue (com.cmuxterm.socket-clients) and a DispatchSemaphore(value: 64) so at most 64 handlers run handleClient at once.
  • Connections that cannot acquire a slot within 30 seconds get ERROR: Server busy, the socket is closed, and backpressure is applied to callers.
  • The accept loop still uses a single long-lived Thread.detachNewThread; only per-client dispatch is bounded.
  • Follow-up doc tweak on the branch: comments clarify that the queue can still backlog async work while the semaphore caps concurrent execution; the 30s wait is a named constant (clientHandlerSlotWait).

Why?

Each incoming connection (e.g. cmux / Claude Code hooks) used to spawn a new NSThread. When the main thread stalls (e.g. heavy SwiftUI), handler threads that hit @MainActor / DispatchQueue.main.sync can block. With unbounded thread creation, blocked threads pile up. In a real incident that led to 150+ blocked threads and ~10.93 GB memory, leaving the app unrecoverable without a force kill. Bounding concurrency limits worst-case thread/memory growth and surfaces overload explicitly via “Server busy”.


Testing

⚠️ I did not yet. The bug happened over a very long runtime, that will be hard to reproduce. This PR is being opened for solicitation for feedback, to evaluate if the long test cycle - with a test compilation of the app - may be worth it.

I can follow any suggestions you offer. 🤝

How did you test this change?

  • Normal cmux CLI over the socket (e.g. ping, report_*, etc.).
  • Optional: simulate a main-thread stall (e.g. debug sleep) and confirm thread count stays bounded (vs unbounded growth before).
  • Optional: drive enough concurrent socket clients to exhaust 64 slots and confirm ERROR: Server busy and connection close.
  • Run tests_v2/ against a tagged debug build and socket (CMUX_SOCKET=... as per project docs).

Test plan

  • Verify normal cmux CLI commands still work (ping, report_*, etc.)
  • Simulate main thread stall with a debug sleep and confirm thread count
    stays bounded
  • Confirm "Server busy" response when all slots are exhausted
  • Run existing socket tests (tests_v2/) against a tagged debug build

What did you verify manually?

  • (Describe: e.g. “Tagged reload.sh --tag …, ran CLI ping + a few report commands; optional Instruments/thread count check.”)

Demo Video

the app hangs, with a spinning color wheel. Memory is consumed until crash of the machine.


Review Trigger (Copy/Paste as PR comment)

@codex review
@coderabbitai review
@greptile-apps review
@cubic-dev-ai review

Checklist

  • I tested the change locally
  • I added or updated tests for behavior changes (optional: note if you rely on manual/socket tests only; consider a follow-up test for “Server busy” if you want CI coverage)
  • I updated docs/changelog if needed (PR doc under docs/ is fine; CHANGELOG.md only if your repo requires it for this kind of fix)
  • I requested bot reviews after my latest commit (copy/paste block above or equivalent)
  • All code review bot comments are resolved
  • All human review comments are resolved


Summary by cubic

Bound socket client handler concurrency to prevent runaway threads and memory spikes during main-thread stalls. Per-connection work now runs on a concurrent queue with a 64-handler cap; saturated connections get "ERROR: Server busy" after 30s.

  • Bug Fixes
    • Replaced per-connection Thread.detachNewThread with a concurrent DispatchQueue (com.cmuxterm.socket-clients) gated by a DispatchSemaphore(64).
    • If a slot isn’t available within 30s, respond with ERROR: Server busy and close the socket to apply backpressure.
    • Accept loop remains a single long-lived thread; only client handling is bounded.
    • Added docs clarifying queue backlog vs concurrency cap and the named wait constant.

Written for commit 857fd9e. Summary will update on new commits.

…slot wait

Documents that the concurrent queue may backlog async blocks while the
semaphore limits concurrent handleClient work. Centralizes the 30s slot
wait as a named constant.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 5, 2026

@NewAlexandria is attempting to deploy a commit to the Manaflow Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1db0642d-40cb-46c6-a588-d4e371a50d2c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 857fd9e809

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1536 to +1540
guard self.clientConcurrencyLimit.wait(timeout: .now() + self.clientHandlerSlotWait) == .success else {
// All handler slots full — reject this connection to apply backpressure.
let msg = "ERROR: Server busy\n"
msg.withCString { ptr in _ = write(clientSocket, ptr, strlen(ptr)) }
close(clientSocket)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Acquire client slot before queueing accepted sockets

This timeout check runs only after the async block starts, so under a connection flood the accept loop still enqueues every accepted socket immediately and can accumulate a large backlog of open FDs before any Server busy rejection happens. In the overload case where 64 handlers are blocked, queued closures may not execute promptly, which means the advertised 30s backpressure window is not enforced from accept time and the process can still hit FD/memory pressure before shedding load.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant