Skip to content

bw serve leaks ~10 KB per HTTP request regardless of endpoint #20548

@ggiesen

Description

@ggiesen

Summary

bw serve retains roughly 10 KB of process memory per HTTP request handled, independent of which endpoint is called (/status, /sync, /object/item/... all behave the same). Over hours/days of normal operation this accumulates to tens of MB to multiple GB, eventually triggering V8 heap exhaustion (Ineffective mark-compacts near heap limit ... Allocation failed - JavaScript heap out of memory) and process death.

The leak is in the request-handling path, not in vault decryption or any sync-specific machinery — /status, which doesn't touch vault items, leaks at the same per-request rate as /object/item/....

Observed in production

Four bw serve deployments in our cluster all exhibit the same pattern. Memory growth measured via container_memory_working_set_bytes (cAdvisor).

A canonical 30-hour cycle on one deployment — smooth, monotonic, 110 MB → 330 MB before OOM-induced restart:

Image

The same pattern, scaled up: a deployment that ran for 9.5 days without a memory limit so the only ceiling was V8's internal heap. 130 MB → 1.1 GB, perfectly linear, no plateau:

Image

A deployment with a wrapper script that detects bw process death and restarts in-process — same per-request leak, capped at V8's default ~256 MB old-gen ceiling, ~22 hour cycles repeating for 10 days:

Image

Diagnostic experiment

To isolate the cause we set HEALTH_CHECK_INTERVAL=600 (sync once every 10 minutes instead of every 30 seconds) on one deployment. Sync rate dropped 20×; total leak rate only dropped ~35%.

Config API call rate (status + sync) Leak rate MB per request
30 s sync wrapper interval (probes + wrapper) ~720 req/h 7.73 MB/h 0.0107
600 s sync wrapper interval (probes + wrapper) ~492 req/h 5.05 MB/h 0.0103

Per-request leak is essentially identical at very different sync frequencies. The leak is per-HTTP-request, not specific to /sync. If sync was the cause, dropping sync 20× would have reduced leak rate ~95%, not ~35%.

Liveness/readiness probes (which only call /status) account for most of the residual API rate — and the residual leak rate scales with them, confirming /status leaks at the same rate as endpoints that touch vault data.

Image

Predicted minimal reproduction

We have not directly run this minimal docker-based repro — the per-request leak rate is inferred from production cAdvisor measurements correlated with API call rate. Based on those measurements, an upstream maintainer should be able to reproduce in minutes:

# Auth + serve
bw login --apikey   # (BW_CLIENTID, BW_CLIENTSECRET in env)
bw unlock --raw > /tmp/session
export BW_SESSION=$(cat /tmp/session)
bw serve --port 8087 &
SERVE_PID=$!

# Hit /status in a tight loop — no vault items needed
while true; do curl -s http://localhost:8087/status > /dev/null; done &

# Watch RSS climb
while true; do echo "$(date +%T) $(ps -o rss= -p $SERVE_PID) KB"; sleep 60; done

Predicted: at ~1 req/s, RSS grows ~36 MB/h, hitting V8's default ~256 MB old-gen ceiling within a few hours.

Versions tested

  • bw 2025.12.1 (nexe-built single binary in our deployments), confirmed leak.
  • Latest npm release 2026.4.1 not directly tested — would be useful to confirm whether the leak still reproduces on current.

Environment

  • Linux x86_64, single-binary bw (nexe-bundled Node).
  • No proxy, no TLS — request handling is direct HTTP.
  • Vault size is independent of leak rate in our observations: deployments serving small vaults (~150 items) leak at the same per-request rate as ones serving thousands of items. Consistent with /status leaking at the same rate as /object/item/....

Suggested investigation

Since /status leaks at the same rate as endpoints that touch vault data, the leak is unlikely to be in:

  • Vault decryption / cache management
  • Sync handlers
  • Item-specific endpoint logic

More likely candidates:

  • Per-request middleware or logging that retains references
  • HTTP response buffer / framework state not being released
  • An accumulating in-memory log or telemetry buffer
  • Closure-capture of request/response objects in promise chains

A heap profile (--heap-prof) taken before and after a few thousand /status requests should pinpoint the retainer.

Existing related issues

Filing fresh against this repo since both prior reports are closed and neither investigation appears to have continued.

Workarounds (documented for downstream consumers, not blocking on this issue)

  • NODE_OPTIONS=--max-old-space-size=512 (or higher) defers the OOM but doesn't prevent it
  • A wrapper script that detects bw process death and restarts in-process keeps the listening socket present (~7 s outage per restart)
  • Client-side retry on ConnectionError / Timeout covers the restart window transparently

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions