Summary
The built-in GatewayManager in dist/server/services/hermes/gateway-manager.js has a writeProfilePort() method that modifies the user's Hermes Agent config.yaml on every WebUI restart, causing a port auto-increment spiral that eventually leads to a complete port clash between the gateway and the WebUI's own Node server.
Root Cause
The resolvePort() method is called during bootstrap via startAll() → resolvePort(). It always calls writeProfilePort() — even when the configured port is perfectly valid and available. This is visible at line ~260:
// Port is free — but still writes config
this.writeProfilePort(name, port, host);
writeProfilePort() does:
yaml.load() the config
- Destroys the
extra structure
- Dumps the entire YAML back via
yaml.dump() — losing all comments, ordering, and any fields the GatewayManager doesn't know about
Meanwhile, detectStatus() has a fragile three-way verification (PID file + process alive + health check on configured port). If the Gateway was restarted for any reason and the health check fails briefly, detectStatus returns running: false → resolvePort() thinks the port is in use → increments by 1 → writes new config → calls hermes gateway restart on the new port.
This cascades on every WebUI restart, producing a chain like:
Port 8645 is in use, reassigning to 8646 (PID 748)
Port 8646 is in use, reassigning to 8647 (PID 738)
Port 8647 is in use, reassigning to 8648 (PID 760)
→ WebUI Node server EADDRINUSE on 8648 → port dead
Impact
- Gateway config silently drifts from the standard 8642 to arbitrary ports up to 8648
- WebUI becomes unreachable — systemctl status shows "running" but port 8648 is held by the gateway (Python) process, not the Node WebUI
- config.yaml is corrupted by repeated yaml.load() → yaml.dump() cycles — comments, field ordering, and unmanaged keys are lost
- npm upgrade does NOT fix it — the logic is in the build, and a fresh install has the same behavior
Affected Version
v0.5.16 (and presumably all versions shipping gateway-manager.js with resolvePort() → writeProfilePort())
Expected Behavior
For environments where the Hermes Agent Gateway is independently managed (systemd, docker, or manual gateway run), the GatewayManager should:
- Detect the running gateway's actual port (via PID + lsof / procfs)
- Use that port for upstream proxy — without modifying config.yaml
- Never write to ~/.hermes/config.yaml
Suggested Fix
writeProfilePort() should be a no-op when the gateway is already running and responsive. Or better: make resolvePort() read-only — allocate ports in-memory only, without side effects on the filesystem.
The upgrade-safe workaround applied in our environment (via startup wrapper script):
# Insert early return at the top of writeProfilePort
sed -i '/^ writeProfilePort(name, port, host) {/a\ return;' dist/server/services/hermes/gateway-manager.js
Evidence
Full server.log trace of the spiral:
2026-05-10 12:27:46 PID 748 Port 8645 is in use, reassigning to 8646
2026-05-10 13:20:52 PID 738 Port 8646 is in use, reassigning to 8647
2026-05-10 17:38:22 PID 760 Port 8647 is in use, reassigning to 8648
After the last step, the EADDRINUSE error from the Node server:
{"err":{"type":"Error","message":"listen EADDRINUSE: address already in use 0.0.0.0:8648","port":8648},"msg":"Unhandled rejection"}
Summary
The built-in GatewayManager in
dist/server/services/hermes/gateway-manager.jshas awriteProfilePort()method that modifies the user's Hermes Agent config.yaml on every WebUI restart, causing a port auto-increment spiral that eventually leads to a complete port clash between the gateway and the WebUI's own Node server.Root Cause
The
resolvePort()method is called during bootstrap viastartAll()→resolvePort(). It always callswriteProfilePort()— even when the configured port is perfectly valid and available. This is visible at line ~260:writeProfilePort()does:yaml.load()the configextrastructureyaml.dump()— losing all comments, ordering, and any fields the GatewayManager doesn't know aboutMeanwhile,
detectStatus()has a fragile three-way verification (PID file + process alive + health check on configured port). If the Gateway was restarted for any reason and the health check fails briefly,detectStatusreturnsrunning: false→resolvePort()thinks the port is in use → increments by 1 → writes new config → callshermes gateway restarton the new port.This cascades on every WebUI restart, producing a chain like:
Impact
Affected Version
v0.5.16 (and presumably all versions shipping gateway-manager.js with resolvePort() → writeProfilePort())
Expected Behavior
For environments where the Hermes Agent Gateway is independently managed (systemd, docker, or manual
gateway run), the GatewayManager should:Suggested Fix
writeProfilePort()should be a no-op when the gateway is already running and responsive. Or better: makeresolvePort()read-only — allocate ports in-memory only, without side effects on the filesystem.The upgrade-safe workaround applied in our environment (via startup wrapper script):
Evidence
Full server.log trace of the spiral:
After the last step, the EADDRINUSE error from the Node server:
{"err":{"type":"Error","message":"listen EADDRINUSE: address already in use 0.0.0.0:8648","port":8648},"msg":"Unhandled rejection"}