fix: start OpenClaw gateway and port forwarding in nemoclaw start#381
fix: start OpenClaw gateway and port forwarding in nemoclaw start#381futhgar wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds management for an OpenClaw gateway and dashboard port forwarding: Changes
Sequence DiagramsequenceDiagram
participant User
participant NemoClaw as "nemoclaw start"
participant Validate as "validate_name()"
participant Resolve as "resolve_sandbox()"
participant OpenShell as "openshell"
participant Sandbox as "Sandbox (OpenClaw)"
User->>NemoClaw: Run "nemoclaw start"
NemoClaw->>Validate: Check SANDBOX_NAME validity
alt Invalid name
Validate-->>NemoClaw: Reject
NemoClaw-->>User: Warn and skip gateway services
else Valid name
Validate-->>NemoClaw: Accept
NemoClaw->>Resolve: Determine sandbox (SANDBOX_NAME or auto-detect)
alt Sandbox found
Resolve-->>NemoClaw: Return sandbox name
NemoClaw->>OpenShell: exec in sandbox -> "openclaw gateway run"
OpenShell->>Sandbox: Start gateway process
Sandbox-->>OpenShell: Gateway running (PID)
NemoClaw->>OpenShell: Start port forward -> $DASHBOARD_PORT
OpenShell-->>NemoClaw: Port forward active
NemoClaw-->>User: Report gateway + forward started
else No sandbox
Resolve-->>NemoClaw: Not found
NemoClaw-->>User: Warn and skip gateway services
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
test/service-env.test.js (1)
138-189: Tests duplicate logic instead of testing the actual implementation.The
validate_nametests inline a copy of the function logic rather than sourcing and invoking the actual function fromstart-services.sh. This means the tests won't catch regressions if the real implementation changes.Consider using the
callValidateNamehelper (or a similar approach) to test the actual script:♻️ Suggested refactor to test actual implementation
it("accepts valid sandbox names", () => { - // validate_name is a case statement that calls fail on bad input. - // We source just the function definition, not the whole script. - const result = spawnSync( - "bash", - ["-c", ` - validate_name() { - case "$1" in - (*[!A-Za-z0-9._-]*|'') echo "INVALID"; return 1 ;; - esac - echo "VALID" - } - validate_name "my-sandbox" - `], - { encoding: "utf-8" } - ); - assert.equal(result.stdout.trim(), "VALID"); + // Source functions from script and test actual implementation + const result = spawnSync("bash", ["-c", ` + fail() { echo "INVALID"; exit 1; } + source <(sed -n '/^validate_name/,/^}/p' "${START_SERVICES_SH}") + validate_name "my-sandbox" && echo "VALID" + `], { encoding: "utf-8" }); + assert.equal(result.stdout.trim(), "VALID"); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/service-env.test.js` around lines 138 - 189, The tests currently inline a duplicate validate_name implementation instead of exercising the real function from start-services.sh; update the three test cases in test/service-env.test.js to source the script (or use the existing callValidateName helper) and invoke the actual validate_name from start-services.sh (or callValidateName) so the tests exercise the real implementation and catch regressions—replace the embedded function blocks with sourcing/start-services.sh invocation and then call validate_name (or callValidateName) with the same inputs ("my-sandbox", "foo;rm -rf /", and empty string) and assert on the real output.scripts/start-services.sh (1)
174-180: Race condition: gateway may not be ready after fixed 3-second sleep.The hardcoded
sleep 3assumes the gateway binds its port within 3 seconds. On slower systems or under load, this may not be sufficient, causing the port forwarding to fail silently if the gateway isn't ready.Consider adding a readiness check loop instead of a fixed sleep:
♻️ Suggested improvement
start_service openclaw-gateway \ openshell sandbox exec "$sandbox" -- openclaw gateway run - # Give the gateway a moment to bind its port - sleep 3 + # Wait for gateway to be ready (up to 30s) + for _ in $(seq 1 30); do + if openshell sandbox exec "$sandbox" -- curl -sf "http://localhost:$DASHBOARD_PORT/health" >/dev/null 2>&1; then + break + fi + sleep 1 + done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/start-services.sh` around lines 174 - 180, The fixed "sleep 3" after starting openclaw-gateway is racy; replace it with a readiness loop that polls the gateway until it binds its port or a timeout is reached. After calling start_service openclaw-gateway (and using openshell sandbox exec "$sandbox" -- ... to run inside the sandbox), repeatedly test the gateway endpoint/port (e.g., with curl against a health URL or nc/tcp probe) and break when successful, logging progress and failing cleanly if the timeout elapses; encapsulate this in a small helper (e.g., wait_for_openclaw or wait_for_port) and call it instead of sleep 3 so is_running/start_service logic reliably waits for the gateway to be ready.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/service-env.test.js`:
- Around line 126-136: The helper function callValidateName is dead code; either
delete the entire callValidateName function or refactor the subsequent tests
that inline the validate_name invocation to call callValidateName instead. If
removing, simply delete the callValidateName definition (which uses spawnSync,
START_SERVICES_SH and sets env SANDBOX_NAME). If refactoring, replace the inline
spawnSync calls in the tests with calls to callValidateName(name) and reuse its
return value, ensuring the same encoding and env behavior is preserved.
---
Nitpick comments:
In `@scripts/start-services.sh`:
- Around line 174-180: The fixed "sleep 3" after starting openclaw-gateway is
racy; replace it with a readiness loop that polls the gateway until it binds its
port or a timeout is reached. After calling start_service openclaw-gateway (and
using openshell sandbox exec "$sandbox" -- ... to run inside the sandbox),
repeatedly test the gateway endpoint/port (e.g., with curl against a health URL
or nc/tcp probe) and break when successful, logging progress and failing cleanly
if the timeout elapses; encapsulate this in a small helper (e.g.,
wait_for_openclaw or wait_for_port) and call it instead of sleep 3 so
is_running/start_service logic reliably waits for the gateway to be ready.
In `@test/service-env.test.js`:
- Around line 138-189: The tests currently inline a duplicate validate_name
implementation instead of exercising the real function from start-services.sh;
update the three test cases in test/service-env.test.js to source the script (or
use the existing callValidateName helper) and invoke the actual validate_name
from start-services.sh (or callValidateName) so the tests exercise the real
implementation and catch regressions—replace the embedded function blocks with
sourcing/start-services.sh invocation and then call validate_name (or
callValidateName) with the same inputs ("my-sandbox", "foo;rm -rf /", and empty
string) and assert on the real output.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 745963ad-56ba-4982-8f48-03379dad0a1a
📒 Files selected for processing (4)
bin/nemoclaw.jsscripts/start-services.shtest/cli.test.jstest/service-env.test.js
`nemoclaw start` previously only started the Telegram bridge and cloudflared tunnel. After a system reboot, the OpenClaw gateway inside the sandbox and the dashboard port forwarding were lost, requiring manual intervention. Add `openclaw-gateway` and `gateway-forward` as managed services in start-services.sh so `nemoclaw start` / `nemoclaw stop` / `nemoclaw status` manage the full service lifecycle. - resolve_sandbox() auto-detects the active sandbox when not explicit - validate_name() rejects identifiers with shell metacharacters - Gateway started via `openshell sandbox exec` (foreground, PID tracked) - Port forwarding started via `openshell forward start` (PID tracked) - Status and banner updated to show all four services Fixes NVIDIA#311 Signed-off-by: Josue Gomez <josue@guatulab.com>
Address CodeRabbit review: the callValidateName helper sourced the full start-services.sh which has side effects that break in the test environment. Replace with a focused testValidateName function that defines just the validate_name case pattern directly. Signed-off-by: Josue Gomez <josue@guatulab.com> Signed-off-by: futhgar <jmaldonado.rosa@gmail.com>
ed0a561 to
c087e2b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/service-env.test.js (1)
367-386: Consider adding test coverage for the fallback branch.This test only verifies the explicit
SANDBOX_NAME != "default"path. The realresolve_sandboxfunction has a second branch that callsopenshell sandbox listwhenSANDBOX_NAMEis"default". Consider adding:
- A test case verifying the "default" branch is entered when
SANDBOX_NAME="default".- A test verifying the function returns empty/nothing when
openshellis unavailable (the|| truefallback).📝 Example additional test case
it("returns empty when SANDBOX_NAME is default and openshell unavailable", () => { const result = spawnSync( "bash", ["-c", ` SANDBOX_NAME="default" resolve_sandbox() { if [ "$SANDBOX_NAME" != "default" ]; then printf '%s\\n' "$SANDBOX_NAME" return fi if command -v openshell > /dev/null 2>&1; then openshell sandbox list 2>/dev/null | awk '/Ready/{print $1; exit}' || true fi } resolve_sandbox `], { encoding: "utf-8" } ); expect(result.status).toBe(0); expect(result.stdout.trim()).toBe(""); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/service-env.test.js` around lines 367 - 386, Add tests covering the "default" branch of resolve_sandbox: create one test where SANDBOX_NAME="default" and stub an openshell that emits a Ready line (so resolve_sandbox parses and returns that sandbox id) and another test where SANDBOX_NAME="default" and openshell is unavailable (ensure command -v openshell fails) to assert resolve_sandbox returns empty; implement both using spawnSync with a small inline bash function resolve_sandbox (same name used in existing test) and either manipulate PATH to point to a temp directory containing a fake openshell script that prints a Ready line or leave openshell missing to exercise the || true fallback, then assert result.status is 0 and stdout.trim() matches expected values (sandbox id or empty).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/service-env.test.js`:
- Around line 367-386: Add tests covering the "default" branch of
resolve_sandbox: create one test where SANDBOX_NAME="default" and stub an
openshell that emits a Ready line (so resolve_sandbox parses and returns that
sandbox id) and another test where SANDBOX_NAME="default" and openshell is
unavailable (ensure command -v openshell fails) to assert resolve_sandbox
returns empty; implement both using spawnSync with a small inline bash function
resolve_sandbox (same name used in existing test) and either manipulate PATH to
point to a temp directory containing a fake openshell script that prints a Ready
line or leave openshell missing to exercise the || true fallback, then assert
result.status is 0 and stdout.trim() matches expected values (sandbox id or
empty).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 293f8a4a-2a0b-4558-a063-a5453d8c61d5
📒 Files selected for processing (4)
bin/nemoclaw.jsscripts/start-services.shtest/cli.test.jstest/service-env.test.js
✅ Files skipped from review due to trivial changes (2)
- test/cli.test.js
- bin/nemoclaw.js
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/start-services.sh
Summary
nemoclaw startnow manages the OpenClaw gateway (inside sandbox) and dashboard port forwarding alongside the existing Telegram bridge and cloudflared tunnelnemoclaw stopcleanly tears down all four servicesnemoclaw statusshows the state of all four servicesvalidate_name()to reject sandbox names with shell metacharactersresolve_sandbox()to auto-detect the active sandbox when--sandboxis not providedProblem
After a system reboot, the OpenClaw gateway inside the sandbox stops and port forwarding is lost. Users must manually SSH into the sandbox, restart the gateway, and re-establish port forwarding. This makes integration with external dashboards (e.g. Mission Control) fragile.
Approach
Rather than introducing SSH proxy tunnels, this uses the existing
openshell sandbox execandopenshell forward startcommands that the codebase already relies on for sandbox interaction and port forwarding during onboarding.The gateway is run in the foreground inside the sandbox via
openshell sandbox exec, which keeps a live host-side PID for the existing PID-based service tracking to work correctly.Test plan
node --test test/cli.test.js test/service-env.test.js-- 25/25)validate_name(accepts valid names, rejects metacharacters, rejects empty)resolve_sandbox(returns explicit name when set)show_statusiterates all four services anddo_stopstops gateway servicesnemoclaw starton a DGX Spark with an existing sandboxFixes #311
Summary by CodeRabbit
New Features
Documentation
Tests
Signed-off-by: Josue Balandrano Coronel josuebc@pm.me