Skip to content

fix: start OpenClaw gateway and port forwarding in nemoclaw start#381

Open
futhgar wants to merge 2 commits intoNVIDIA:mainfrom
futhgar:fix/gateway-auto-restart
Open

fix: start OpenClaw gateway and port forwarding in nemoclaw start#381
futhgar wants to merge 2 commits intoNVIDIA:mainfrom
futhgar:fix/gateway-auto-restart

Conversation

@futhgar
Copy link
Copy Markdown

@futhgar futhgar commented Mar 19, 2026

Summary

  • nemoclaw start now manages the OpenClaw gateway (inside sandbox) and dashboard port forwarding alongside the existing Telegram bridge and cloudflared tunnel
  • nemoclaw stop cleanly tears down all four services
  • nemoclaw status shows the state of all four services
  • Adds validate_name() to reject sandbox names with shell metacharacters
  • Adds resolve_sandbox() to auto-detect the active sandbox when --sandbox is not provided

Problem

After a system reboot, the OpenClaw gateway inside the sandbox stops and port forwarding is lost. Users must manually SSH into the sandbox, restart the gateway, and re-establish port forwarding. This makes integration with external dashboards (e.g. Mission Control) fragile.

Approach

Rather than introducing SSH proxy tunnels, this uses the existing openshell sandbox exec and openshell forward start commands that the codebase already relies on for sandbox interaction and port forwarding during onboarding.

The gateway is run in the foreground inside the sandbox via openshell sandbox exec, which keeps a live host-side PID for the existing PID-based service tracking to work correctly.

Test plan

  • All existing tests pass (node --test test/cli.test.js test/service-env.test.js -- 25/25)
  • New tests for validate_name (accepts valid names, rejects metacharacters, rejects empty)
  • New tests for resolve_sandbox (returns explicit name when set)
  • New tests verify show_status iterates all four services and do_stop stops gateway services
  • CLI help test verifies gateway is mentioned in start description
  • Manual: nemoclaw start on a DGX Spark with an existing sandbox

Fixes #311

Summary by CodeRabbit

  • New Features

    • Added gateway service and host port-forwarding to service management; services auto-detect environment and validate sandbox names.
  • Documentation

    • Updated CLI help to list available services: gateway, port fwd, Telegram, tunnel.
  • Tests

    • Added CLI help test and new environment tests for sandbox resolution, name validation, and presence/management of gateway and port-forward services.

Signed-off-by: Josue Balandrano Coronel josuebc@pm.me

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

Adds management for an OpenClaw gateway and dashboard port forwarding: scripts/start-services.sh now defines openclaw-gateway and gateway-forward, includes validate_name() and resolve_sandbox(), and starts/stops/reports these services; CLI help text updated and tests extended to cover the new services. (33 words)

Changes

Cohort / File(s) Summary
CLI Help & Tests
bin/nemoclaw.js, test/cli.test.js
Help text for nemoclaw start updated to list gateway, port fwd; added CLI test asserting gateway appears in help output.
Service Management Script
scripts/start-services.sh
Added validate_name() and resolve_sandbox(); added managed services openclaw-gateway and gateway-forward; updated do_start(), do_stop(), and show_status() to handle them, with conditional start/forward when openshell and sandbox are resolvable and idempotent PID-file handling.
Service Environment Tests
test/service-env.test.js
Added subprocess tests exercising validate_name and resolve_sandbox, and assertions that start-services.sh lists and handles openclaw-gateway and gateway-forward (including stop logic).

Sequence Diagram

sequenceDiagram
    participant User
    participant NemoClaw as "nemoclaw start"
    participant Validate as "validate_name()"
    participant Resolve as "resolve_sandbox()"
    participant OpenShell as "openshell"
    participant Sandbox as "Sandbox (OpenClaw)"

    User->>NemoClaw: Run "nemoclaw start"
    NemoClaw->>Validate: Check SANDBOX_NAME validity
    alt Invalid name
        Validate-->>NemoClaw: Reject
        NemoClaw-->>User: Warn and skip gateway services
    else Valid name
        Validate-->>NemoClaw: Accept
        NemoClaw->>Resolve: Determine sandbox (SANDBOX_NAME or auto-detect)
        alt Sandbox found
            Resolve-->>NemoClaw: Return sandbox name
            NemoClaw->>OpenShell: exec in sandbox -> "openclaw gateway run"
            OpenShell->>Sandbox: Start gateway process
            Sandbox-->>OpenShell: Gateway running (PID)
            NemoClaw->>OpenShell: Start port forward -> $DASHBOARD_PORT
            OpenShell-->>NemoClaw: Port forward active
            NemoClaw-->>User: Report gateway + forward started
        else No sandbox
            Resolve-->>NemoClaw: Not found
            NemoClaw-->>User: Warn and skip gateway services
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Inside the sandbox, gears align,
Gateway wakes and threads entwine,
Ports forwarded, banners bright,
Start and stop done through the night,
NemoClaw hops — all systems fine.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding OpenClaw gateway and port forwarding management to the nemoclaw start workflow.
Linked Issues check ✅ Passed All coding requirements from issue #311 are met: gateway is started via nemoclaw start, port forwarding is configured, status shows gateway state, and stop cleanly terminates gateway services.
Out of Scope Changes check ✅ Passed All changes directly support the primary objective of managing OpenClaw gateway and port forwarding; help text update is a necessary documentation change for the new functionality.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
test/service-env.test.js (1)

138-189: Tests duplicate logic instead of testing the actual implementation.

The validate_name tests inline a copy of the function logic rather than sourcing and invoking the actual function from start-services.sh. This means the tests won't catch regressions if the real implementation changes.

Consider using the callValidateName helper (or a similar approach) to test the actual script:

♻️ Suggested refactor to test actual implementation
     it("accepts valid sandbox names", () => {
-      // validate_name is a case statement that calls fail on bad input.
-      // We source just the function definition, not the whole script.
-      const result = spawnSync(
-        "bash",
-        ["-c", `
-          validate_name() {
-            case "$1" in
-              (*[!A-Za-z0-9._-]*|'') echo "INVALID"; return 1 ;;
-            esac
-            echo "VALID"
-          }
-          validate_name "my-sandbox"
-        `],
-        { encoding: "utf-8" }
-      );
-      assert.equal(result.stdout.trim(), "VALID");
+      // Source functions from script and test actual implementation
+      const result = spawnSync("bash", ["-c", `
+        fail() { echo "INVALID"; exit 1; }
+        source <(sed -n '/^validate_name/,/^}/p' "${START_SERVICES_SH}")
+        validate_name "my-sandbox" && echo "VALID"
+      `], { encoding: "utf-8" });
+      assert.equal(result.stdout.trim(), "VALID");
     });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.js` around lines 138 - 189, The tests currently inline
a duplicate validate_name implementation instead of exercising the real function
from start-services.sh; update the three test cases in test/service-env.test.js
to source the script (or use the existing callValidateName helper) and invoke
the actual validate_name from start-services.sh (or callValidateName) so the
tests exercise the real implementation and catch regressions—replace the
embedded function blocks with sourcing/start-services.sh invocation and then
call validate_name (or callValidateName) with the same inputs ("my-sandbox",
"foo;rm -rf /", and empty string) and assert on the real output.
scripts/start-services.sh (1)

174-180: Race condition: gateway may not be ready after fixed 3-second sleep.

The hardcoded sleep 3 assumes the gateway binds its port within 3 seconds. On slower systems or under load, this may not be sufficient, causing the port forwarding to fail silently if the gateway isn't ready.

Consider adding a readiness check loop instead of a fixed sleep:

♻️ Suggested improvement
       start_service openclaw-gateway \
         openshell sandbox exec "$sandbox" -- openclaw gateway run
-      # Give the gateway a moment to bind its port
-      sleep 3
+      # Wait for gateway to be ready (up to 30s)
+      for _ in $(seq 1 30); do
+        if openshell sandbox exec "$sandbox" -- curl -sf "http://localhost:$DASHBOARD_PORT/health" >/dev/null 2>&1; then
+          break
+        fi
+        sleep 1
+      done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/start-services.sh` around lines 174 - 180, The fixed "sleep 3" after
starting openclaw-gateway is racy; replace it with a readiness loop that polls
the gateway until it binds its port or a timeout is reached. After calling
start_service openclaw-gateway (and using openshell sandbox exec "$sandbox" --
... to run inside the sandbox), repeatedly test the gateway endpoint/port (e.g.,
with curl against a health URL or nc/tcp probe) and break when successful,
logging progress and failing cleanly if the timeout elapses; encapsulate this in
a small helper (e.g., wait_for_openclaw or wait_for_port) and call it instead of
sleep 3 so is_running/start_service logic reliably waits for the gateway to be
ready.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/service-env.test.js`:
- Around line 126-136: The helper function callValidateName is dead code; either
delete the entire callValidateName function or refactor the subsequent tests
that inline the validate_name invocation to call callValidateName instead. If
removing, simply delete the callValidateName definition (which uses spawnSync,
START_SERVICES_SH and sets env SANDBOX_NAME). If refactoring, replace the inline
spawnSync calls in the tests with calls to callValidateName(name) and reuse its
return value, ensuring the same encoding and env behavior is preserved.

---

Nitpick comments:
In `@scripts/start-services.sh`:
- Around line 174-180: The fixed "sleep 3" after starting openclaw-gateway is
racy; replace it with a readiness loop that polls the gateway until it binds its
port or a timeout is reached. After calling start_service openclaw-gateway (and
using openshell sandbox exec "$sandbox" -- ... to run inside the sandbox),
repeatedly test the gateway endpoint/port (e.g., with curl against a health URL
or nc/tcp probe) and break when successful, logging progress and failing cleanly
if the timeout elapses; encapsulate this in a small helper (e.g.,
wait_for_openclaw or wait_for_port) and call it instead of sleep 3 so
is_running/start_service logic reliably waits for the gateway to be ready.

In `@test/service-env.test.js`:
- Around line 138-189: The tests currently inline a duplicate validate_name
implementation instead of exercising the real function from start-services.sh;
update the three test cases in test/service-env.test.js to source the script (or
use the existing callValidateName helper) and invoke the actual validate_name
from start-services.sh (or callValidateName) so the tests exercise the real
implementation and catch regressions—replace the embedded function blocks with
sourcing/start-services.sh invocation and then call validate_name (or
callValidateName) with the same inputs ("my-sandbox", "foo;rm -rf /", and empty
string) and assert on the real output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 745963ad-56ba-4982-8f48-03379dad0a1a

📥 Commits

Reviewing files that changed from the base of the PR and between 9513eca and 516223a.

📒 Files selected for processing (4)
  • bin/nemoclaw.js
  • scripts/start-services.sh
  • test/cli.test.js
  • test/service-env.test.js

@wscurran wscurran added NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. labels Mar 19, 2026
futhgar added 2 commits March 28, 2026 17:12
`nemoclaw start` previously only started the Telegram bridge and
cloudflared tunnel. After a system reboot, the OpenClaw gateway inside
the sandbox and the dashboard port forwarding were lost, requiring
manual intervention.

Add `openclaw-gateway` and `gateway-forward` as managed services in
start-services.sh so `nemoclaw start` / `nemoclaw stop` / `nemoclaw
status` manage the full service lifecycle.

- resolve_sandbox() auto-detects the active sandbox when not explicit
- validate_name() rejects identifiers with shell metacharacters
- Gateway started via `openshell sandbox exec` (foreground, PID tracked)
- Port forwarding started via `openshell forward start` (PID tracked)
- Status and banner updated to show all four services

Fixes NVIDIA#311

Signed-off-by: Josue Gomez <josue@guatulab.com>
Address CodeRabbit review: the callValidateName helper sourced the full
start-services.sh which has side effects that break in the test
environment. Replace with a focused testValidateName function that
defines just the validate_name case pattern directly.

Signed-off-by: Josue Gomez <josue@guatulab.com>
Signed-off-by: futhgar <jmaldonado.rosa@gmail.com>
@futhgar futhgar force-pushed the fix/gateway-auto-restart branch from ed0a561 to c087e2b Compare March 28, 2026 21:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/service-env.test.js (1)

367-386: Consider adding test coverage for the fallback branch.

This test only verifies the explicit SANDBOX_NAME != "default" path. The real resolve_sandbox function has a second branch that calls openshell sandbox list when SANDBOX_NAME is "default". Consider adding:

  1. A test case verifying the "default" branch is entered when SANDBOX_NAME="default".
  2. A test verifying the function returns empty/nothing when openshell is unavailable (the || true fallback).
📝 Example additional test case
it("returns empty when SANDBOX_NAME is default and openshell unavailable", () => {
  const result = spawnSync(
    "bash",
    ["-c", `
      SANDBOX_NAME="default"
      resolve_sandbox() {
        if [ "$SANDBOX_NAME" != "default" ]; then
          printf '%s\\n' "$SANDBOX_NAME"
          return
        fi
        if command -v openshell > /dev/null 2>&1; then
          openshell sandbox list 2>/dev/null | awk '/Ready/{print $1; exit}' || true
        fi
      }
      resolve_sandbox
    `],
    { encoding: "utf-8" }
  );
  expect(result.status).toBe(0);
  expect(result.stdout.trim()).toBe("");
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/service-env.test.js` around lines 367 - 386, Add tests covering the
"default" branch of resolve_sandbox: create one test where
SANDBOX_NAME="default" and stub an openshell that emits a Ready line (so
resolve_sandbox parses and returns that sandbox id) and another test where
SANDBOX_NAME="default" and openshell is unavailable (ensure command -v openshell
fails) to assert resolve_sandbox returns empty; implement both using spawnSync
with a small inline bash function resolve_sandbox (same name used in existing
test) and either manipulate PATH to point to a temp directory containing a fake
openshell script that prints a Ready line or leave openshell missing to exercise
the || true fallback, then assert result.status is 0 and stdout.trim() matches
expected values (sandbox id or empty).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/service-env.test.js`:
- Around line 367-386: Add tests covering the "default" branch of
resolve_sandbox: create one test where SANDBOX_NAME="default" and stub an
openshell that emits a Ready line (so resolve_sandbox parses and returns that
sandbox id) and another test where SANDBOX_NAME="default" and openshell is
unavailable (ensure command -v openshell fails) to assert resolve_sandbox
returns empty; implement both using spawnSync with a small inline bash function
resolve_sandbox (same name used in existing test) and either manipulate PATH to
point to a temp directory containing a fake openshell script that prints a Ready
line or leave openshell missing to exercise the || true fallback, then assert
result.status is 0 and stdout.trim() matches expected values (sandbox id or
empty).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 293f8a4a-2a0b-4558-a063-a5453d8c61d5

📥 Commits

Reviewing files that changed from the base of the PR and between ed0a561 and c087e2b.

📒 Files selected for processing (4)
  • bin/nemoclaw.js
  • scripts/start-services.sh
  • test/cli.test.js
  • test/service-env.test.js
✅ Files skipped from review due to trivial changes (2)
  • test/cli.test.js
  • bin/nemoclaw.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/start-services.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenClaw gateway inside sandbox requires manual start after every reboot

2 participants