Skip to content

refactor(cli): shell consolidation, TypeScript migration & oclif #924

@cv

Description

@cv

Summary

Migrate the NemoClaw CLI from ~4,700 lines of shell + ~4,200 lines of untyped CJS to strict ESM TypeScript on oclif, delivered across 6 sequential PRs. This subsumes #909 (oclif migration) and addresses the same 24 issues and 17 superseded PRs listed there.

No user-facing behavioral changes. The CLI commands, flags, and output stay the same — the internals become typed, tested, and structurally sound.

Motivation

The current CLI has three categories of problem that compound each other:

Shell duplication. Three copies of the install logic (install.sh, scripts/install.sh, inline in onboard.js). A dead setup.sh that leaks NVIDIA_API_KEY on the command line. An uninstall path that exists entirely outside the JS test suite. ~4,700 lines of shell that mostly duplicate logic already in bin/lib/*.js.

No type safety. The 12 modules in bin/lib/ are untyped CJS. The CLI and plugin share domain concepts (sandbox entries, onboard config, credential shapes, endpoint types) but have no shared type definitions — the same shapes are implicit in JS and explicit in TS, with no compile-time guarantee they agree. Silent bugs like registerSandbox({ naem: "foo" }) store name: undefined with no complaint.

No CLI framework. The 515-line switch/case dispatcher in bin/nemoclaw.js has no shell completions, no per-command --help, no flag validation, no --json output, no structured error handling. Fixing these individually is less efficient than fixing the architecture.

Architecture

oclif commands live at the repo root. The OpenClaw plugin (nemoclaw/src/) is a separate concern (Commander-based, different runtime) and is unchanged.

src/
├── commands/              oclif command classes (one file per command)
│   ├── onboard.ts
│   ├── deploy.ts
│   ├── list.ts
│   ├── status.ts
│   ├── start.ts / stop.ts
│   ├── debug.ts / uninstall.ts
│   └── sandbox/
│       ├── connect.ts
│       ├── status.ts
│       ├── logs.ts
│       ├── destroy.ts
│       └── policy/
│           ├── add.ts
│           └── list.ts
├── lib/                   CLI library modules (converted from bin/lib/)
│   ├── runner.ts          Subprocess execution via execa ($$ helper)
│   ├── registry.ts        Multi-sandbox state (~/.nemoclaw/sandboxes.json)
│   ├── credentials.ts     API key and token management
│   ├── platform.ts        OS/arch/Docker socket detection
│   ├── uninstall.ts       Uninstall steps (Docker, npm, state)
│   ├── services.ts        PID management for helper services
│   ├── debug.ts           Diagnostic collection with secret redaction
│   ├── onboard.ts         Onboard wizard logic
│   └── ...
└── hooks/
    └── command-not-found.ts   Backward compat: nemoclaw <sandbox> <action>
shared/
└── types.ts               Domain types shared between CLI and plugin
nemoclaw/
└── src/                   OpenClaw plugin (unchanged)

Two TypeScript projects (tsconfig.cli.json at root + nemoclaw/tsconfig.json) with shared/ bridging them.

Plan

6 sequential PRs. Each builds on the last.

flowchart TD
    PR0["**PR 0** — Foundation ✅ #913\ntsconfig.cli.json · root execa · TS coverage ratchet"]
    PR1["**PR 1** — Security fix + first TS batch\nKill setup.sh · shared/types.ts\nrunner · registry · credentials · platform → src/lib/"]
    PR2["**PR 2** — Port shell scripts to TS\nuninstall.sh · start-services.sh · debug.sh → src/lib/\n+ nim · policies · inference-config · local-inference"]
    PR3["**PR 3** — oclif scaffold + simple commands\nScaffold oclif · list · status · start · stop · debug · uninstall\nMerge installers · port fix-coredns · convert onboard + preflight"]
    PR4["**PR 4** — oclif complex commands\nSandbox topic commands · deploy · onboard wizard\nCentralized error handling · secret redaction"]
    PR5["**PR 5** — E2E migration + cleanup\nAll e2e → vitest TS · shell completions\nDelete old dispatcher · strict TS everywhere"]

    PR0 --> PR1
    PR1 --> PR2
    PR2 --> PR3
    PR3 --> PR4
    PR4 --> PR5

    style PR0 fill:#2d6a2d,color:#fff
Loading

PR 0: Foundation ✅ #913

Done. tsconfig.cli.json (strict, noEmit), execa ^9.6.1 in root devDependencies, scripts/check-coverage-ratchet.ts replacing the bash+python version, tsc-check-cli pre-push hook, CI updated.

PR 1: Kill setup.sh + shared types + first CJS→ESM TS batch

Security fix. scripts/setup.sh passes NVIDIA_API_KEY on the openshell sandbox create command line. onboard.js handles this through the gateway's stored credential. The shell path must die.

  • shared/types.ts — shared domain types with at least one consumer from day one:
    SandboxEntry, SandboxRegistry, EndpointType, NemoClawOnboardConfig, PlatformInfo, CredentialKey
  • Delete scripts/setup.sh (242 lines)
  • Convert 4 core modules (CJS→ESM TS in one step, straight to src/lib/):
    • bin/lib/runner.jssrc/lib/runner.tsrewrite on execa ($$ project helper, delete shellQuote())
    • bin/lib/registry.jssrc/lib/registry.ts
    • bin/lib/credentials.jssrc/lib/credentials.ts
    • bin/lib/platform.jssrc/lib/platform.ts

~240 lines of shell deleted, 4 modules converted. Risk: Low.

PR 2: Port uninstall + start-services + debug to TS

All three are shell scripts that nemoclaw.js shells out to via spawnSync('bash', ...). Each becomes a TS module using execa, called directly.

  • uninstall.sh (555 lines) → src/lib/uninstall.ts
  • scripts/start-services.sh (204 lines) → src/lib/services.ts
  • scripts/debug.sh (328 lines) → src/lib/debug.ts
  • Convert remaining lib modules: nim, policies, inference-config, local-inference

~1,090 lines of shell → ~620 lines of TS + ~400 lines of tests. Risk: Medium.

PR 3: oclif scaffold + simple commands

The biggest structural change. Scaffold oclif at repo root and migrate the simple commands.

Old dispatcher (bin/nemoclaw.js) stays as fallback during this PR. Risk: Medium-high.

PR 4: oclif sandbox commands + onboard wizard + deploy

Risk: High for onboard wizard. Test interactive flow end-to-end.

PR 5: E2E test migration + completions + cleanup

Risk: Medium. Run old and new e2e in parallel for one PR cycle.

End state

pie title Lines of code by language (after)
    "TypeScript (CLI commands)" : 1500
    "TypeScript (CLI lib)" : 3000
    "TypeScript (plugin)" : 4800
    "TypeScript (shared)" : 80
    "TypeScript (e2e tests)" : 700
    "Shell (stays)" : 1300
    "Python (1 script)" : 100
Loading
Before After
~9,600 lines of shell ~1,300 (curl-pipe installers, cloud provisioning, CI utilities)
~4,200 lines of untyped CJS 0
~4,100 lines of shell e2e tests vitest TS with waitFor helper
No CLI framework oclif (auto-help, completions, --json, flag validation)
No shared types shared/types.ts — one definition, two consumers
bash -c + shellQuote() execa array args (no shell, no injection)

Shell that stays (and why)

Script Lines Why
install.sh ~200 Must be curl-pipeable — no Node.js available yet
uninstall.sh ~30 Stub for users who curl-pipe the uninstaller
brev-setup.sh ~167 Linux VM provisioning (apt-get, sudo, systemd)
nemoclaw-start.sh ~220 Sandbox entrypoint with inline Python heredocs
scripts/setup-spark.sh ~142 DGX Spark cgroup/Docker fix, runs under sudo -E
Other CI/dev utilities ~540 check-spdx-headers.sh, backup-workspace.sh, walkthrough.sh, etc.

Subprocess execution: execa replaces child_process

All subprocess calls move from string-concatenated bash -c commands to execa array args. A project-wide $$ helper in src/lib/runner.ts provides defaults.

// Before: string concat + bash -c + shellQuote
run(`docker rm -f ${shellQuote(container)} 2>/dev/null || true`, { ignoreError: true });
const out = runCapture(`docker ps --format '{{.Names}}'`);

// After: execa — no shell, no quoting, typed
await $$({ reject: false })`docker rm -f ${container}`;
const { stdout } = await $$({ stdio: "pipe" })`docker ps --format ${"{{.Names}}"}`;

Conflict management

Must merge before PR 1 (large changes to files we're rewriting):

Should merge if ready (smaller onboard.js changes):

Superseded by oclif migration (close before PR 3):

See #909 for the full list of 24 issues addressed and 17 PRs superseded.

UX compatibility

The nemoclaw <sandbox-name> <action> syntax is preserved via a command_not_found hook — no breaking changes to existing workflows or scripts. The new nemoclaw sandbox <action> <name> syntax is also available.

/cc @cv @HagegeR @prekshivyas

Metadata

Metadata

Assignees

Labels

NemoClaw CLIUse this label to identify issues with the NemoClaw command-line interface (CLI).refactor

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions