Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
build-and-push:
runs-on: [self-hosted, Linux, X64]
Comment on lines +3 to +11
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't execute fork PRs on the self-hosted runner.

pull_request plus runs-on: [self-hosted, ...] lets untrusted PR code run arbitrary Docker build steps on your private runner. Keep self-hosted execution for trusted pushes/same-repo branches only, or move PR builds to GitHub-hosted runners.

🔒 Minimal guard
 jobs:
   build-and-push:
+    if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
     runs-on: [self-hosted, Linux, X64]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
build-and-push:
runs-on: [self-hosted, Linux, X64]
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
build-and-push:
if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
runs-on: [self-hosted, Linux, X64]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 3 - 11, The workflow currently allows
pull_request events to run the "build-and-push" job on a self-hosted runner via
runs-on: [self-hosted, Linux, X64]; change this so untrusted fork PRs never run
on your private runner by splitting or conditioning the job: keep the
self-hosted runner for trusted push events (push to main) and add a separate job
(or conditional job) for pull_request that uses a GitHub-hosted runner like
ubuntu-latest; update the "on:" triggers and adjust the "build-and-push" job (or
create a new "build-and-push-pr" job) so that pull_request runs on github-hosted
runners while push continues to use runs-on: [self-hosted, Linux, X64].

permissions:
contents: read
packages: write

steps:
- uses: actions/checkout@v4

- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Build and push
uses: docker/build-push-action@v6
with:
context: .
push: ${{ github.ref == 'refs/heads/main' }}
tags: |
ghcr.io/${{ github.repository }}:latest
ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
101 changes: 101 additions & 0 deletions .github/workflows/upstream-sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
name: Upstream Sync

on:
schedule:
# Daily at 06:30 UTC (offset from paperclip's 06:00)
- cron: "30 6 * * *"
workflow_dispatch:
inputs:
mode:
description: "Sync mode"
required: true
default: "pr"
type: choice
options:
- pr
- push
- dry-run

jobs:
sync:
runs-on: self-hosted
timeout-minutes: 20

Comment on lines +1 to +23
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add workflow concurrency before this can push main.

This workflow can be started by schedule, manual dispatch, and reruns, but there is no concurrency guard. Two overlapping runs can rebase from the same starting point and both force-push main or reuse the same sync/upstream-YYYY-MM-DD branch.

⛔ Serialize sync runs
 name: Upstream Sync
+
+concurrency:
+  group: upstream-sync-main
+  cancel-in-progress: true
 
 on:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/upstream-sync.yml around lines 1 - 23, The workflow
"Upstream Sync" lacks a concurrency guard so overlapping runs of the sync job
can race and force-push main; add a top-level concurrency section to the
workflow (above jobs) that defines a stable group name (e.g., "upstream-sync-${{
github.workflow }}" or include event/input like mode) and set
cancel-in-progress: true to ensure any running sync is cancelled when a new run
starts; this change affects the workflow header and protects the sync job named
"sync" from concurrent executions that could rebase/push the same branch.

steps:
- name: Checkout (full history for rebase)
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.GH_PAT }}

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: "20"

- name: Configure git
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
# Nuke stale credential helpers on self-hosted runner
git config --local --unset-all credential.helper 2>/dev/null || true
git config --global --unset-all credential.helper 2>/dev/null || true

- name: Clean working tree (self-hosted runner may have leftovers)
run: |
git checkout main 2>/dev/null || true
git rebase --abort 2>/dev/null || true
git reset --hard HEAD
git clean -fd

- name: Add upstream remote
run: |
git remote get-url upstream 2>/dev/null || \
git remote add upstream https://github.com/NVIDIA/NemoClaw.git
git fetch upstream

- name: Install Agent SDK
run: |
rm -rf "$(npm root -g)/@anthropic-ai/.claude-code-"* "$(npm root -g)/@anthropic-ai/.claude-agent-sdk-"* 2>/dev/null || true
npm install -g @anthropic-ai/claude-code || npm install -g @anthropic-ai/claude-code
npm install -g @anthropic-ai/claude-agent-sdk || npm install -g @anthropic-ai/claude-agent-sdk
node -e "require('@anthropic-ai/claude-agent-sdk')" 2>/dev/null || echo "SDK import check: will resolve at runtime via global root"

- name: Write Claude credentials
run: |
mkdir -p ~/.claude
echo '${{ secrets.CLAUDE_CREDENTIALS_JSON }}' > ~/.claude/.credentials.json

Comment on lines +64 to +68
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether repository scripts reference the credentials file path.
rg -n --hidden -S '\.claude/\.credentials\.json|CLAUDE_CREDENTIALS_JSON|CLAUDE_CODE_OAUTH_TOKEN|ANTHROPIC_API_KEY'

Repository: NVIDIA/NemoClaw

Length of output: 1535


🏁 Script executed:

cat scripts/upstream-sync.mjs | head -100

Repository: NVIDIA/NemoClaw

Length of output: 3273


🏁 Script executed:

grep -n "credentials\|\.claude" scripts/upstream-sync.mjs

Repository: NVIDIA/NemoClaw

Length of output: 41


🏁 Script executed:

wc -l scripts/upstream-sync.mjs

Repository: NVIDIA/NemoClaw

Length of output: 88


🏁 Script executed:

cat -n scripts/upstream-sync.mjs | tail -300

Repository: NVIDIA/NemoClaw

Length of output: 11904


Remove the unused Claude credentials file write step.

The upstream-sync.mjs script authenticates exclusively via the CLAUDE_CODE_OAUTH_TOKEN environment variable passed directly to the Agent SDK, not via the credentials file. Writing CLAUDE_CREDENTIALS_JSON to disk increases secret-at-rest risk without providing any benefit.

-      - name: Write Claude credentials
-        run: |
-          mkdir -p ~/.claude
-          echo '${{ secrets.CLAUDE_CREDENTIALS_JSON }}' > ~/.claude/.credentials.json
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Write Claude credentials
run: |
mkdir -p ~/.claude
echo '${{ secrets.CLAUDE_CREDENTIALS_JSON }}' > ~/.claude/.credentials.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/upstream-sync.yml around lines 64 - 68, Remove the "Write
Claude credentials" step that creates ~/.claude/.credentials.json and stops
echoing the secret; the upstream-sync flow uses CLAUDE_CODE_OAUTH_TOKEN passed
to the Agent SDK, so eliminate the workflow step referencing
secrets.CLAUDE_CREDENTIALS_JSON (the block with name "Write Claude credentials")
to avoid writing secrets to disk and rely solely on the CLAUDE_CODE_OAUTH_TOKEN
env var used by upstream-sync.mjs.

- name: Check for upstream changes
id: check
run: |
BEHIND=$(git rev-list HEAD..upstream/main --count)
echo "behind=$BEHIND" >> "$GITHUB_OUTPUT"
if [ "$BEHIND" -eq 0 ]; then
echo "Up to date with upstream. Skipping sync."
else
echo "Behind upstream by $BEHIND commits."
fi

- name: Run upstream sync
if: steps.check.outputs.behind != '0'
env:
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
GH_TOKEN: ${{ secrets.GH_PAT }}
run: |
MODE="${{ github.event.inputs.mode || 'pr' }}"
case "$MODE" in
push) FLAG="--push" ;;
pr) FLAG="--pr" ;;
dry-run) FLAG="--dry-run" ;;
esac
node scripts/upstream-sync.mjs $FLAG

Comment on lines +85 to +93
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Quick check for unquoted variable expansions in workflow run blocks.
rg -n --hidden -S '\$[A-Za-z_][A-Za-z0-9_]*' .github/workflows/upstream-sync.yml

Repository: NVIDIA/NemoClaw

Length of output: 292


Quote $FLAG when invoking the sync script.

Line 92 should quote the variable to satisfy SC2086 and prevent word-splitting/globbing issues.

Suggested fix
-          node scripts/upstream-sync.mjs $FLAG
+          node scripts/upstream-sync.mjs "$FLAG"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: |
MODE="${{ github.event.inputs.mode || 'pr' }}"
case "$MODE" in
push) FLAG="--push" ;;
pr) FLAG="--pr" ;;
dry-run) FLAG="--dry-run" ;;
esac
node scripts/upstream-sync.mjs $FLAG
run: |
MODE="${{ github.event.inputs.mode || 'pr' }}"
case "$MODE" in
push) FLAG="--push" ;;
pr) FLAG="--pr" ;;
dry-run) FLAG="--dry-run" ;;
esac
node scripts/upstream-sync.mjs "$FLAG"
🧰 Tools
🪛 actionlint (1.7.11)

[error] 85-85: shellcheck reported issue in this script: SC2086:info:7:32: Double quote to prevent globbing and word splitting

(shellcheck)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/upstream-sync.yml around lines 85 - 93, The node
invocation is expanding $FLAG unquoted which can trigger
shellword-splitting/globbing (SC2086); update the call that runs
scripts/upstream-sync.mjs to quote the variable (use "$FLAG") so MODE/FLAG are
passed as a single argument to node and avoid unintended splitting or glob
expansion when invoking scripts/upstream-sync.mjs.

- name: Upload agent event log
if: always()
uses: actions/upload-artifact@v4
with:
name: agent-events-log
path: agent-events.log
if-no-files-found: ignore
retention-days: 14
22 changes: 21 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,18 @@ ENV NEMOCLAW_MODEL=${NEMOCLAW_MODEL} \
NEMOCLAW_INFERENCE_API=${NEMOCLAW_INFERENCE_API} \
NEMOCLAW_INFERENCE_COMPAT_B64=${NEMOCLAW_INFERENCE_COMPAT_B64}

# WOPR sidecar — provision + health endpoints for nemoclaw-platform
COPY wopr/ /opt/wopr/

WORKDIR /sandbox
USER sandbox

# Pre-create OpenClaw directories and write default config.
# These are saved to /opt/nemoclaw-defaults/ (read-only at runtime).
# The startup script copies them to $HOME/.openclaw/ (writable volume).
RUN mkdir -p /sandbox/.openclaw/agents/main/agent \
&& chmod 700 /sandbox/.openclaw

# Write the COMPLETE openclaw.json including gateway config and auth token.
# This file is immutable at runtime (Landlock read-only on /sandbox/.openclaw).
# No runtime writes to openclaw.json are needed or possible.
Expand Down Expand Up @@ -118,7 +127,6 @@ path = os.path.expanduser('~/.openclaw/openclaw.json'); \
json.dump(config, open(path, 'w'), indent=2); \
os.chmod(path, 0o600)"

# Install NemoClaw plugin into OpenClaw
RUN openclaw doctor --fix > /dev/null 2>&1 || true \
&& openclaw plugins install /opt/nemoclaw > /dev/null 2>&1 || true

Expand Down Expand Up @@ -146,6 +154,18 @@ RUN sha256sum /sandbox/.openclaw/openclaw.json > /sandbox/.openclaw/.config-hash
&& chmod 444 /sandbox/.openclaw/.config-hash \
&& chown root:root /sandbox/.openclaw/.config-hash

# Save build-time config as defaults — startup script copies to writable HOME
RUN cp -a /sandbox/.openclaw /opt/nemoclaw-defaults \
&& cp -a /sandbox/.nemoclaw /opt/nemoclaw-defaults/.nemoclaw
Comment on lines +157 to +159
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The default snapshot still points back into /sandbox/.openclaw-data.

Dockerfile.base makes /sandbox/.openclaw/* symlinks to absolute /sandbox/.openclaw-data/* targets. cp -a preserves those links here, so when scripts/nemoclaw-start.sh later copies this tree into ${HOME}/.openclaw, writes under HOME=/data still go back to /sandbox and fail on a read-only rootfs.

🐛 One minimal fix
-RUN cp -a /sandbox/.openclaw /opt/nemoclaw-defaults \
+RUN cp -aL /sandbox/.openclaw /opt/nemoclaw-defaults \
     && cp -a /sandbox/.nemoclaw /opt/nemoclaw-defaults/.nemoclaw
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Save build-time config as defaults — startup script copies to writable HOME
RUN cp -a /sandbox/.openclaw /opt/nemoclaw-defaults \
&& cp -a /sandbox/.nemoclaw /opt/nemoclaw-defaults/.nemoclaw
# Save build-time config as defaults — startup script copies to writable HOME
RUN cp -aL /sandbox/.openclaw /opt/nemoclaw-defaults \
&& cp -a /sandbox/.nemoclaw /opt/nemoclaw-defaults/.nemoclaw
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 157 - 159, The default snapshot copies
/sandbox/.openclaw preserving symlinks that point back to
/sandbox/.openclaw-data; change the copy to dereference symlinks so targets are
copied instead of links. Update the RUN that currently uses "cp -a
/sandbox/.openclaw /opt/nemoclaw-defaults && cp -a /sandbox/.nemoclaw
/opt/nemoclaw-defaults/.nemoclaw" to a form that dereferences links (e.g., use
"cp -aL" or an rsync equivalent) so the files under /opt/nemoclaw-defaults no
longer reference /sandbox and scripts/nemoclaw-start.sh will copy a
self-contained tree into ${HOME} without pointing back to /sandbox.

USER sandbox

# At runtime, HOME=/data (writable volume mount from FleetManager).
# ReadonlyRootfs makes /sandbox read-only, so all writes go to /data.
ENV HOME=/data

# Expose WOPR sidecar port
EXPOSE 3100

# Entrypoint runs as root to start the gateway as the gateway user,
# then drops to sandbox for agent commands. See nemoclaw-start.sh.
ENTRYPOINT ["/usr/local/bin/nemoclaw-start"]
Expand Down
21 changes: 21 additions & 0 deletions scripts/nemoclaw-start.sh
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,20 @@ PYAUTOPAIR

# ── Main ─────────────────────────────────────────────────────────

# Copy default config to writable HOME if not already present.
# FleetManager mounts a volume at /data and sets HOME=/data.
# The rootfs is read-only, so all writes must go to /data.
if [ -d /opt/nemoclaw-defaults ] && [ ! -f "${HOME}/.openclaw/openclaw.json" ]; then
echo "[init] Copying default config to ${HOME}/.openclaw/"
mkdir -p "${HOME}/.openclaw" "${HOME}/.nemoclaw"
cp -a /opt/nemoclaw-defaults/. "${HOME}/.openclaw/"
[ -d /opt/nemoclaw-defaults/.nemoclaw ] && cp -a /opt/nemoclaw-defaults/.nemoclaw/. "${HOME}/.nemoclaw/"
fi

# Ensure writable dirs exist
mkdir -p "${HOME}/.openclaw/agents/main/agent" 2>/dev/null || true
touch /tmp/gateway.log 2>/dev/null || true

echo 'Setting up NemoClaw...'
[ -f .env ] && chmod 600 .env

Expand Down Expand Up @@ -293,6 +307,13 @@ echo "[gateway] openclaw gateway launched as 'gateway' user (pid $GATEWAY_PID)"
start_auto_pair
print_dashboard_urls

# Start WOPR sidecar in foreground — keeps the container alive.
# Serves /internal/health + /internal/provision for nemoclaw-platform.
if [ -f /opt/wopr/sidecar.js ]; then
echo "[wopr-sidecar] starting in foreground (port ${PORT:-3100})"
exec node /opt/wopr/sidecar.js
fi
Comment on lines +310 to +315
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep PID 1 supervising both the gateway and the sidecar.

After the background launch at Lines 303-305, exec node /opt/wopr/sidecar.js makes Line 319 unreachable. If the gateway dies later, the container stays up behind the sidecar process and can continue serving stale health. Use wait -n/traps or a small supervisor so either process brings the container down.

🔁 One way to keep both processes under supervision
 if [ -f /opt/wopr/sidecar.js ]; then
   echo "[wopr-sidecar] starting in foreground (port ${PORT:-3100})"
-  exec node /opt/wopr/sidecar.js
+  node /opt/wopr/sidecar.js &
+  SIDECAR_PID=$!
+  trap 'kill "$GATEWAY_PID" "$SIDECAR_PID" 2>/dev/null || true' EXIT INT TERM
+  wait -n "$GATEWAY_PID" "$SIDECAR_PID"
+  exit $?
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Start WOPR sidecar in foreground — keeps the container alive.
# Serves /internal/health + /internal/provision for nemoclaw-platform.
if [ -f /opt/wopr/sidecar.js ]; then
echo "[wopr-sidecar] starting in foreground (port ${PORT:-3100})"
exec node /opt/wopr/sidecar.js
fi
# Start WOPR sidecar in foreground — keeps the container alive.
# Serves /internal/health + /internal/provision for nemoclaw-platform.
if [ -f /opt/wopr/sidecar.js ]; then
echo "[wopr-sidecar] starting in foreground (port ${PORT:-3100})"
node /opt/wopr/sidecar.js &
SIDECAR_PID=$!
trap 'kill "$GATEWAY_PID" "$SIDECAR_PID" 2>/dev/null || true' EXIT INT TERM
wait -n "$GATEWAY_PID" "$SIDECAR_PID"
exit $?
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 310 - 315, The current script does an
exec on the sidecar (exec node /opt/wopr/sidecar.js) which makes the gateway
unreachable and prevents PID 1 from supervising both processes; instead start
the sidecar in background (&) and capture its PID (e.g. SIDECAR_PID=$!), ensure
the gateway is also started in background earlier and its PID captured
(GATEWAY_PID), add traps to forward SIGTERM/SIGINT to both PIDs (kill -TERM
"$GATEWAY_PID" "$SIDECAR_PID"), then use wait -n "$GATEWAY_PID" "$SIDECAR_PID"
so PID 1 exits when either process dies (optionally exit with that process's
status); replace the exec node /opt/wopr/sidecar.js line with this supervised
background + wait/trap pattern.


# Keep container running by waiting on the gateway process.
# This script is PID 1 (ENTRYPOINT); if it exits, Docker kills all children.
wait "$GATEWAY_PID"
Loading