refactor(test): wire E2E security tests to real production code paths

## Context

PR #1092 added E2E tests for credential sanitization and Telegram injection, but the tests reimplement production logic inline rather than calling the real code. Carlos flagged this in review — the tests can pass even if production regresses.

## Problems

### test/e2e/test-credential-sanitization.sh

1. **`stripCredentials()` + `isCredentialField()` reimplemented 3x** (C1-C5, C12, C13) — copy-pastes `CREDENTIAL_FIELDS`, `CREDENTIAL_FIELD_PATTERN`, and both functions into `node -e` heredocs instead of importing from `migration-state.ts`.

2. **`walkAndRemoveFile()` reimplemented 2x** (C1-C5, C8) — production uses `copyDirectory()` with a `CREDENTIAL_SENSITIVE_BASENAMES` filter. Test implements its own recursive walk.

3. **`sanitizeConfigFile()` behavior has drifted** — production does `delete config.gateway` then `stripCredentials()`. The test does NOT delete `gateway`, it strips fields inside it. C4b (`gateway.mode` preserved) tests wrong behavior.

4. **`verifyBlueprintDigest` / `verifyDigest` reimplemented** (C9-C11) — self-fulfilling tests that define their own verification logic, never calling production's `computeFileDigest()`.

5. **Python dependency for JSON parsing** (C3-C4) — uses `python3 -c "import json..."` instead of `node -e`.

### test/e2e/test-telegram-injection.sh

6. **`send_message_to_sandbox()` is dead code** — defined but never called. All tests use inline SSH.

7. **Tests bypass `runAgentInSandbox()` / `shellQuote()`** — T1-T4, T8 use `MSG=$(cat) && echo "$MSG"` over SSH, a different code path than production's `shellQuote()` from `bin/lib/runner.js`. A regression in `shellQuote()` wouldn't be caught.

8. **`sandbox_exec()` still fails open** — this copy wasn't updated with the fail-closed fix applied to the credential test.

## Required changes

### Production code

- **`nemoclaw/src/commands/migration-state.ts`**: export `isCredentialField`, `stripCredentials`, `sanitizeConfigFile`, `computeFileDigest`, `CREDENTIAL_FIELDS`, `CREDENTIAL_FIELD_PATTERN` (or extract to a shared module).
- **`scripts/telegram-bridge.js`**: export `runAgentInSandbox` for testability.

### Test code

- Replace all inline `stripCredentials`/`walkAndRemoveFile`/`verifyDigest` with `require()` of real code.
- Fix `sanitizeConfigFile` drift (gateway deletion vs stripping).
- Replace python3 JSON parsing with node.
- Wire telegram injection tests through `shellQuote()` / `runAgentInSandbox()`.
- Remove or use `send_message_to_sandbox()` dead code.
- Fix `sandbox_exec()` fail-closed in telegram test.

## References

- PR #1092 (merged) — original E2E tests
- Carlos's review: https://github.com/NVIDIA/NemoClaw/pull/1092#pullrequestreview-4032002051
- `shellQuote` and `validateName` are already exported from `bin/lib/runner.js` ✅

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(test): wire E2E security tests to real production code paths #1107

Context

Problems

test/e2e/test-credential-sanitization.sh

test/e2e/test-telegram-injection.sh

Required changes

Production code

Test code

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

refactor(test): wire E2E security tests to real production code paths #1107

Description

Context

Problems

test/e2e/test-credential-sanitization.sh

test/e2e/test-telegram-injection.sh

Required changes

Production code

Test code

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions