Skip to content

refactor(test): wire E2E security tests to real production code paths #1107

@jyaunches

Description

@jyaunches

Context

PR #1092 added E2E tests for credential sanitization and Telegram injection, but the tests reimplement production logic inline rather than calling the real code. Carlos flagged this in review — the tests can pass even if production regresses.

Problems

test/e2e/test-credential-sanitization.sh

  1. stripCredentials() + isCredentialField() reimplemented 3x (C1-C5, C12, C13) — copy-pastes CREDENTIAL_FIELDS, CREDENTIAL_FIELD_PATTERN, and both functions into node -e heredocs instead of importing from migration-state.ts.

  2. walkAndRemoveFile() reimplemented 2x (C1-C5, C8) — production uses copyDirectory() with a CREDENTIAL_SENSITIVE_BASENAMES filter. Test implements its own recursive walk.

  3. sanitizeConfigFile() behavior has drifted — production does delete config.gateway then stripCredentials(). The test does NOT delete gateway, it strips fields inside it. C4b (gateway.mode preserved) tests wrong behavior.

  4. verifyBlueprintDigest / verifyDigest reimplemented (C9-C11) — self-fulfilling tests that define their own verification logic, never calling production's computeFileDigest().

  5. Python dependency for JSON parsing (C3-C4) — uses python3 -c "import json..." instead of node -e.

test/e2e/test-telegram-injection.sh

  1. send_message_to_sandbox() is dead code — defined but never called. All tests use inline SSH.

  2. Tests bypass runAgentInSandbox() / shellQuote() — T1-T4, T8 use MSG=$(cat) && echo "$MSG" over SSH, a different code path than production's shellQuote() from bin/lib/runner.js. A regression in shellQuote() wouldn't be caught.

  3. sandbox_exec() still fails open — this copy wasn't updated with the fail-closed fix applied to the credential test.

Required changes

Production code

  • nemoclaw/src/commands/migration-state.ts: export isCredentialField, stripCredentials, sanitizeConfigFile, computeFileDigest, CREDENTIAL_FIELDS, CREDENTIAL_FIELD_PATTERN (or extract to a shared module).
  • scripts/telegram-bridge.js: export runAgentInSandbox for testability.

Test code

  • Replace all inline stripCredentials/walkAndRemoveFile/verifyDigest with require() of real code.
  • Fix sanitizeConfigFile drift (gateway deletion vs stripping).
  • Replace python3 JSON parsing with node.
  • Wire telegram injection tests through shellQuote() / runAgentInSandbox().
  • Remove or use send_message_to_sandbox() dead code.
  • Fix sandbox_exec() fail-closed in telegram test.

References

Metadata

Metadata

Assignees

Labels

enhancement: testingUse this label to identify requests to improve NemoClaw test coverage.refactorThis is a refactor of the code and/or architecture.securitySomething isn't securewontfixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions