Security: Command Injection via Unsanitized Shell Command Construction#997
Security: Command Injection via Unsanitized Shell Command Construction#997tomaioo wants to merge 1 commit into
Conversation
In `helpers.ts`, `helpers-pi.ts`, `helpers-oh-my-pi.ts`, `helpers-codex.ts`, `helpers-cursor.ts`, `helpers-github.ts`, and `helpers-gemini-cli.ts`, the `dockerExec` function passes user-controlled `cmd` strings directly to `execSync` via `input: cmd + "\n"`. While this uses `docker exec -i container bash` with stdin piping to avoid some shell quoting issues, the `exec` function itself still invokes the system shell. More critically, `createMockHarness` in `helpers-harness.ts` directly interpolates `name` and `exitCode` into shell commands without sanitization: `dockerExec(\`mkdir -p ${scriptDir}\`)`, `dockerExec(\`echo "${b64}" | base64 -d > ${scriptPath}\`)`, and `dockerExec(\`chmod +x ${scriptPath}\`)`. The `name` parameter is attacker-controllable and can inject arbitrary shell commands. Additionally, `createSimpleProcess` interpolates `dir` directly into a shell command. This is a classic command injection vulnerability.
Signed-off-by: tomaioo <203048277+tomaioo@users.noreply.github.com>
There was a problem hiding this comment.
Thanks for the security fix. I cannot approve this yet because the patch only closes one narrow name path and leaves other command/script injection paths in the same helper module.
Blockers
e2e-tests/docker/helpers-harness.ts:31-createSimpleProcess(dir)is still vulnerable to command injection.
The PR body explicitly calls out createSimpleProcess as vulnerable because dir is interpolated into shell commands, but the diff does not change it. It still builds commands equivalent to:
mkdir -p ${dir}- redirecting to
${dir}/proc.js
A malicious dir can alter the bash command executed through dockerExec. Please validate dir against a strict safe pattern or avoid interpolated shell strings for this helper, and add a regression test with a malicious dir payload.
e2e-tests/docker/helpers-harness.ts:20-outputis embedded into the generated bash script in a way that can execute command substitution.
scriptContent uses echo ${JSON.stringify(output)}. JSON.stringify produces a double-quoted shell string, but bash still expands command substitutions inside double quotes when the generated mock harness runs. For example, an output containing $(touch /tmp/pwned) would be executed by the mock harness script instead of printed inertly.
Please generate the script so output is emitted as data, not shell source: for example, robust shell quoting, printf with a safely quoted literal, or a base64 decode path that does not permit shell expansion. Add a regression test proving command substitution text is printed, not executed.
Major
e2e-tests/docker/helpers-harness.ts:44-cleanupMockHarnesses(...names)still interpolates unvalidated names intorm -f.
The helper builds rmPaths from raw names and executes rm -f ${rmPaths} through bash. Apply the same validation to cleanup names or centralize a validated harness path builder so create and cleanup share the same safety invariant. Add a malicious-name cleanup regression test.
Minor
e2e-tests/docker/helpers-harness.ts:51- The file now lacks a trailing newline. Please restore it.
QA
QA did not pass. I dispatched qa-dispatch.yml for this PR, but run 28433527130 failed during checkout before tests ran:
A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found
The PR head is in fork tomaioo/babysitter, while the workflow checked for the branch in a5c-ai/babysitter.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command injection paths in the same helper module. Mitigation: before merge, fix all dynamic shell/script interpolation paths in
helpers-harness.tsand add exploit-style regression tests forname,dir, cleanup names, andoutputcommand substitution. - Risk: the PR description overstates the fix and could be merged under the assumption that command injection is solved. Mitigation: make the patch and tests match the stated scope, or narrow the PR and explicitly track the remaining security work before approval.
- Risk: production data-plane impact is low because this is E2E helper code, but CI/test execution can still run arbitrary commands inside the Docker test container. Mitigation: keep this blocked at review until the helper is hardened and tests pass.
There was a problem hiding this comment.
Blocking this PR. The current patch adds a name allowlist in one place, but the vulnerability described in the PR body remains reachable through other inputs in the same helper file, and QA did not run successfully.
Blocking findings
-
e2e-tests/docker/helpers-harness.ts:20still allows command execution throughoutputwhen the generated mock harness runs.JSON.stringify(output)creates a double-quoted shell argument inside the script; a value like$(touch /tmp/pwn)would still be evaluated by bash when the mock harness is invoked. Use a heredoc/base64 decode/printf pattern that writes literal output without shell evaluation. -
e2e-tests/docker/helpers-harness.ts:34still interpolatesdirdirectly intomkdir -p ${dir}and> ${dir}/proc.js. The PR body explicitly calls this function out, but the implementation leaves it unchanged. Validate the expected test path or escape/quote it with a reliable shell-escaping primitive before it reachesdockerExec. -
e2e-tests/docker/helpers-harness.ts:49still interpolates cleanup names intorm -f ${rmPaths}. The newcreateMockHarnessvalidation is not applied here, so a malicious cleanup name can alter rm targets or inject shell syntax. Reuse the same validation or avoid shell string construction for removal. -
The PR is currently not mergeable. GitHub reports
mergeStateStatus: DIRTY/mergeable: CONFLICTING; this needs to be rebased/resolved before merge.
Major issues
- There are no tests for the security-sensitive validator or the remaining command-construction paths. Please add tests for invalid harness names, valid harness names, malicious output payloads, malicious
dirvalues, and cleanup names. - The final newline was removed from
helpers-harness.ts; restore it to avoid unrelated diff noise.
QA
QA Dispatch run 28433552208 failed before executing QA. The actions/checkout@v6 step used ref fix/security/command-injection-via-unsanitized-shell- and ended with: A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found. No QA scenario results were produced.
Risk Assessment
Risk level: risk:high
-
Risk: the security fix is incomplete and may create false confidence while command injection remains reachable through
output,dir, and cleanup names.
Mitigation: fix all dynamic shell interpolation paths inhelpers-harness.tsand add adversarial tests with shell metacharacter payloads. -
Risk: the branch is conflicting/dirty against
staging, so the reviewed diff may not match what would land.
Mitigation: rebase or otherwise resolve conflicts, then rerun review and QA on the updated diff. -
Risk: this helper is used by docker e2e harness tests, so changes can break harness invocation flows across adapters.
Mitigation: rerun the affected docker harness tests after the security fix is complete.
There was a problem hiding this comment.
Blocking this PR. The patch only validates name in createMockHarness; the same helper file still has multiple command/script injection paths, the branch is conflicting, and QA failed before tests ran.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash as shell source.
scriptContent uses echo ${JSON.stringify(output)}. That produces a double-quoted shell word, and bash still performs command substitution inside double quotes when the generated mock harness runs. A value like $(touch /tmp/pwned) would execute instead of being printed literally.
Fix this by emitting output as data, not shell syntax: for example, base64 decode, robust shell quoting, or another literal-safe printf/heredoc pattern. Add a regression test proving command substitution text is printed literally and does not execute.
e2e-tests/docker/helpers-harness.ts:21-exitCodeis still interpolated into shell source without runtime validation.
The PR body identifies unsanitized command construction around createMockHarness, but this patch only validates name. TypeScript's number annotation is not a runtime boundary for JS callers or cast values, so exit ${exitCode} is still dynamic shell source.
Validate that exitCode is an integer in the expected shell exit-code range before using it, or avoid generating shell source from dynamic values.
e2e-tests/docker/helpers-harness.ts:34-createSimpleProcess(dir)remains vulnerable.
The PR body explicitly calls out createSimpleProcess because dir is interpolated into shell commands, but the diff leaves it unchanged. It still builds mkdir -p ${dir} and redirects into ${dir}/proc.js through dockerExec, so a malicious dir can alter the bash command.
Constrain dir to a known safe test path or quote/escape it with a reliable shell primitive, and add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49-cleanupMockHarnesses(...names)still interpolates raw names intorm -f.
The new allowlist is not applied here. rmPaths is built from raw names and executed through bash, so cleanup names can inject shell syntax or alter deletion targets.
Centralize a validated harness path builder or apply the same validation to cleanup before constructing paths. Add a malicious cleanup-name regression test.
Major Issues
- No regression tests were added for this security-sensitive change. Please cover valid names, invalid names, malicious
output, maliciousdir, cleanup names, andexitCodehandling. - GitHub reports this PR as conflicting (
mergeable: CONFLICTING), so it cannot merge as reviewed. Resolve conflicts/rebase ontostaging, then rerun review and QA.
Minor
e2e-tests/docker/helpers-harness.ts:51removes the final newline. Please restore it.
QA
QA did not pass. I dispatched qa-dispatch.yml run 28487039295, and it failed before executing tests. The failing step was actions/checkout@v6 with ref fix/security/command-injection-via-unsanitized-shell- against a5c-ai/babysitter; checkout reported that no branch or tag with that name could be found. No QA scenario results were produced.
Risk Assessment
Risk level: risk:high
-
Risk: incomplete security remediation creates false confidence while command injection remains reachable through
output,dir, cleanup names, andexitCode.
Mitigation: fix every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input before merge. -
Risk: the PR is conflicting against
staging, so the final resolved diff may differ from this review.
Mitigation: resolve conflicts/rebase, then rerun review and QA on the updated head commit. -
Risk: this is e2e helper code rather than production data-plane code, but CI/test containers can still execute arbitrary commands and unsafe helpers can mask future harness bugs.
Mitigation: after fixes, run the affected Docker harness/e2e scenarios and keep the PR blocked until QA succeeds.
There was a problem hiding this comment.
Blocking this PR. The patch only validates name in createMockHarness, but the PR body describes a broader command-injection issue and several reachable shell/script injection paths remain.
Blockers
e2e-tests/docker/helpers-harness.ts:20still treatsoutputas shell source in the generated mock harness.
JSON.stringify(output) creates a double-quoted shell argument inside the generated bash script. Bash still expands command substitutions inside double quotes when the mock harness runs, so a payload such as command-substitution text would execute instead of being printed literally.
Fix: emit output as data, not shell syntax. Use a base64-decoded payload, robust shell quoting, or another literal write/print pattern that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed inertly.
e2e-tests/docker/helpers-harness.ts:34ande2e-tests/docker/helpers-harness.ts:35still interpolatedirdirectly into shell commands.
The PR body explicitly calls out createSimpleProcess(dir), but the diff leaves mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the command executed through dockerExec.
Fix: validate dir against the expected safe test-path shape or quote/escape it with a reliable primitive before it reaches the shell. Add a malicious dir regression test.
e2e-tests/docker/helpers-harness.ts:49ande2e-tests/docker/helpers-harness.ts:50still interpolate cleanup names intorm -f.
The new validation is only applied in createMockHarness. cleanupMockHarnesses(...names) accepts raw names, joins them into shell words, and executes them through dockerExec.
Fix: share the same validated harness-name/path builder between create and cleanup, or avoid shell string construction for removal. Add a malicious cleanup-name regression test.
- QA failed before any scenario ran.
QA Dispatch run 28487045871 failed in actions/checkout@v6: it tried to check out fix/security/command-injection-via-unsanitized-shell- from a5c-ai/babysitter, but that branch/tag does not exist in the base repository. No QA scenario results were produced for the PR head.
Fix: make the QA dispatch use a checkoutable fork PR ref or exact head SHA, then rerun QA on the reviewed commit.
Major
- Missing tests for a security-sensitive fix. Please add tests for valid harness names, invalid harness names, malicious
output, maliciousdir, and malicious cleanup names.
Minor
e2e-tests/docker/helpers-harness.ts:51removes the file's trailing newline. Please restore it to avoid unrelated diff noise.
Risk Assessment
Risk level: risk:high
-
Risk: incomplete security remediation leaves command injection paths reachable while giving false confidence that the vulnerability is fixed.
Mitigation: fix every dynamic shell/script interpolation path in this helper and add exploit-style regression tests before merge. -
Risk: this is test infrastructure rather than production data-plane code, but CI/docker e2e execution can still run attacker-controlled shell content inside the test container.
Mitigation: rerun the affected docker/e2e helper coverage on the exact PR head after hardening. -
Risk: QA dispatch currently does not validate fork heads because checkout uses the base repository branch name.
Mitigation: update the dispatch input/ref strategy or run equivalent QA against the exact PR head SHA before approval.
There was a problem hiding this comment.
Blocking this PR. The patch only validates createMockHarness(name), but the command/script injection class described in the PR body remains reachable through other inputs in the same helper module. QA also failed before any scenario executed, and GitHub reports the PR is currently conflicting.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
JSON.stringify(output) produces a double-quoted shell string in the generated script, and bash still evaluates command substitutions inside double quotes when that mock harness runs. A payload like $(touch /tmp/pwned) would execute rather than print literally. Emit output as data instead: for example, use a safely quoted printf, a quoted heredoc, or a base64 decode path that never exposes the decoded value to shell evaluation. Add a regression test proving command substitution text is printed literally.
e2e-tests/docker/helpers-harness.ts:34-createSimpleProcess(dir)is still vulnerable to command injection.
The PR body explicitly calls this function out, but the diff leaves mkdir -p ${dir} and redirection to ${dir}/proc.js unchanged. A malicious dir can still alter the bash command passed through dockerExec. Validate the expected directory shape or use reliable shell quoting before interpolation, and add a malicious dir regression test.
e2e-tests/docker/helpers-harness.ts:49-cleanupMockHarnesses(...names)still interpolates unvalidated names intorm -f.
The new allowlist is local to createMockHarness; cleanup builds /tmp/mock-harnesses/${n} from raw names and joins those into a shell command. Reuse a shared validated harness-name/path helper for both create and cleanup, or avoid shell string construction. Add a malicious cleanup-name regression test.
- PR mergeability is blocked.
GitHub reports mergeable: CONFLICTING, so this cannot be merged as reviewed. Resolve conflicts against staging, then rerun review and QA on the updated diff.
Major
e2e-tests/docker/helpers-harness.ts:14- the security-sensitive validator has no tests. Please cover valid names, invalid names, malicious output payloads, maliciousdirvalues, and cleanup names. The current patch does not prove the fix or guard against regressions.
Minor
e2e-tests/docker/helpers-harness.ts:51- the final trailing newline was removed. Please restore it.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28487057646, but it failed during actions/checkout@v6 before tests ran:
A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found
The workflow checked out repository a5c-ai/babysitter, while this PR head branch is in fork tomaioo/babysitter. No docker harness QA scenario results were produced.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command injection and script injection paths in the same helper module. Mitigation: before merge, harden output emission,
createSimpleProcessdir handling, and cleanup names; add adversarial regression tests for each input. - Risk: the PR description overstates the actual fix, which can create false confidence during review. Mitigation: make the patch match the stated scope or narrow the PR and track the remaining security work before approval.
- Risk: the branch conflicts with
staging, so the final landed patch may differ from the reviewed diff. Mitigation: resolve conflicts, then rerun review and QA. - Risk: this is e2e helper code rather than production data-plane code, but it can still execute arbitrary commands inside CI/docker test environments. Mitigation: keep this blocked until the helper is hardened and affected docker harness tests pass.
There was a problem hiding this comment.
Blocking this PR. The current diff validates only createMockHarness(name), but the PR body describes a broader command-injection problem and the adjacent shell/script injection paths remain open. QA also failed before producing test results, and the branch is not mergeable as reviewed.
Blockers
e2e-tests/docker/helpers-harness.ts:20- generated harness output can still execute shell command substitution.
scriptContent writes a bash script using echo ${JSON.stringify(output)}. JSON.stringify gives a double-quoted shell argument, but bash still performs command substitution inside double quotes when the generated harness runs. An output payload such as $(touch /tmp/pwned) would be interpreted as shell code instead of emitted as inert output.
Fix this by writing output as data, not shell source. Use robust shell quoting, a safe printf literal, or a base64 decode path for the output that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command injectable.
The PR body explicitly calls out createSimpleProcess as vulnerable, but the diff leaves dir interpolated into both mkdir -p ${dir} and the redirection target > ${dir}/proc.js. A malicious dir can still alter the bash command executed via dockerExec.
Fix this by validating dir against the exact allowed test path shape or by using a reliable shell-escaping primitive before it reaches shell source. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names are still unvalidated shell input.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through bash. The new name validator is only applied during creation, so cleanup still accepts injection payloads and path-changing values.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports mergeable: CONFLICTING and mergeStateStatus: DIRTY. Resolve conflicts against staging, then rerun review and QA on the resolved commit.
Major
e2e-tests/docker/helpers-harness.ts:14- there are no regression tests around this security boundary. Please add tests for valid names, invalid names, malicious output payloads, maliciousdirvalues, and cleanup names. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28487058968; it failed in actions/checkout@v6 before tests ran. Checkout used ref fix/security/command-injection-via-unsanitized-shell- in repository a5c-ai/babysitter, but the PR head is in fork tomaioo/babysitter, so the workflow reported: A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found.
No live-stack QA scenario results were produced.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation creates false confidence while command injection remains reachable through
output,dir, and cleanup names in the same helper module. Mitigation: before merge, fix all dynamic shell/script interpolation paths and add exploit-style tests for each input. - Risk: the PR is conflicting against
staging, so the eventual resolved diff may differ from the reviewed code. Mitigation: resolve conflicts and rerun review/QA on the new head. - Risk: QA has not produced meaningful results because checkout failed for the fork branch. Mitigation: rerun QA against a resolvable ref or checkout the fork/head SHA explicitly before considering merge.
There was a problem hiding this comment.
Blocking this PR. The patch only validates createMockHarness(name), while the command/script injection class described in the PR body remains reachable through adjacent inputs in the same helper. QA also failed before any scenario ran, and the PR is currently conflicting against staging.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent uses echo ${JSON.stringify(output)}. That creates a double-quoted shell word inside the generated script, and bash still evaluates command substitutions inside double quotes when the mock harness runs. A payload containing command substitution text can execute instead of being printed literally.
Fix: emit output as data, not shell syntax. Use a base64 decode path, robust shell quoting, or a literal-safe printf pattern that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed inertly and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCodeis still dynamic shell source.
The new validation covers only name. TypeScript's number annotation is not a runtime boundary for JS callers or cast values, and exit ${exitCode} is still generated shell source.
Fix: validate exitCode with Number.isInteger and constrain it to an acceptable shell exit-code range before script generation, or avoid generating shell source from dynamic values. Add invalid-exitCode tests.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command injectable.
The PR body explicitly calls this function out, but the diff leaves mkdir -p ${dir} and redirection to ${dir}/proc.js unchanged. A malicious dir can still alter the bash command executed through dockerExec.
Fix: validate dir against the expected safe test-path shape or use reliable shell quoting/argument handling before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names are still raw shell input.
cleanupMockHarnesses(...names) does not reuse the new allowlist. It builds /tmp/mock-harnesses/${n} from raw names, joins those paths with spaces, and executes rm -f through dockerExec.
Fix: centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for removal. Add invalid cleanup-name tests.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, malicious output payloads, maliciousdirvalues, cleanup names, and invalidexitCodevalues.- PR metadata reports
mergeable: CONFLICTING/mergeStateStatus: DIRTY. Resolve conflicts againststaging, then rerun review and QA on the resolved head.
Minor
e2e-tests/docker/helpers-harness.ts:51removes the final newline. Please restore it.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28558745080; it failed in actions/checkout@v6 before executing any QA scenarios:
A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found
The PR head branch is in fork tomaioo/babysitter, while the dispatch attempted to check out that branch name from the base repository. No live-stack QA evidence was produced.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection reachable in this helper while creating false confidence. Mitigation: before merge, harden output,
exitCode,dir, and cleanup paths and add exploit-style regression tests for each input. - Risk: the PR is conflicting against
staging, so the final resolved diff may differ from the reviewed code. Mitigation: resolve conflicts, then rerun review and QA on the exact resolved commit. - Risk: QA dispatch does not validate the fork head when it checks out by base-repo branch name. Mitigation: rerun QA against a checkoutable PR ref or exact head SHA before approval.
There was a problem hiding this comment.
Blocking this PR. The current patch validates only createMockHarness(name), but the PR body describes a broader command-injection issue and several reachable shell/script injection paths remain. QA also failed before any scenario ran, and GitHub reports the PR is currently conflicting.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent uses echo ${JSON.stringify(output)}. That produces a double-quoted shell word in the generated script, and bash still performs command substitution inside double quotes when the mock harness runs. An output payload such as $(touch /tmp/pwned) would execute instead of being printed literally.
Fix this by emitting output as data, not shell syntax. Use a base64 decode path, robust shell quoting, or a safe printf literal that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed literally and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCodeis still dynamic shell source.
The generated script writes exit ${exitCode}. TypeScript's number annotation is not a runtime boundary for JS callers or cast values, so an invalid value can alter the generated shell source.
Validate exitCode at runtime as an integer in the expected shell exit-code range, or avoid generating shell source from dynamic values.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command injectable.
The PR body explicitly calls out createSimpleProcess, but the diff leaves dir interpolated into both mkdir -p ${dir} and the redirection target > ${dir}/proc.js. A malicious dir can still alter the command executed through dockerExec.
Validate dir against the exact expected test path shape or use a reliable shell-escaping primitive before interpolation. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names remain unvalidated shell input.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through bash. The new name validator is only applied during creation.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports mergeable: CONFLICTING and mergeStateStatus: DIRTY. Resolve conflicts against staging, then rerun review and QA on the resolved head commit.
Major
e2e-tests/docker/helpers-harness.ts:14- there are no regression tests around this security boundary. Please add tests for valid names, invalid names, malicious output payloads, maliciousdirvalues, malicious cleanup names, and invalidexitCodevalues. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28558755801; it failed in actions/checkout@v6 before tests ran. Checkout used ref fix/security/command-injection-via-unsanitized-shell- in repository a5c-ai/babysitter, but the PR head is in fork tomaioo/babysitter, so checkout reported:
A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found
No QA scenario results were produced for the PR head.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation creates false confidence while command injection remains reachable through
output,dir, cleanup names, andexitCode. Mitigation: before merge, harden every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input. - Risk: the PR is conflicting against
staging, so the eventual resolved diff may differ from the reviewed code. Mitigation: resolve conflicts and rerun review/QA on the new head commit. - Risk: this is e2e helper code rather than production data-plane code, but CI/docker test containers can still execute arbitrary commands. Mitigation: rerun affected docker harness scenarios on the exact PR head after hardening and keep the PR blocked until QA succeeds.
There was a problem hiding this comment.
Blocking this PR. The patch validates only createMockHarness(name), but the broader command/script injection class described in the PR body is still reachable through adjacent inputs in the same helper module. QA also failed before any scenario ran, and the PR is currently conflicting.
Blockers
e2e-tests/docker/helpers-harness.ts:20- generated harness output is still treated as shell source.
scriptContent uses echo ${JSON.stringify(output)} inside a generated bash script. JSON.stringify produces a double-quoted shell word, and bash still evaluates command substitutions inside double quotes when that generated script runs. A payload such as command-substitution text would execute in the Docker test container instead of being printed literally.
Fix this by emitting output as data, not shell syntax. Use a base64 decode path, a safely quoted printf, or a quoted heredoc pattern that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed inertly and causes no side effects.
e2e-tests/docker/helpers-harness.ts:20-exitCodeis still interpolated into generated shell source.
The TypeScript number annotation is not a runtime boundary for JS callers, casts, or test helper misuse. exit ${exitCode} should not accept arbitrary dynamic shell text.
Validate exitCode at runtime as an integer in the valid shell status range, or avoid placing it into shell source directly. Add regression coverage for invalid values.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command injectable.
The PR body explicitly calls out this function, but the diff leaves mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the command executed through dockerExec.
Constrain dir to the expected safe test path shape or quote/escape it with a reliable shell primitive before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names are still unvalidated shell/path input.
The new allowlist applies only during creation. cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through bash.
Centralize a validated harness-name/path helper and reuse it in create and cleanup, or avoid shell string construction for removal. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports this PR as conflicting/dirty against staging. Resolve conflicts, then rerun review and QA on the resolved commit.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security boundary. Please cover valid names, invalid names, malicious output payloads, maliciousdirvalues, cleanup names, andexitCodehandling.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28558760466, and it failed in actions/checkout@v6 before tests ran. Checkout used repository a5c-ai/babysitter with ref fix/security/command-injection-via-unsanitized-shell-, but the PR head branch is in fork tomaioo/babysitter, so checkout reported that no branch or tag with that name could be found.
No live-stack QA scenario results were produced for the reviewed commit.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection reachable in Docker e2e helper code while creating false confidence that the vulnerability is fixed. Mitigation: harden every dynamic shell/script input in
helpers-harness.tsand add adversarial tests forname,output,exitCode,dir, and cleanup names before merge. - Risk: QA has not validated the PR because dispatch checkout failed for the fork branch. Mitigation: rerun QA against a resolvable fork PR ref or exact head SHA after the fix is complete.
- Risk: the PR is conflicting, so the reviewed patch cannot be merged and the final conflict resolution may change behavior. Mitigation: resolve conflicts against
staging, then repeat review and QA on the resolved commit. - Risk: blast radius is limited to test infrastructure, but arbitrary commands can still run inside CI/docker test containers. Mitigation: keep this blocked until the helper is hardened and affected Docker harness/e2e scenarios pass.
There was a problem hiding this comment.
Blocking this PR. The current patch only validates createMockHarness(name), but the PR body describes a broader command-injection remediation and several reachable shell/script injection paths remain open. QA also did not produce a passing result, and the PR is currently conflicting.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into the generated mock harness. JSON.stringify creates a double-quoted shell word, and bash still evaluates command substitution inside double quotes when that script runs. An output payload such as $(touch /tmp/pwned) would execute instead of being printed literally.
Fix: emit output as data, not shell source. Use a literal-safe mechanism such as base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed inertly and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCodeis still unvalidated dynamic shell source.
The PR body calls out exitCode interpolation, but the patch does not validate or harden it. TypeScript's number annotation is not a runtime boundary for JS callers or cast values, and the generated script still includes exit ${exitCode}.
Fix: validate exitCode at runtime as an integer in the valid shell exit-code range, or avoid generating shell source from dynamic values. Add tests for invalid/cast payloads.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly identifies createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and redirection to ${dir}/proc.js unchanged. A malicious dir can still alter the command passed through dockerExec.
Fix: constrain dir to the exact expected safe test path shape or apply reliable shell escaping before it reaches shell source. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
The new allowlist is local to createMockHarness. cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them into shell words, and executes rm -f ${rmPaths} through dockerExec.
Fix: centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable as reviewed.
GitHub reports this PR as conflicting against staging. Resolve conflicts, then rerun review and QA on the updated head so the reviewed diff matches what would land.
Major Issues
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, maliciousoutput, maliciousexitCode, maliciousdir, and cleanup names.- QA did not pass. I dispatched
qa-dispatch.ymlrun28558773658; it reached checkout successfully but did not complete within the review process window. Last observed state: jobqain progress atRun a5c-ai/babysitter/packages/adapters/triggers@staging. No scenario results were available, so this cannot count as passing QA.
Minor
e2e-tests/docker/helpers-harness.ts:51removes the final trailing newline. Please restore it.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection paths reachable in the same helper module while creating false confidence that the vulnerability is fixed. Mitigation: before merge, harden output emission,
exitCode,createSimpleProcess(dir), and cleanup-name handling; add exploit-style regression tests for each input. - Risk: the branch is conflicting against
staging, so the eventual resolved diff may differ from this review. Mitigation: resolve conflicts and rerun review/QA on the exact updated commit. - Risk: QA is inconclusive, so the patch has not been validated by live-stack scenarios. Mitigation: rerun QA to completion against a checkoutable PR ref or exact head SHA before approval.
There was a problem hiding this comment.
Blocking this PR. The current patch only validates createMockHarness(name), but the broader command/script injection class described in the PR body remains reachable through adjacent inputs in the same helper module. QA also failed before any scenario ran, and the PR is currently conflicting.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into a generated bash script. JSON.stringify creates a double-quoted shell word, and bash still evaluates command substitution inside double quotes when that script runs. Emit output as data instead: use a literal-safe mechanism such as base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed inertly and causes no side effects.
e2e-tests/docker/helpers-harness.ts:20-exitCodeis still unvalidated dynamic shell source.
The PR body calls out exitCode interpolation, but the patch does not validate or harden it. TypeScript's number annotation is not a runtime boundary for JS callers or cast values, and the generated script still includes exit ${exitCode}. Validate exitCode at runtime as an integer in the valid shell status range, or avoid generating shell source from dynamic values. Add tests for invalid/cast payloads.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly identifies createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and redirection to ${dir}/proc.js unchanged. A malicious dir can still alter the command passed through dockerExec. Constrain dir to the exact expected safe test path shape or apply reliable shell escaping before it reaches shell source. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
The new allowlist is local to createMockHarness. cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins those paths into shell words, and executes rm -f ${rmPaths} through dockerExec. Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable as reviewed.
GitHub reports this PR as conflicting against staging. Resolve conflicts, then rerun review and QA on the exact updated commit so the reviewed diff matches what would land.
Major Issues
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, maliciousoutput, maliciousexitCode, maliciousdir, and cleanup names.- QA did not pass. I dispatched
qa-dispatch.ymlrun28637733205; it failed atactions/checkout@v6before executing any QA scenarios because the workflow tried to check outfix/security/command-injection-via-unsanitized-shell-from the base repository, while the PR head is in forktomaioo/babysitter.
Minor
e2e-tests/docker/helpers-harness.ts:51removes the final trailing newline. Restore it to keep the diff focused.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection paths reachable in the same helper module while creating false confidence that the vulnerability is fixed.
Mitigation: before merge, harden output emission,exitCode,createSimpleProcess(dir), and cleanup-name handling; add exploit-style regression tests for each input. - Risk: the branch is conflicting against
staging, so the eventual resolved diff may differ from this review.
Mitigation: resolve conflicts and rerun review/QA on the exact resolved commit. - Risk: QA is failing before scenarios run, so there is no live-stack evidence for this patch.
Mitigation: rerun QA against a checkoutable PR ref or exact head SHA after the fix is complete. - Risk: blast radius is limited to docker e2e/test infrastructure, but unsafe helpers can execute arbitrary commands inside CI/docker test containers.
Mitigation: keep this blocked until the helper is hardened and affected docker harness/e2e scenarios pass.
There was a problem hiding this comment.
Blocking this PR. The current patch only validates createMockHarness(name), while the PR body describes a broader command-injection remediation and several reachable shell/script injection paths remain. QA also failed before running scenarios, and the PR is currently conflicting against staging.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into the generated mock harness. That creates a double-quoted shell word, and bash still performs command substitution inside double quotes when the generated script runs. An output payload containing command-substitution text can execute in the Docker test container instead of being printed literally.
Fix this by emitting output as data, not shell source: for example, a literal-safe base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving the command-substitution text is printed inertly and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCoderemains unvalidated dynamic shell source.
The generated script includes exit ${exitCode}. The new validation covers only name; TypeScript's number annotation is not a runtime boundary for JS callers, casts, or helper misuse.
Validate exitCode at runtime with Number.isInteger and constrain it to the accepted shell status range, or avoid placing it into shell source directly. Add invalid-exitCode regression coverage.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly calls out createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the bash command executed through dockerExec.
Constrain dir to the exact expected safe test path shape or quote/escape it with a reliable primitive before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through dockerExec. The new allowlist is only applied during creation.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports this PR as conflicting/dirty against staging. Resolve conflicts, then rerun review and QA on the resolved head commit.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, malicious output payloads, invalidexitCodevalues, maliciousdirvalues, and cleanup names. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28637760933; it failed in actions/checkout@v6 before any scenarios ran. Checkout used repository a5c-ai/babysitter with ref fix/security/command-injection-via-unsanitized-shell-, but the PR head branch is in fork tomaioo/babysitter, so checkout reported that no branch or tag with that name could be found.
No live-stack QA results were produced for the reviewed commit.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection reachable through
output,exitCode,dir, and cleanup names while creating false confidence that the vulnerability is fixed. Mitigation: before merge, harden every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input. - Risk: the PR is conflicting against
staging, so the final resolved diff may differ from the reviewed code. Mitigation: resolve conflicts and rerun review/QA against the exact updated head. - Risk: QA has not validated the PR because dispatch checkout failed for the fork branch. Mitigation: rerun QA against a checkoutable PR ref or exact head SHA before approval.
- Risk: blast radius is limited to test infrastructure, but arbitrary commands can still run inside CI/docker test containers. Mitigation: keep this blocked until the helper is hardened and affected Docker harness/e2e scenarios pass.
There was a problem hiding this comment.
Blocking this PR. The current patch validates only createMockHarness(name), while the PR body describes a broader command-injection remediation and several reachable shell/script injection paths remain. QA also failed before running scenarios, and the PR is currently conflicting against staging.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into the generated mock harness. That creates a double-quoted shell word, and bash still performs command substitution inside double quotes when the generated script runs. An output payload containing command-substitution text can execute in the Docker test container instead of being printed literally.
Fix this by emitting output as data, not shell source: for example, a literal-safe base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving the command-substitution text is printed inertly and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCoderemains unvalidated dynamic shell source.
The generated script includes exit ${exitCode}. The new validation covers only name; TypeScript's number annotation is not a runtime boundary for JS callers, casts, or helper misuse.
Validate exitCode at runtime with Number.isInteger and constrain it to the accepted shell status range, or avoid placing it into shell source directly. Add invalid-exitCode regression coverage.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly calls out createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the bash command executed through dockerExec.
Constrain dir to the exact expected safe test path shape or quote/escape it with a reliable primitive before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through dockerExec. The new allowlist is only applied during creation.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports this PR as mergeable=false / mergeable_state=dirty against staging. Resolve conflicts, then rerun review and QA on the resolved head commit.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, malicious output payloads, invalidexitCodevalues, maliciousdirvalues, and cleanup names. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28694258168; it failed in actions/checkout@v6 before any scenarios ran. Checkout used repository a5c-ai/babysitter with ref fix/security/command-injection-via-unsanitized-shell-, but the PR head branch is in fork tomaioo/babysitter, so checkout reported:
A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found
No live-stack QA results were produced for the reviewed commit.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection reachable through
output,exitCode,dir, and cleanup names while creating false confidence that the vulnerability is fixed. Mitigation: before merge, harden every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input. - Risk: the PR is conflicting against
staging, so the final resolved diff may differ from the reviewed code. Mitigation: resolve conflicts and rerun review/QA against the exact updated head. - Risk: QA has not validated the PR because dispatch checkout failed for the fork branch. Mitigation: rerun QA against a checkoutable PR ref or exact head SHA before approval.
- Risk: blast radius is limited to test infrastructure, but arbitrary commands can still run inside CI/docker test containers. Mitigation: keep this blocked until the helper is hardened and affected Docker harness/e2e scenarios pass.
There was a problem hiding this comment.
Blocking this PR. The current patch validates only createMockHarness(name), but the command/script injection class described in the PR body remains reachable through adjacent inputs in the same helper module. QA also failed before scenarios ran, and the PR is currently conflicting against staging.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into the generated mock harness. That creates a double-quoted shell word, and bash still performs command substitution inside double quotes when the generated script runs. An output payload containing command-substitution text can execute in the Docker test container instead of being printed literally.
Fix this by emitting output as data, not shell source: for example, a literal-safe base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving the command-substitution text is printed inertly and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCoderemains unvalidated dynamic shell source.
The generated script includes exit ${exitCode}. The new validation covers only name; TypeScript's number annotation is not a runtime boundary for JS callers, casts, or helper misuse.
Validate exitCode at runtime with Number.isInteger and constrain it to the accepted shell status range, or avoid placing it into shell source directly. Add invalid-exitCode regression coverage.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly calls out createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the bash command executed through dockerExec.
Constrain dir to the exact expected safe test path shape or quote/escape it with a reliable primitive before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through dockerExec. The new allowlist is only applied during creation.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports this PR as conflicting/dirty against staging. Resolve conflicts, then rerun review and QA on the resolved head commit.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, malicious output payloads, invalidexitCodevalues, maliciousdirvalues, and cleanup names. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28694254345; it failed in actions/checkout@v6 before any scenarios ran. Checkout used repository a5c-ai/babysitter with ref fix/security/command-injection-via-unsanitized-shell-, but the PR head branch is in fork tomaioo/babysitter, so checkout reported that no branch or tag with that name could be found.
No live-stack QA results were produced for the reviewed commit.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection reachable through
output,exitCode,dir, and cleanup names while creating false confidence that the vulnerability is fixed. Mitigation: before merge, harden every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input. - Risk: the PR is conflicting against
staging, so the final resolved diff may differ from the reviewed code. Mitigation: resolve conflicts and rerun review/QA against the exact updated head. - Risk: QA has not validated the PR because dispatch checkout failed for the fork branch. Mitigation: rerun QA against a checkoutable PR ref or exact head SHA before approval.
- Risk: blast radius is limited to test infrastructure, but arbitrary commands can still run inside CI/docker test containers. Mitigation: keep this blocked until the helper is hardened and affected docker harness/e2e scenarios pass.
There was a problem hiding this comment.
Blocking this PR. The patch validates only createMockHarness(name), but the command/script injection class described in the PR body remains reachable through adjacent inputs in e2e-tests/docker/helpers-harness.ts. QA also failed before any scenario ran, and the PR is currently conflicting against staging.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into the generated mock harness. That creates a double-quoted shell word, and bash still performs command substitution inside double quotes when the generated script runs. An output payload containing command-substitution text can execute in the Docker test container instead of being printed literally.
Fix this by emitting output as data, not shell source: for example, a literal-safe base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed inertly and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCoderemains unvalidated dynamic shell source.
The generated script includes exit ${exitCode}. The new validation covers only name; TypeScript's number annotation is not a runtime boundary for JS callers, casts, or helper misuse.
Validate exitCode at runtime with Number.isInteger and constrain it to the accepted shell status range, or avoid placing it into shell source directly. Add invalid-exitCode regression coverage.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly calls out createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the bash command executed through dockerExec.
Constrain dir to the exact expected safe test path shape or quote/escape it with a reliable primitive before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through dockerExec. The new allowlist is only applied during creation.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports this PR as conflicting/dirty against staging. Resolve conflicts, then rerun review and QA on the resolved head commit.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, malicious output payloads, invalidexitCodevalues, maliciousdirvalues, and cleanup names. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28729266894; it failed in actions/checkout@v6 before any scenarios ran. Checkout used repository a5c-ai/babysitter with ref fix/security/command-injection-via-unsanitized-shell-, but the PR head branch is in fork tomaioo/babysitter, so checkout reported that no branch or tag with that name could be found.
No live-stack QA results were produced for the reviewed commit.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection reachable through
output,exitCode,dir, and cleanup names while creating false confidence that the vulnerability is fixed. Mitigation: before merge, harden every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input. - Risk: the PR is conflicting against
staging, so the final resolved diff may differ from the reviewed code. Mitigation: resolve conflicts and rerun review/QA against the exact updated head. - Risk: QA has not validated the PR because dispatch checkout failed for the fork branch. Mitigation: rerun QA against a checkoutable PR ref or exact head SHA before approval.
- Risk: blast radius is limited to test infrastructure, but arbitrary commands can still run inside CI/docker test containers. Mitigation: keep this blocked until the helper is hardened and affected Docker harness/e2e scenarios pass.
There was a problem hiding this comment.
Blocking this PR. The patch only validates createMockHarness(name), but the command/script injection class described in the PR body remains reachable through adjacent inputs in the same helper module. QA also failed before scenarios ran, and the PR is currently conflicting against staging.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into the generated mock harness. That creates a double-quoted shell word, and bash still performs command substitution inside double quotes when the generated script runs. An output payload containing command-substitution text can execute in the Docker test container instead of being printed literally.
Fix this by emitting output as data, not shell source: for example, a literal-safe base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving the command-substitution text is printed inertly and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCoderemains unvalidated dynamic shell source.
The generated script includes exit ${exitCode}. The new validation covers only name; TypeScript's number annotation is not a runtime boundary for JS callers, casts, or helper misuse.
Validate exitCode at runtime with Number.isInteger and constrain it to the accepted shell status range, or avoid placing it into shell source directly. Add invalid-exitCode regression coverage.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly calls out createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the bash command executed through dockerExec.
Constrain dir to the exact expected safe test path shape or quote/escape it with a reliable primitive before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through dockerExec. The new allowlist is only applied during creation.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports this PR as mergeable=CONFLICTING / mergeStateStatus=DIRTY against staging. Resolve conflicts, then rerun review and QA on the resolved head commit.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, malicious output payloads, invalidexitCodevalues, maliciousdirvalues, and cleanup names. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28729278442; it failed in actions/checkout@v6 before any scenarios ran. Checkout used repository a5c-ai/babysitter with ref fix/security/command-injection-via-unsanitized-shell-, but the PR head branch is in fork tomaioo/babysitter, so checkout reported:
A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found
No live-stack QA results were produced for the reviewed commit.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection reachable through
output,exitCode,dir, and cleanup names while creating false confidence that the vulnerability is fixed. Mitigation: before merge, harden every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input. - Risk: the PR is conflicting against
staging, so the final resolved diff may differ from the reviewed code. Mitigation: resolve conflicts and rerun review/QA against the exact updated head. - Risk: QA has not validated the PR because dispatch checkout failed for the fork branch. Mitigation: rerun QA against a checkoutable PR ref or exact head SHA before approval.
- Risk: blast radius is limited to test infrastructure, but arbitrary commands can still run inside CI/docker test containers. Mitigation: keep this blocked until the helper is hardened and affected Docker harness/e2e scenarios pass.
There was a problem hiding this comment.
Blocking this PR. The current patch validates only createMockHarness(name), while the PR body describes a broader command-injection remediation. Several adjacent shell/script injection paths remain reachable, QA failed before scenarios ran, and the PR is currently conflicting against staging.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into the generated mock harness. That creates a double-quoted shell word, and bash still performs command substitution inside double quotes when the generated script runs. An output payload containing command-substitution text can execute in the Docker test container instead of being printed literally.
Fix this by emitting output as data, not shell source: for example, a literal-safe base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed inertly and no side-effect command runs.
e2e-tests/docker/helpers-harness.ts:20-exitCoderemains unvalidated dynamic shell source.
The generated script includes exit ${exitCode}. The new validation covers only name; TypeScript's number annotation is not a runtime boundary for JS callers, casts, or helper misuse.
Validate exitCode at runtime with Number.isInteger and constrain it to the accepted shell status range, or avoid placing it into shell source directly. Add invalid-exitCode regression coverage.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly calls out createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the bash command executed through dockerExec.
Constrain dir to the exact expected safe test path shape or quote/escape it with a reliable primitive before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through dockerExec. The new allowlist is only applied during creation.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports this PR as mergeable: CONFLICTING against staging. Resolve conflicts, then rerun review and QA on the exact resolved head commit.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, malicious output payloads, invalidexitCodevalues, maliciousdirvalues, and cleanup names. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28729277096, and it failed in actions/checkout@v6 before any scenarios ran. Checkout used repository a5c-ai/babysitter with ref fix/security/command-injection-via-unsanitized-shell-, but the PR head branch is in fork tomaioo/babysitter, so checkout reported:
A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found
No live-stack QA results were produced for the reviewed commit.
Risk Assessment
Risk level: risk:high
-
Risk: incomplete security remediation leaves command/script injection reachable through
output,exitCode,dir, and cleanup names while creating false confidence that the vulnerability is fixed.
Mitigation: before merge, harden every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input. -
Risk: the PR is conflicting against
staging, so the final resolved diff may differ from the reviewed code.
Mitigation: resolve conflicts and rerun review/QA against the exact updated head. -
Risk: QA has not validated the PR because dispatch checkout failed for the fork branch.
Mitigation: rerun QA against a checkoutable PR ref or exact head SHA before approval. -
Risk: blast radius is limited to test infrastructure, but arbitrary commands can still run inside CI/docker test containers.
Mitigation: keep this blocked until the helper is hardened and affected Docker harness/e2e scenarios pass.
There was a problem hiding this comment.
Blocking this PR. The current patch validates only createMockHarness(name), but the command/script injection class described in the PR body remains reachable through adjacent inputs in the same helper module. QA also failed before scenarios ran, and the PR is currently conflicting against staging.
Blockers
e2e-tests/docker/helpers-harness.ts:20-outputis still embedded into generated bash source.
scriptContent writes echo ${JSON.stringify(output)} into the generated mock harness. That creates a double-quoted shell word, and bash still performs command substitution inside double quotes when the generated script runs. An output payload containing command-substitution text can execute in the Docker test container instead of being printed literally.
Fix this by emitting output as data, not shell source: for example, a literal-safe base64 decode to stdout, robust shell quoting, or a quoted heredoc/printf pattern that cannot trigger shell expansion. Add a regression test proving command-substitution text is printed inertly and causes no side effects.
e2e-tests/docker/helpers-harness.ts:20-exitCoderemains unvalidated dynamic shell source.
The generated script includes exit ${exitCode}. The new validation covers only name; TypeScript's number annotation is not a runtime boundary for JS callers, casts, or helper misuse.
Validate exitCode at runtime with Number.isInteger and constrain it to the accepted shell status range, or avoid placing it into shell source directly. Add invalid-exitCode regression coverage.
e2e-tests/docker/helpers-harness.ts:34/:35-createSimpleProcess(dir)remains command-injectable.
The PR body explicitly calls out createSimpleProcess(dir), but the diff leaves both mkdir -p ${dir} and the redirect target ${dir}/proc.js unchanged. A malicious dir can still alter the bash command executed through dockerExec.
Constrain dir to the exact expected safe test path shape or quote/escape it with a reliable primitive before it reaches bash. Add a malicious-dir regression test.
e2e-tests/docker/helpers-harness.ts:49/:50- cleanup names still flow raw intorm -f.
cleanupMockHarnesses(...names) maps raw names into /tmp/mock-harnesses/${n}, joins them with spaces, and executes rm -f ${rmPaths} through dockerExec. The new allowlist is only applied during creation.
Centralize a validated harness-name/path helper and use it in both create and cleanup, or avoid shell string construction for deletion. Add invalid cleanup-name tests.
- PR metadata - this head is not mergeable.
GitHub reports this PR as conflicting/dirty against staging. Resolve conflicts, then rerun review and QA on the resolved head commit.
Major
e2e-tests/docker/helpers-harness.ts:14- no regression tests were added for this security-sensitive boundary. Please cover valid names, invalid names, malicious output payloads, invalidexitCodevalues, maliciousdirvalues, and cleanup names. The remaining vulnerable paths would have been caught by those tests.
Minor
e2e-tests/docker/helpers-harness.ts:51- the file lost its trailing newline. Restore it to keep the diff focused.
QA
QA did not pass. I dispatched qa-dispatch.yml as run 28729284676; it failed in actions/checkout@v6 before any scenarios ran. Checkout used repository a5c-ai/babysitter with ref fix/security/command-injection-via-unsanitized-shell-, but the PR head branch is in fork tomaioo/babysitter, so checkout reported:
A branch or tag with the name 'fix/security/command-injection-via-unsanitized-shell-' could not be found
No live-stack QA results were produced for the reviewed commit.
Risk Assessment
Risk level: risk:high
- Risk: incomplete security remediation leaves command/script injection reachable through
output,exitCode,dir, and cleanup names while creating false confidence that the vulnerability is fixed. Mitigation: before merge, harden every dynamic shell/script interpolation path inhelpers-harness.tsand add exploit-style regression tests for each input. - Risk: the PR is conflicting against
staging, so the final resolved diff may differ from the reviewed code. Mitigation: resolve conflicts and rerun review/QA against the exact updated head. - Risk: QA has not validated the PR because dispatch checkout failed for the fork branch. Mitigation: rerun QA against a checkoutable PR ref or exact head SHA before approval.
- Risk: blast radius is limited to test infrastructure, but arbitrary commands can still run inside CI/docker test containers. Mitigation: keep this blocked until the helper is hardened and affected Docker harness/e2e scenarios pass.
Summary
Security: Command Injection via Unsanitized Shell Command Construction
Problem
Severity:
Critical| File:e2e-tests/docker/helpers-harness.ts:L18In
helpers.ts,helpers-pi.ts,helpers-oh-my-pi.ts,helpers-codex.ts,helpers-cursor.ts,helpers-github.ts, andhelpers-gemini-cli.ts, thedockerExecfunction passes user-controlledcmdstrings directly toexecSyncviainput: cmd + "\n". While this usesdocker exec -i container bashwith stdin piping to avoid some shell quoting issues, theexecfunction itself still invokes the system shell. More critically,createMockHarnessinhelpers-harness.tsdirectly interpolatesnameandexitCodeinto shell commands without sanitization:dockerExec(\mkdir -p ${scriptDir}`),dockerExec(`echo "${b64}" | base64 -d > ${scriptPath}`), anddockerExec(`chmod +x ${scriptPath}`). Thenameparameter is attacker-controllable and can inject arbitrary shell commands. Additionally,createSimpleProcessinterpolatesdir` directly into a shell command. This is a classic command injection vulnerability.Solution
Use parameterized commands or a proper shell escaping function (e.g.,
shell-quoteorshlex) for all dynamic values passed todockerExec. Avoid string interpolation for shell commands. ForcreateMockHarness, validatenameagainst a strict allowlist (e.g.,^[a-zA-Z0-9_-]+$) and useexecFileorspawnwith argument arrays instead ofexecSyncwith string commands.Changes
e2e-tests/docker/helpers-harness.ts(modified)