feat: move restore execution to agent RPC#873
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
app/server/modules/agents/agents-manager.ts (1)
221-229: 💤 Low value
restore.progressevent forwarding looks correct.Same shape as the controller-local emission in
repositories.service.ts(organizationId,repositoryId,snapshotId,...progress), so SSE consumers can treat both sources uniformly.One thing to keep in mind:
event.payload.organizationId/repositoryId/snapshotIdare echoed by the agent. Agents are authenticated per-organization, so cross-tenant spoofing is bounded by that auth, but if you ever want a stricter guarantee you could deriveorganizationIdfrom the connection context (event.agentId→ org lookup) instead of the agent payload.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/server/modules/agents/agents-manager.ts` around lines 221 - 229, The current "restore.progress" case forwards event.payload.organizationId/repositoryId/snapshotId via serverEvents.emit which matches the controller-local shape, but to prevent potential cross-tenant spoofing tighten the source of truth by deriving organizationId from the connection context: look up the org for event.agentId (e.g., map agentId → organization) instead of trusting event.payload.organizationId before calling serverEvents.emit; keep repositoryId/snapshotId from payload unless you also want to validate them against the agent's repo permissions.app/server/modules/repositories/__tests__/repositories.service.test.ts (1)
486-498: ⚡ Quick winTighten assertions to catch the
optionsspread leak.Both call assertions use
expect.objectContaining({ options: expect.objectContaining({ organizationId, basePath }) }), which will pass even ifoptionsalso containstargetAgentIdandtargetPath(see the matching comment onrepositories.service.ts). Once the service is updated to destructure those out, you can tighten toexpect.not.objectContaining({ targetAgentId: expect.anything(), targetPath: expect.anything() })(or usetoEqualwith the exact expected options shape) to lock down the contract sent to the agent.Also applies to: 610-621
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/server/modules/repositories/__tests__/repositories.service.test.ts` around lines 486 - 498, The test currently uses loose objectContaining checks on the agent call (see restoreMock in repositories.service.test.ts) which allows leaked fields in the options payload; tighten the assertion for the options argument to either assert the exact shape (toEqual with the expected options object) or add expect.not.objectContaining({ targetAgentId: expect.anything(), targetPath: expect.anything() }) so options does not include targetAgentId/targetPath, and apply the same tighter assertion to the other similar assertion block (the one around lines 610-621) so the test fails if repositories.service.ts doesn't destructure those fields out.app/server/modules/agents/controller/server.ts (1)
27-29: 💤 Low valueType widening allows
agent.readyandheartbeat.pongto silently drop in the manager—clarify intent.
AgentManagerEventnow accepts allAgentMessagevariants, but the switch inagents-manager.tsonly handlesagent.disconnected, the backup-related cases (backup.started,backup.progress, etc.), andrestore.progress. The command-result variants (volume.commandResult,restore.commandResult) are resolved insession.tsand never reach the manager. However,agent.readyandheartbeat.pongare forwarded by the default branch insession.tsbut have no handler in the manager, so they silently drop. Nodefaultcase exists in the manager's switch, so unhandled types are silently ignored. This is functionally fine, but if exhaustive checking is added later, each new variant will require explicit handling or deliberate ignoring.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/server/modules/agents/controller/server.ts` around lines 27 - 29, AgentManagerEvent is currently widened to accept all AgentMessage variants (via AgentMessage combined with AgentEventContext), which lets messages like "agent.ready" and "heartbeat.pong" be forwarded from session.ts but silently dropped in the agents-manager.ts switch; either narrow AgentManagerEvent to only the intended variants (e.g., keep (AgentEventContext & { type: "agent.disconnected" }) and the explicit backup/restore variants) so the type system prevents unexpected messages, or keep the broad type and make agents-manager.ts explicitly handle/unhandle the extras (add explicit no-op cases for "agent.ready" and "heartbeat.pong" or add a single exhaustive default that calls an assertNever-like helper) so dropping is intentional and future exhaustiveness checks are safe; refer to the AgentManagerEvent type, AgentEventContext, AgentMessage, and the switch in agents-manager.ts and the default-forwarding in session.ts when applying the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/server/modules/repositories/__tests__/repositories.service.test.ts`:
- Around line 550-577: The test "uses controller-local restore fallback when
local agent supervision is disabled" may create the temp target under
process.cwd(), which can be inside blocked roots (e.g., os.tmpdir()); change the
temp directory creation in the test (where targetPath is created) to use a
known-safe temp root such as os.tmpdir() (e.g.,
mkdtemp(nodePath.join(os.tmpdir(), "restore-target-"))) so the path won't be
rejected by the restore target validation; ensure you import/use the os module
and keep the teardown (fs.rm) unchanged; references:
setupRestoreSnapshotScenario, repositoriesService.restoreSnapshot,
restic.restore, targetPath.
In `@app/server/modules/repositories/repositories.service.ts`:
- Around line 378-414: Destructure and remove targetAgentId and targetPath from
the options object before spreading it into downstream calls: when calling
restic.restore (inside the useControllerLocalRestoreFallback branch) and when
building the options for agentManager.runRestoreCommand (inside the agent
branch), do const { targetAgentId, targetPath, ...safeOptions } = options and
spread safeOptions instead of options so targetAgentId and targetPath are not
sent to restic.restore or into RestoreCommandPayload options.
In `@apps/agent/src/commands/restore.ts`:
- Around line 17-27: The check in assertAllowedRestoreTarget uses path.resolve
only and can be bypassed via symlinks; change it to canonicalize the target
against the nearest existing ancestor before calling isPathWithin. Specifically,
in assertAllowedRestoreTarget (and when calling
getBlockedRestoreTargets/isPathWithin), compute resolvedTarget =
path.resolve(target), walk up with path.dirname until you find an existing
ancestor (fs.existsSync), call fs.realpathSync on that ancestor to get
realAncestor, reconstruct canonicalTarget = path.join(realAncestor,
path.relative(existingAncestor, resolvedTarget)), then use
isPathWithin(blockedTarget, canonicalTarget) to perform the blocked-path check;
keep the same error throw on match.
- Around line 55-71: The outbound offer result is currently discarded; capture
the boolean returned by ControllerCommandContext.offerOutbound (e.g., const
queued = yield* context.offerOutbound(...)) for both success and error branches
and explicitly handle rejection by checking if (!queued) and failing/throwing an
error so the controller sees the rejected delivery (mirror the pattern used in
session.ts). Ensure you reference the
createAgentMessage("restore.commandResult", { commandId: payload.commandId,
status: "...", ... }) call and use toMessage(error) in the error branch, but
only proceed as successful if queued is true; if queued is false, return/fail
with an error indicating outbound queue rejection.
In `@packages/core/src/restic/commands/restore.ts`:
- Around line 134-146: The call to cleanupTemporaryKeys(env, deps) is currently
after awaiting safeSpawn({...}) so if safeSpawn rejects the temporary keys are
never removed; wrap the spawn step in a try/finally (or Effect.ensuring) so
cleanupTemporaryKeys(env, deps) always runs regardless of success or failure.
Specifically, ensure the call site that awaits safeSpawn({ command: "restic",
args, env, signal: options.signal, onStdout: ... }) is enclosed in a finally
block that invokes cleanupTemporaryKeys(env, deps), keeping the existing
onStdout/streamProgress behavior intact.
---
Nitpick comments:
In `@app/server/modules/agents/agents-manager.ts`:
- Around line 221-229: The current "restore.progress" case forwards
event.payload.organizationId/repositoryId/snapshotId via serverEvents.emit which
matches the controller-local shape, but to prevent potential cross-tenant
spoofing tighten the source of truth by deriving organizationId from the
connection context: look up the org for event.agentId (e.g., map agentId →
organization) instead of trusting event.payload.organizationId before calling
serverEvents.emit; keep repositoryId/snapshotId from payload unless you also
want to validate them against the agent's repo permissions.
In `@app/server/modules/agents/controller/server.ts`:
- Around line 27-29: AgentManagerEvent is currently widened to accept all
AgentMessage variants (via AgentMessage combined with AgentEventContext), which
lets messages like "agent.ready" and "heartbeat.pong" be forwarded from
session.ts but silently dropped in the agents-manager.ts switch; either narrow
AgentManagerEvent to only the intended variants (e.g., keep (AgentEventContext &
{ type: "agent.disconnected" }) and the explicit backup/restore variants) so the
type system prevents unexpected messages, or keep the broad type and make
agents-manager.ts explicitly handle/unhandle the extras (add explicit no-op
cases for "agent.ready" and "heartbeat.pong" or add a single exhaustive default
that calls an assertNever-like helper) so dropping is intentional and future
exhaustiveness checks are safe; refer to the AgentManagerEvent type,
AgentEventContext, AgentMessage, and the switch in agents-manager.ts and the
default-forwarding in session.ts when applying the change.
In `@app/server/modules/repositories/__tests__/repositories.service.test.ts`:
- Around line 486-498: The test currently uses loose objectContaining checks on
the agent call (see restoreMock in repositories.service.test.ts) which allows
leaked fields in the options payload; tighten the assertion for the options
argument to either assert the exact shape (toEqual with the expected options
object) or add expect.not.objectContaining({ targetAgentId: expect.anything(),
targetPath: expect.anything() }) so options does not include
targetAgentId/targetPath, and apply the same tighter assertion to the other
similar assertion block (the one around lines 610-621) so the test fails if
repositories.service.ts doesn't destructure those fields out.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d0806a08-758c-4b67-a943-345833bd4088
📒 Files selected for processing (17)
app/server/modules/agents/agents-manager.tsapp/server/modules/agents/controller/server.tsapp/server/modules/agents/controller/session.tsapp/server/modules/backups/backup.helpers.tsapp/server/modules/repositories/__tests__/repositories.service.test.tsapp/server/modules/repositories/repositories.dto.tsapp/server/modules/repositories/repositories.service.tsapps/agent/src/commands/__tests__/volume.test.tsapps/agent/src/commands/backup-run.tsapps/agent/src/commands/helpers/__tests__/backup.helpers.test.tsapps/agent/src/commands/helpers/backup.helpers.tsapps/agent/src/commands/index.tsapps/agent/src/commands/restore.tspackages/contracts/src/agent-protocol.tspackages/core/src/restic/commands/__tests__/restore.test.tspackages/core/src/restic/commands/restore.tspackages/core/src/restic/index.ts
| const assertAllowedRestoreTarget = (target: string) => { | ||
| const resolvedTarget = path.resolve(target); | ||
|
|
||
| for (const blockedTarget of getBlockedRestoreTargets()) { | ||
| if (isPathWithin(blockedTarget, resolvedTarget)) { | ||
| throw new Error( | ||
| "Restore target path is not allowed. Restoring to this path could overwrite critical system files or application data.", | ||
| ); | ||
| } | ||
| } | ||
| }; |
There was a problem hiding this comment.
Canonicalize the target before the blocked-path check.
path.resolve() only normalizes the string. A target like /restore/link/out, where link is a symlink into the repositories directory or the restic pass-file directory, will pass this guard and still restore into the blocked location. This needs a real-path check on the nearest existing ancestor before calling isPathWithin(...).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/agent/src/commands/restore.ts` around lines 17 - 27, The check in
assertAllowedRestoreTarget uses path.resolve only and can be bypassed via
symlinks; change it to canonicalize the target against the nearest existing
ancestor before calling isPathWithin. Specifically, in
assertAllowedRestoreTarget (and when calling
getBlockedRestoreTargets/isPathWithin), compute resolvedTarget =
path.resolve(target), walk up with path.dirname until you find an existing
ancestor (fs.existsSync), call fs.realpathSync on that ancestor to get
realAncestor, reconstruct canonicalTarget = path.join(realAncestor,
path.relative(existingAncestor, resolvedTarget)), then use
isPathWithin(blockedTarget, canonicalTarget) to perform the blocked-path check;
keep the same error throw on match.
| yield* context.offerOutbound( | ||
| createAgentMessage("restore.commandResult", { | ||
| commandId: payload.commandId, | ||
| status: "success", | ||
| result, | ||
| }), | ||
| ); | ||
| }).pipe( | ||
| Effect.catchAll((error) => | ||
| context.offerOutbound( | ||
| createAgentMessage("restore.commandResult", { | ||
| commandId: payload.commandId, | ||
| status: "error", | ||
| error: toMessage(error), | ||
| }), | ||
| ), | ||
| ), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== ControllerCommandContext definition =="
fd -i 'context.ts' apps/agent/src --exec sed -n '1,220p' {}
echo
echo "== offerOutbound declarations and usages =="
rg -n -C3 '\bofferOutbound\b' apps/agent/src app/server/modules/agentsRepository: nicotsx/zerobyte
Length of output: 13399
Handle queue rejection from offerOutbound to prevent silent loss of restore completion message.
ControllerCommandContext.offerOutbound returns Effect.Effect<boolean, never, never>, where false indicates the outbound queue rejected the message. The current code at lines 55–70 uses yield* to extract the boolean but discards it without checking, causing both success and failure branches to return without sending restore.commandResult. If the queue is full, the restoration completes but the controller never receives the response and waits until its 1-hour timeout.
Compare against similar handlers in the codebase (e.g., app/server/modules/agents/controller/session.ts lines 258–261) which explicitly check if (!queued) and raise an error. Apply the same pattern here: capture the boolean return and handle rejection explicitly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/agent/src/commands/restore.ts` around lines 55 - 71, The outbound offer
result is currently discarded; capture the boolean returned by
ControllerCommandContext.offerOutbound (e.g., const queued = yield*
context.offerOutbound(...)) for both success and error branches and explicitly
handle rejection by checking if (!queued) and failing/throwing an error so the
controller sees the rejected delivery (mirror the pattern used in session.ts).
Ensure you reference the createAgentMessage("restore.commandResult", {
commandId: payload.commandId, status: "...", ... }) call and use
toMessage(error) in the error branch, but only proceed as successful if queued
is true; if queued is false, return/fail with an error indicating outbound queue
rejection.
1015e3a to
9f3edd5
Compare
|
Related Knowledge 1 document with suggested updates is ready for review. Zerobyte's Space Repositories in Zerobyte: Types, Architecture, and OperationsView Suggested Changes@@ -359,7 +359,10 @@
- `exclude`: Array of paths to exclude (optional)
- `excludeXattr`: Array of extended attributes to exclude (optional)
- `targetPath`: Custom location for restore
+ - `targetAgentId`: ID of a specific remote agent to execute the restore operation (optional). When provided, the restore is executed by the specified agent. If omitted or the agent is unavailable, the restore falls back to local controller execution.
- `overwrite`: Overwrite mode for existing files (`'always'`, `'if-changed'`, `'if-newer'`, or `'never'`)
+ - Restore execution can be delegated to remote agents via the agent RPC system for improved performance and resource distribution
+ - Progress tracking and monitoring work the same way for users regardless of whether the restore is executed on the controller or a remote agent
- Single-file restore behavior: When restoring a single file to a custom location with `selectedItemKind: "file"`, the system uses the file's parent directory as the common ancestor for proper path resolution, ensuring the file is placed correctly at the target path
- The `RestoreForm` component accepts both `queryBasePath` and `displayBasePath` parameters for path handling:
- `queryBasePath`: The base path for API queries (scoped to snapshot contents) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f3edd59d5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const getBlockedRestoreTargets = () => | ||
| [REPOSITORY_BASE, path.dirname(RESTIC_PASS_FILE), os.tmpdir()].map((target) => path.resolve(target)); |
There was a problem hiding this comment.
Restore full blocked-path denylist in agent validation
After this change, most restores are executed via agent RPC, so assertAllowedRestoreTarget is now the main path guard. The new denylist only includes repository root, restic password directory, and temp directory, which is weaker than the previous controller-side protection (e.g., RESTORE_BLOCKED_ROOTS, DB/provisioning-related paths). In the default local-agent flow, restores to sensitive paths like /app can now pass validation and overwrite runtime files.
Useful? React with 👍 / 👎.
| duration: "1 hour", | ||
| onTimeout: () => new Error(`Restore command ${payload.snapshotId} timed out`), |
There was a problem hiding this comment.
Avoid hard one-hour timeout for restore command completion
This introduces a fixed 1-hour timeout for waiting on restore.commandResult. Large repositories can legitimately take longer, so the controller will report failure and remove the pending command while the agent keeps restoring; when completion eventually arrives it is treated as an unknown command. That creates false error reporting and inconsistent restore state for long-running jobs.
Useful? React with 👍 / 👎.
9f3edd5 to
7ccbb0e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7ccbb0e2e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| const basePath = hasNonPosixSnapshotPaths ? "/" : findCommonAncestor(paths); | ||
| const basePath = hasNonPosixSnapshotPaths ? "/" : findCommonAncestor(snapshot.paths); | ||
| const executionAgentId = options?.targetAgentId ?? LOCAL_AGENT_ID; |
There was a problem hiding this comment.
Verify target agent ownership before dispatching restores
When targetAgentId is supplied, this path uses the raw id without checking that the connected agent belongs to the current organizationId (or is the local agent). In a multi-organization deployment, any user who can obtain or guess another org's agent id can route a restore to that agent; the code below then sends the decrypted repository config and restic password and writes to the requested target path on that host. Please look up the agent and require agent.organizationId === organizationId before calling runRestoreCommand.
Useful? React with 👍 / 👎.
| excludeXattr: z.array(z.string()).optional(), | ||
| delete: z.boolean().optional(), | ||
| targetPath: z.string().optional(), | ||
| targetAgentId: z.string().optional(), |
There was a problem hiding this comment.
Regenerate restore API client for target agent selection
Adding targetAgentId here changes the restore request contract, but the committed generated client is still stale: app/client/api-client/types.gen.ts RestoreSnapshotData.body does not include targetAgentId. Any frontend or TypeScript consumer using the generated restoreSnapshot SDK cannot call this new agent-routing path without type errors, so the API client needs to be regenerated with this field.
Useful? React with 👍 / 👎.
|
|
||
| export const handleRestoreCommand = (context: ControllerCommandContext, payload: RestoreCommandPayload) => { | ||
| return Effect.gen(function* () { | ||
| assertAllowedRestoreTarget(payload.target); |
There was a problem hiding this comment.
Return restore validation errors instead of killing the agent loop
When an agent receives a restore whose target is blocked, assertAllowedRestoreTarget throws a synchronous exception inside Effect.gen. In Effect, thrown exceptions are defects rather than failures, so this bypasses the Effect.catchAll below, no restore.commandResult is sent, and the agent's command processor can die while the controller waits until its restore timeout. Wrap this validation in an Effect.try/Effect.fail path so blocked targets return the intended error result immediately.
Useful? React with 👍 / 👎.
| serverEvents.emit("restore:progress", { | ||
| organizationId: event.payload.organizationId, |
There was a problem hiding this comment.
Validate restore progress before rebroadcasting it
For remote agents, restore progress is rebroadcast using the organizationId, repository, and snapshot supplied by the agent message, with no check that the message belongs to a pending restore command or to the agent's own organization. A buggy or compromised connected agent can therefore inject misleading restore:progress SSE events into another organization simply by choosing that organization id in the payload; backup events avoid this by matching job/schedule/agent state before forwarding.
Useful? React with 👍 / 👎.

Summary by CodeRabbit
Release Notes
New Features
targetAgentIdparameterTests