Skip to content

feat(fleetnode): carry concurrent commands over the ControlStream#389

Draft
ankitgoswami wants to merge 1 commit into
mainfrom
fleetnode-commands
Draft

feat(fleetnode): carry concurrent commands over the ControlStream#389
ankitgoswami wants to merge 1 commit into
mainfrom
fleetnode-commands

Conversation

@ankitgoswami
Copy link
Copy Markdown
Contributor

Summary

Groundwork for sending commands to miners paired via fleet nodes. Today the
fleet-node ControlStream carries exactly one thing: server-initiated discovery
(#235). This PR generalizes the control plane so it can carry many concurrent
commands per node, which is the prerequisite for routing operator miner commands
(reboot, curtail, …) down a node's stream via RFC 0001's remote-node Miner
adapter (a later PR).

No behavior change to discovery. This is Phase 1 of a 3-phase effort; it ships
no user-visible capability on its own.

What changed

  • Typed AgentCommand envelope (proto/pairing/v1): the ControlCommand.payload
    is now a pairing.v1.AgentCommand oneof so the node can tell command kinds apart.
    Discovery migrates into the discover arm (done exactly once, on both the admin
    send side and the node decode side). Fields 2/3 are left for the upcoming
    miner-command and pairing arms, coordinated with the parallel pairing effort so the
    discovery payload is never migrated twice.
  • Registry (internal/domain/fleetnode/control): replaces the
    single-in-flight-command-per-node model with a command_id-keyed map.
    Report-bearing commands (discovery, and later pairing) stream batches + a terminal
    ack on an events channel and admit device reports against the existing scope/quota;
    ack-only commands (per-miner commands) take their terminal ack on a per-command
    channel via a new blocking SendCommand. The outbound queue is buffered, and teardown
    frees every in-flight command. Channel-ownership invariants are preserved.
  • Node (cmd/fleetnode/control.go): replaces the single command worker with a
    bounded (16) worker pool, so a long (up to 10-min) discovery no longer
    head-of-line-blocks quick commands and many commands run concurrently; BUSY is acked
    only at the pool ceiling. handleCommand dispatches on envelope kind.

Test plan

  • go test -race ./internal/domain/fleetnode/control/... ./cmd/fleetnode/... — green.
    New/updated coverage: many concurrent commands not rejected, ack-routing by kind,
    SendCommand block/disconnect/ctx-cancel/teardown, node worker pool runs a quick
    command while a long discovery is in flight, BUSY past the pool ceiling, unknown
    envelope kind → BAD_REQUEST.
  • DB-backed DiscoverOnFleetNode integration suite — green (end-to-end envelope
    round-trip through the real handler + registry).
  • Full server build, golangci-lint, and pre-push tsc — clean.

Follow-ups (later PRs)

  • Phase 2: MinerCommand proto + remote-node Miner adapter + fleet_node_device
    resolution branch + node-side executor (end-to-end for the virtual driver).
  • Phase 3: firmware/logs out-of-band transfer, password credential-edit flow, unpair.

🤖 Generated with Claude Code

Groundwork for routing operator miner commands to fleet-node-paired miners
(RFC 0001's remote-node Miner adapter). No behavior change to discovery.

- Wrap the ControlStream payload in a typed pairing.v1.AgentCommand envelope so
  the node can tell command kinds apart; discovery migrates into the `discover`
  arm. Fields 2 and 3 are left for the upcoming miner-command and pairing arms
  so the discovery payload is migrated exactly once.
- Registry: replace the single-in-flight-command-per-node model with a
  command_id-keyed map. Report-bearing commands (discovery, later pairing)
  stream batches + a terminal ack on an events channel; ack-only commands take
  the terminal ack on a per-command channel via a new blocking SendCommand. The
  outbound queue is buffered and teardown frees every in-flight command.
- Node: replace the single command worker with a bounded worker pool, so a long
  discovery no longer head-of-line-blocks quick commands; BUSY is acked only at
  the pool ceiling. handleCommand dispatches on envelope kind.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added javascript Pull requests that update javascript code client server shared labels Jun 4, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (8e49c1617ac0e3702f635762250bf8f414a3d9fa...f1ad3dc175e508e9f088fa83780c123127817de4, exact PR three-dot diff)
  • Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: HIGH

Findings

[HIGH] ControlCommand payload change breaks mixed-version fleet nodes

  • Category: Protobuf
  • Location: server/internal/handlers/fleetnode/admin/handler.go:206
  • Description: Discovery commands are now always marshaled as pairing.v1.AgentCommand, while the agent now only decodes AgentCommand in server/cmd/fleetnode/control.go. Existing fleet nodes expect the payload to be a bare DiscoverRequest, so upgrading the server before every node will make remote discovery fail. The reverse mixed-version path is also unsafe because a new agent decoding an old bare DiscoverRequest as AgentCommand will either see no recognized command or misparse field 1.
  • Impact: Rolling upgrades can break fleet-node discovery across deployed agents, returning BAD_REQUEST-style failures until every server and node binary is upgraded together.
  • Recommendation: Add an explicit migration path: include a protocol/capability version in the control handshake and select the payload format per node, or make the agent accept both AgentCommand and legacy bare DiscoverRequest during the transition.

[MEDIUM] In-flight command map is unbounded before enqueue succeeds

  • Category: Reliability
  • Location: server/internal/domain/fleetnode/control/registry.go:161
  • Description: addCmd inserts every command into conn.cmds before Registry.Send or SendCommand successfully enqueues it to the bounded outgoing channel. Once outgoingBuffer fills, additional RPC handlers can continue adding unique command IDs and then block until context timeout.
  • Impact: A burst of DiscoverOnFleetNode requests against a slow or non-reading node can accumulate unbounded in-flight command entries, event channels, goroutines, and operator streams for up to DiscoverCommandTimeout.
  • Recommendation: Enforce a per-node queued/in-flight cap before inserting into conn.cmds, or enqueue first and only register the command once dispatch is guaranteed. Return ResourceExhausted/BUSY when the server-side cap is hit.

[MEDIUM] Discovery commands can now saturate the fleet node worker pool

  • Category: Reliability
  • Location: server/cmd/fleetnode/control.go:181
  • Description: The agent changed from one serialized command to a 16-slot worker pool, but AgentCommand currently only defines discovery work. That lets multiple full discovery scans run concurrently on one node, each with its own probe fanout or nmap invocation.
  • Impact: An authorized user can unintentionally or intentionally drive much higher CPU, network, and plugin load on the fleet node and local miner network than before.
  • Recommendation: Keep discovery single-flight or give discovery a much smaller dedicated semaphore, reserving the wider pool for future short per-miner commands.

Notes

No cryptostealing, pool hijacking, SQL injection, auth bypass, or frontend XSS issues were evident in the reviewed diff.


Generated by Codex Security Review |
Triggered by: @ankitgoswami |
Review workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client javascript Pull requests that update javascript code server shared

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant