Skip to content

Conversation

@pjungkamp
Copy link

@pjungkamp pjungkamp commented Jan 6, 2026

Motivation

I've been experiencing errors when attempting remote store builds:

$ nix build --eval-store auto --store ssh-ng://remote-store.example flake#output
error: unexpected end-of-file
error: cannot open connection to remote store 'ssh-ng://remote-store.example': error: protocol mismatch, got 'started
              oixd

This only happens with DeterminateSystemd/nix-src, not with upstream Nix.

The underlying issue is my default ControlMaster/ControlPath configuration for SSH connection sharing.

I've looked into the SSHMaster connection code in Nix and identified some oddities. SSHMaster::isMasterRunning doesn't specify the control socket that needs to be checked and thus falls back to the one that I've configured, the lifetime of which is not managed by Nix. Nix shouldn't try to interact with those sockets because checking for such a master's existence and then acting on it is inherently racy.

Context

I fixed the SSHMaster::isMasterRunning function by adding a socketPath parameter. This function should only be called when useMaster is true, which I've now checked by assert.

This then caught another bug: The condition !useMaster && !isMasterRunning() in SSHMaster::startCommand is nonsensical; it will only check isMasterRunning if useMaster is false.

I also ensured that Nix explicitly disables connection sharing when SSHMaster::useMaster is false by adding -S none.

Summary by CodeRabbit

  • Refactor
    • Improved SSH master management to consistently use a specified socket path for control connections.
    • Startup and readiness checks updated for more reliable detection of existing master processes.
    • Non-master mode now reports "none" instead of an empty value for clearer status reporting.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

isMasterRunning and addCommonSSHOpts now accept an explicit Path socketPath and all call sites were updated to pass it. startMaster returns the literal string "none" when SSH master mode is disabled instead of an empty string; the SSH -S socketPath option is applied consistently.

Changes

Cohort / File(s) Summary
SSH Master Declaration
src/libstore/include/nix/store/ssh.hh
isMasterRunning signature updated to isMasterRunning(Path socketPath) to expose socket-path handling in the header.
SSH Master Implementation
src/libstore/ssh.cc
Added addCommonSSHOpts(Strings & args, Path socketPath); made isMasterRunning socketPath-aware and updated callers (startCommand, startMaster, readiness checks) to pass socketPath; non-master return changed from "" to "none"; -S socketPath applied consistently.

Sequence Diagram(s)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • edolstra

Poem

🐰 I hop through sockets, tidy and spry,
Passing paths so masters know why,
No empty echoes, just "none" to sing,
SSH lines hum, a clearer spring,
Tiny hops, a tidy tie. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'libstore: Improve SSH connection sharing' accurately summarizes the main change: enhancing SSH master management to handle socket paths explicitly and prevent race conditions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pjungkamp
Copy link
Author

What is this CodeRabbit "Docstring Coverage" warning supposed to tell me?

@pjungkamp
Copy link
Author

I'm also not satisfied by the CodeRabbit's summary and review time estimation.

I want to draw special attention to the modified condition here:

https://github.com/DeterminateSystems/nix-src/pull/311/files#diff-18b83d56d8285fe6a571671edbcb5c7ec0d708856d5e0119710ea76fd8d76988R210

The summary didn't explicitly note this change, which in my opinion would need the most scrutiny.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/libstore/ssh.cc (2)

119-128: Add a comment explaining the precondition.

The assert(useMaster) enforces an important precondition, but lacks explanation. Consider adding a comment to clarify why isMasterRunning should only be called when Nix is managing the SSH master.

📝 Suggested comment
 bool SSHMaster::isMasterRunning(Path socketPath)
 {
+    // This function should only be called when useMaster is true,
+    // as checking master status is only meaningful when Nix manages the SSH master.
     assert(useMaster);

235-235: Consider defining a constant for the "none" socket path.

Returning the literal string "none" is correct (SSH's -S none disables connection sharing), but using a magic string reduces clarity. Consider defining a named constant.

🔧 Suggested refactor

At the top of the file or in the class definition:

static constexpr const char * NO_SSH_MASTER_SOCKET = "none";

Then use it here:

     if (!useMaster)
-        return "none";
+        return NO_SSH_MASTER_SOCKET;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2e2fe1 and e5f6bb9.

📒 Files selected for processing (2)
  • src/libstore/include/nix/store/ssh.hh
  • src/libstore/ssh.cc
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/libstore/include/nix/store/ssh.hh
🧰 Additional context used
🧬 Code graph analysis (1)
src/libstore/ssh.cc (1)
src/libstore/include/nix/store/ssh.hh (2)
  • args (42-42)
  • socketPath (43-43)
🔇 Additional comments (6)
src/libstore/ssh.cc (6)

92-117: LGTM: Socket path is now explicitly controlled.

The addition of the socketPath parameter and the -S option ensures that SSH uses the correct control socket (or "none" to disable sharing). This prevents Nix from inadvertently probing or interfering with user-managed SSH control sockets.


156-156: LGTM: Consistent socket path handling.

Calling startMaster() unconditionally is correct—it returns "none" when not using master mode, which is then properly handled throughout the function.


189-189: LGTM: Socket path correctly propagated.

The socketPath obtained from startMaster() is correctly passed to addCommonSSHOpts, ensuring the -S option uses the appropriate control socket or "none".


252-252: LGTM: Assert precondition is satisfied here.

At this point in startMaster(), the early return on line 235 ensures useMaster is true, so the assert(useMaster) on line 121 will pass. The call is safe.


267-267: LGTM: Correct socket path used for master startup.

Passing state->socketPath to addCommonSSHOpts ensures the SSH master process is started with the correct control socket path via the -S option.


210-210: Verify the corrected logic prevents probing user-managed sockets.

This is the critical fix mentioned in the PR description. The new condition !(useMaster && isMasterRunning(socketPath)) ensures that:

  1. isMasterRunning is only called when useMaster is true (via short-circuit evaluation), respecting the assert on line 121
  2. Nix no longer probes SSH sockets when useMaster is false, preventing interference with user-managed SSH ControlMaster configurations

Both call sites to isMasterRunning are properly protected: at line 210 via short-circuit AND evaluation, and at line 252 via the early return guard in startMaster() (if (!useMaster) return "none"; at line 239). The logic is correct, though the double negation could be replaced with the equivalent form !useMaster || !isMasterRunning(socketPath) if clarity is preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant