Skip to content

Conversation

@mismithhisler
Copy link
Member

@mismithhisler mismithhisler commented Oct 30, 2025

Description

Storage fingerprinting was using the detected available disk space from the df command. This value would lead to incorrect disk space calculations when the fingerprinting client had existing allocations that had written to disk.

Testing & Reproduction steps

Links

Fixes GH-24914, GH-6172

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

Storage fingerprinting was using the detected available disk space from
the df command. This value would lead to incorrect disk space
calculations when the fingerprinting client had existing allocations
that had written to disk.
// SPDX-License-Identifier: BUSL-1.1

// MACHINE GENERATED BY 'go generate' COMMAND; DO NOT EDIT
// Code generated by 'go generate'; DO NOT EDIT.
Copy link
Member Author

@mismithhisler mismithhisler Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was using deprecated functions so I regenerated it. Tested on a windows machine to validate it worked correctly.


resp.AddAttribute("unique.storage.volume", volume)
resp.AddAttribute("unique.storage.bytestotal", strconv.FormatUint(total, 10))
resp.AddAttribute("unique.storage.bytesfree", strconv.FormatUint(free, 10))
Copy link
Member Author

@mismithhisler mismithhisler Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should remove this attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's go ahead and do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now wondering if it makes sense to just completely remove disk_free_mb. This setting doesn't really make sense with regard to the fingerprint anymore. It would basically become another override for total disk space, which we already have disk_total_mb for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. We can deprecate it, but if you remove it outright I'm pretty sure users will hit validation errors during upgrade. So we'll have to remove it later on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a warning to the client config finalizer for users.

return fmt.Errorf("failed to determine disk space for %s: %v", storageDir, err)
}

if cfg.DiskTotalMB > 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should also deprecate this field as well. It was added in the same PR as DiskFreeMB. I think the only 2 use cases for this would be if we could not detect the clients total disk space for some reason (but I think this would be a Nomad bug) or if a user wanted to "overprovision" their storage for scheduling sake.

Copy link
Contributor

@aimeeu aimeeu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs update. I left some suggestions.

tgross
tgross previously approved these changes Nov 4, 2025
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Do we need to make any changes to the CLI output for nomad node status to account for these changes?


resp.AddAttribute("unique.storage.volume", volume)
resp.AddAttribute("unique.storage.bytestotal", strconv.FormatUint(total, 10))
resp.AddAttribute("unique.storage.bytesfree", strconv.FormatUint(free, 10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's go ahead and do that.

@mismithhisler
Copy link
Member Author

@tgross

Do we need to make any changes to the CLI output for nomad node status to account for these changes?

We aren't adding or removing any values from nomad node status just changing what the denominator is for Allocated Resources -> Disk, so we should be good to go there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disk exhausted after upgrade 1.5.6-1.9.5

3 participants