Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host DiskStats produces quite different list of output to Linux "df" #20247

Closed
tomqwpl opened this issue Mar 28, 2024 · 3 comments
Closed

Host DiskStats produces quite different list of output to Linux "df" #20247

tomqwpl opened this issue Mar 28, 2024 · 3 comments

Comments

@tomqwpl
Copy link

tomqwpl commented Mar 28, 2024

Nomad version

Output from nomad version 1.7.5

Operating system and Environment details

Linux

Issue

The list of filesystems you get out of the DiskStats in HostStats is quite different to what you appear to get from df.

Here's an example from a Linux VM

df:
Filesystem                 1K-blocks     Used Available Use% Mounted on
devtmpfs                     8036992        0   8036992   0% /dev
tmpfs                        8055772     1084   8054688   1% /dev/shm
tmpfs                        8055772   826036   7229736  11% /run
tmpfs                        8055772        0   8055772   0% /sys/fs/cgroup
/dev/mapper/almalinux-root  28297216 17430160  10867056  62% /
/dev/sda1                    1038336   306268    732068  30% /boot
tmpfs                        1611152        0   1611152   0% /run/user/3203345
tmpfs                        1611152        0   1611152   0% /run/user/3201116

nomad lists /, /boot, /opt/sentinelone/rpm_mount

Another example, this time from WSL:

nomad lists: 
/mnt/wsl/docker-desktop-data/isocache
/mnt/wsl/docker-desktop/docker-desktop-user-distro
/mnt/wsl/docker-desktop/cli-tools
/
/mnt/wslg/distro

df:
Filesystem                                1K-blocks       Used  Available Use% Mounted on
none                                       16354108          4   16354104   1% /mnt/wsl
/dev/sdd                                 1055762868  393449968  608609428  40% /mnt/wsl/docker-desktop-data/isocache
none                                       16354108          8   16354100   1% /mnt/wsl/docker-desktop/shared-sockets/host-services
/dev/sdc                                 1055762868      62620 1001996776   1% /mnt/wsl/docker-desktop/docker-desktop-user-distro
/dev/loop0                                   468724     468724          0 100% /mnt/wsl/docker-desktop/cli-tools
none                                     1999902716 1389829136  610073580  70% /usr/lib/wsl/drivers
none                                       16354108          0   16354108   0% /usr/lib/modules
none                                       16354108          0   16354108   0% /usr/lib/modules/5.15.146.1-microsoft-standard-WSL2
/dev/sde                                 1055762868   57937784  944121612   6% /
none                                       16354108        160   16353948   1% /mnt/wslg
none                                       16354108          0   16354108   0% /usr/lib/wsl/lib
rootfs                                     16350852       1884   16348968   1% /init
none                                       16350852          0   16350852   0% /dev
none                                       16354108          4   16354104   1% /run
none                                       16354108          0   16354108   0% /run/lock
none                                       16354108          0   16354108   0% /run/shm
none                                       16354108          0   16354108   0% /run/user
tmpfs                                      16354108          0   16354108   0% /sys/fs/cgroup
none                                       16354108         76   16354032   1% /mnt/wslg/versions.txt
none                                       16354108         76   16354032   1% /mnt/wslg/doc
C:\                                      1999902716 1389829136  610073580  70% /mnt/c
none                                       16354108        152   16353956   1% /mnt/wsl/docker-desktop-bind-mounts/AlmaLinux9/1e3f10efd5b66e89511d4ace575ae6f00a11f07d7c405a94977aebc6893fa00c
C:\Program Files\Docker\Docker\resources 1999902716 1389829136  610073580  70% /Docker/host

Finally, inside a docker container:

df:
Filesystem      1K-blocks       Used Available Use% Mounted on
overlay        1055762868  393449828 608609568  40% /
tmpfs               65536          0     65536   0% /dev
tmpfs            16354108          0  16354108   0% /sys/fs/cgroup
shm                 65536         32     65504   1% /dev/shm
C:\            1999902716 1389696728 610205988  70% /home
/dev/sdd       1055762868  393449828 608609568  40% /etc/hosts
tmpfs            16354108       8320  16345788   1% /run

nomad lists:
/etc/resolv.conf
/etc/hostname
/etc/hosts

I appreciate that this information all appears to be provided by the underlying shirou/gopsutil library, but clearly what it's doing isn't right here.

Reproduction steps

Make a request for the node stats and compare the list of filesystems with what you get from df.

Expected Result

Output to list the same filesystems as "df" on Linux

Actual Result

A different list.

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Apr 1, 2024
@jrasell
Copy link
Member

jrasell commented Apr 4, 2024

Hi @tomqwpl and thanks for raising this issue. Nomad populates this data using by detailing physical devices only and ignoring all others such as memory partitions. Toggling this behaviour in the code would be trivial, however, the return object size would significantly increase and the number of metrics emitted would also increase significantly per host.

I believe a similar command to use would be findmnt -D -t nosquashfs,notmpfs,nodevtmpfs,overlay which would mimic Nomad's current behaviour. Do you have any use cases to modify this behaviour, apart from the output listing matching the df output? If so, it's something I can quickly raise internally to get some additional feedback.

@tomqwpl
Copy link
Author

tomqwpl commented Apr 4, 2024

@jrasell The list if populated using the underlying library whose name escapes me currently. My issue really is with the implementation of the underlying library and I've raised an issue at that level too.
Nomad reports disk usage and I was surprised by the list of partitions it returned me. Naturally I compared the result to the result of "df", and that gives a whole different list. The filtering that df uses is is completely different to what the underlying library uses. The main filtering that "df" appears to do is to filter those filesystems where the filesystem size is zero. I believe it then removes duplicate mounts of the same underlying filesystem. The underlying library here appears to filter things based on whether there is a record of the filesystem type in the /proc/filesystems file and whether that record is marked as "nodev".

I want to display disk usage of the client nodes to the end user, and so the unfiltered list would not be what I would want.

Any change here has to come from making use of a different version of the underlying library, making use of some currently unimplemented option in the underlying library, making use of a different underlying library that would do a different thing etc. So right now I'm not expecting any action at this level, it is more for information at this point.

@jrasell
Copy link
Member

jrasell commented Apr 4, 2024

Hi @tomqwpl and thanks for the response. The library is gopsutil which has your current issue regarding disk listing. I'll keep an eye on that issue from the Nomad side, for any future changes we would need to be aware of.

making use of a different version of the underlying library

gopsutil is heavily used in Nomad and is not something we would be able to swap out without heavy engineering investment, even if there was a viable and well tested alternative.

So right now I'm not expecting any action at this level, it is more for information at this point.

OK, we appreciate the detail and this issue will be searchable by other user and engineers if this gets raised or discussed in the future. I'll close it out seeing as it was intended for information, but as I mentioned, will keep an eye on the gopsutil issue.

@jrasell jrasell closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024
Nomad - Community Issues Triage automation moved this from Needs Triage to Done Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants