You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
stress-ng version: stress-ng, version 0.18.07 (gcc 11.4.0, aarch64 Linux 5.15.148-tegra) 💻🔥
Steps to reproduce:
Install stress-ng from the PPA: ppa:colin-king/stress-ng
Create temp workdir /mnt/nvme0n1p1/bg-temp/ and mount the drive: sudo mkdir -p /mnt/nvme0n1p1/ && sudo mount /dev/nvme0n1p1 /mnt/nvme0n1p1 && mkdir /mnt/nvme0n1p1/bg-temp/
Expected result:
Test finishes within 10 seconds timeout
Actual result:
Test hangs indefinitely.
$ sudo stress-ng --aggressive --verify --timeout 10 --temp-path /mnt/nvme0n1p1/bg-temp/ --hdd-opts dsync --readahead-bytes 16M -k --chmod 0
stress-ng: info: [2130] setting to a 10 secs run per stressor
stress-ng: info: [2130] dispatching hogs: 12 chmod
^C^Z
Probable root cause:
Command probably hangs due to using CPU nodes that are offline. The failure case logs show 12 chmods, whereas according to jetson_clocks only 8 CPU nodes are currently online:
Making all 12 CPU nodes available with sudo nvpmodel -m 3 makes the above command work as expected (nvpmodel is a NVIDIA utility for switching between power modes, by default we're in power saving mode which explains why some CPUs are offline.)
The text was updated successfully, but these errors were encountered:
Using --chmod -1 will select just the online'd number of CPUs rather than the total system configured number of cpus. Using --vmstat 1 will show you the system activity to see if it's still doing I/O after the 10 seconds. Using --klog-check will dump out any kernel errors found from the kernel log. The -v option will show the stress-ng activity with in verbose mode.
The manual states:
"One can specify the number of processes to invoke per type of stress test; specifying a zero value will select the
number of processors available as defined by sysconf(_SC_NPROCESSORS_CONF), if that can't be determined then the number of online CPUs is used. If the value is less than zero then the number of online CPUs is used."
Hi Colin, thanks for the quick reply!
There are some vmstat lines after the initial 10s timeout that just keep going until I Ctrl-Z, but there's no I/O, I don't see any kernel errors in the log either. But -1 does let us use only the CPUs that are online. Unfortunately the command still hangs on "power-saving" mode. Once I switch to "all 12 CPUs available" mode I have a success.
Environment:
Steps to reproduce:
sudo mkdir -p /mnt/nvme0n1p1/ && sudo mount /dev/nvme0n1p1 /mnt/nvme0n1p1 && mkdir /mnt/nvme0n1p1/bg-temp/
sudo stress-ng --aggressive --verify --timeout 10 --temp-path /mnt/nvme0n1p1/bg-temp/ --hdd-opts dsync --readahead-bytes 16M -k --chmod 0
Expected result:
Test finishes within 10 seconds timeout
Actual result:
Test hangs indefinitely.
Probable root cause:
Command probably hangs due to using CPU nodes that are offline. The failure case logs show 12 chmods, whereas according to
jetson_clocks
only 8 CPU nodes are currently online:Making all 12 CPU nodes available with
sudo nvpmodel -m 3
makes the above command work as expected (nvpmodel is a NVIDIA utility for switching between power modes, by default we're in power saving mode which explains why some CPUs are offline.)The text was updated successfully, but these errors were encountered: