Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression in 3.2.1] Wrong CPU threads are shown when running in a container #1195

Closed
C0rn3j opened this issue Feb 24, 2023 · 11 comments · Fixed by #1197
Closed

[Regression in 3.2.1] Wrong CPU threads are shown when running in a container #1195

C0rn3j opened this issue Feb 24, 2023 · 11 comments · Fixed by #1197
Labels
bug 🐛 Something isn't working Containers 📦 Light-weight virtualization suites Linux 🐧 Linux related issues
Milestone

Comments

@C0rn3j
Copy link
Contributor

C0rn3j commented Feb 24, 2023

I have a host server with 48 threads running a container using LXC 5.0.2 where I am executing htop.

This container has the following LXC config set: limits.cpu: "16", meaning it gets assigned 16 random threads.

htop 3.0.5 shows the threads and their usage correctly:
image

htop 3.1.0 to 3.2.0 show things correctly while showing the unused host CPU threads too:
image

htop 3.2.1 to 3.2.2 however seem to show completely wrong threads by simply showing threads 1 to 16 as they go in order, irrelevant of the actual assignment. This is obvious when comparing against the previous screenshot.
image

@C0rn3j C0rn3j changed the title [Regression] Wrong CPU threads are shown when running in a container [Regression in 3.2.1] Wrong CPU threads are shown when running in a container Feb 24, 2023
@fasterit
Copy link
Member

Can you please provide a cat /proc/cpuinfo from inside the LXC container?
Probably /linux/LinuxProcessList.c:scanAvailableCPUsFromCPUinfo needs to be tinkered with a bit more to accommodate (more) LXC oddities.

@C0rn3j
Copy link
Contributor Author

C0rn3j commented Feb 24, 2023

cpuinfo.txt

@fasterit
Copy link
Member

ok, that cpuinfo only has 16 "processors" so we need to find a way to identify which real cores are the active ones. Can you please upload a tar -czf LXC_cpus.tar.gz /sys/devices/system/cpu/ ?

@C0rn3j
Copy link
Contributor Author

C0rn3j commented Feb 25, 2023

Sure thing
LXC_cpus.tar.gz

@fasterit
Copy link
Member

/sys/devices/system/cpu/online seems useful:
4,7,12,15-16,18-19,24,26-27,32,39,42-43,45,47

Ofc, the individual /cpu*/online are all "1" despite LXC configured to not use them

@fasterit
Copy link
Member

@C0rn3j / @BenBE*: If you add the "PROCESSOR" column to htop list view (Setup -> Screens -> Add "PROCESSOR" somewhere), do these go 1-16 or 4,7,12,15-16,18-19,24,26-27,32,39,42-43,45,47 as per the above list?

Trying to find out what LXC munges and what not.

*BenBE said on IRC they run LXC somewhere in production as well

@C0rn3j
Copy link
Contributor Author

C0rn3j commented Feb 25, 2023

The PROCESSOR column in 3.2.2 works correctly
image

@BenBE
Copy link
Member

BenBE commented Feb 25, 2023

Couldn't test myself yet, but from the screenshot this looks like the physical core id, not some logical re-ordering.

@fasterit
Copy link
Member

Thanks @C0rn3j. So LXC sees the "true" CPU IDs from the host. Which means we should probably revert the hack from 0d53245. Which means 2 core LXC containers on a 128 core host system, see 126 offline CPUs. Not nice but more correct.

May be as a second step we should extend the CPUMeter with an option to hide non-active CPUs?!?

@BenBE
Copy link
Member

BenBE commented Feb 25, 2023

May be as a second step we should extend the CPUMeter with an option to hide non-active CPUs?!?

Possibly with the extension I remarked in IRC to "remember" which notes have been seen, so systems with their full 128 cores would still see all the nodes while some go offline in idle. Otherwise this might cause quite some flicker on systems with many CPUs/nodes that have energy saving active.

fasterit pushed a commit to fasterit/htop that referenced this issue Feb 25, 2023
LXC shows the real host CPU ids but can be limited in configuration
as to which cores are used. Still the sysfs files are visible and the
CPUs (stay) marked online. We will need to parse
/sys/devices/system/cpu/online to follow LXC's logic.
Revert for now until we can come up with a better handling of the LXC hacks.

Cf. issue htop-dev#1195
Essentially reverting 33973f7 and 0d53245 (htop-dev#993, htop-dev#995)
@fasterit
Copy link
Member

Please try the PR from #1197. This should revert to LXC showing all CPU meters.

@BenBE BenBE added bug 🐛 Something isn't working Linux 🐧 Linux related issues labels Feb 25, 2023
@BenBE BenBE added this to the 3.3.0 milestone Feb 25, 2023
BenBE pushed a commit that referenced this issue Feb 25, 2023
LXC shows the real host CPU ids but can be limited in configuration
as to which cores are used. Still the sysfs files are visible and the
CPUs (stay) marked online. We will need to parse
/sys/devices/system/cpu/online to follow LXC's logic.
Revert for now until we can come up with a better handling of the LXC hacks.

Cf. issue #1195
Essentially reverting 33973f7 and 0d53245 (#993, #995)
@BenBE BenBE linked a pull request Feb 25, 2023 that will close this issue
@BenBE BenBE closed this as completed Feb 25, 2023
@BenBE BenBE added the Containers 📦 Light-weight virtualization suites label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working Containers 📦 Light-weight virtualization suites Linux 🐧 Linux related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants