High Kepler CPU usage under normal workloads #1670

vimalk78 · 2024-08-05T16:04:43Z

Without any load on system, kepler CPU usage goes upto 20%

vimalk78 · 2024-08-05T16:05:01Z

vimalk78 · 2024-08-06T08:42:06Z

on latest main, if machine is loaded with stress-ng, the kepler cpu usage spikes. In comparison, the kepler before ringbuffer does not show increase in cpu if machine is loaded.

vimalk78 · 2024-08-06T08:47:45Z

comparing with old code, some kepler cpu usage spike is understandable since some processing ( 3 map lookup, 2 update, 1 delete) was happening in kernel context and cpu cycles for these were accounted for in the kernel, which now happens in user space and gets counted as kepler cpu.

need to check if we can reduce the cpu spike in kepler when machine is loaded.

dave-tucker · 2024-08-06T12:01:27Z

need to check if we can reduce the cpu spike in kepler when machine is loaded.

exactly! I'm now able to reproduce with stress-ng and I'm working to keep that CPU spike as low as possible.

rootfs · 2024-08-06T16:48:30Z

@dave-tucker can you create a feature branch, move the code there, and revert the related commits?

vimalk78 · 2024-08-07T06:54:41Z

i ran some perf stat tests to check impact of kepler on context switch time. idea being that since kepler traps sched_switch and does some processing, it should have some impact on context switch time. stress-ng is used in parallel to simulate load.

without running kepler

root@bkr18:~# sudo perf stat -a -e sched:sched_switch --timeout 600000 # with no kepler with load

 Performance counter stats for 'system wide':

        90,480,301      sched:sched_switch                                                    

     600.105927296 seconds time elapsed

with running kepler release-0.7.11

root@bkr18:~# sudo perf stat -a -e sched:sched_switch --timeout 600000 # with kepler 0.7.11 with load

 Performance counter stats for 'system wide':

        87,500,721      sched:sched_switch                                                    

     600.100293869 seconds time elapsed

with running kepler latest (with ring buffer )

root@bkr18:~# sudo perf stat -a -e sched:sched_switch --timeout 600000 # with kepler latest with load

 Performance counter stats for 'system wide':

        79,620,228      sched:sched_switch                                                    

     600.099929726 seconds time elapsed

Observation: with kepler running, the number of context switches goes down, as expected. But with ring-buffer changes, the drop is more than 7-11 release.

Test is run on a bare-metal machine with almost no other load.

stress-ng command:
stress-ng --cpu 8 --iomix 4 --vm 2 --vm-bytes 128M --fork 4 --timeout 11m

dave-tucker mentioned this issue Aug 5, 2024

fix(pkg/bpf): Use channel to process events #1671

Merged

ArneTR mentioned this issue Aug 27, 2024

Overhead Measurements green-kernel/ebpf-kernel#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Kepler CPU usage under normal workloads #1670

High Kepler CPU usage under normal workloads #1670

vimalk78 commented Aug 5, 2024

vimalk78 commented Aug 5, 2024

vimalk78 commented Aug 6, 2024 •

edited

Loading

vimalk78 commented Aug 6, 2024

dave-tucker commented Aug 6, 2024

rootfs commented Aug 6, 2024

vimalk78 commented Aug 7, 2024

High Kepler CPU usage under normal workloads #1670

High Kepler CPU usage under normal workloads #1670

Comments

vimalk78 commented Aug 5, 2024

vimalk78 commented Aug 5, 2024

vimalk78 commented Aug 6, 2024 • edited Loading

vimalk78 commented Aug 6, 2024

dave-tucker commented Aug 6, 2024

rootfs commented Aug 6, 2024

vimalk78 commented Aug 7, 2024

vimalk78 commented Aug 6, 2024 •

edited

Loading