Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to RAPL counters on some CPU / kernel combinaison #20

Open
PierreRustOrange opened this issue Dec 16, 2021 · 20 comments
Open

Access to RAPL counters on some CPU / kernel combinaison #20

PierreRustOrange opened this issue Dec 16, 2021 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@PierreRustOrange
Copy link
Contributor

On some system, the sensor fails to access RAPL counters and we get this error at startup:

E: 21-12-07 11:14:26 config: event 'RAPL_ENERGY_PKG' is invalid or unsupported by this machine

However, on the same systems, we can see rapl data in the powercap sysfs.

powerapi-ng/powerapi#125 is probably an example of such error.

Actually the sensor use the perf subsystem to access rapl, which is implemented in a different part of the kernel source tree than powercap. Thus I suspect that this can happens when the kernel contains, for the cpu of the machine, the implementation of powercap but not of rapl access in perf.

I suggest we implement a fallback access to RAPL using powercap sysfs, when we cannot use perf.

@PierreRustOrange
Copy link
Contributor Author

It seems this can also happen if the appropriate kernel module is not loaded.
For ubuntu, the module is in the linux-modules-extra package

apt install linux-modules-extra-$(uname -r)
update-initramfs -c -k $(uname -r)

The module is in /usr/lib/modules/$(uname -r)/kernel/arch/x86/events/ in rapl.ko for recent kernels or intel/intel-rapl-perf.ko for older kernels.

Thanks @gfieni for this info !

However, I think there are still cases where perf implementation is not available in a kernel (for a recent cpu), while powercap is ok. For example with a 5.4 kernel on a i7-10875H (that's a laptop spu, but I've seen similar issue with server class cpu).

@dsaingre
Copy link

dsaingre commented Feb 3, 2022

Hi,
Is there any update on the issue ? @PierreRustOrange @rouvoy

It seems I can't use the hwpc-sensor. My issue seems to be similar to powerapi-ng/powerapi#125
Even after trying to install the appropriate kernel module by running the command advised in the previous comment, I still have issues with perf

sudo perf stat -a -e "power/energy-cores/" /bin/ls
[sudo] Mot de passe de dimitri : 
event syntax error: 'power/energy-cores/'
                     \___ Cannot find PMU `power'. Missing kernel support?
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events

I have a laptop with a 5.13.0-28-generic kernel version and a 11th Gen Intel(R) Core(TM) i7-1165G7 CPU (if any more infos could help don't hesitate to ask me)

Would the advice solution be to implement a sensor accessing RAPL through powercap sysfs ?

@gfieni
Copy link
Collaborator

gfieni commented Feb 7, 2022

Hello @dsaingre,
Which Linux distribution are you using ?
Are the energy readings of powercap available on your system ?
Could you give the result of the modinfo rapl command ?

@dsaingre
Copy link

dsaingre commented Feb 8, 2022

Hi @gfieni,
I'm using Ubuntu 20.04.3 LTS.
I believe I do have the energy readings available :

tree /sys/devices/virtual/powercap
/sys/devices/virtual/powercap
├── dtpm
│   ├── enabled
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── subsystem -> ../../../../class/powercap
│   └── uevent
├── intel-rapl
│   ├── enabled
│   ├── intel-rapl:0
│   │   ├── constraint_0_max_power_uw
│   │   ├── constraint_0_name
│   │   ├── constraint_0_power_limit_uw
│   │   ├── constraint_0_time_window_us
│   │   ├── constraint_1_max_power_uw
│   │   ├── constraint_1_name
│   │   ├── constraint_1_power_limit_uw
│   │   ├── constraint_1_time_window_us
│   │   ├── constraint_2_max_power_uw
│   │   ├── constraint_2_name
│   │   ├── constraint_2_power_limit_uw
│   │   ├── constraint_2_time_window_us
│   │   ├── device -> ../../intel-rapl
│   │   ├── enabled
│   │   ├── energy_uj
│   │   ├── intel-rapl:0:0
│   │   │   ├── constraint_0_max_power_uw
│   │   │   ├── constraint_0_name
│   │   │   ├── constraint_0_power_limit_uw
│   │   │   ├── constraint_0_time_window_us
│   │   │   ├── device -> ../../intel-rapl:0
│   │   │   ├── enabled
│   │   │   ├── energy_uj
│   │   │   ├── max_energy_range_uj
│   │   │   ├── name
│   │   │   ├── power
│   │   │   │   ├── async
│   │   │   │   ├── autosuspend_delay_ms
│   │   │   │   ├── control
│   │   │   │   ├── runtime_active_kids
│   │   │   │   ├── runtime_active_time
│   │   │   │   ├── runtime_enabled
│   │   │   │   ├── runtime_status
│   │   │   │   ├── runtime_suspended_time
│   │   │   │   └── runtime_usage
│   │   │   ├── subsystem -> ../../../../../../class/powercap
│   │   │   └── uevent
│   │   ├── intel-rapl:0:1
│   │   │   ├── constraint_0_max_power_uw
│   │   │   ├── constraint_0_name
│   │   │   ├── constraint_0_power_limit_uw
│   │   │   ├── constraint_0_time_window_us
│   │   │   ├── device -> ../../intel-rapl:0
│   │   │   ├── enabled
│   │   │   ├── energy_uj
│   │   │   ├── max_energy_range_uj
│   │   │   ├── name
│   │   │   ├── power
│   │   │   │   ├── async
│   │   │   │   ├── autosuspend_delay_ms
│   │   │   │   ├── control
│   │   │   │   ├── runtime_active_kids
│   │   │   │   ├── runtime_active_time
│   │   │   │   ├── runtime_enabled
│   │   │   │   ├── runtime_status
│   │   │   │   ├── runtime_suspended_time
│   │   │   │   └── runtime_usage
│   │   │   ├── subsystem -> ../../../../../../class/powercap
│   │   │   └── uevent
│   │   ├── max_energy_range_uj
│   │   ├── name
│   │   ├── power
│   │   │   ├── async
│   │   │   ├── autosuspend_delay_ms
│   │   │   ├── control
│   │   │   ├── runtime_active_kids
│   │   │   ├── runtime_active_time
│   │   │   ├── runtime_enabled
│   │   │   ├── runtime_status
│   │   │   ├── runtime_suspended_time
│   │   │   └── runtime_usage
│   │   ├── subsystem -> ../../../../../class/powercap
│   │   └── uevent
│   ├── intel-rapl:1
│   │   ├── constraint_0_max_power_uw
│   │   ├── constraint_0_name
│   │   ├── constraint_0_power_limit_uw
│   │   ├── constraint_0_time_window_us
│   │   ├── constraint_1_max_power_uw
│   │   ├── constraint_1_name
│   │   ├── constraint_1_power_limit_uw
│   │   ├── constraint_1_time_window_us
│   │   ├── device -> ../../intel-rapl
│   │   ├── enabled
│   │   ├── energy_uj
│   │   ├── max_energy_range_uj
│   │   ├── name
│   │   ├── power
│   │   │   ├── async
│   │   │   ├── autosuspend_delay_ms
│   │   │   ├── control
│   │   │   ├── runtime_active_kids
│   │   │   ├── runtime_active_time
│   │   │   ├── runtime_enabled
│   │   │   ├── runtime_status
│   │   │   ├── runtime_suspended_time
│   │   │   └── runtime_usage
│   │   ├── subsystem -> ../../../../../class/powercap
│   │   └── uevent
│   ├── power
│   │   ├── async
│   │   ├── autosuspend_delay_ms
│   │   ├── control
│   │   ├── runtime_active_kids
│   │   ├── runtime_active_time
│   │   ├── runtime_enabled
│   │   ├── runtime_status
│   │   ├── runtime_suspended_time
│   │   └── runtime_usage
│   ├── subsystem -> ../../../../class/powercap
│   └── uevent
└── intel-rapl-mmio
    ├── enabled
    ├── intel-rapl-mmio:0
    │   ├── constraint_0_max_power_uw
    │   ├── constraint_0_name
    │   ├── constraint_0_power_limit_uw
    │   ├── constraint_0_time_window_us
    │   ├── constraint_1_max_power_uw
    │   ├── constraint_1_name
    │   ├── constraint_1_power_limit_uw
    │   ├── constraint_1_time_window_us
    │   ├── device -> ../../intel-rapl-mmio
    │   ├── enabled
    │   ├── energy_uj
    │   ├── max_energy_range_uj
    │   ├── name
    │   ├── power
    │   │   ├── async
    │   │   ├── autosuspend_delay_ms
    │   │   ├── control
    │   │   ├── runtime_active_kids
    │   │   ├── runtime_active_time
    │   │   ├── runtime_enabled
    │   │   ├── runtime_status
    │   │   ├── runtime_suspended_time
    │   │   └── runtime_usage
    │   ├── subsystem -> ../../../../../class/powercap
    │   └── uevent
    ├── power
    │   ├── async
    │   ├── autosuspend_delay_ms
    │   ├── control
    │   ├── runtime_active_kids
    │   ├── runtime_active_time
    │   ├── runtime_enabled
    │   ├── runtime_status
    │   ├── runtime_suspended_time
    │   └── runtime_usage
    ├── subsystem -> ../../../../class/powercap
    └── uevent

(is this relevant and what you're asking? Not very knowledgeable yet on powercap and co)

Regarding modinfo rapl:

filename:       /lib/modules/5.13.0-28-generic/kernel/arch/x86/events/rapl.ko
license:        GPL
srcversion:     E0C3F70A00E2957694E4176
alias:          cpu:type:x86,ven0002fam0019mod*:feature:*
alias:          cpu:type:x86,ven0009fam0018mod*:feature:*
alias:          cpu:type:x86,ven0002fam0017mod*:feature:*
alias:          cpu:type:x86,ven0000fam0006mod008F:feature:*
alias:          cpu:type:x86,ven0000fam0006mod009A:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0097:feature:*
alias:          cpu:type:x86,ven0000fam0006mod00A5:feature:*
alias:          cpu:type:x86,ven0000fam0006mod00A6:feature:*
alias:          cpu:type:x86,ven0000fam0006mod006A:feature:*
alias:          cpu:type:x86,ven0000fam0006mod006C:feature:*
alias:          cpu:type:x86,ven0000fam0006mod007D:feature:*
alias:          cpu:type:x86,ven0000fam0006mod007E:feature:*
alias:          cpu:type:x86,ven0000fam0006mod007A:feature:*
alias:          cpu:type:x86,ven0000fam0006mod005F:feature:*
alias:          cpu:type:x86,ven0000fam0006mod005C:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0066:feature:*
alias:          cpu:type:x86,ven0000fam0006mod009E:feature:*
alias:          cpu:type:x86,ven0000fam0006mod008E:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0055:feature:*
alias:          cpu:type:x86,ven0000fam0006mod005E:feature:*
alias:          cpu:type:x86,ven0000fam0006mod004E:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0085:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0057:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0056:feature:*
alias:          cpu:type:x86,ven0000fam0006mod004F:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0047:feature:*
alias:          cpu:type:x86,ven0000fam0006mod003D:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0046:feature:*
alias:          cpu:type:x86,ven0000fam0006mod0045:feature:*
alias:          cpu:type:x86,ven0000fam0006mod003F:feature:*
alias:          cpu:type:x86,ven0000fam0006mod003C:feature:*
alias:          cpu:type:x86,ven0000fam0006mod003E:feature:*
alias:          cpu:type:x86,ven0000fam0006mod003A:feature:*
alias:          cpu:type:x86,ven0000fam0006mod002D:feature:*
alias:          cpu:type:x86,ven0000fam0006mod002A:feature:*
depends:        
retpoline:      Y
intree:         Y
name:           rapl
vermagic:       5.13.0-28-generic SMP mod_unload modversions 
sig_id:         PKCS#7
signer:         Build time autogenerated kernel key
sig_key:        65:04:EF:DB:22:8E:60:98:46:12:AA:25:C3:1D:F0:FA:DE:9C:5F:68
sig_hashalgo:   sha512
signature:      43:C4:06:AF:9D:08:1D:3F:0F:6F:56:DD:20:BE:72:23:5D:D2:2E:98:
		06:D6:7F:59:A4:33:5A:07:2F:A3:73:6A:BB:D7:F9:67:60:87:82:75:
		92:A1:B0:41:DC:37:D5:BA:B7:A9:44:50:E1:26:47:B8:CA:65:3D:49:
		97:62:2A:32:13:4B:22:F2:28:A5:16:19:3D:E6:CD:6D:E1:06:DE:96:
		07:A1:FD:37:F9:9F:B3:48:D9:CA:30:40:14:4D:28:D0:E9:56:1C:4A:
		1E:02:58:74:76:07:A0:D4:3F:6D:A5:2C:71:19:D4:C1:0A:8B:60:AD:
		EB:E5:66:14:43:28:7A:B0:F0:62:E9:93:5B:D9:7D:F7:DE:F0:A5:DA:
		7E:F4:07:4C:55:33:1C:E2:C8:62:3E:4C:05:62:CF:E7:CD:43:81:15:
		87:27:4B:89:BA:C2:AD:07:AB:43:BA:65:F7:1C:61:9E:C6:B6:56:3D:
		3C:CC:CC:ED:61:FE:71:2E:B1:45:4D:FD:98:3E:C3:4A:75:9E:7F:D9:
		D8:1F:80:23:FD:C2:20:00:3B:C6:20:41:8D:89:A5:45:C5:AF:EC:63:
		EB:C9:06:D4:E2:EE:6D:70:2B:50:CA:CF:03:C5:58:07:A8:AD:F9:5F:
		6B:80:CD:90:E8:EF:BD:10:C0:1F:9D:8F:48:A6:F8:52:7B:F5:0B:CB:
		D9:8D:0D:B8:1D:17:40:52:AE:DA:90:85:92:F5:2A:65:5E:89:29:F7:
		FC:E1:55:E6:88:18:02:89:6A:AA:A2:E1:34:7E:DA:96:50:F4:B1:04:
		FE:8E:A1:B2:99:54:20:80:5A:AB:89:AD:A0:77:C6:2F:6F:6B:16:3F:
		5D:01:1A:2B:C1:A9:36:3C:13:CA:60:50:48:0E:D7:ED:1D:4A:F3:2F:
		65:BD:7C:2D:47:B8:65:EE:3A:54:08:8A:49:5D:EA:78:59:DA:05:F5:
		49:C6:A1:F3:ED:B6:F3:65:A0:0B:31:E3:9E:BF:F1:E6:9B:F0:9F:75:
		D6:9E:37:DC:61:A8:E9:84:DD:23:FC:BC:E2:42:00:D6:65:A7:6A:18:
		BF:8C:67:02:D5:9C:04:15:03:AE:13:47:47:8B:AC:AF:F4:4C:BA:EB:
		A9:AC:2E:99:32:A6:A7:29:E7:10:0A:E0:E6:F3:A1:6B:9B:C8:D7:4B:
		43:B6:A5:C7:DF:7E:FA:3D:11:26:F8:F7:E4:F4:E9:AA:14:D3:64:43:
		4C:CB:9A:DE:09:8B:2B:0D:E7:8A:78:7D:8D:59:F9:42:19:49:2C:14:
		CF:30:91:B1:BA:07:36:3D:26:57:7A:6C:2E:F4:C3:61:80:14:02:BD:
		DE:16:EB:05:A8:C8:5A:75:06:FC:FF:84

Does it helps to see if the issue is coming from my side?

@PierreRustOrange
Copy link
Contributor Author

I think that's another case where rapl support is implemented in powercap (and thus fs access) but not in the perf tool.

If I'm understanding that code correctly (clearly no warranted here !! :), support for rapl is not even implemented in the current source tree, in perf
https://github.com/torvalds/linux/blob/555f3d7be91a873114c9656069f1a9fa476ec41a/arch/x86/events/rapl.c#L776

Meanwhile it's been implemented in powercap two years ago :
https://github.com/torvalds/linux/blob/0917b95079af82c69d8f5bab301faeebcd2cb3cd/arch/x86/events/msr.c#L89

I think we still need an option for the sensor to read the rapl information through the powercap fs .

@Mbenni
Copy link

Mbenni commented May 24, 2022

Hi,
is there any update regarding this issue ? @PierreRustOrange @rouvoy
I've tried everything which was already said but i still can't use the hwpc sensor.
I am using Ubuntu 22.04 LTS and Linux Kernel 5.15.0-30-generic. When i'm trying to start the sensor it seem it can't access to RAPL_ENERGY_PKG event :

$ docker run --rm --net=host --privileged --pid=host -v /sys:/sys -v /var/lib/docker/containers:/var/lib/docker/containers:ro -v /tmp/powerapi-sensor-reporting:/reporting -v $(pwd):/srv -v $(pwd)/config_file.json:/config_file.json powerapi/hwpc-sensor --config-file srv/config_sensor.json

I: 22-05-24 14:00:50 build: version v1.1.2 (rev: eba2fe195878bae1afadb29fb6da7c4151c890ad) (Jan 21 2022 - 14:54:06)
I: 22-05-24 14:00:50 uname: Linux 5.15.0-30-generic #31-Ubuntu SMP Thu May 5 10:00:34 UTC 2022 x86_64
E: 22-05-24 14:00:50 config: event 'RAPL_ENERGY_PKG' is invalid or unsupported by this machine
E: 22-05-24 14:00:50 config: failed to parse the provided config  file

I also get an issue with perf

& sudo perf stat -a -e "power/energy-cores/" /bin/ls
[sudo] password for mbennani: 
event syntax error: 'power/energy-cores/'
                     \___ Cannot find PMU `power'. Missing kernel support?
Run 'perf list' for a list of valid events

 Usage: perf stat [<options>] [<command>]

    -e, --event <event>   event selector. use 'perf list' to list available events

@PierreRustOrange
Copy link
Contributor Author

Hi,
could you please tell us the reference of the cpu you'are using ?

@Mbenni
Copy link

Mbenni commented May 24, 2022

Hi,
sorry i forgot to tell i am using a 11th Gen Intel(R) Core(TM) i7-11390H @ 3.40GHz

@BZConserto
Copy link

Hi @PierreRustOrange,
I have the same problem @Mbenni.
I've tried everything which was already said but i still can't use the hwpc sensor.
I am using Ubuntu 20.04.4 LTS, Linux Kernel 5.13.0-41-generic x86_64 and the reference of the cpu 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz × 8.
Thank you in advance for your answer.

@rouvoy rouvoy added the bug Something isn't working label May 30, 2022
@roda82
Copy link

roda82 commented Jun 3, 2022

Hello everyone,

We investigated this issue and it is clear that the Linux kernel (packaged with Ubuntu) does not support energy events access at least for "Tiger Lake" and "Rocket Lake" Intel families via the perf interface. To deploy hwpc_sensor on these families, the current solution requires to modify the kernel (cf. arch/x86/events/rapl.c) and recompile it. If you cannot do that, the best that we can do now is to create a list of supported families with your help. To check if you can access energy events on your host machine, you should run the command perf list | grep power/ and check that the output is not empty.

@BZConserto
Copy link

Hi,
Thank you for your response.
For me, the output is empty ?
Thanks again

@dsaingre
Copy link

dsaingre commented Jun 8, 2022

Output is empty on my side too

@roda82
Copy link

roda82 commented Jun 8, 2022

Hello,
In that case you have to modify your kernel if you want to use hwpc_sensor.

@BZConserto
Copy link

Hello,
Thank you once again for your answer!
I modified the kernel, now it works.
I have a small question :). The measurements with smartwatts are watts or milliwatts? because I have weird values on Grafana of the order of 1000000?
Thanks,

@Mbenni
Copy link

Mbenni commented Jun 9, 2022

Hi @BZConserto
Could you please tell us what did you modify in the kernel ? I am only student but this would help me a lot in my research.
Thank you.

@BZConserto
Copy link

Hi @Mbenni
I only have modified the linux kernel. Before I had 5.13.0-41, now, I installed 5.10.0-14.
I hope its help you.

@BZConserto
Copy link

Hello, Thank you once again for your answer! I modified the kernel, now it works. I have a small question :). The measurements with smartwatts are watts or milliwatts? because I have weird values on Grafana of the order of 1000000? Thanks,

@roda82
Copy link

roda82 commented Jun 14, 2022

Hello, measurements are in watts.

@Laccio
Copy link

Laccio commented Nov 18, 2023

Hello everyone, from i5 13600k with Ubuntu 22.04.3 LTS and kernel as 5.10.0-051000-generic giving:

sudo docker run --rm \
--net=host \
--privileged \
--pid=host \
-v /sys:/sys \
-v /var/lib/docker/containers:/var/lib/docker/containers:ro \
-v /tmp/powerapi-sensor-reporting:/reporting \
-v $(pwd):/srv \
powerapi/hwpc-sensor \
-n "$(hostname -f)" \
-r "mongodb" -U "mongodb://127.0.0.1" -D "test" -C "prep" \
-s "rapl" -o -e "RAPL_ENERGY_PKG" \
-s "msr" -e "TSC" -e "APERF" -e "MPERF" \
-c "core" -e "CPU_CLK_UNHALTED:REF_P" -e "CPU_CLK_UNHALTED:THREAD_P" -e "LLC_MISSES" -e "INSTRUCTIONS_RETIRED"

I'm getting this output.

I: 23-11-18 22:41:43 build: version unknown (rev: unknown)
I: 23-11-18 22:41:43 uname: Linux 5.10.0-051000-generic #202012132330 SMP Sun Dec 13 23:33:36 UTC 2020 x86_64
I: 23-11-18 22:41:43 pmu: found ix86arch 'Intel X86 architectural PMU' having 7 events, 9 counters (6 general, 3 fixed)
I: 23-11-18 22:41:43 pmu: found perf 'perf_events generic PMU' having 184 events, 0 counters (0 general, 0 fixed)
I: 23-11-18 22:41:43 pmu: found perf_raw 'perf_events raw PMU' having 1 events, 0 counters (0 general, 0 fixed)
I: 23-11-18 22:41:43 pmu: found intel_msr 'Intel MSR' having 6 events, 6 counters (0 general, 6 fixed)
E: 23-11-18 22:41:43 config: event 'RAPL_ENERGY_PKG' is invalid or unsupported by this machine
E: 23-11-18 22:41:43 config: failed to parse the provided command-line arguments

What do u suggest me to do? I have already downgraded the kernel to 5.10 as suggested above but still not working. I need RAPL energy for my studies.

@roda82
Copy link

roda82 commented Nov 22, 2023

Hi,
Unfortunately, currently the Linux Kernel does not support energy events access for your "Raptor Lake" Intel Processor. We are working in a new Formula based on procfs that will allow the usage of PowerAPI with this kind of processors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants