Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible with Ryzen 9 3950X #2

Open
KeithMyers opened this issue Feb 14, 2021 · 22 comments
Open

Incompatible with Ryzen 9 3950X #2

KeithMyers opened this issue Feb 14, 2021 · 22 comments

Comments

@KeithMyers
Copy link

Just tried this new driver and monitor in advance for a friend who is getting a Ryzen 5950X.

I ran it against my Ryzen 3950X and got this error message.

keith@Serenity:~/Downloads/ryzen_monitor/src$ sudo ./ryzen_monitor
[sudo] password for keith: 
rd_buf: 0.1.1
**SMU Driver Version Incompatible With Library Version**
keith@Serenity:~/Downloads/ryzen_monitor/src$ 

@hattedsquirrel
Copy link
Owner

hattedsquirrel commented Feb 14, 2021

Thanks for the info. Apparently ryzen_smu got updated 5 days ago to a new version. I'll have to look into that later. We expect to see v.0.1.0.

Regarding your CPUs:
The 3950X will not work out of the box right now. You'd have to create a pm_table mapping first.
The 5950X on the other hand should just work fine (given the SMU driver version matches). I tested with the 5900X which is essentially the same chip, but with 4 cores permanentely disabled.

@KeithMyers
Copy link
Author

Will continue to watch this repo for updates. Thanks for the quick reply.

@hattedsquirrel
Copy link
Owner

I just checked in an update which now works for ryzen_smu v0.1.1 as well. You should now probably get a message about table version not supported.

If you are willing to provide pm_table dumps I can take a look and see how easy it is to guess the changes compared to the existing 3700X table.

You can create dumps by runnign the following script in bash (make sure you have read access to /sys/kernel/ryzen_smu_drv/pm_table and /sys/kernel/ryzen_smu_drv/pm_table_version first):

cat /sys/kernel/ryzen_smu_drv/pm_table_version | xxd -p > dump_pm_version
sleep 5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_idle.bin

yes > /dev/null &
sleep 0.5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_1Ta.bin
sleep 5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_1Tb.bin

yes > /dev/null &
sleep 0.5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_2Ta.bin
sleep 5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_2Tb.bin

for i in {1..30}; do (yes > /dev/null &); done
sleep 0.5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_32Ta.bin
sleep 5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_32Tb.bin

killall yes

Then pack all dump_* files and attach the archive. Thanks.

@KeithMyers
Copy link
Author

OK, here is the archive of the dump* files.
pm_dump.zip

@hattedsquirrel
Copy link
Owner

Could you test this patch? https://hattedsquirrel.net/downloads/ryzen_3950x-01.patch

@KeithMyers
Copy link
Author

Your patch file is corrupted at the end.
keith@Serenity:~/Downloads/ryzen_monitor/src$ patch < ryzen_3950x-01.patch
patching file pm_tables.c
patching file pm_tables.h
patching file ryzen_monitor.c
Hunk #1 succeeded at 275 (offset -2 lines).
Hunk #2 succeeded at 303 with fuzz 2 (offset -2 lines).
Hunk #3 succeeded at 314 (offset -4 lines).
Hunk #4 FAILED at 474.
Hunk #5 FAILED at 490.
2 out of 5 hunks FAILED -- saving rejects to file ryzen_monitor.c.rej

@hattedsquirrel
Copy link
Owner

pull the newest commits, then try again. I checked in some changes yesterday. Sorry about not mentioning that.

@KeithMyers
Copy link
Author

Ok, much better. Works now.
───────────────────────────────────────────────┬────────────────────────────────────────────────╮
│ CPU Model │ AMD Ryzen 9 3950X 16-Core Processor │
│ Processor Code Name │ Matisse │
│ Cores │ 16 │
│ Core CCDs │ 2 │
│ Core CCXs │ 4 │
│ Cores Per CCX │ 4 │
│ SMU FW Version │ v46.67.0 │
│ MP1 IF Version │ v11 │
╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯
╭─────────┬────────────┬──────────┬─────────┬──────────┬─────────────┬─────────────┬─────────────╮
│ Core 0 │ 4300 MHz | 5.843 W | 1.275 V | 66.72 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 1 │ 4300 MHz | 6.275 W | 1.275 V | 72.93 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 2 │ 4300 MHz | 5.881 W | 1.275 V | 67.52 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 3 │ 4300 MHz | 6.287 W | 1.275 V | 73.19 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 4 │ 4300 MHz | 5.805 W | 1.275 V | 66.22 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 5 │ 4300 MHz | 6.225 W | 1.275 V | 72.45 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 6 │ 4300 MHz | 5.481 W | 1.275 V | 65.69 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 7 │ 4300 MHz | 5.775 W | 1.275 V | 71.73 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 8 │ 4275 MHz | 5.385 W | 1.275 V | 70.92 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 9 │ 4275 MHz | 4.990 W | 1.275 V | 64.93 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 10 │ 4275 MHz | 5.665 W | 1.275 V | 70.95 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 11 │ 4275 MHz | 5.058 W | 1.275 V | 63.72 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 12 │ 4275 MHz | 5.639 W | 1.275 V | 72.14 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 13 │ 4275 MHz | 5.797 W | 1.275 V | 68.86 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 14 │ 4275 MHz | 5.814 W | 1.275 V | 73.11 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
│ Core 15 │ 4275 MHz | 5.864 W | 1.275 V | 69.81 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │
╰─────────┴────────────┴──────────┴─────────┴──────────┴─────────────┴─────────────┴─────────────╯
╭── Core Statistics (Calculated) ───────────────┬────────────────────────────────────────────────╮
│ Highest Effective Core Frequency │ 4300 MHz │
│ Highest Core Temperature │ 73.19 C │
│ Highest Core Voltage │ 1.275 V │
│ Average Core Voltage │ 0.000 V │
│ Average Core CC6 │ 0.00 % │
│ Total Core Power Sum │ 91.7840 W │
├── Reported by SMU ────────────────────────────┼────────────────────────────────────────────────┤
│ Peak Core Voltage │ 1.275 V │
│ Package CC6 │ 0.00 % │
╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯
╭── Electrical & Thermal Constraints ───────────┬────────────────────────────────────────────────╮
│ Peak Temperature │ 75.50 C │
│ SoC Temperature │ 37.55 C │
│ Voltage from Core VRM │ 1.100 V | 1.442 V | 76.27 % │
│ PPT │ 174.971 W | 142 W | 123.22 % │
│ TDC Value │ 113.832 A | 95 A | 119.82 % │
│ TDC Actual │ 90.914 A | 95 A | 95.70 % │
│ EDC │ 139.999 A | 140 A | 100.00 % │
│ THM │ 74.18 C | 95 C | 78.09 % │
│ FIT │ 0 | 258 | 0.01 % │
╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯
╭── Memory Interface ───────────────────────────┬────────────────────────────────────────────────╮
│ Coupled Mode │ ON │
│ Fabric Clock (Average) │ 1800 MHz │
│ Fabric Clock │ 1800 MHz │
│ Uncore Clock │ 1800 MHz │
│ Memory Clock │ 1800 MHz │
│ cLDO_VDDM │ 0.9504 V │
│ cLDO_VDDP │ 0.9002 V │
│ cLDO_VDDG │ 1.0477 V │
╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯
╭── Power Consumption ──────────────────────────┬────────────────────────────────────────────────╮
│ Total Core Power Sum │ 91.7840 W │
│ VDDCR_SOC Power │ 19.3636 W │
│ GMI2_VDDG Power │ 8.7156 W │
│ L3 Logic Power │ 0.517 W + 0.5365 W │
│ L3 Logic Power │ + 0.386 W + 0.3332 W = 1.7727 W │
│ L3 VDDM Power │ 0.350 W + 0.3510 W │
│ L3 VDDM Power │ + 0.369 W + 0.3652 W = 1.4350 W │
│ │ │
│ VDDIO_MEM Power │ 8.6723 W │
│ IOD_VDDIO_MEM Power │ 0.0000 W │
│ DDR_VDDP Power │ 5.1823 W │
│ VDD18 Power │ 0.8000 W │
│ │ │
│ Calculated Thermal Output │ 137.7255 W │
├── Additional Reports ─────────────────────────┼────────────────────────────────────────────────┤
│ SoC Power (SVI2) │ 1.094 V | 17.704 A | 19.364 W │
│ Core Power (SVI2) │ 1.275 V | 113.817 A | 145.117 W │
│ Core Power (SMU) │ 145.117 W │
│ Socket Power (SMU) │ 174.9525 W │
│ Package Power (SMU) │ nan W │
╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯

@hattedsquirrel
Copy link
Owner

Okay, cool. Thanks for the help and the screenshot. It also pointed out a bug in the calculation of "Average Core Voltage", which I now fixed.
I'll push all changes online now.

hattedsquirrel added a commit that referenced this issue Feb 15, 2021
Implemented table 0x240803 with the help of data provided in #2.
@KeithMyers
Copy link
Author

Ok, I'll pull the newest commit and test it for the missing average voltage value.

Was reading through the commit and noticed that you are limiting the application only to Ryzen parts.

Ever consider adding Epyc parts? You are hard coding a core limit of 16. My Epyc 7402P has 24 cores.

Would be nice to have the application usable on Epyc parts also.

@KeithMyers
Copy link
Author

All good. Average Core Voltage is now populated with actual value.

@hattedsquirrel
Copy link
Owner

The only reason Epyc isn't supported right now is that I don't know anything about them. The first step would be to find out which SMN registers to read and to see if they differ to the Ryzen series. Those registers are read to find out how many CCDs there are and which cores are disabled. If you are brave enough you can build and run the attached util and paste its output. (It also depends on the ryzen_smu kernel driver.) Maybe the registeres look simmilar enough to the Ryzen series.
smn_debug.tar.gz

@KeithMyers
Copy link
Author

I'll give it a shot. Glad to help developers with hardware testing.

@KeithMyers
Copy link
Author

Here is the smn_debug output from my AMD Epyc 7402P cpu.

ryzen_smu version string: 0.1.1
fam: 0x17
model: 0x31
logical_cores: 48
threads_per_core: 2
read 05d218: 02850a14, ret = OK
read 05d228: 95400000, ret = OK
read 05d258: 00000000, ret = OK
read 05d21c: 09120a14, ret = OK
read 05d22c: 0000002a, ret = OK
read 05d25c: 24401e81, ret = OK
read 30081800: 00000000, ret = OK
read 30081d98: 00000000, ret = OK
read 31081800: ffffffff, ret = OK
read 31081d98: ffffffff, ret = OK
read 32081800: ffffffff, ret = OK
read 32081d98: ffffffff, ret = OK
read 33081800: ffffffff, ret = OK
read 33081d98: ffffffff, ret = OK
read 34081800: 00000000, ret = OK
read 34081d98: 00000000, ret = OK
read 35081800: ffffffff, ret = OK
read 35081d98: ffffffff, ret = OK
read 36081800: ffffffff, ret = OK
read 36081d98: ffffffff, ret = OK

@KeithMyers
Copy link
Author

Gave ryzen_monitor a what the hell shot on the Epyc.

ryzen_smu version string: 0.1.1
PM Tables are not supported on this platform.

@hattedsquirrel
Copy link
Owner

Oh, thats unfortunate. The error message means that the ryzen_smu doesn't know how to read the PM table from the SMU yet. I looked into the code and the reason seems to be that they don't know which function number to call. Maybe you could reach out to them and see if they can get it going with your help.
Once ryzen_smu can read the PM table I'm positive we can get things working on my side as well.

@KeithMyers
Copy link
Author

I will do that. Zenpower module works with my 7402P. Zen Monitor also. But it does not work on the 7502 or 7642 with the higher core counts.

@level1wendell
Copy link

level1wendell commented Jun 8, 2021

I can provide remote access to epyc rome and Milan if that's useful. Also how do I contribute $ to fund further work here? (You should sign up for github sponsor?)

@hattedsquirrel
Copy link
Owner

Can you check if ryzen_smu provides /sys/kernel/ryzen_smu_drv/pm_table and /sys/kernel/ryzen_smu_drv/pm_table/pm_table_version on your machines? This underlying support needs to be in place before we can start implementing support on our end.

@patrickschur
Copy link

@level1wendell I would like to have access to an Epyc server. How can I reach you?

@level1wendell
Copy link

level1wendell commented Jun 11, 2021 via email

@patrickschur
Copy link

@level1wendell You got an email. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants