Convert hardcoded list of OS names into a default case & Merge upstream #95

agravgaard · 2020-06-14T12:00:19Z

Reasoning described in #94

agravgaard · 2020-06-15T10:41:46Z

This patch will currently fail as the dkms build:

  MODPOST 4 modules
ERROR: modpost: "kallsyms_lookup_name" [/var/lib/dkms/amdgpu-3.5/30/build/amd/amdkcl/amdkcl.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:94: __modpost] Error 1
make: *** [Makefile:1642: modules] Error 2
make: Leaving directory '/usr/lib/modules/5.7.0-3-MANJARO/build'

The reason is better explained in: anbox/anbox-modules#49 and here
kallsyms_lookup_name is no longer exported in 5.7 for security reasons, so I guess all referenced in amdkcl and dkms should be avoided for 5.7.

Adjust kernel version conditionals:

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/2106edaef95ae2262097652454bce1fe720b2b9b/drivers/gpu/drm/amd/amdkcl/kcl_common.h#L11-L21

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/2106edaef95ae2262097652454bce1fe720b2b9b/drivers/gpu/drm/amd/amdkcl/symbols#L3-L5

And possibly in some of the configure scripts in amd/dkms

agravgaard · 2020-06-15T15:52:32Z

The DKMS build is now succeeding for me with 4.19, 5.6 and 5.7.

With the module installed Linux 5.7 boots fine and has the attached rocminfo and dmesg output:
rocminfo_output.txt
dmesg_grep_amdgpu.txt

Do you need more info, testing or other logs ?

drop the duplicate register macros from sid.h and use the standard ones in the oss register headers. Acked-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

Currently the goto statement is skipping over a lot of setup code because it is outside of an if-block and should be inside it. Fix this by adding missing if statement braces. Addresses-Coverity: ("Structurally dead code") Fixes: fd151ca5396d ("drm amdgpu: SI UVD v3_1") Acked-by: Nirmoy Das <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

In the cases where adev->jpeg.num_jpeg_inst is zero or the condition adev->jpeg.harvest_config & (1 << i) is always non-zero the variable ret is never set to an error condition and the function returns an uninitialized value in ret. Since the only exit condition at the end if the function is a success then explicitly return 0 rather than a potentially uninitialized value in ret. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 14f43e8 ("drm/amdgpu: move JPEG2.5 out from VCN2.5") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

[ 150.887733] ====================================================== [ 150.893903] WARNING: possible circular locking dependency detected [ 150.905917] ------------------------------------------------------ [ 150.912129] kfdtest/4081 is trying to acquire lock: [ 150.917002] ffff8f7f3762e118 (&mm->mmap_sem#2){++++}, at: __might_fault+0x3e/0x90 [ 150.924490] but task is already holding lock: [ 150.930320] ffff8f7f49d229e8 (&dqm->lock_hidden){+.+.}, at: destroy_queue_cpsch+0x29/0x210 [amdgpu] [ 150.939432] which lock already depends on the new lock. [ 150.947603] the existing dependency chain (in reverse order) is: [ 150.955074] -> ROCm#3 (&dqm->lock_hidden){+.+.}: [ 150.960822] __mutex_lock+0xa1/0x9f0 [ 150.964996] evict_process_queues_cpsch+0x22/0x120 [amdgpu] [ 150.971155] kfd_process_evict_queues+0x3b/0xc0 [amdgpu] [ 150.977054] kgd2kfd_quiesce_mm+0x25/0x60 [amdgpu] [ 150.982442] amdgpu_amdkfd_evict_userptr+0x35/0x70 [amdgpu] [ 150.988615] amdgpu_mn_invalidate_hsa+0x41/0x60 [amdgpu] [ 150.994448] __mmu_notifier_invalidate_range_start+0xa4/0x240 [ 151.000714] copy_page_range+0xd70/0xd80 [ 151.005159] dup_mm+0x3ca/0x550 [ 151.008816] copy_process+0x1bdc/0x1c70 [ 151.013183] _do_fork+0x76/0x6c0 [ 151.016929] __x64_sys_clone+0x8c/0xb0 [ 151.021201] do_syscall_64+0x4a/0x1d0 [ 151.025404] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 151.030977] -> ROCm#2 (&adev->notifier_lock){+.+.}: [ 151.036993] __mutex_lock+0xa1/0x9f0 [ 151.041168] amdgpu_mn_invalidate_hsa+0x30/0x60 [amdgpu] [ 151.047019] __mmu_notifier_invalidate_range_start+0xa4/0x240 [ 151.053277] copy_page_range+0xd70/0xd80 [ 151.057722] dup_mm+0x3ca/0x550 [ 151.061388] copy_process+0x1bdc/0x1c70 [ 151.065748] _do_fork+0x76/0x6c0 [ 151.069499] __x64_sys_clone+0x8c/0xb0 [ 151.073765] do_syscall_64+0x4a/0x1d0 [ 151.077952] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 151.083523] -> ROCm#1 (mmu_notifier_invalidate_range_start){+.+.}: [ 151.090833] change_protection+0x802/0xab0 [ 151.095448] mprotect_fixup+0x187/0x2d0 [ 151.099801] setup_arg_pages+0x124/0x250 [ 151.104251] load_elf_binary+0x3a4/0x1464 [ 151.108781] search_binary_handler+0x6c/0x210 [ 151.113656] __do_execve_file.isra.40+0x7f7/0xa50 [ 151.118875] do_execve+0x21/0x30 [ 151.122632] call_usermodehelper_exec_async+0x17e/0x190 [ 151.128393] ret_from_fork+0x24/0x30 [ 151.132489] -> #0 (&mm->mmap_sem#2){++++}: [ 151.138064] __lock_acquire+0x11a1/0x1490 [ 151.142597] lock_acquire+0x90/0x180 [ 151.146694] __might_fault+0x68/0x90 [ 151.150879] read_sdma_queue_counter+0x5f/0xb0 [amdgpu] [ 151.156693] update_sdma_queue_past_activity_stats+0x3b/0x90 [amdgpu] [ 151.163725] destroy_queue_cpsch+0x1ae/0x210 [amdgpu] [ 151.169373] pqm_destroy_queue+0xf0/0x250 [amdgpu] [ 151.174762] kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu] [ 151.180577] kfd_ioctl+0x223/0x400 [amdgpu] [ 151.185284] ksys_ioctl+0x8f/0xb0 [ 151.189118] __x64_sys_ioctl+0x16/0x20 [ 151.193389] do_syscall_64+0x4a/0x1d0 [ 151.197569] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 151.203141] other info that might help us debug this: [ 151.211140] Chain exists of: &mm->mmap_sem#2 --> &adev->notifier_lock --> &dqm->lock_hidden [ 151.222535] Possible unsafe locking scenario: [ 151.228447] CPU0 CPU1 [ 151.232971] ---- ---- [ 151.237502] lock(&dqm->lock_hidden); [ 151.241254] lock(&adev->notifier_lock); [ 151.247774] lock(&dqm->lock_hidden); [ 151.254038] lock(&mm->mmap_sem#2); This commit fixes the warning by ensuring get_user() is not called while reading SDMA stats with dqm_lock held as get_user() could cause a page fault which leads to the circular locking scenario. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>

The release_firmware() function is NULL tolerant so we do not need to check for NULL param before calling it. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]>

Add a switch statement to simplify asic checks. Note that BACO is not supported on APUs, so there is no need to check them. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

The failures with ROCm only happen with noretry=1, so enable runtime pm when noretry=0 (the current default). Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and following started to happen on each boot: ... BUG: kernel NULL pointer dereference, address: 0000000000000128 ... CPU: 9 PID: 1940 Comm: modprobe Tainted: G E 5.7.2-200.im0.fc32.x86_64 ROCm#1 Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020 RIP: 0010:lock_bus+0x42/0x60 [amdgpu] ... Call Trace: i2c_smbus_xfer+0x3d/0xf0 i2c_default_probe+0xf3/0x130 i2c_detect.isra.0+0xfe/0x2b0 ? kfree+0xa3/0x200 ? kobject_uevent_env+0x11f/0x6a0 ? i2c_detect.isra.0+0x2b0/0x2b0 __process_new_driver+0x1b/0x20 bus_for_each_dev+0x64/0x90 ? 0xffffffffc0f34000 i2c_register_driver+0x73/0xc0 do_one_initcall+0x46/0x200 ? _cond_resched+0x16/0x40 ? kmem_cache_alloc_trace+0x167/0x220 ? do_init_module+0x23/0x260 do_init_module+0x5c/0x260 __do_sys_init_module+0x14f/0x170 do_syscall_64+0x5b/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... Error appears when some i2c device driver tries to probe for devices using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`. Code supporting this adapter requires `adev->psp.ras.ras` to be not NULL, which is true only when `amdgpu_ras_init()` detects HW support by calling `amdgpu_ras_check_supported()`. Before 9015d60, adapter was registered by -> amdgpu_device_ip_init() -> amdgpu_ras_recovery_init() -> amdgpu_ras_eeprom_init() -> smu_v11_0_i2c_eeprom_control_init() after verifying that `adev->psp.ras.ras` is not NULL in `amdgpu_ras_recovery_init()`. Currently it is registered unconditionally by -> amdgpu_device_ip_init() -> pp_sw_init() -> hwmgr_sw_init() -> vega20_smu_init() -> smu_v11_0_i2c_eeprom_control_init() Fix simply adds HW support check (ras == NULL => no support) before calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`. Please note that there is a chance that similar fix is also required for CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense when there is no HW support. Cc: [email protected] Fixes: 9015d60 ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device") Signed-off-by: Ivan Mironov <[email protected]> Tested-by: Bjorn Nostvold <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

Port functionality from the Radeon driver to support UVD clock control. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>

Port functionality from the Radeon driver to support VCE clock control. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>

Port functionality from the Radeon driver to support UVD and VCE power management. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>

The KFD VMID assignment was hard-coded in a few places. Consolidate that in a single variable adev->vm_manager.first_kfd_vmid. The value is still assigned in gmc-ip-version-specific code. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]>

When there is no graphics support, KFD can use more of the VMIDs. Graphics VMIDs are only used for video decoding/encoding and post processing. With two VCE engines, there is no reason to reserve more than 2 VMIDs for that. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]>

Move request init data to virt detection func, so we can insert request full access between request init data and set ip blocks. Signed-off-by: Wenhui Sheng <[email protected]>

Init soc15 reg base early enough so we can touch mailbox related registers in request full access for sriov before set_ip_blocks, vi&nv arch doesn't use reg base in virt ops. v2: fix reg_base_init missed in bare metal case. Signed-off-by: Wenhui Sheng <[email protected]>

From SIENNA_CICHLID, HW introduce a new protection feature which can control the FB, doorbell and MMIO write access for VF, so guest driver should request full access before ip discovery, or we couldn't access ip discovery data in FB. Signed-off-by: Wenhui Sheng <[email protected]>

After we move request full access before set ip blocks, we can merge atombios init block of SRIOV and baremetal together. Signed-off-by: Wenhui Sheng <[email protected]>

Only read first 4K data instead of whole TMR block, so we can reduce the time in full access mode. Signed-off-by: Wenhui Sheng <[email protected]>

This reverts commit 3030524. rhel6.x specific case. drop it. Signed-off-by: Flora Cui <[email protected]>

The legacy way to initialize discovery_tmr_size is using DISCOVERY_TMR_SIZE, while after we reduce DISCOVERY_TMR_SIZE from 64KB to 4KB, variable discovery_tmr_size is also reduced to 4KB, this is not correct, it still should be 64KB, discovery_tmr_size will be used to calculate ip_discovery reserved mem's start address and size. Using DISCOVERY_TMR_OFFSET to init discovery_tmr_size instead. Signed-off-by: Wenhui Sheng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]>

Copy board parameters directly instead of set each parameter for sienna_cichlid. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Change-Id: Ic80e597d8da084d85329bb240063b5d10e14cc7e

Update sienna_cichlid driver if header file to match pptable changes. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Change-Id: I6ec08fc07bc3d9924075b85233d85e68edce9656

To follow conventional style. And this unnecessary "@" confuses our userspace tool. Change-Id: Id4cdc611d63e800cf5a93449b6331a1e8323e727 Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]>

To avoid s3 fail at the first cycle, it needs to revert this patch: drm/amd/display: add mechanism to skip DCN init Change-Id: I3a454fbf4744b42b62dfcce8902b35b3a30e8539 Signed-off-by: changfeng <[email protected]>

fxkamd · 2020-08-27T15:37:19Z

Hi Andreas, it looks like you're trying to make our DKMS/release branch work with the latest upstream kernels. That's not really the purpose of that branch. The purpose is to support older or patched Linux kernels used in popular Linux distributions. Therefore our DKMS kernel support development is usually aligned with the Linux distributions' release schedule. We don't try to keep up with every upstream release.

If you want to use ROCm with the latest upstream kernels, I would recommend you try the KFD included in the upstream kernel. We do most of our KFD development in the upstream branch anyway; it flows into the DKMS branch from there. There are only a few features still missing upstream, I'm not sure whether those matter to you. We're working on closing that gap:

PCIe P2P support
IPC support (used by MPI), patch has been submitted for upstream consideration, waiting for feedback, maybe I should ping that thread
GDB support is still experimental and depends on experimental GDB patches, we won't upstream that until we have settled on a final API

agravgaard · 2020-08-27T15:50:41Z

Hi Andreas, it looks like you're trying to make our DKMS/release branch work with the latest upstream kernels. That's not really the purpose of that branch. The purpose is to support older or patched Linux kernels used in popular Linux distributions. Therefore our DKMS kernel support development is usually aligned with the Linux distributions' release schedule. We don't try to keep up with every upstream release.

If you want to use ROCm with the latest upstream kernels, I would recommend you try the KFD included in the upstream kernel. We do most of our KFD development in the upstream branch anyway; it flows into the DKMS branch from there. There are only a few features still missing upstream, I'm not sure whether those matter to you. We're working on closing that gap:

PCIe P2P support

IPC support (used by MPI), patch has been submitted for upstream consideration, waiting for feedback, maybe I should ping that thread

GDB support is still experimental and depends on experimental GDB patches, we won't upstream that until we have settled on a final API

Makes sense, thanks for clearing that up!
Those features are not that important to me (gdb support will definitely be at some point sooner or later) - I'm not sure about the rest of the rocm-arch community though.
Anyway, I'll stop working on it for now, and until I or someone else runs into problems with the KFD in upstream.

agravgaard mentioned this pull request Jun 14, 2020

Distro and Kernel compatibility #94

Closed

agravgaard changed the title ~~drm/dkms: convert hardcoded list of OS names into a default case~~ Convert hardcoded list of OS names into a default case & Merge upstream Jun 14, 2020

agravgaard force-pushed the master branch from f896480 to 79e8633 Compare June 15, 2020 15:39

agravgaard mentioned this pull request Jun 15, 2020

[rock-dkms-bin] Fix DRM_VER in dkms Makefile rocm-arch/rocm-arch#244

Merged

alexdeucher and others added 24 commits June 28, 2020 18:25

drm/amdgpu: call release_firmware() without a NULL check

bd49c88

The release_firmware() function is NULL tolerant so we do not need to check for NULL param before calling it. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]>

drm/amdgpu: rework runtime pm enablement for BACO

2493bad

Add a switch statement to simplify asic checks. Note that BACO is not supported on APUs, so there is no need to check them. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

drm/amdgpu: SI support for UVD clock control

ae2d988

Port functionality from the Radeon driver to support UVD clock control. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>

drm/amdgpu: SI support for VCE clock control

07eccbb

Port functionality from the Radeon driver to support VCE clock control. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>

drm/amdgpu: SI support for UVD and VCE power managment

2853e77

Port functionality from the Radeon driver to support UVD and VCE power management. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>

drm/amdgpu: request init data in virt detection

3b1c20e

Move request init data to virt detection func, so we can insert request full access between request init data and set ip blocks. Signed-off-by: Wenhui Sheng <[email protected]>

drm/amdgpu: merge atombios init block

bcf252f

After we move request full access before set ip blocks, we can merge atombios init block of SRIOV and baremetal together. Signed-off-by: Wenhui Sheng <[email protected]>

drm/amdgpu: reduce ip discovery data reading size

a120075

Only read first 4K data instead of whole TMR block, so we can reduce the time in full access mode. Signed-off-by: Wenhui Sheng <[email protected]>

Revert "drm/amdkcl: check whether DEFINE_IDA2 is available (v2)"

4381c75

This reverts commit 3030524. rhel6.x specific case. drop it. Signed-off-by: Flora Cui <[email protected]>

drm/amd/powerplay: change method to set board parameters

c65f44f

Copy board parameters directly instead of set each parameter for sienna_cichlid. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Change-Id: Ic80e597d8da084d85329bb240063b5d10e14cc7e

drm/amd/powerplay: update driver if file for sienna_cichlid

8746b15

Update sienna_cichlid driver if header file to match pptable changes. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Change-Id: I6ec08fc07bc3d9924075b85233d85e68edce9656

drm/amd/powerplay: drop unnecessary "@" on OD sysfs output

32a5e8f

To follow conventional style. And this unnecessary "@" confuses our userspace tool. Change-Id: Id4cdc611d63e800cf5a93449b6331a1e8323e727 Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]>

Revert "drm/amd/display: add mechanism to skip DCN init"

435e5cc

To avoid s3 fail at the first cycle, it needs to revert this patch: drm/amd/display: add mechanism to skip DCN init Change-Id: I3a454fbf4744b42b62dfcce8902b35b3a30e8539 Signed-off-by: changfeng <[email protected]>

agravgaard added 10 commits August 26, 2020 18:37

Add support for new mmap API in 5.8+

6f56656

Renamed flag in 5.8+

b80460d

typo: mm should be the vm_mm member

1d3ef1f

Add support for new mmap API in 5.8+

10e3e5c

Renamed i2c new device function in 5.8+

4d83059

Renamed (un)use_mm for 5.8+

389e4e0

Add support for new mmap API in 5.8+

1b3a0e0

Renamed (un)use_mm for 5.8+

af95801

Add support for new mmap API in 5.8+

e7d2006

Add support for new mmap API in 5.8+

4c2793c

destroy_connector does not exist in 5.8+

917f88c

agravgaard mentioned this pull request Aug 27, 2020

[rock-dkms-bin] Radeon VII with kernel 5.7 doesn't work rocm-arch/rocm-arch#269

Closed

kentrussell force-pushed the master branch from 0d07f6d to 6a986c0 Compare March 10, 2021 19:41

kentrussell force-pushed the master branch from 6a986c0 to 88970f5 Compare May 10, 2021 16:38

kentrussell force-pushed the master branch from 88970f5 to c09ec36 Compare July 30, 2021 18:47

kentrussell force-pushed the master branch from 73b3910 to 5b46cf5 Compare October 27, 2021 16:59

kentrussell force-pushed the master branch from fd3e3a0 to 5fb6d9c Compare February 9, 2022 16:43

kentrussell force-pushed the master branch from 445ddf3 to 7d51a8a Compare March 30, 2022 16:17

kentrussell force-pushed the master branch from d30c224 to 7d7f742 Compare June 28, 2022 13:18

kentrussell force-pushed the master branch from 56a94d4 to 437f02c Compare September 30, 2022 15:59

kentrussell force-pushed the master branch from c191515 to 76c0ce8 Compare November 30, 2022 14:22

kentrussell force-pushed the master branch from 8ab8a55 to 0b579de Compare April 24, 2023 13:23

kentrussell force-pushed the master branch from 0b579de to 795c38e Compare June 28, 2023 13:53

kentrussell force-pushed the master branch from b728ef0 to fb41287 Compare September 15, 2023 16:20

rajbhar force-pushed the master branch from 2a83b3a to cc2d865 Compare December 14, 2023 02:08

kentrussell force-pushed the master branch from cf8e29e to dad5559 Compare April 16, 2024 18:46

kentrussell force-pushed the master branch from 80920ea to 2889fc7 Compare August 2, 2024 16:51

kentrussell force-pushed the master branch from 8809b4b to 627f890 Compare December 3, 2024 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert hardcoded list of OS names into a default case & Merge upstream #95

Convert hardcoded list of OS names into a default case & Merge upstream #95

agravgaard commented Jun 14, 2020

agravgaard commented Jun 15, 2020

agravgaard commented Jun 15, 2020

fxkamd commented Aug 27, 2020

agravgaard commented Aug 27, 2020

Convert hardcoded list of OS names into a default case & Merge upstream #95

Are you sure you want to change the base?

Convert hardcoded list of OS names into a default case & Merge upstream #95

Conversation

agravgaard commented Jun 14, 2020

agravgaard commented Jun 15, 2020

agravgaard commented Jun 15, 2020

fxkamd commented Aug 27, 2020

agravgaard commented Aug 27, 2020