-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert hardcoded list of OS names into a default case & Merge upstream #95
base: master
Are you sure you want to change the base?
Conversation
This patch will currently fail as the dkms build:
The reason is better explained in: anbox/anbox-modules#49 and here Adjust kernel version conditionals: And possibly in some of the configure scripts in amd/dkms |
The DKMS build is now succeeding for me with 4.19, 5.6 and 5.7. With the module installed Linux 5.7 boots fine and has the attached rocminfo and dmesg output: Do you need more info, testing or other logs ? |
drop the duplicate register macros from sid.h and use the standard ones in the oss register headers. Acked-by: Christian König <[email protected]> Reviewed-by: Hawking Zhang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Currently the goto statement is skipping over a lot of setup code because it is outside of an if-block and should be inside it. Fix this by adding missing if statement braces. Addresses-Coverity: ("Structurally dead code") Fixes: fd151ca5396d ("drm amdgpu: SI UVD v3_1") Acked-by: Nirmoy Das <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
In the cases where adev->jpeg.num_jpeg_inst is zero or the condition adev->jpeg.harvest_config & (1 << i) is always non-zero the variable ret is never set to an error condition and the function returns an uninitialized value in ret. Since the only exit condition at the end if the function is a success then explicitly return 0 rather than a potentially uninitialized value in ret. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 14f43e8 ("drm/amdgpu: move JPEG2.5 out from VCN2.5") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
[ 150.887733] ====================================================== [ 150.893903] WARNING: possible circular locking dependency detected [ 150.905917] ------------------------------------------------------ [ 150.912129] kfdtest/4081 is trying to acquire lock: [ 150.917002] ffff8f7f3762e118 (&mm->mmap_sem#2){++++}, at: __might_fault+0x3e/0x90 [ 150.924490] but task is already holding lock: [ 150.930320] ffff8f7f49d229e8 (&dqm->lock_hidden){+.+.}, at: destroy_queue_cpsch+0x29/0x210 [amdgpu] [ 150.939432] which lock already depends on the new lock. [ 150.947603] the existing dependency chain (in reverse order) is: [ 150.955074] -> ROCm#3 (&dqm->lock_hidden){+.+.}: [ 150.960822] __mutex_lock+0xa1/0x9f0 [ 150.964996] evict_process_queues_cpsch+0x22/0x120 [amdgpu] [ 150.971155] kfd_process_evict_queues+0x3b/0xc0 [amdgpu] [ 150.977054] kgd2kfd_quiesce_mm+0x25/0x60 [amdgpu] [ 150.982442] amdgpu_amdkfd_evict_userptr+0x35/0x70 [amdgpu] [ 150.988615] amdgpu_mn_invalidate_hsa+0x41/0x60 [amdgpu] [ 150.994448] __mmu_notifier_invalidate_range_start+0xa4/0x240 [ 151.000714] copy_page_range+0xd70/0xd80 [ 151.005159] dup_mm+0x3ca/0x550 [ 151.008816] copy_process+0x1bdc/0x1c70 [ 151.013183] _do_fork+0x76/0x6c0 [ 151.016929] __x64_sys_clone+0x8c/0xb0 [ 151.021201] do_syscall_64+0x4a/0x1d0 [ 151.025404] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 151.030977] -> ROCm#2 (&adev->notifier_lock){+.+.}: [ 151.036993] __mutex_lock+0xa1/0x9f0 [ 151.041168] amdgpu_mn_invalidate_hsa+0x30/0x60 [amdgpu] [ 151.047019] __mmu_notifier_invalidate_range_start+0xa4/0x240 [ 151.053277] copy_page_range+0xd70/0xd80 [ 151.057722] dup_mm+0x3ca/0x550 [ 151.061388] copy_process+0x1bdc/0x1c70 [ 151.065748] _do_fork+0x76/0x6c0 [ 151.069499] __x64_sys_clone+0x8c/0xb0 [ 151.073765] do_syscall_64+0x4a/0x1d0 [ 151.077952] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 151.083523] -> ROCm#1 (mmu_notifier_invalidate_range_start){+.+.}: [ 151.090833] change_protection+0x802/0xab0 [ 151.095448] mprotect_fixup+0x187/0x2d0 [ 151.099801] setup_arg_pages+0x124/0x250 [ 151.104251] load_elf_binary+0x3a4/0x1464 [ 151.108781] search_binary_handler+0x6c/0x210 [ 151.113656] __do_execve_file.isra.40+0x7f7/0xa50 [ 151.118875] do_execve+0x21/0x30 [ 151.122632] call_usermodehelper_exec_async+0x17e/0x190 [ 151.128393] ret_from_fork+0x24/0x30 [ 151.132489] -> #0 (&mm->mmap_sem#2){++++}: [ 151.138064] __lock_acquire+0x11a1/0x1490 [ 151.142597] lock_acquire+0x90/0x180 [ 151.146694] __might_fault+0x68/0x90 [ 151.150879] read_sdma_queue_counter+0x5f/0xb0 [amdgpu] [ 151.156693] update_sdma_queue_past_activity_stats+0x3b/0x90 [amdgpu] [ 151.163725] destroy_queue_cpsch+0x1ae/0x210 [amdgpu] [ 151.169373] pqm_destroy_queue+0xf0/0x250 [amdgpu] [ 151.174762] kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu] [ 151.180577] kfd_ioctl+0x223/0x400 [amdgpu] [ 151.185284] ksys_ioctl+0x8f/0xb0 [ 151.189118] __x64_sys_ioctl+0x16/0x20 [ 151.193389] do_syscall_64+0x4a/0x1d0 [ 151.197569] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 151.203141] other info that might help us debug this: [ 151.211140] Chain exists of: &mm->mmap_sem#2 --> &adev->notifier_lock --> &dqm->lock_hidden [ 151.222535] Possible unsafe locking scenario: [ 151.228447] CPU0 CPU1 [ 151.232971] ---- ---- [ 151.237502] lock(&dqm->lock_hidden); [ 151.241254] lock(&adev->notifier_lock); [ 151.247774] lock(&dqm->lock_hidden); [ 151.254038] lock(&mm->mmap_sem#2); This commit fixes the warning by ensuring get_user() is not called while reading SDMA stats with dqm_lock held as get_user() could cause a page fault which leads to the circular locking scenario. Signed-off-by: Mukul Joshi <[email protected]> Reviewed-by: Felix Kuehling <[email protected]>
The release_firmware() function is NULL tolerant so we do not need to check for NULL param before calling it. Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]>
Add a switch statement to simplify asic checks. Note that BACO is not supported on APUs, so there is no need to check them. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
The failures with ROCm only happen with noretry=1, so enable runtime pm when noretry=0 (the current default). Reviewed-by: Felix Kuehling <[email protected]> Acked-by: Rajneesh Bhardwaj <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and following started to happen on each boot: ... BUG: kernel NULL pointer dereference, address: 0000000000000128 ... CPU: 9 PID: 1940 Comm: modprobe Tainted: G E 5.7.2-200.im0.fc32.x86_64 ROCm#1 Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020 RIP: 0010:lock_bus+0x42/0x60 [amdgpu] ... Call Trace: i2c_smbus_xfer+0x3d/0xf0 i2c_default_probe+0xf3/0x130 i2c_detect.isra.0+0xfe/0x2b0 ? kfree+0xa3/0x200 ? kobject_uevent_env+0x11f/0x6a0 ? i2c_detect.isra.0+0x2b0/0x2b0 __process_new_driver+0x1b/0x20 bus_for_each_dev+0x64/0x90 ? 0xffffffffc0f34000 i2c_register_driver+0x73/0xc0 do_one_initcall+0x46/0x200 ? _cond_resched+0x16/0x40 ? kmem_cache_alloc_trace+0x167/0x220 ? do_init_module+0x23/0x260 do_init_module+0x5c/0x260 __do_sys_init_module+0x14f/0x170 do_syscall_64+0x5b/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... Error appears when some i2c device driver tries to probe for devices using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`. Code supporting this adapter requires `adev->psp.ras.ras` to be not NULL, which is true only when `amdgpu_ras_init()` detects HW support by calling `amdgpu_ras_check_supported()`. Before 9015d60, adapter was registered by -> amdgpu_device_ip_init() -> amdgpu_ras_recovery_init() -> amdgpu_ras_eeprom_init() -> smu_v11_0_i2c_eeprom_control_init() after verifying that `adev->psp.ras.ras` is not NULL in `amdgpu_ras_recovery_init()`. Currently it is registered unconditionally by -> amdgpu_device_ip_init() -> pp_sw_init() -> hwmgr_sw_init() -> vega20_smu_init() -> smu_v11_0_i2c_eeprom_control_init() Fix simply adds HW support check (ras == NULL => no support) before calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`. Please note that there is a chance that similar fix is also required for CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense when there is no HW support. Cc: [email protected] Fixes: 9015d60 ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device") Signed-off-by: Ivan Mironov <[email protected]> Tested-by: Bjorn Nostvold <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Port functionality from the Radeon driver to support UVD clock control. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
Port functionality from the Radeon driver to support VCE clock control. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
Port functionality from the Radeon driver to support UVD and VCE power management. Signed-off-by: Alex Jivin <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Acked-by: Christian König <[email protected]>
The KFD VMID assignment was hard-coded in a few places. Consolidate that in a single variable adev->vm_manager.first_kfd_vmid. The value is still assigned in gmc-ip-version-specific code. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]>
When there is no graphics support, KFD can use more of the VMIDs. Graphics VMIDs are only used for video decoding/encoding and post processing. With two VCE engines, there is no reason to reserve more than 2 VMIDs for that. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]>
Move request init data to virt detection func, so we can insert request full access between request init data and set ip blocks. Signed-off-by: Wenhui Sheng <[email protected]>
Init soc15 reg base early enough so we can touch mailbox related registers in request full access for sriov before set_ip_blocks, vi&nv arch doesn't use reg base in virt ops. v2: fix reg_base_init missed in bare metal case. Signed-off-by: Wenhui Sheng <[email protected]>
From SIENNA_CICHLID, HW introduce a new protection feature which can control the FB, doorbell and MMIO write access for VF, so guest driver should request full access before ip discovery, or we couldn't access ip discovery data in FB. Signed-off-by: Wenhui Sheng <[email protected]>
After we move request full access before set ip blocks, we can merge atombios init block of SRIOV and baremetal together. Signed-off-by: Wenhui Sheng <[email protected]>
Only read first 4K data instead of whole TMR block, so we can reduce the time in full access mode. Signed-off-by: Wenhui Sheng <[email protected]>
This reverts commit 3030524. rhel6.x specific case. drop it. Signed-off-by: Flora Cui <[email protected]>
The legacy way to initialize discovery_tmr_size is using DISCOVERY_TMR_SIZE, while after we reduce DISCOVERY_TMR_SIZE from 64KB to 4KB, variable discovery_tmr_size is also reduced to 4KB, this is not correct, it still should be 64KB, discovery_tmr_size will be used to calculate ip_discovery reserved mem's start address and size. Using DISCOVERY_TMR_OFFSET to init discovery_tmr_size instead. Signed-off-by: Wenhui Sheng <[email protected]> Reviewed-by: Hawking Zhang <[email protected]>
Copy board parameters directly instead of set each parameter for sienna_cichlid. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Change-Id: Ic80e597d8da084d85329bb240063b5d10e14cc7e
Update sienna_cichlid driver if header file to match pptable changes. Signed-off-by: Likun Gao <[email protected]> Acked-by: Alex Deucher <[email protected]> Change-Id: I6ec08fc07bc3d9924075b85233d85e68edce9656
To follow conventional style. And this unnecessary "@" confuses our userspace tool. Change-Id: Id4cdc611d63e800cf5a93449b6331a1e8323e727 Signed-off-by: Evan Quan <[email protected]> Acked-by: Alex Deucher <[email protected]>
To avoid s3 fail at the first cycle, it needs to revert this patch: drm/amd/display: add mechanism to skip DCN init Change-Id: I3a454fbf4744b42b62dfcce8902b35b3a30e8539 Signed-off-by: changfeng <[email protected]>
Hi Andreas, it looks like you're trying to make our DKMS/release branch work with the latest upstream kernels. That's not really the purpose of that branch. The purpose is to support older or patched Linux kernels used in popular Linux distributions. Therefore our DKMS kernel support development is usually aligned with the Linux distributions' release schedule. We don't try to keep up with every upstream release. If you want to use ROCm with the latest upstream kernels, I would recommend you try the KFD included in the upstream kernel. We do most of our KFD development in the upstream branch anyway; it flows into the DKMS branch from there. There are only a few features still missing upstream, I'm not sure whether those matter to you. We're working on closing that gap:
|
Makes sense, thanks for clearing that up! |
Reasoning described in #94