-
Notifications
You must be signed in to change notification settings - Fork 49
[6.14-adv-next] Add Grace virtualization support to 6.14-adv, (upstream vEVENTQ + HW QUEUE and OOT vEGM) #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[6.14-adv-next] Add Grace virtualization support to 6.14-adv, (upstream vEVENTQ + HW QUEUE and OOT vEGM) #167
Conversation
The module code does not create a writable copy of the executable memory anymore so there is no need to handle it in module relocation and alternatives patching. This reverts commit 9bfc482. Signed-off-by: "Mike Rapoport (Microsoft)" <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 1d7e707) Signed-off-by: Nirmoy Das <[email protected]>
Pretty much every caller of is_endbr() actually wants to test something at an address and ends up doing get_kernel_nofault(). Fold the lot into a more convenient helper. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: "Masami Hiramatsu (Google)" <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 72e213a) Signed-off-by: Nirmoy Das <[email protected]>
b4c3a62 to
e30dedb
Compare
c36eb0a to
df3cae8
Compare
e7e4110 to
7eeda3f
Compare
…mapping_domain" This reverts commit 78480b2. Signed-off-by: Nirmoy Das <[email protected]>
…m ids
ASPEED VGA card has two built-in devices:
0008:06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 06)
0008:07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52)
Its toplogy looks like this:
+-[0008:00]---00.0-[01-09]--+-00.0-[02-09]--+-00.0-[03]----00.0 Sandisk Corp Device 5017
| +-01.0-[04]--
| +-02.0-[05]----00.0 NVIDIA Corporation Device
| +-03.0-[06-07]----00.0-[07]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family
| +-04.0-[08]----00.0 Renesas Technology Corp. uPD720201 USB 3.0 Host Controller
| \-05.0-[09]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
\-00.1 PMC-Sierra Inc. Device 4028
The IORT logic populaties two identical IDs into the fwspec->ids array via
DMA aliasing in iort_pci_iommu_init() called by pci_for_each_dma_alias().
Though the SMMU driver had been able to handle this situation since commit
563b5cb ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs"), that
got broken by the later commit cdf315f ("iommu/arm-smmu-v3: Maintain
a SID->device structure"), which ended up with allocating separate streams
with the same stuffing.
On a kernel prior to v6.15-rc1, there has been an overlooked warning:
pci 0008:07:00.0: vgaarb: setting as boot VGA device
pci 0008:07:00.0: vgaarb: bridge control possible
pci 0008:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
pcieport 0008:06:00.0: Adding to iommu group 14
ast 0008:07:00.0: stream 67328 already in tree <===== WARNING
ast 0008:07:00.0: enabling device (0002 -> 0003)
ast 0008:07:00.0: Using default configuration
ast 0008:07:00.0: AST 2600 detected
ast 0008:07:00.0: [drm] Using analog VGA
ast 0008:07:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
[drm] Initialized ast 0.1.0 for 0008:07:00.0 on minor 0
ast 0008:07:00.0: [drm] fb0: astdrmfb frame buffer device
With v6.15-rc, since the commit bcb81ac ("iommu: Get DT/ACPI parsing
into the proper probe path"), the error returned with the warning is moved
to the SMMU device probe flow:
arm_smmu_probe_device+0x15c/0x4c0
__iommu_probe_device+0x150/0x4f8
probe_iommu_group+0x44/0x80
bus_for_each_dev+0x7c/0x100
bus_iommu_probe+0x48/0x1a8
iommu_device_register+0xb8/0x178
arm_smmu_device_probe+0x1350/0x1db0
which then fails the entire SMMU driver probe:
pci 0008:06:00.0: Adding to iommu group 21
pci 0008:07:00.0: stream 67328 already in tree
arm-smmu-v3 arm-smmu-v3.9.auto: Failed to register iommu
arm-smmu-v3 arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22
Since SMMU driver had been already expecting a potential duplicated Stream
ID in arm_smmu_install_ste_for_dev(), change the arm_smmu_insert_master()
routine to ignore a duplicated ID from the fwspec->sids array as well.
Note: this has been failing the iommu_device_probe() since 2021, although a
recent iommu commit in v6.15-rc1 that moves iommu_device_probe() started to
fail the SMMU driver probe. Since nobody has cared about DMA Alias support,
leave that as it was but fix the fundamental iommu_device_probe() breakage.
Fixes: cdf315f ("iommu/arm-smmu-v3: Maintain a SID->device structure")
Cc: [email protected]
Suggested-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit b00d249 linux-next)
Signed-off-by: Nirmoy Das <[email protected]>
There are new attach/detach/replace helpers in device.c taking care of both the attach_handle and the fault specific routines for iopf_enable/disable() and auto response. Clean up these redundant functions in the fault.c file. Link: https://patch.msgid.link/r/3ca94625e9d78270d9a715fa0809414fddd57e58.1738645017.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <[email protected]> Reviewed-by: Yi Liu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit dc10ba2) Signed-off-by: Nirmoy Das <[email protected]>
This reverts commit 8aced5e. Signed-off-by: Nirmoy Das <[email protected]>
…u_cookie
The IOMMU translation for MSI message addresses has been a 2-step process,
separated in time:
1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address
is stored in the MSI descriptor when an MSI interrupt is allocated.
2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a
translated message address.
This has an inherent lifetime problem for the pointer stored in the cookie
that must remain valid between the two steps. However, there is no locking
at the irq layer that helps protect the lifetime. Today, this works under
the assumption that the iommu domain is not changed while MSI interrupts
being programmed. This is true for normal DMA API users within the kernel,
as the iommu domain is attached before the driver is probed and cannot be
changed while a driver is attached.
Classic VFIO type1 also prevented changing the iommu domain while VFIO was
running as it does not support changing the "container" after starting up.
However, iommufd has improved this so that the iommu domain can be changed
during VFIO operation. This potentially allows userspace to directly race
VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and
VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()).
This potentially causes both the cookie pointer and the unlocked call to
iommu_get_domain_for_dev() on the MSI translation path to become UAFs.
Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA
address is already known during iommu_dma_prepare_msi() and cannot change.
Thus, it can simply be stored as an integer in the MSI descriptor.
The other UAF related to iommu_get_domain_for_dev() will be addressed in
patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by
using the IOMMU group mutex.
Link: https://patch.msgid.link/r/a4f2cd76b9dc1833ee6c1cf325cba57def22231c.1740014950.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 1f7df3a)
Signed-off-by: Nirmoy Das <[email protected]>
The two-step process to translate the MSI address involves two functions, iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg(). Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as it had to dereference the opaque cookie pointer. Now, the previous patch changed the cookie pointer into an integer so there is no longer any need for the iommu layer to be involved. Further, the call sites of iommu_dma_compose_msi_msg() all follow the same pattern of setting an MSI message address_hi/lo to non-translated and then immediately calling iommu_dma_compose_msi_msg(). Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly accepts the u64 version of the address and simplifies all the callers. Move the new helper to linux/msi.h since it has nothing to do with iommu. Aside from refactoring, this logically prepares for the next patch, which allows multiple implementation options for iommu_dma_prepare_msi(). So, it does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c as it no longer provides the only iommu_dma_prepare_msi() implementation. Link: https://patch.msgid.link/r/eda62a9bafa825e9cdabd7ddc61ad5a21c32af24.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 9349887) Signed-off-by: Nirmoy Das <[email protected]>
SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 288683c) Signed-off-by: Nirmoy Das <[email protected]>
Caller of the two APIs always provide a valid handle, make @handle as mandatory parameter. Take this chance incoporate the handle->domain set under the protection of group->mutex in iommu_attach_group_handle(). Link: https://patch.msgid.link/r/[email protected] Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Nicolin Chen <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Lu Baolu <[email protected]> Signed-off-by: Yi Liu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 237603a) Signed-off-by: Nirmoy Das <[email protected]>
iommufd does not use it now, so drop it. Link: https://patch.msgid.link/r/[email protected] Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Nicolin Chen <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Lu Baolu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 473ec07) Signed-off-by: Nirmoy Das <[email protected]>
iommu_attach_device_pasid() only stores handle to group->pasid_array when there is a valid handle input. However, it makes the iommu_attach_device_pasid() unable to detect if the pasid has been attached or not previously. To be complete, let the iommu_attach_device_pasid() store the domain to group->pasid_array if no valid handle. The other users of the group->pasid_array should be updated to be consistent. e.g. the iommu_attach_group_handle() and iommu_replace_group_handle(). Link: https://patch.msgid.link/r/[email protected] Suggested-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Nicolin Chen <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Signed-off-by: Yi Liu <[email protected]> Reviewed-by: Lu Baolu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit e1ea9d3) Signed-off-by: Nirmoy Das <[email protected]>
…h op of iommu drivers The current implementation stores entry to the group->pasid_array before the underlying iommu driver has successfully set the new domain. This can lead to issues where PRIs are received on the new domain before the attach operation is completed. This patch swaps the order of operations to ensure that the domain is set in the underlying iommu driver before updating the group->pasid_array. Link: https://patch.msgid.link/r/[email protected] Suggested-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Nicolin Chen <[email protected]> Reviewed-by: Lu Baolu <[email protected]> Signed-off-by: Yi Liu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 5e9f822) Signed-off-by: Nirmoy Das <[email protected]>
The drivers doing their own fwspec parsing have no need to call iommu_fwspec_free() since fwspecs were moved into dev_iommu, as returning an error from .probe_device will tear down the whole lot anyway. Move it into the private interface now that it only serves for of_iommu to clean up in an error case. I have no idea what mtk_v1 was doing in effectively guaranteeing a NULL fwspec would be dereferenced if no "iommus" DT property was found, so add a check for that to at least make the code look sane. Signed-off-by: Robin Murphy <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/36e245489361de2d13db22a510fa5c79e7126278.1740667667.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <[email protected]> (cherry picked from commit 29c6e1c) Signed-off-by: Nirmoy Das <[email protected]>
At the moment, if of_iommu_configure() allocates dev->iommu itself via iommu_fwspec_init(), then suffers a DT parsing failure, it cleans up the fwspec but leaves the empty dev_iommu hanging around. So far this is benign (if a tiny bit wasteful), but we'd like to be able to reason about dev->iommu having a consistent and unambiguous lifecycle. Thus make sure that the of_iommu cleanup undoes precisely whatever it did. Signed-off-by: Robin Murphy <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/d219663a3f23001f23d520a883ac622d70b4e642.1740753261.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <[email protected]> (cherry picked from commit 3832862) Signed-off-by: Nirmoy Das <[email protected]>
Currently, IRQ_MSI_IOMMU is selected if DMA_IOMMU is available to provide an implementation for iommu_dma_prepare/compose_msi_msg(). However, it'll make more sense for irqchips that call prepare/compose to select it, and that will trigger all the additional code and data to be compiled into the kernel. If IRQ_MSI_IOMMU is selected with no IOMMU side implementation, then the prepare/compose() will be NOP stubs. If IRQ_MSI_IOMMU is not selected by an irqchip, then the related code on the iommu side is compiled out. Link: https://patch.msgid.link/r/a2620f67002c5cdf974e89ca3bf905f5c0817be6.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> (cherry picked from commit 96093fe) Signed-off-by: Nirmoy Das <[email protected]>
nvgrace-egm exposes the API register_egm_node & unregister_egm_node to manage EGM (Extended GPU Memory) present on the system. To allow out-of-tree driver such as nvidia-vgpu-vfio make use of them, move the declaration to a new nvgrace-egm.h in include. Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit bed340f https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit a961663 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
…tion Free the kmalloc'd region when the EGM is unregistered. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit fc592b9 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit f24760c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Move region hash initiaization alongside the other region initialization statements to avoid situations where the hash table was not properly initialized. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 8021c1d https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit e1264a6 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
…rrors Update error handling within EGM regiration routine to catch and return errors to the caller. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit a57210c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit a706ff8 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Detect and handle a failure from the EGM registration service. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit f18eee3 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 8371b68 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Fix source to resolve checkpatch warnings Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit c7b47b7 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit dfa0e06 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Fix minor syntax errors from sparse. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit bbb64e6 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit fe78194 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Return the intended errno upon a copyout fault, remove unnecessary checks following container_of pointer derivation, and use the correct macro and types for overflow checking. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 429910b https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit bda63f3 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Use the correct macro and types for overflow checking. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit afa8f63 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit d110330 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Ensure ACPI table reads are successful prior to using the value. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit b2947b0 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 9258355 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Some environments may provide a "nvidia,egm-retired-pages-data-base” but fail to populate it with a base address, leaving it NULL. Mapping this invalid value results in a synchronous exception when the region is first touched. Detect a NULL value, generate a warning to draw attention to the firmware bug, and return without mapping. INFO: th500_ras_intr_handler: External Abort reason=1 syndrome=0x92000410 flags=0x1 [ 82.104493] Internal error: synchronous external abort: 0000000096000410 [#1] SMP [ 82.114898] Modules linked in: nvgrace_gpu_vfio_pci(E) nvgrace_egm(E) [ 82.257218] CPU: 0 PID: 10 Comm: kworker/0:1 Tainted: G OE 6.8.12+ #5 [ 82.265135] Hardware name: NVIDIA GH200 P5042, BIOS 24103110 20241031 [ 82.271720] Workqueue: events work_for_cpu_fn [ 82.276180] pstate: 03400009 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 82.283298] pc : register_egm_node+0x2cc/0x440 [nvgrace_egm] [ 82.289087] lr : register_egm_node+0x2c4/0x440 [nvgrace_egm] [ 82.294872] sp : ffff8000802ebc30 [ 82.298254] x29: ffff8000802ebc60 x28: 00000000000000ff x27: 0000000000000000 [ 82.305550] x26: ffff000087a320c8 x25: ffff0000a5700000 x24: ffff000087a32000 [ 82.312846] x23: ffffa77cd758e368 x22: 0000000000000000 x21: ffffa77cd758c640 [ 82.320141] x20: ffffa77cd758e170 x19: ffff800081e7d000 x18: ffff800080293038 [ 82.327437] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 82.334732] x14: 0000000000000000 x13: 65203a65646f6e5f x12: 0000000000000000 [ 82.342027] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 [ 82.349322] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 82.356618] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 82.363913] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff800081e7d000 [ 82.371210] Call trace: [ 82.373705] register_egm_node+0x2cc/0x440 [nvgrace_egm] [ 82.379135] nvgrace_gpu_probe+0x2ac/0x528 [nvgrace_gpu_vfio_pci] [ 82.385366] local_pci_probe+0x4c/0xe0 [ 82.389198] work_for_cpu_fn+0x28/0x58 [ 82.393026] process_one_work+0x168/0x3f0 [ 82.397123] worker_thread+0x360/0x480 [ 82.400952] kthread+0x11c/0x128 [ 82.404248] ret_from_fork+0x10/0x20 [ 82.407906] Code: d2820001 940002b3 aa0003f3 b4fffac0 (f9400017) [ 82.414134] ---[ end trace 0000000000000000 ]--- Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 7ba2930 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 349fb1c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
In an effort to simplify the programming model, use a symmetrical model for the the EGM regsiration APIs. This avoids the caller needing to keep a cookie or even have knowlege of if EGM is supported. Update the EGM unregisration API to use the PCI device as its parameter. Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit d8903ec https://github.com/nvmochs/NV-Kernels/tree/vegm_01232025) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 5839fc5 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
…egions GB200 systems could have multiple GPUs associated with an EGM region. For proper EGM functionality the host topology in terms of GPU affinity has to be replicated in the VM. Hence the EGM region structure must track the GPU devices belonging to the same socket. On the device probe, the device pci_dev struct is added to a linked list of the appropriate EGM region. Similarly on device remove, the pci_dev struct for the GPU is removed from the EGM region. Signed-off-by: Ankit Agrawal <[email protected]> Ref: sj24: /home/nvidia/ankita/kernel_patches/0001_vfio_nvgrace-egm_track_GPUs_associated_with_the_EGM_regions.patch (koba: Enhance error handling, Remove egm_node from unregister_egm_node and move destroy_egm_chardev a little forward) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 0222c35 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
To replicate the host EGM topology in the VM in terms of the GPU affinity, the userspace need to be aware of which GPUs belong to the same socket as the EGM region. Expose the list of GPUs associated with an EGM region through sysfs. The list can be queried from the location /sys/devices/virtual/egm/egmX/gpu_devices. Signed-off-by: Ankit Agrawal <[email protected]> Ref: sj24: /home/nvidia/ankita/kernel_patches/0002_vfio_nvgrace-egm_list_gpus_through_sysfs.patch (koba: Enchance error handling for sysfs_create_group) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit fec2356 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
To allocate the EGM, the userspace need to know it's size. Currently, there is no easy way for the userspace to determine that. Make nvgrace-egm expose the size through sysfs that can be queried by the userspace from /sys/devices/virtual/egm/egmX/egm_size. Signed-off-by: Ankit Agrawal <[email protected]> Ref: sj24: /home/nvidia/ankita/kernel_patches/0003_vfio_nvgrace-egm_expose_the_egm_size_through_sysfs.patch Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit dcdcef2 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
… allocations Add missing null pointer checks after vzalloc() calls in the NVIDIA Grace GPU driver's EGM (External GPU Memory) handling code. This prevents potential null pointer dereferences in the memory failure handling and bad page fetching functions, providing proper error handling for allocation failures. Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 63127e2 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
Add CONFIG_NVGRACE_EGM with policy 'm' for arm64 architecture. Signed-off-by: Nirmoy Das <[email protected]>
On platforms without the mig HW bug (e.g. Grace-Blackwell) there is not a requirement to create the resmem region. Accordingly, this region is not configured on these platforms, which leads to the following print when the device is closed: resource: Trying to free nonexistent resource <0x0000000000000000-0x000000000000ffff> Avoid calling unregister_pfn_address_space for resmem when the region is not being used. Fixes: 2d21b7b ("vfio/nvgrace-gpu: register device memory for poison handling") Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Nirmoy Das <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit bd0187d https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]>
7eeda3f to
e2d029c
Compare
nvmochs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran pick analyzer on c7d2a4a^..e2d029c, the majority of the patches match upstream exactly. Of the ones that were flagged, reviewed and found they were only called out due to minor context differences or the addition of "NVIDIA: SAUCE:" tags.
Manual review of the patches with backport tags, no issues or concerns.
Lastly, confirmed pick tags and trailers are present and correct.
Acked-by: Matthew R. Ochs <[email protected]>
4ff9dc6 sets it in the annotations, fa811f0 sets it in the defconfig. The defconfig one is not really needed in the Ubuntu tree, but I have continued to carry it forward since some CSPs were not using the annotations. |
|
So CSPs can be using this exact git tree but they do not use the annotations to build the kernel? |
We have advised them now to use the annotations, and updated the reference code release notes with the command to generate the .config from it. But of course we cannot force them. =) If you feel strongly about it we can remove the defconfig commit, I don't have a preference either way. |
|
no its ok.Leave it. |
clsotog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acked-by: Carol L Soto <[email protected]>
|
Merged, closing PR. |
This PR backport/cherry-pick patches for upstream vEVENTQ + HW QUEUE and OOT vEGM
testing sources:
QEMU src: https://github.com/nvmochs/QEMU/tree/6.11_gracevirt_vcmdq_v9
VM image: https://urm.nvidia.com/artifactory/sw-dgx-platform-generic-local/staging/ghvirt/guest/jammy-server-cloudimg-arm64_may022024_public_r550.54.15_cuda12.4.qcow2.xz
CUDA Test: https://dvstransfer.nvidia.com/dvsshare/dvs-binaries-virtual/gpu_drv_r575_00_Release_Linux_aarch64sbsa_CUDA_DVS_Test/
VM start command for EGM testing
Test runs for EGM enabled VM
vEVENTQ validation
VM start command for vEVENTQ testing with
cmdqv=onTest runs for vEVENTQ enabled VM