Skip to content

Conversation

@nirmoy
Copy link
Collaborator

@nirmoy nirmoy commented Jul 22, 2025

This PR backport/cherry-pick patches for upstream vEVENTQ + HW QUEUE and OOT vEGM

testing sources:

QEMU src: https://github.com/nvmochs/QEMU/tree/6.11_gracevirt_vcmdq_v9
VM image: https://urm.nvidia.com/artifactory/sw-dgx-platform-generic-local/staging/ghvirt/guest/jammy-server-cloudimg-arm64_may022024_public_r550.54.15_cuda12.4.qcow2.xz
CUDA Test: https://dvstransfer.nvidia.com/dvsshare/dvs-binaries-virtual/gpu_drv_r575_00_Release_Linux_aarch64sbsa_CUDA_DVS_Test/

VM start command for EGM testing

VM_IMAGE=/localhome/local-nirmoyd/ubuntu-24.04-server-cloudimg-arm64-grace-6.8.0-1009-nvidia-adv-2025-02-07-08-57-55.qcow2
/usr/local/bin/qemu-system-aarch64 -object iommufd,id=iommufd0 \
    -machine hmat=on -machine virt,accel=kvm,gic-version=3,iommu=nested-smmuv3,ras=on \
    -cpu host -smp cpus=4 -m size=16G,slots=2,maxmem=66G -nographic \
    -object memory-backend-file,id=m0,mem-path=/dev/egm4,size=16G,share=on,prealloc=on \
    -numa node,memdev=m0,cpus=0-3,nodeid=0 \
    -numa node,nodeid=1 -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
    -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 -numa node,nodeid=8 \
    -device vfio-pci-nohotplug,host=0009:01:00.0,rombar=0,id=dev0,iommufd=iommufd0 \
    -object acpi-egm-memory,id=egm0,pci-dev=dev0,node=0 \
    -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=1 \
    -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=2 \
    -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=3 \
    -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=4 \
    -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=5 \
    -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=6 \
    -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=7 \
    -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=8 \
    -bios /usr/share/AAVMF/AAVMF_CODE.fd \
    -device nvme,drive=nvme0,serial=deadbeaf1,bus=pcie.0 \
    -drive file=$VM_IMAGE,index=0,media=disk,format=qcow2,if=none,id=nvme0 \
    -device e1000,romfile=/usr/local/share/qemu/efi-e1000.rom,netdev=net0,bus=pcie.0 \
    -netdev user,id=net0,hostfwd=tcp::5558-:22,hostfwd=tcp::5586-:5586

Test runs for EGM enabled VM

nvidia@ubuntu:~$ lspci -k -d 10de:2348
b0:00.0 3D controller: NVIDIA Corporation Device 2348 (rev a1)
        Subsystem: NVIDIA Corporation Device 18d2
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
sudo nvidia-smi -q |grep -i egm
        EGM                               : enabled 
nvidia@ubuntu:~$ sudo ./tests/runtime/gflops/gflops
Running GFLOPs test...
&&&& PERF GFLOPs 0
&&&& gflops test PASSED
nvidia@ubuntu:~$ sudo ./tests/driver/egm/egm
Device 0: NVIDIA GH200 144G HBM3e
Driver version: 12090
Runtime version: 12090
Dispatcher pid: 1366
Running test SmokeTest (pid: 1408)
^^^^ PASS: SmokeTest (404.7ms)
Running test SmokeTestIpc (pid: 1411)
(thread 260537756117184 [t0]) At /dvs/p4/build/sw/rel/gpgpu/toolkit/r12.9/cuda/apps/egm/test/smoke.cpp:392:
SmokeTestIpc is NOT supported in a configuration with only one process

^^^^ WAIVE: SmokeTestIpc (298.9ms)
Running test atomictest (pid: 1414)
^^^^ PASS: atomictest (329.2ms)
Total time: 1033ms
2 out of 2 ENABLED tests passed (100%)
    1 ENABLED tests were waived
&&&& egm test PASSED
sudo tests/runtime/uvmConformance/uvmConformance -t texture_simple
Device 0: NVIDIA GH200 144G HBM3e
Driver version: 12090
Runtime version: 12090
Dispatcher pid: 1135
Running test texture_simple (pid: 1177)
^^^^ PASS: texture_simple (350.1ms)
Total time: 350ms
1 out of 1 ENABLED tests passed (100%)
&&&& uvmConformance test PASSED
sudo tests/runtime/uvmConformance/uvmConformance -t ats_malloc_host
Device 0: NVIDIA GH200 144G HBM3e
Driver version: 12090
Runtime version: 12090
Dispatcher pid: 1233
Running test ats_malloc_host (pid: 1275)
^^^^ PASS: ats_malloc_host (351.2ms)
Total time: 351ms
1 out of 1 ENABLED tests passed (100%)
&&&& uvmConformance test PASSED

vEVENTQ validation

VM start command for vEVENTQ testing with cmdqv=on

VM_IMAGE=/localhome/local-nirmoyd/ubuntu-24.04-server-cloudimg-arm64-grace-6.8.0-1009-nvidia-adv-2025-02-07-08-57-55.qcow2
qemu-system-aarch64 \
       -object iommufd,id=iommufd0 \
       -machine hmat=on -machine virt,accel=kvm,gic-version=3,iommu=nested-smmuv3,cmdqv=on,ras=on \
       -cpu host -smp cpus=4 -m size=16G,slots=2,maxmem=64G -nographic \
       -object memory-backend-file,size=8G,id=m0,mem-path=/hugepages/,prealloc=on,share=off \
       -object memory-backend-file,size=8G,id=m1,mem-path=/hugepages/,prealloc=on,share=off \
       -numa node,memdev=m0,cpus=0-3,nodeid=0 -numa node,memdev=m1,nodeid=1 \
       -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 -numa node,nodeid=5 \
       -numa node,nodeid=6 -numa node,nodeid=7 -numa node,nodeid=8 -numa node,nodeid=9 \
       -device vfio-pci-nohotplug,host=0009:01:00.0,rombar=0,id=dev0,iommufd=iommufd0 \
       -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
       -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
       -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
       -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
       -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
       -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
       -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
       -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
       -device vfio-pci-nohotplug,host=0010:01:00.0,rombar=0,id=dev1,iommufd=iommufd0 \
       -bios /usr/share/AAVMF/AAVMF_CODE.fd \
       -device nvme,drive=nvme0,serial=deadbeaf1,bus=pcie.0 \
       -drive file=$VM_IMAGE,index=0,media=disk,format=qcow2,if=none,id=nvme0 \
       -device e1000,romfile=/usr/local/share/qemu/efi-e1000.rom,netdev=net0,bus=pcie.0 \
       -netdev user,id=net0,hostfwd=tcp::5558-:22,hostfwd=tcp::5586-:5586

Test runs for vEVENTQ enabled VM

nvidia@ubuntu:~$ sudo dmesg | grep "Default domain type"
[    0.274182] iommu: Default domain type: Translated
nvidia@ubuntu:~$ sudo journalctl -b|grep vcmdq -i|head -n1
Jul 22 15:32:54 ubuntu kernel: arm-smmu-v3 arm-smmu-v3.0.auto: allocated 524288 entries for vcmdq0
sudo ./tests/runtime/gflops/gflops
Running GFLOPs test...
&&&& PERF GFLOPs 0
&&&& gflops test PASSED
sudo tests/runtime/uvmConformance/uvmConformance -t texture_simple
Device 0: NVIDIA GH200 144G HBM3e
Driver version: 12090
Runtime version: 12090
Dispatcher pid: 1443
Running test texture_simple (pid: 1485)
^^^^ PASS: texture_simple (339.6ms)
Total time: 340ms
1 out of 1 ENABLED tests passed (100%)
&&&& uvmConformance test PASSED
nvidia@ubuntu:~$ sudo tests/runtime/uvmConformance/uvmConformance -t texture_simple
Device 0: NVIDIA GH200 144G HBM3e
Driver version: 12090
Runtime version: 12090
Dispatcher pid: 1537
Running test texture_simple (pid: 1579)
^^^^ PASS: texture_simple (339.6ms)
Total time: 340ms
1 out of 1 ENABLED tests passed (100%)
&&&& uvmConformance test PASSED
sudo tests/runtime/uvmConformance/uvmConformance -t ats_malloc_host
Device 0: NVIDIA GH200 144G HBM3e
Driver version: 12090
Runtime version: 12090
Dispatcher pid: 1038
Running test ats_malloc_host (pid: 1080)
^^^^ PASS: ats_malloc_host (349.2ms)
Total time: 349ms
1 out of 1 ENABLED tests passed (100%)
&&&& uvmConformance test PASSED

rppt and others added 2 commits July 18, 2025 07:38
The module code does not create a writable copy of the executable memory
anymore so there is no need to handle it in module relocation and
alternatives patching.

This reverts commit 9bfc482.

Signed-off-by: "Mike Rapoport (Microsoft)" <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 1d7e707)
Signed-off-by: Nirmoy Das <[email protected]>
Pretty much every caller of is_endbr() actually wants to test something at an
address and ends up doing get_kernel_nofault(). Fold the lot into a more
convenient helper.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Sami Tolvanen <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: "Masami Hiramatsu (Google)" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 72e213a)
Signed-off-by: Nirmoy Das <[email protected]>
@nirmoy nirmoy marked this pull request as draft July 22, 2025 13:24
@nirmoy nirmoy changed the title [draft][6.14-adv-next] Backport: Add Extended GPU Memory (EGM) virtualization support [draft][6.14-adv-next] Add Grace virtualization support to 6.14-adv, (upstream vEVENTQ + HW QUEUE and OOT vEGM) Jul 22, 2025
@nirmoy nirmoy force-pushed the 614_tech_preview_virt.1 branch 2 times, most recently from b4c3a62 to e30dedb Compare July 23, 2025 13:05
@nirmoy nirmoy marked this pull request as ready for review July 23, 2025 13:13
@nirmoy nirmoy changed the title [draft][6.14-adv-next] Add Grace virtualization support to 6.14-adv, (upstream vEVENTQ + HW QUEUE and OOT vEGM) [6.14-adv-next] Add Grace virtualization support to 6.14-adv, (upstream vEVENTQ + HW QUEUE and OOT vEGM) Jul 23, 2025
@nirmoy nirmoy force-pushed the 614_tech_preview_virt.1 branch 3 times, most recently from c36eb0a to df3cae8 Compare July 23, 2025 13:28
@nvmochs nvmochs requested review from clsotog and nvmochs July 23, 2025 14:43
@nirmoy nirmoy force-pushed the 614_tech_preview_virt.1 branch 3 times, most recently from e7e4110 to 7eeda3f Compare July 23, 2025 16:07
nirmoy and others added 14 commits July 23, 2025 09:21
…mapping_domain"

This reverts commit 78480b2.

Signed-off-by: Nirmoy Das <[email protected]>
…m ids

ASPEED VGA card has two built-in devices:
 0008:06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 06)
 0008:07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52)

Its toplogy looks like this:
 +-[0008:00]---00.0-[01-09]--+-00.0-[02-09]--+-00.0-[03]----00.0  Sandisk Corp Device 5017
                             |               +-01.0-[04]--
                             |               +-02.0-[05]----00.0  NVIDIA Corporation Device
                             |               +-03.0-[06-07]----00.0-[07]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
                             |               +-04.0-[08]----00.0  Renesas Technology Corp. uPD720201 USB 3.0 Host Controller
                             |               \-05.0-[09]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
                             \-00.1  PMC-Sierra Inc. Device 4028

The IORT logic populaties two identical IDs into the fwspec->ids array via
DMA aliasing in iort_pci_iommu_init() called by pci_for_each_dma_alias().

Though the SMMU driver had been able to handle this situation since commit
563b5cb ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs"), that
got broken by the later commit cdf315f ("iommu/arm-smmu-v3: Maintain
a SID->device structure"), which ended up with allocating separate streams
with the same stuffing.

On a kernel prior to v6.15-rc1, there has been an overlooked warning:
  pci 0008:07:00.0: vgaarb: setting as boot VGA device
  pci 0008:07:00.0: vgaarb: bridge control possible
  pci 0008:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
  pcieport 0008:06:00.0: Adding to iommu group 14
  ast 0008:07:00.0: stream 67328 already in tree   <===== WARNING
  ast 0008:07:00.0: enabling device (0002 -> 0003)
  ast 0008:07:00.0: Using default configuration
  ast 0008:07:00.0: AST 2600 detected
  ast 0008:07:00.0: [drm] Using analog VGA
  ast 0008:07:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16
  [drm] Initialized ast 0.1.0 for 0008:07:00.0 on minor 0
  ast 0008:07:00.0: [drm] fb0: astdrmfb frame buffer device

With v6.15-rc, since the commit bcb81ac ("iommu: Get DT/ACPI parsing
into the proper probe path"), the error returned with the warning is moved
to the SMMU device probe flow:
  arm_smmu_probe_device+0x15c/0x4c0
  __iommu_probe_device+0x150/0x4f8
  probe_iommu_group+0x44/0x80
  bus_for_each_dev+0x7c/0x100
  bus_iommu_probe+0x48/0x1a8
  iommu_device_register+0xb8/0x178
  arm_smmu_device_probe+0x1350/0x1db0
which then fails the entire SMMU driver probe:
  pci 0008:06:00.0: Adding to iommu group 21
  pci 0008:07:00.0: stream 67328 already in tree
  arm-smmu-v3 arm-smmu-v3.9.auto: Failed to register iommu
  arm-smmu-v3 arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22

Since SMMU driver had been already expecting a potential duplicated Stream
ID in arm_smmu_install_ste_for_dev(), change the arm_smmu_insert_master()
routine to ignore a duplicated ID from the fwspec->sids array as well.

Note: this has been failing the iommu_device_probe() since 2021, although a
recent iommu commit in v6.15-rc1 that moves iommu_device_probe() started to
fail the SMMU driver probe. Since nobody has cared about DMA Alias support,
leave that as it was but fix the fundamental iommu_device_probe() breakage.

Fixes: cdf315f ("iommu/arm-smmu-v3: Maintain a SID->device structure")
Cc: [email protected]
Suggested-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
(cherry picked from commit b00d249 linux-next)
Signed-off-by: Nirmoy Das <[email protected]>
There are new attach/detach/replace helpers in device.c taking care of both
the attach_handle and the fault specific routines for iopf_enable/disable()
and auto response.

Clean up these redundant functions in the fault.c file.

Link: https://patch.msgid.link/r/3ca94625e9d78270d9a715fa0809414fddd57e58.1738645017.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit dc10ba2)
Signed-off-by: Nirmoy Das <[email protected]>
…u_cookie

The IOMMU translation for MSI message addresses has been a 2-step process,
separated in time:

 1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address
    is stored in the MSI descriptor when an MSI interrupt is allocated.

 2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a
    translated message address.

This has an inherent lifetime problem for the pointer stored in the cookie
that must remain valid between the two steps. However, there is no locking
at the irq layer that helps protect the lifetime. Today, this works under
the assumption that the iommu domain is not changed while MSI interrupts
being programmed. This is true for normal DMA API users within the kernel,
as the iommu domain is attached before the driver is probed and cannot be
changed while a driver is attached.

Classic VFIO type1 also prevented changing the iommu domain while VFIO was
running as it does not support changing the "container" after starting up.

However, iommufd has improved this so that the iommu domain can be changed
during VFIO operation. This potentially allows userspace to directly race
VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and
VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()).

This potentially causes both the cookie pointer and the unlocked call to
iommu_get_domain_for_dev() on the MSI translation path to become UAFs.

Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA
address is already known during iommu_dma_prepare_msi() and cannot change.
Thus, it can simply be stored as an integer in the MSI descriptor.

The other UAF related to iommu_get_domain_for_dev() will be addressed in
patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by
using the IOMMU group mutex.

Link: https://patch.msgid.link/r/a4f2cd76b9dc1833ee6c1cf325cba57def22231c.1740014950.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 1f7df3a)
Signed-off-by: Nirmoy Das <[email protected]>
The two-step process to translate the MSI address involves two functions,
iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg().

Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as
it had to dereference the opaque cookie pointer. Now, the previous patch
changed the cookie pointer into an integer so there is no longer any need
for the iommu layer to be involved.

Further, the call sites of iommu_dma_compose_msi_msg() all follow the same
pattern of setting an MSI message address_hi/lo to non-translated and then
immediately calling iommu_dma_compose_msi_msg().

Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly
accepts the u64 version of the address and simplifies all the callers.

Move the new helper to linux/msi.h since it has nothing to do with iommu.

Aside from refactoring, this logically prepares for the next patch, which
allows multiple implementation options for iommu_dma_prepare_msi(). So, it
does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c
as it no longer provides the only iommu_dma_prepare_msi() implementation.

Link: https://patch.msgid.link/r/eda62a9bafa825e9cdabd7ddc61ad5a21c32af24.1740014950.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 9349887)
Signed-off-by: Nirmoy Das <[email protected]>
SW_MSI supports IOMMU to translate an MSI message before the MSI message
is delivered to the interrupt controller. On such systems, an iommu_domain
must have a translation for the MSI message for interrupts to work.

The IRQ subsystem will call into IOMMU to request that a physical page be
set up to receive MSI messages, and the IOMMU then sets an IOVA that maps
to that physical page. Ultimately the IOVA is programmed into the device
via the msi_msg.

Generalize this by allowing iommu_domain owners to provide implementations
of this mapping. Add a function pointer in struct iommu_domain to allow a
domain owner to provide its own implementation.

Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during
the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by
VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in
the iommu_get_msi_cookie() path.

Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain
doesn't change or become freed while running. Races with IRQ operations
from VFIO and domain changes from iommufd are possible here.

Replace the msi_prepare_lock with a lockdep assertion for the group mutex
as documentation. For the dmau_iommu.c each iommu_domain is unique to a
group.

Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 288683c)
Signed-off-by: Nirmoy Das <[email protected]>
Caller of the two APIs always provide a valid handle, make @handle as
mandatory parameter. Take this chance incoporate the handle->domain
set under the protection of group->mutex in iommu_attach_group_handle().

Link: https://patch.msgid.link/r/[email protected]
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 237603a)
Signed-off-by: Nirmoy Das <[email protected]>
iommufd does not use it now, so drop it.

Link: https://patch.msgid.link/r/[email protected]
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 473ec07)
Signed-off-by: Nirmoy Das <[email protected]>
iommu_attach_device_pasid() only stores handle to group->pasid_array
when there is a valid handle input. However, it makes the
iommu_attach_device_pasid() unable to detect if the pasid has been
attached or not previously.

To be complete, let the iommu_attach_device_pasid() store the domain
to group->pasid_array if no valid handle. The other users of the
group->pasid_array should be updated to be consistent. e.g. the
iommu_attach_group_handle() and iommu_replace_group_handle().

Link: https://patch.msgid.link/r/[email protected]
Suggested-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit e1ea9d3)
Signed-off-by: Nirmoy Das <[email protected]>
…h op of iommu drivers

The current implementation stores entry to the group->pasid_array before
the underlying iommu driver has successfully set the new domain. This can
lead to issues where PRIs are received on the new domain before the attach
operation is completed.

This patch swaps the order of operations to ensure that the domain is set
in the underlying iommu driver before updating the group->pasid_array.

Link: https://patch.msgid.link/r/[email protected]
Suggested-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Nicolin Chen <[email protected]>
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 5e9f822)
Signed-off-by: Nirmoy Das <[email protected]>
The drivers doing their own fwspec parsing have no need to call
iommu_fwspec_free() since fwspecs were moved into dev_iommu, as
returning an error from .probe_device will tear down the whole lot
anyway. Move it into the private interface now that it only serves
for of_iommu to clean up in an error case.

I have no idea what mtk_v1 was doing in effectively guaranteeing
a NULL fwspec would be dereferenced if no "iommus" DT property was
found, so add a check for that to at least make the code look sane.

Signed-off-by: Robin Murphy <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/36e245489361de2d13db22a510fa5c79e7126278.1740667667.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit 29c6e1c)
Signed-off-by: Nirmoy Das <[email protected]>
At the moment, if of_iommu_configure() allocates dev->iommu itself via
iommu_fwspec_init(), then suffers a DT parsing failure, it cleans up the
fwspec but leaves the empty dev_iommu hanging around. So far this is
benign (if a tiny bit wasteful), but we'd like to be able to reason
about dev->iommu having a consistent and unambiguous lifecycle. Thus
make sure that the of_iommu cleanup undoes precisely whatever it did.

Signed-off-by: Robin Murphy <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Link: https://lore.kernel.org/r/d219663a3f23001f23d520a883ac622d70b4e642.1740753261.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <[email protected]>
(cherry picked from commit 3832862)
Signed-off-by: Nirmoy Das <[email protected]>
Currently, IRQ_MSI_IOMMU is selected if DMA_IOMMU is available to provide
an implementation for iommu_dma_prepare/compose_msi_msg(). However, it'll
make more sense for irqchips that call prepare/compose to select it, and
that will trigger all the additional code and data to be compiled into
the kernel.

If IRQ_MSI_IOMMU is selected with no IOMMU side implementation, then the
prepare/compose() will be NOP stubs.

If IRQ_MSI_IOMMU is not selected by an irqchip, then the related code on
the iommu side is compiled out.

Link: https://patch.msgid.link/r/a2620f67002c5cdf974e89ca3bf905f5c0817be6.1740014950.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
(cherry picked from commit 96093fe)
Signed-off-by: Nirmoy Das <[email protected]>
ankita-nv and others added 18 commits July 23, 2025 09:22
nvgrace-egm exposes the API register_egm_node & unregister_egm_node
to manage EGM (Extended GPU Memory) present on the system.

To allow out-of-tree driver such as nvidia-vgpu-vfio make use of them,
move the declaration to a new nvgrace-egm.h in include.

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit bed340f https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit a961663 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
…tion

Free the kmalloc'd region when the EGM is unregistered.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit fc592b9 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit f24760c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Move region hash initiaization alongside the other region initialization
statements to avoid situations where the hash table was not properly
initialized.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 8021c1d https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit e1264a6 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
…rrors

Update error handling within EGM regiration routine to catch and
return errors to the caller.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit a57210c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit a706ff8 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Detect and handle a failure from the EGM registration service.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit f18eee3 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 8371b68 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Fix source to resolve checkpatch warnings

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit c7b47b7 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit dfa0e06 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Fix minor syntax errors from sparse.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit bbb64e6 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit fe78194 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Return the intended errno upon a copyout fault, remove unnecessary
checks following container_of pointer derivation, and use the correct
macro and types for overflow checking.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 429910b https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit bda63f3 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Use the correct macro and types for overflow checking.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit afa8f63 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit d110330 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Ensure ACPI table reads are successful prior to using the value.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit b2947b0 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 9258355 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Some environments may provide a "nvidia,egm-retired-pages-data-base” but
fail to populate it with a base address, leaving it NULL. Mapping this
invalid value results in a synchronous exception when the region is first
touched. Detect a NULL value, generate a warning to draw attention to the
firmware bug, and return without mapping.

INFO:    th500_ras_intr_handler: External Abort reason=1 syndrome=0x92000410 flags=0x1
[   82.104493] Internal error: synchronous external abort: 0000000096000410 [#1] SMP
[   82.114898] Modules linked in: nvgrace_gpu_vfio_pci(E) nvgrace_egm(E)
[   82.257218] CPU: 0 PID: 10 Comm: kworker/0:1 Tainted: G           OE      6.8.12+ #5
[   82.265135] Hardware name: NVIDIA GH200 P5042, BIOS 24103110 20241031
[   82.271720] Workqueue: events work_for_cpu_fn
[   82.276180] pstate: 03400009 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[   82.283298] pc : register_egm_node+0x2cc/0x440 [nvgrace_egm]
[   82.289087] lr : register_egm_node+0x2c4/0x440 [nvgrace_egm]
[   82.294872] sp : ffff8000802ebc30
[   82.298254] x29: ffff8000802ebc60 x28: 00000000000000ff x27: 0000000000000000
[   82.305550] x26: ffff000087a320c8 x25: ffff0000a5700000 x24: ffff000087a32000
[   82.312846] x23: ffffa77cd758e368 x22: 0000000000000000 x21: ffffa77cd758c640
[   82.320141] x20: ffffa77cd758e170 x19: ffff800081e7d000 x18: ffff800080293038
[   82.327437] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   82.334732] x14: 0000000000000000 x13: 65203a65646f6e5f x12: 0000000000000000
[   82.342027] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[   82.349322] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[   82.356618] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   82.363913] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff800081e7d000
[   82.371210] Call trace:
[   82.373705]  register_egm_node+0x2cc/0x440 [nvgrace_egm]
[   82.379135]  nvgrace_gpu_probe+0x2ac/0x528 [nvgrace_gpu_vfio_pci]
[   82.385366]  local_pci_probe+0x4c/0xe0
[   82.389198]  work_for_cpu_fn+0x28/0x58
[   82.393026]  process_one_work+0x168/0x3f0
[   82.397123]  worker_thread+0x360/0x480
[   82.400952]  kthread+0x11c/0x128
[   82.404248]  ret_from_fork+0x10/0x20
[   82.407906] Code: d2820001 940002b3 aa0003f3 b4fffac0 (f9400017)
[   82.414134] ---[ end trace 0000000000000000 ]---

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 7ba2930 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 349fb1c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
In an effort to simplify the programming model, use a symmetrical model
for the the EGM regsiration APIs. This avoids the caller needing to keep
a cookie or even have knowlege of if EGM is supported. Update the EGM
unregisration API to use the PCI device as its parameter.

Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit d8903ec https://github.com/nvmochs/NV-Kernels/tree/vegm_01232025)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 5839fc5 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
…egions

GB200 systems could have multiple GPUs associated with
an EGM region. For proper EGM functionality the host
topology in terms of GPU affinity has to be replicated
in the VM. Hence the EGM region structure must track the
GPU devices belonging to the same socket.

On the device probe, the device pci_dev struct is added to a
linked list of the appropriate EGM region.

Similarly on device remove, the pci_dev struct for the GPU
is removed from the EGM region.

Signed-off-by: Ankit Agrawal <[email protected]>
Ref: sj24: /home/nvidia/ankita/kernel_patches/0001_vfio_nvgrace-egm_track_GPUs_associated_with_the_EGM_regions.patch
(koba: Enhance error handling, Remove egm_node from unregister_egm_node
and move destroy_egm_chardev a little forward)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 0222c35 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
To replicate the host EGM topology in the VM in terms of
the GPU affinity, the userspace need to be aware of which
GPUs belong to the same socket as the EGM region.

Expose the list of GPUs associated with an EGM region
through sysfs. The list can be queried from the location
/sys/devices/virtual/egm/egmX/gpu_devices.

Signed-off-by: Ankit Agrawal <[email protected]>
Ref: sj24: /home/nvidia/ankita/kernel_patches/0002_vfio_nvgrace-egm_list_gpus_through_sysfs.patch
(koba: Enchance error handling for sysfs_create_group)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit fec2356 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
To allocate the EGM, the userspace need to know it's size. Currently,
there is no easy way for the userspace to determine that.

Make nvgrace-egm expose the size through sysfs that can be queried
by the userspace from /sys/devices/virtual/egm/egmX/egm_size.

Signed-off-by: Ankit Agrawal <[email protected]>
Ref: sj24: /home/nvidia/ankita/kernel_patches/0003_vfio_nvgrace-egm_expose_the_egm_size_through_sysfs.patch
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit dcdcef2 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
… allocations

Add missing null pointer checks after vzalloc() calls in the NVIDIA
Grace GPU driver's EGM (External GPU Memory) handling code. This
prevents potential null pointer dereferences in the memory failure
handling and bad page fetching functions, providing proper error
handling for allocation failures.

Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 63127e2 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Add CONFIG_NVGRACE_EGM with policy 'm' for arm64 architecture.

Signed-off-by: Nirmoy Das <[email protected]>
On platforms without the mig HW bug (e.g. Grace-Blackwell) there is not a
requirement to create the resmem region. Accordingly, this region is not
configured on these platforms, which leads to the following print when the
device is closed:

resource: Trying to free nonexistent resource <0x0000000000000000-0x000000000000ffff>

Avoid calling unregister_pfn_address_space for resmem when the region is
not being used.

Fixes: 2d21b7b ("vfio/nvgrace-gpu: register device memory for poison handling")

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Nirmoy Das <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit bd0187d https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
@nirmoy nirmoy force-pushed the 614_tech_preview_virt.1 branch from 7eeda3f to e2d029c Compare July 23, 2025 16:22
Copy link
Collaborator

@nvmochs nvmochs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran pick analyzer on c7d2a4a^..e2d029c, the majority of the patches match upstream exactly. Of the ones that were flagged, reviewed and found they were only called out due to minor context differences or the addition of "NVIDIA: SAUCE:" tags.

Manual review of the patches with backport tags, no issues or concerns.

Lastly, confirmed pick tags and trailers are present and correct.

Acked-by: Matthew R. Ochs <[email protected]>

@clsotog
Copy link
Collaborator

clsotog commented Jul 24, 2025

Question I see CONFIG_NVGRACE_EGM like configured in 2 places:
4ff9dc6 NVIDIA: SAUCE: arm64: configs: enable NVGRACE_EGM as module
fa811f0 NVIDIA: SAUCE: arm64: configs: Build CONFIG_NVGRACE_EGM as LKM

Do we need both places?

@nvmochs
Copy link
Collaborator

nvmochs commented Jul 24, 2025

Question I see CONFIG_NVGRACE_EGM like configured in 2 places: 4ff9dc6 NVIDIA: SAUCE: arm64: configs: enable NVGRACE_EGM as module fa811f0 NVIDIA: SAUCE: arm64: configs: Build CONFIG_NVGRACE_EGM as LKM

Do we need both places?

4ff9dc6 sets it in the annotations, fa811f0 sets it in the defconfig. The defconfig one is not really needed in the Ubuntu tree, but I have continued to carry it forward since some CSPs were not using the annotations.

@clsotog
Copy link
Collaborator

clsotog commented Jul 24, 2025

So CSPs can be using this exact git tree but they do not use the annotations to build the kernel?
There are more things in the annotations for our kernel like tpm, cpufreq performance, etc. So they use the grace doc to get the other config needed?

@nvmochs
Copy link
Collaborator

nvmochs commented Jul 24, 2025

So CSPs can be using this exact git tree but they do not use the annotations to build the kernel? There are more things in the annotations for our kernel like tpm, cpufreq performance, etc. So they use the grace doc to get the other config needed?

We have advised them now to use the annotations, and updated the reference code release notes with the command to generate the .config from it. But of course we cannot force them. =)

If you feel strongly about it we can remove the defconfig commit, I don't have a preference either way.

@clsotog
Copy link
Collaborator

clsotog commented Jul 24, 2025

no its ok.Leave it.
I remembered the issue that I was helping Nathan and it was a config issue. If we now recommend the annotations thats great!

Copy link
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <[email protected]>

@nvmochs
Copy link
Collaborator

nvmochs commented Jul 24, 2025

Merged, closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.