24.04 linux nvidia 6.17 next.mpam.extras #265

tdavenvidia · 2025-12-16T02:06:21Z

This PR is based on the @fyu1''s PR #230.
In addition to PR 230 , this PR add following patches:

MPAM KUnit fixes:

[NVIDIA: SAUCE: resctrl/tests: mpam_devices: compare only meaningful bytes of mpam_props])
[NVIDIA: SAUCE: resctrl/mpam: Align packed mpam_props to fix arm64 KUnit alignment fault]
[NVIDIA: SAUCE: arm_mpam: resctrl: Fix MPAM kunit]

Annotations change:
4. [NVIDIA: SAUCE: [Config] Update RESCTRL annotations]

Missing patch:
5. [x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode]

BugLink: https://bugs.launchpad.net/bugs/2114230 Please refer https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md for details regarding FFA device details for secure EC services communication. 1. We need to get virtual IDs which a EC service supports. In the FFA node, the _DSD object contains this information. If we look the sample from above document, Name(_DSD, Package() { ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), //Device Prop UUID Package() { Package(2) { "arm-arml0002-ffa-ntf-bind", Package() { 1, // Revision 2, // Count of following packages Package () { ToUUID("330c1273-fde5-4757-9819-5b6539037502"), // Service1 UUID Package () { 0x01, //Cookie1 (UINT32) 0x07, //Cookie2 } }, Package () { ToUUID("b510b3a3-59f6-4054-ba7a-ff2eb1eac765"), // Service2 UUID Package () { 0x01, //Cookie1 0x03, //Cookie2 } } } } } }) // _DSD() Then it uses a nexted package structure. nvidia_ffa_fill_notification_map() added in this commit parses the _DSD object and fill the notification id map for that service. 2. Once the virtual ID is get then it needs to map to physical ID by invoking function 1 in the notify service. 3. The UUID for notification service is B510B3A3-59F6-4054-BA7A-FF2EB1EAC765. An FFA device will be created for this notification service by ffa_module. This notify service needs to be probed first. To make that happen, a separate ffa_driver instance is created and it is getting registered first. 4. We can do 1:1 mapping between virtual ID and hardware ID. 5. We need to invoke notify_request() with hardware notification ID. It registers callback function for notification. 6. Once notification comes then we need to evaluate _DSM method with virtual ID (which will be mapped same as hardware ID). 7. The function 2 in the notify service should destroy the mapping. But it is nither implemented in the firmware not its documentation is available. A TODO comment is added in nvidia_ffa_notification_destroy(). Also, if we unload and reload the modules, the existing mapping still exists. In nvidia_ffa_notification_setup(), ignore the error for this case. When firmware is updated, then the error will be returned. 8. The notification service FFA device is needed by each EC secure services FFA device to get virtual notification list. Now following device dependency chain is created. FFA device <- notification service FFA device <- EC secure services FFA device To satisfy this, call driver registration in its dependent driver probe routine. Similarly, do the driver registration in its dependent driver removed routine. Signed-off-by: Abhishek Sahu <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Jamie Nguyen <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 1287a1d noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

… EC driver BugLink: https://bugs.launchpad.net/bugs/2114230 The NVIDIA FFA and EC secure services driver enables the communication with EC (Embedded Controller). Make this driver built-in to enable EC communication at early boot. Signed-off-by: Abhishek Sahu <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Jamie Nguyen <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 9ea0251) (cherry picked from commit 9ea0251 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2114759 Add quirk function to skip pcie secondary bus reset. PCIe gen4 link will downgrade to gen1 after SBR, so we have to skip this operation. Signed-off-by: Jerry.Guo <[email protected]> Signed-off-by: Yenchia Chen <[email protected]> Signed-off-by: Abhishek Sahu <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Jamie Nguyen <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 0185574 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

…pinctrl driver BugLink: https://bugs.launchpad.net/bugs/2117784 Kernel GPIO subsystem mapping hardware pin number to a different range of gpio number. Add gpio-range structure to hold the mapped gpio range in pinctrl driver. That enables the kernel to search a range of mapped gpio range against a pinctrl device. Signed-off-by: Jonas Chen <[email protected]> Signed-off-by: Yenchia Chen <[email protected]> Signed-off-by: Abhishek Sahu <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Jamie Nguyen <[email protected]> Acked-by: Acked-by: nvmochs Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 1049985 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2117784 Add acpi support in the shared part of pinctrl driver. Parsing hardware base addresses and irq naumber to initialize eint accroding to the acpi table data. Signed-off-by: Jonas Chen <[email protected]> Signed-off-by: Yenchia Chen <[email protected]> Signed-off-by: Abhishek Sahu <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Jamie Nguyen <[email protected]> Acked-by: Acked-by: nvmochs Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (backported from commit cdce65d noble:linux-nvidia-6.14) [maskedarray: context adjusted due to commit 86dee87: "pinctrl: mediatek: Fix the invalid conditions"] Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2117784 Add mt8901 pinctrl, gpio and eint driver implementation. Signed-off-by: Jonas Chen <[email protected]> Signed-off-by: Yenchia Chen <[email protected]> Signed-off-by: Abhishek Sahu <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Jamie Nguyen <[email protected]> Acked-by: Acked-by: nvmochs Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (backported from commit 1fc7a58 noble:linux-nvidia-6.14) [maskedarray: context adjusted for missing commit a3fe132: "pinctrl: mediatek: Add pinctrl driver for mt8189"] Signed-off-by: Abdur Rahman <[email protected]>

…CTRL_MT8901 BugLink: https://bugs.launchpad.net/bugs/2117784 Signed-off-by: Abhishek Sahu <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Jamie Nguyen <[email protected]> Acked-by: Acked-by: nvmochs Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 0bd85d0) (cherry picked from commit 0bd85d0 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2118357 commit d0038ee ("NVIDIA: SAUCE: Add support for EC secure service communication") added nvidia_ffh_handler() function. While copying the data back into ACPI FFH packet, it uses the request length. The response data can be larger than request length. The response length can't be fetched in the linux FFH handler function. We can copy all the bytes from ffa_data.data. The ACPI AML code will only use the required number bytes from this. Normally we don't need response length to be known. The ACPI table are not using that. It is parsing response data directly. In the latest revision of spec, the length field itself has been removed https://github.com/OpenDevicePartnership/documentation/blob/b23acb09f7cf03a5c3167509533f396d547e6291/guide_book/src/specs/ec_interface/secure-ec-services-overview.md#operation-region-definition For DIGITS GB10, it is using older revision of spec and the launch is planned with older revision of spec. When we move to latest revision, then we need to copy all data bytes for both request and response. The info->length is corresponding to FFH buffer length in ACPI table. Following is the code in ACPI table Name (_HID, "MSFT000C") // _HID: Hardware ID OperationRegion (AFFH, FFixedHW, 0x04, 0x90) info->length will be 0x90 (144) bytes. ffa_packet->length in the older revision is valid data bytes (https://github.com/OpenDevicePartnership/documentation/blob/45ad9b30be0f40e229deed2fef7a60d0b0b591f5/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md) struct nvidia_ec_ffa_packet *ffa_packet = (struct nvidia_ec_ffa_packet *)value; This value buffer length should be info->length. We are taking minimum of sizeof(ffa_data.data) = 112 and (info->length = 144) - (offsetof(struct nvidia_ec_ffa_packet, rawdata) = 18) = 126, so ffh_copy_len will be 112 for the current DIGITS ACPI implementation. In the latest revision, this length mismatch is also fixed. Raw data will start at offset 32, so there both will come as 112. Fixes: d0038ee ("NVIDIA: SAUCE: Add support for EC secure service communication") Signed-off-by: Abhishek Sahu <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 141bd56 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2118663 Add cpu part and model macro definitions for NVIDIA Olympus core. Signed-off-by: Shanker Donthineni <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 9273361 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2118663 Set CONFIG_ARM64_BRBE=y for arm64 linux-nvidia-6.14. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 26a417a noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 The kernel MM currently does not handle ECC errors / poison on a memory region that is not backed by struct pages. If a memory region is mapped using remap_pfn_range(), but not added to the kernel, MM will not have associated struct pages. Add a new mechanism to handle memory failure on such memory. Make kernel MM expose a function to allow modules managing the device memory to register a failure function and the physical address space associated with the device memory. MM maintains this information as interval tree. The registered memory failure function is used by MM to notify the kernel module managing the PFN, so that the module may take any required action. The module for example may use the information to track the poisoned pages. In this implementation, kernel MM follows the following sequence similar (mostly) to the memory_failure() handler for struct page backed memory: 1. memory_failure() is triggered on reception of a poison error. An absence of struct page is detected and consequently memory_failure_pfn() is executed. 2. memory_failure_pfn() call the newly introduced failure handler exposed by the module managing the poisoned memory to notify it of the problematic PFN. 3. memory_failure_pfn() unmaps the stage-2 mapping to the PFN. 4. memory_failure_pfn() collects the processes mapped to the PFN. 5. memory_failure_pfn() sends SIGBUS (BUS_MCEERR_AO) to all the processes mapping the faulty PFN using kill_procs(). 6. An access to the faulty PFN by an operation in VM at a later point is trapped and user_mem_abort() is called. 7. The vma ops fault function gets called due to the absence of Stage-2 mapping. It is expected to return VM_FAULT_HWPOISON on the PFN. 8. __gfn_to_pfn_memslot() then returns KVM_PFN_ERR_HWPOISON, which cause the poison with SIGBUS (BUS_MCEERR_AR) to be sent to the QEMU process through kvm_send_hwpoison_signal(). Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (backported from commit f037dd7 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) (koba: Add a pgoff parameter to __add_to_kill) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (backported from commit 4bb248a https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) [Nirmoy: s/folio_shift(page_folio(p))/page_shift(compound_head(p)), add missing arg in page_address_in_vma()] Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (backported from commit a3fe67d) [maskedarray: context conflict due to upstream commit: c1f1fda "ACPI: APEI: handle synchronous exceptions in task work". Adjusted context] Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2125434 Correct the logic used to identify the absence of struct page during memory_failure(). Fixes: a3fe67d ("NVIDIA: SAUCE: mm: handle poisoning of pfn without struct pages") Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Jamie Nguyen <[email protected]> Acked-by: Jacob Martin <[email protected]> Acked-by: Abdur Rahman <[email protected]> Signed-off--by: Brad Figg <[email protected]> (backported from 94017e2 noble/nvidia-6.14-next) [maskedarray: removed pfn_t.h header file from mm/memory-failure.c as this was no longer needed and removed upstream] Signed-off-by: Abdur Rahman <[email protected]>

…apped pfn BugLink: https://bugs.launchpad.net/bugs/2119656 The fixup_user_fault() currently does not expect a VM_FAULT_HWPOISON and hence does not check for it while calling vm_fault_to_errno(). Since we now have a new code path which can trigger such case, change fixup_user_fault to look for VM_FAULT_HWPOISON. Also make hva_to_pfn_remapped check for -EHWPOISON and communicate the poison fault up to the user_mem_abort(). Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 3e895d5 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (backported from commit 2c0d6cc https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) [Nirmoy: fix few offset shifts, adopt to b176f4b] Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit f4b8c6c noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 The GHES code allows calling of memory_failure() on the PFNs that pass the pfn_valid() check. This contract is broken for the remapped PFNs which fails the check and ghes_do_memory_failure() returns without triggering memory_failure(). Update code to allow memory_failure() call on PFNs failing pfn_valid(). Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit cbcf5ec https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 7e95d6c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (backported from commit 388d89b nvidia-6.14) [maskedarray: adjusted conflict due to upstream commit c1f1fda: "ACPI: APEI: handle synchronous exceptions in task work"] Signed-off-by: Abdur Rahman <[email protected]>

…ndling BugLink: https://bugs.launchpad.net/bugs/2119656 The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA (Qemu) using remap_pfn_range() without adding the memory to the kernel. The device memory pages are not backed by struct page. Patches 1-3 implements the mechanism to handle ECC/poison on memory page without struct page and expose a registration function. This new mechanism is leveraged here. The module registers its memory region with the kernel MM for ECC handling using the register_pfn_address_space() registration API exposed by the kernel. It also defines a failure callback function pfn_memory_failure() to get the poisoned PFN from the MM. The module track poisoned PFN using a hastable. The PFN is communicated by the kernel MM to the module through the failure function, which push the appropriate memory offset to the hashtable. The module also defines a VMA fault ops for the module. It returns VM_FAULT_HWPOISON in case the memory offset is found in the hashtable. [1] https://lore.kernel.org/all/[email protected]/ Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 2fae9af https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit d9c50d2 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 33a2f83 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 9433fd4 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit a1bdf88 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 60f9b04 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 3eff6df https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 6c6e893 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit a6a3ccc noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

…IO_CONTAINER BugLink: https://bugs.launchpad.net/bugs/2119656 CONFIG_IOMMUFD_VFIO_CONTAINER is the VFIO compatible mode provided by iommufd core, to replace VFIO_IOMMU_TYPE1. Enable it instead. This might be used by VFIO mdev feature. Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 8188507 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit b0d6efb https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 9e7a939 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

…e in VMA BugLink: https://bugs.launchpad.net/bugs/2119656 When the Grace Hopper/Blackwell system is setup with EGM mode in virtualization, the system memory is partitioned into two: A Host OS visible memory and a second EGM region that is not added to the host OS. The EGM region is assigned to the VM as its system memory with the QEMU VMA mapped through remap_pfn_range. Currently KVM sets up the stage-2 mapping for memory that is not added to the kernel with device properties. It thus does not allow support for execution fault on such region. Since the EGM memory is mapped through remap_pfn_range and not added to the kernel, such memory is set without execution fault support. This patch intends to update the KVM behaviour. It is an extension of the proposal [1] to make KVM determine whether a region should have NORMAL memory properties based on the VMA pgprot. The KVM behavior is changed to set a region with support of executable fault if and only if its VMA is mapped cacheable. The EGM memory is NORMAL system memory that is not added to the kernel. It is safe in terms of execution fault and is expected to display all properties of NORMAL memory. The patch enables this use case. Check QEMU VMA pgprot to check if it is mapped as Normal cacheable memory and allow exec fault. Link: https://lore.kernel.org/lkml/[email protected] [1] Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit e38eceb https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (backported from commit b6bd6da https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) [Nirmoy: s/device/s2_force_noncacheable, s/mapping_type()/FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(page_prot))] Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 21c5951 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 The Extended GPU Memory (EGM) feature enables the GPU access to the system memory across sockets and nodes. In this mode, the physical memory can be allocated for GPU usage from anywhere in a multi-node system. The feature is being extended to virtualization. EGM when enabled in the virtualization stack, the host memory is partitioned into 2: One partition for the Host OS usage, and a second EGM region. The EGM region essentially becomes the system memory of the VM. The following figure shows the memory map in the virtualization environment. |---- Sysmem ----| |--- GPU mem ---| VM Memory Map | | | | | | | | |------ EGM -----|--Host Mem----| |--- GPU mem ---| Host Memory Map The EGM region is not available to the host memory for its usage as it is not added to the kernel. Its base HPA and the length is communicated through the DSDT entries. A linear mapping between the VM IPA and system HPA is a requirement for EGM support. The EGM region is thus assigned to a VM by mapping the QEMU VMA to a linearly increasing HPA of the EGM region using remap_pfn_range(). Introduce a new nvgrace-egm helper module to nvgrace-gpu to manage the EGM/VM region for the VM. nvgrace-egm module handles the following: 1. Fetch the EGM memory properties (base HPA, length, proximity domain). 2. Create a char device that can be used as memory-backend-file by Qemu for the VM and implement file operations. The char device is /dev/egmX, where X is the PXM node ID of the EGM being mapped fetched in 1. 3. Zero the EGM memory on first device open(). 4. Map the QEMU VMA to the EGM region using remap_pfn_range. 5. Cleaning up state and destroying the chardev on device unbind. Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 892ac24 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 3a1b819 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 8807f4b noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 It is possible for some system memory pages on the EGM to have uncorrectable ECC errors. A list of pages known with such errors (referred as retired pages) are maintained by the Host UEFI. The Host UEFI populates such list in a reserved region. It communicates the SPA of this region through a ACPI DSDT property. nvgrace-egm module is responsible to store the list of retired page offsets to be made available for usermode processes. The module: 1. Get the reserved memory region SPA and maps to it to fetch the list of bad pages. 2. Calculate the retired page offsets in the EGM and stores it. 3. Expose an ioctl to allow querying of the offsets. The ioctl is called by usermode apps such as QEMU to get the retired page offsets. The usermode apps are expected to take appropriate action to communicate the list to the VM. Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit be54641 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit c4cb193 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 6b0a6d6 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

…errors handling BugLink: https://bugs.launchpad.net/bugs/2119656 The Extended GPU Memory (EGM) is mapped through remap_pfn_range() and is not backed by struct pages. Currently, memory_failure() on such region is unsupported in kernel MM. There is a proposal to handle such memory region [1]. The implementation exports APIs to register a memory region and a corresponding callback function with the kernel MM. On the occurrence of memory failure on the registered region, kernel MM calls the callback to communicate the faulting PFN. This patch registers the EGM memory and the callback function nvgrace_egm_pfn_memory_failure with the kernel MM. On memory failure, nvgrace_egm_pfn_memory_failure is triggered and the nvgrace-egm module adds the faulting PFN to the hashtable tracking retired ECC error pages. It also implements a fault VM ops to check if the access is being made to a page known with ECC errors and returns VM_FAULT_HWPOISON in such case. Link: https://lore.kernel.org/all/[email protected]/ [1] Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (backported from commit 215f345 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) (koba: vmalloc.h exists) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 4eba6e1 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit bd280a2 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 5bb23c1 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 7d2ea55 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 077c834 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 nvgrace-egm exposes the API register_egm_node & unregister_egm_node to manage EGM (Extended GPU Memory) present on the system. To allow out-of-tree driver such as nvidia-vgpu-vfio make use of them, move the declaration to a new nvgrace-egm.h in include. Signed-off-by: Ankit Agrawal <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit bed340f https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit a961663 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 020c46c noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

…tion BugLink: https://bugs.launchpad.net/bugs/2119656 Free the kmalloc'd region when the EGM is unregistered. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit fc592b9 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit f24760c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 374b166 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 Move region hash initiaization alongside the other region initialization statements to avoid situations where the hash table was not properly initialized. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 8021c1d https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit e1264a6 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 0f8a098 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

…rrors BugLink: https://bugs.launchpad.net/bugs/2119656 Update error handling within EGM regiration routine to catch and return errors to the caller. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit a57210c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit a706ff8 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit edc0ac0 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 Detect and handle a failure from the EGM registration service. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit f18eee3 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit 8371b68 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit be5ae8f noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 Fix source to resolve checkpatch warnings Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit c7b47b7 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit dfa0e06 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit 0c2fbf6 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2119656 Fix minor syntax errors from sparse. Signed-off-by: Matthew R. Ochs <[email protected]> Acked-by: Kai-Heng Feng <[email protected]> Acked-by: Carol L. Soto <[email protected]> Acked-by: Koba Ko <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit bbb64e6 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next) Signed-off-by: Koba Ko <[email protected]> Acked-by: Matthew R. Ochs <[email protected]> Acked-by: Carol L. Soto <[email protected]> Signed-off-by: Matthew R. Ochs <[email protected]> (cherry picked from commit fe78194 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next) Signed-off-by: Nirmoy Das <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Noah Wager <[email protected]> Acked-by: Jacob Martin <[email protected]> Signed-off--by: Brad Figg <[email protected]> (cherry picked from commit b192960 noble:linux-nvidia-6.14) Signed-off-by: Abdur Rahman <[email protected]>

Resctrl previously had a 'range' schema format that took some kind of number. This has since been split into percentage, MB/s and an AMD platform specific scheme. As range is no longer used, remove it. The last user is mba_sc which should be described as taking MB/s. Signed-off-by: James Morse <[email protected]> (cherry picked from commit 93fda1d6632174fefddfe5e712110dd1e2947c95 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

…tmap controls MPAM has cache capacity controls that effectively take a percentage. Resctrl supports percentages, but the collection of files that are exposed to describe this control belong to the MB resource. To find the minimum granularity of the percentage cache capacity controls, user-space is expected to rad the banwdidth_gran file, and know this has nothing to do with bandwidth. The only problem here is the name of the file. Add duplicates of these properties with percentage and bitmap in the name. These will be exposed based on the schema format. The existing files must remain tied to the specific resources so that they remain visible to user-space. Using the same helpers ensures the values will always be the same regardless of the file used. These files are not exposed until the new RFTYPE schema flags are set on a resource 'fflags'. Signed-off-by: James Morse <[email protected]> (cherry picked from commit 673bcb00d2371a2876e164da55d642fdf7657b8d https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

…n schema format MPAM has cache capacity controls that effectively take a percentage. Resctrl supports percentages, but the collection of files that are exposed to describe this control belong to the MB resource. New files have been added that are selected based on the schema format. Apply the flags to enable these files based on the schema format. Add a new fflags_from_schema() that is used for controls. Signed-off-by: James Morse <[email protected]> (cherry picked from commit a837ccc258380d6aeef86df709cc0484b60a4acf https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

If more schemas are added to resctrl, user-space needs to know how to configure them. To allow user-space to configure schema it doesn't know about, it would be helpful to tell user-space the format, e.g. percentage. Add a file under info that describes the schema format. Percentages and 'mbps' are implicitly decimal, bitmaps are expected to be in hex. Signed-off-by: James Morse <[email protected]> (cherry picked from commit b457019d995b2849e683aef0fd89066e64c679a4 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

MPAM can have both cache portion and cache capacity controls on any cache that supports MPAM. Cache portion bitmaps can be exposed via resctrl if they are implemented on L2 or L3. The cache capacity controls can not be used to isolate portions, which is in implicit in the L2 or L3 bitmap provided by user-space. These controls need to be configured with something more like a percentage. Add the resource enum entries for these two resources. No additional resctrl code is needed because the architecture code will specify this resource takes a 'percentage', re-using the support previously used only for the MB resource. Signed-off-by: James Morse <[email protected]> (cherry picked from commit b601bbf375b016c417db4ec0e8bd6ae58b9057aa https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

…m cmax MPAM's maximum cache-capacity controls take a fixed point fraction format. Instead of dumping this on user-space, convert it to a percentage. User-space using resctrl already knows how to handle percentages. Signed-off-by: James Morse <[email protected]> (cherry picked from commit 183d4c43260089e6b51518e50427d0f04a6af875 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

The cpu hotplug lock has a helper lockdep_assert_cpus_held() that makes it easy to annotate functions that must be called with the cpu hotplug lock held. Do the same for memory. Signed-off-by: James Morse <[email protected]> (cherry picked from commit f40d4b8451b3d9e197166ff33104bd63f93709d0 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

…PU hotplug lock resctrl takes the read side CPU hotplug lock whenever it is working with the list of domains. This prevents a CPU being brought online and the list being modified while resctrl is walking the list, or picking CPUs from the CPU masks. If resctrl domains for CPU-less NUMA nodes are to be supported, this would not be enough to prevent the domain list form being modified as a NUMA node can come online with only memory. Take the memory hotplug lock whenever the CPU hotplug lock is taken. Signed-off-by: James Morse <[email protected]> (cherry picked from commit f5a082989a5f40b9b95515d68b230f8125648fdb https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

…arch stubs Resctrl expects the domain IDs for the 'MB' resource to be the corresponding L3 cache-ids. This is a problem for platforms where the memory bandwidth controls are implemented somewhere other than the L3 cache, and exist on a platform with CPU-less NUMA nodes. Such platforms can't currently be exposed via resctrl as not all the memory bandwidth can be controlled. Add a mount option to allow user-space to opt-in to the domain IDs for the MB resource to be the NUMA nid instead. Signed-off-by: James Morse <[email protected]> (cherry picked from commit ae8929caac02dccdc932666c1d8c906dda541bf1 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

idx is not used. Remove it to avoid build warning. The author is James but he doesn't add his Signed-off-by. (backported from commit c9b4fabe0b1b4805186d4326d47547993a02d191 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) [fenghuay: Change subject to a meaningfull one. Add commit message.] Signed-off-by: Fenghua Yu <[email protected]>

…stead of cache-id The MB domain ids are the L3 cache-id. This is unfortunate if the memory bandwidth controls are implemented for CPU-less NUMA nodes as there is no L3 whose cache-id can be used to expose these controls to resctrl. When picking the class to use as MB, note whether it is possible for the NUMA nid to be used as the domain-id. By default the MB resource will use the cache-id. Signed-off-by: James Morse <[email protected]> (cherry picked from commit c2506e7fdb9e9de624af635f5060a1fe56a6bb80 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

… work with a set of CPUs mpam_resctrl_offline_domain_hdr() expects to take a single CPU that is going offline. Once all CPUs are offline, the domain header is removed from its parent list, and the structure can be freed. This doesn't work for NUMA nodes. Change the CPU passed to mpam_resctrl_offline_domain_hdr() and mpam_resctrl_domain_hdr_init to be a cpumask. This allows a single CPU to be passed for CPUs going offline, and cpu_possible_mask to be passed for a NUMA node going offline. Signed-off-by: James Morse <[email protected]> (cherry picked from commit 093483e5bca0aef546208b32eedf59f3aac665ff https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

…domain() to have CPU and node mpam_resctrl_alloc_domain() brings a domain with CPUs online. To allow for domains that don't have any CPUs, split it into a CPU and NUMA node version. Signed-off-by: James Morse <[email protected]> (cherry picked from commit 817d04bd296871b61dd70f68d160b85837dfe9a8 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

…nline/offline To expose resctrl resources that contain CPU-less NUMA domains, resctrl needs to be told when a CPU-less NUMA domain comes online. This can't be done with the cpuhp callbacks. Add a memory hotplug notifier, and use this to create and destroy resctrl domains. Signed-off-by: James Morse <[email protected]> (cherry picked from commit caf4034229d8df2c306658c2ddbe3c1ab73df109 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

…UMA nid as MB domain-id Enable resctrl's use of NUMA nid as the domain-id for the MB resource. Changing this state involves changing the IDs of all the domains visible to resctrl. Writing to this list means preventing CPU and memory hotplug. Signed-off-by: James Morse <[email protected]> (cherry picked from commit a795ac909c6c050daaf095abc9043217ddf5e746 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git) Signed-off-by: Fenghua Yu <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/2122432 Modified for latest MPAM. Signed-off-by: Brad Figg <[email protected]> Signed-off-by: Koba Ko <[email protected]> Signed-off-by: Fenghua Yu <[email protected]> (forward ported from commit 77bd02c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-6.14-next) [fenghuay: change 6.14 path to 6.17] Signed-off-by: Fenghua Yu <[email protected]> Acked-by: Matt Ochs <[email protected]> Acked-by: Carol L Soto <[email protected]> Acked-by: Jacob Martin <[email protected]> Acked-by: Abdur Rahman <[email protected]> Acked-by: Koba Ko <[email protected]>

Define the missing SHIFT definitions to fix build errors. Fixes: a76ea20 ("NVIDIA: SAUCE: arm_mpam: Add quirk framework") Signed-off-by: Fenghua Yu <[email protected]>

partid is from 0 to partid_max, inclusively. partid_max + 1 is out of valid partid range. Accessing partid_max + 1 will generate error interrupt and cause MPAM disabled. Signed-off-by: Fenghua Yu <[email protected]>

…ed in mbm_event mode The following NULL pointer dereference is encountered on mount of resctrl fs after booting a system that supports assignable counters with the "rdt=!mbmtotal,!mbmlocal" kernel parameters: BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:mbm_cntr_get Call Trace: rdtgroup_assign_cntr_event rdtgroup_assign_cntrs rdt_get_tree Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features and the MBM events they represent. This results in the per-domain MBM event related data structures to not be allocated during early initialization. resctrl fs initialization follows by implicitly enabling both MBM total and local events on a system that supports assignable counters (mbm_event mode), but this enabling occurs after the per-domain data structures have been created. After booting, resctrl fs assumes that an enabled event can access all its state. This results in NULL pointer dereference when resctrl attempts to access the un-allocated structures of an enabled event. Remove the late MBM event enabling from resctrl fs. This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter (mbm_event) mode is enabled without any events to support. Switching between the "default" and "mbm_event" mode without any events is not practical. Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL or X86_FEATURE_CQM_MBM_LOCAL. This ensures all needed MBM related data structures are created before use and that it is only possible to switch between "default" and "mbm_event" mode when the same events are available in both modes. This dependency does not exist in the hardware but this usage of these feature settings work for known systems. [ bp: Massage commit message. ] Fixes: 1339086 ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details") Co-developed-by: Reinette Chatre <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Babu Moger <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com (cherry picked from commit 19de711) Signed-off-by: Tushar Dave <[email protected]>

Add 'CONFIG_ARM64_MPAM_RESCTRL_FS' to annotations. No code yet exits for 'CONFIG_CGROUP_RESCTRL' and 'CONFIG_RESCTRL_PMU', remove them from annotations. Signed-off-by: Tushar Dave <[email protected]>

KUNIT_CASE_PARAM macro's parameter generator function expects signature 'const void* gen_params(const void *prev, char *desc)' but function test_all_bwa_wd_gen_params() has wrong signature, causing compilation failure. Signed-off-by: Tushar Dave <[email protected]>

…it alignment fault KUnit builds pack struct mpam_props, which can misalign its DECLARE_BITMAP (features). On arm64, bitops perform unsigned long accesses that fault on misaligned addresses, causing mpam_resctrl KUnit tests to abort (EC=0x25 DABT, FSC=0x21 alignment fault). Keep the struct packed (to preserve padding-sanitization intent) but force its alignment to __alignof__(unsigned long) so bitmap operations are naturally aligned. No functional change outside tests. Signed-off-by: Tushar Dave <[email protected]>

…ytes of mpam_props Aligning struct mpam_props introduces potential tail padding beyond the last field. The test previously used memcmp over the entire struct, which now fails due to padding differences rather than content. Compare only up to the last meaningful field (via offsetof + sizeof) to avoid false negatives. No behavioral change to driver logic. Signed-off-by: Tushar Dave <[email protected]>

jamieNguyenNVIDIA · 2025-12-16T02:58:13Z

Thanks, Tushar!

Acked-by: Jamie Nguyen <[email protected]>

nvmochs

Verified the base branch is the same as PR 230 and that the requested fix commit that was added matches the upstream source.

The annotations and kunit fixes LGTM.

Acked-by: Matthew R. Ochs <[email protected]>

clsotog

Acked-by: Carol L Soto <[email protected]>

abhsahu and others added 30 commits November 14, 2025 21:05

James Morse and others added 23 commits November 20, 2025 12:43

NVIDIA: SAUCE: arm_mpam: Fix missing SHIFT definitions

b082ef8

Define the missing SHIFT definitions to fix build errors. Fixes: a76ea20 ("NVIDIA: SAUCE: arm_mpam: Add quirk framework") Signed-off-by: Fenghua Yu <[email protected]>

NVIDIA: SAUCE: Fix partid_max range issue

6b47273

partid is from 0 to partid_max, inclusively. partid_max + 1 is out of valid partid range. Accessing partid_max + 1 will generate error interrupt and cause MPAM disabled. Signed-off-by: Fenghua Yu <[email protected]>

NVIDIA: SAUCE: [Config] Update RESCTRL annotations

f3d59ca

Add 'CONFIG_ARM64_MPAM_RESCTRL_FS' to annotations. No code yet exits for 'CONFIG_CGROUP_RESCTRL' and 'CONFIG_RESCTRL_PMU', remove them from annotations. Signed-off-by: Tushar Dave <[email protected]>

nvmochs self-requested a review December 16, 2025 03:24

nvmochs approved these changes Dec 16, 2025

View reviewed changes

clsotog self-requested a review December 16, 2025 16:53

clsotog approved these changes Dec 16, 2025

View reviewed changes

nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.17-next branch 2 times, most recently from c7fca69 to 6a9a932 Compare December 18, 2025 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

24.04 linux nvidia 6.17 next.mpam.extras #265

24.04 linux nvidia 6.17 next.mpam.extras #265

Uh oh!

tdavenvidia commented Dec 16, 2025

Uh oh!

jamieNguyenNVIDIA commented Dec 16, 2025

Uh oh!

nvmochs left a comment

Uh oh!

clsotog left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

24.04 linux nvidia 6.17 next.mpam.extras #265

Are you sure you want to change the base?

24.04 linux nvidia 6.17 next.mpam.extras #265

Uh oh!

Conversation

tdavenvidia commented Dec 16, 2025

Uh oh!

jamieNguyenNVIDIA commented Dec 16, 2025

Uh oh!

nvmochs left a comment

Choose a reason for hiding this comment

Uh oh!

clsotog left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants