Skip to content

Conversation

@tdavenvidia
Copy link
Collaborator

This PR is based on the @fyu1''s PR #230.
In addition to PR 230 , this PR add following patches:

MPAM KUnit fixes:

  1. [NVIDIA: SAUCE: resctrl/tests: mpam_devices: compare only meaningful bytes of mpam_props])
  2. [NVIDIA: SAUCE: resctrl/mpam: Align packed mpam_props to fix arm64 KUnit alignment fault]
  3. [NVIDIA: SAUCE: arm_mpam: resctrl: Fix MPAM kunit]

Annotations change:
4. [NVIDIA: SAUCE: [Config] Update RESCTRL annotations]

Missing patch:
5. [x86,fs/resctrl: Fix NULL pointer dereference with events force-disabled in mbm_event mode]

abhsahu and others added 30 commits November 14, 2025 21:05
BugLink: https://bugs.launchpad.net/bugs/2114230

Please refer
https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md
for details regarding FFA device details for secure
EC services communication.

1. We need to get virtual IDs which a EC service supports.
   In the FFA node, the _DSD object contains this information.
   If we look the sample from above document,

  Name(_DSD, Package() {
      ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"), //Device Prop UUID
      Package() {
        Package(2) {
          "arm-arml0002-ffa-ntf-bind",
          Package() {
              1, // Revision
              2, // Count of following packages
              Package () {
                     ToUUID("330c1273-fde5-4757-9819-5b6539037502"), // Service1 UUID
                     Package () {
                          0x01,     //Cookie1 (UINT32)
                          0x07,     //Cookie2
                      }
              },
              Package () {
                     ToUUID("b510b3a3-59f6-4054-ba7a-ff2eb1eac765"), // Service2 UUID
                     Package () {
                          0x01,     //Cookie1
                          0x03,     //Cookie2
                      }
             }
         }
      }
    }
  }) // _DSD()

  Then it uses a nexted package structure.
  nvidia_ffa_fill_notification_map() added in this commit parses the _DSD
  object and fill the notification id map for that service.

2. Once the virtual ID is get then it needs to map to
   physical ID by invoking function 1 in the notify service.

3. The UUID for notification service is
   B510B3A3-59F6-4054-BA7A-FF2EB1EAC765.
   An FFA device will be created for this notification service
   by ffa_module. This notify service needs to be probed first.
   To make that happen, a separate ffa_driver instance is created
   and it is getting registered first.

4. We can do 1:1 mapping between virtual ID and hardware ID.

5. We need to invoke notify_request() with hardware notification ID.
   It registers callback function for notification.

6. Once notification comes then we need to evaluate _DSM method
   with virtual ID (which will be mapped same as hardware ID).

7. The function 2 in the notify service should destroy the mapping.
   But it is nither implemented in the firmware not its documentation
   is available. A TODO comment is added in
   nvidia_ffa_notification_destroy().

   Also, if we unload and reload the modules, the existing mapping
   still exists. In nvidia_ffa_notification_setup(), ignore the error
   for this case. When firmware is updated, then the error will be
   returned.

8. The notification service FFA device is needed by each EC secure
   services FFA device to get virtual notification list. Now following
   device dependency chain is created.

    FFA device <-  notification service FFA device <- EC secure services FFA device

    To satisfy this, call driver registration in its dependent driver probe routine.
    Similarly, do the driver registration in its dependent driver removed routine.

Signed-off-by: Abhishek Sahu <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 1287a1d noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
… EC driver

BugLink: https://bugs.launchpad.net/bugs/2114230

The NVIDIA FFA and EC secure services driver enables the communication
with EC (Embedded Controller). Make this driver built-in to enable EC
communication at early boot.

Signed-off-by: Abhishek Sahu <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>
(cherry picked from commit 9ea0251)

(cherry picked from commit 9ea0251 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2114759

Add quirk function to skip pcie secondary bus reset. PCIe gen4 link
will downgrade to gen1 after SBR, so we have to skip this operation.

Signed-off-by: Jerry.Guo <[email protected]>
Signed-off-by: Yenchia Chen <[email protected]>
Signed-off-by: Abhishek Sahu <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 0185574 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
…pinctrl driver

BugLink: https://bugs.launchpad.net/bugs/2117784

Kernel GPIO subsystem mapping hardware pin number to a different
range of gpio number. Add gpio-range structure to hold
the mapped gpio range in pinctrl driver. That enables the kernel
to search a range of mapped gpio range against a pinctrl device.

Signed-off-by: Jonas Chen <[email protected]>
Signed-off-by: Yenchia Chen <[email protected]>
Signed-off-by: Abhishek Sahu <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
Acked-by: Acked-by: nvmochs
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 1049985 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2117784

Add acpi support in the shared part of pinctrl driver. Parsing
hardware base addresses and irq naumber to initialize eint
accroding to the acpi table data.

Signed-off-by: Jonas Chen <[email protected]>
Signed-off-by: Yenchia Chen <[email protected]>
Signed-off-by: Abhishek Sahu <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
Acked-by: Acked-by: nvmochs
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>
(backported from commit cdce65d noble:linux-nvidia-6.14)
[maskedarray: context adjusted due to commit 86dee87: "pinctrl:
mediatek: Fix the invalid conditions"]
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2117784

Add mt8901 pinctrl, gpio and eint driver implementation.

Signed-off-by: Jonas Chen <[email protected]>
Signed-off-by: Yenchia Chen <[email protected]>
Signed-off-by: Abhishek Sahu <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
Acked-by: Acked-by: nvmochs
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>
(backported from commit 1fc7a58 noble:linux-nvidia-6.14)
[maskedarray: context adjusted for missing commit a3fe132: "pinctrl:
mediatek: Add pinctrl driver for mt8189"]
Signed-off-by: Abdur Rahman <[email protected]>
…CTRL_MT8901

BugLink: https://bugs.launchpad.net/bugs/2117784

Signed-off-by: Abhishek Sahu <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
Acked-by: Acked-by: nvmochs
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>
(cherry picked from commit 0bd85d0)

(cherry picked from commit 0bd85d0 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2118357

commit d0038ee ("NVIDIA: SAUCE: Add support for EC
secure service communication") added nvidia_ffh_handler()
function. While copying the data back into ACPI FFH packet,
it uses the request length. The response data can be larger
than request length. The response length can't be fetched in the
linux FFH handler function. We can copy all the bytes from
ffa_data.data. The ACPI AML code will only use the required
number bytes from this.

Normally we don't need response length to be known.
The ACPI table are not using that. It is parsing response
data directly. In the latest revision of spec, the length
field itself has been removed

https://github.com/OpenDevicePartnership/documentation/blob/b23acb09f7cf03a5c3167509533f396d547e6291/guide_book/src/specs/ec_interface/secure-ec-services-overview.md#operation-region-definition

For DIGITS GB10, it is using older revision of spec and the launch is
planned with older revision of spec. When we move to latest revision,
then we need to copy all data bytes for both request and response.

The info->length is corresponding to FFH buffer length in ACPI table.
Following is the code in ACPI table

  Name (_HID, "MSFT000C")  // _HID: Hardware ID
  OperationRegion (AFFH, FFixedHW, 0x04, 0x90)

info->length will be 0x90 (144) bytes.
ffa_packet->length in the older revision is valid data bytes
(https://github.com/OpenDevicePartnership/documentation/blob/45ad9b30be0f40e229deed2fef7a60d0b0b591f5/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md)

struct nvidia_ec_ffa_packet *ffa_packet = (struct nvidia_ec_ffa_packet *)value;

This value buffer length should be info->length.
We are taking minimum of sizeof(ffa_data.data) = 112 and
(info->length = 144) - (offsetof(struct nvidia_ec_ffa_packet, rawdata) = 18) = 126,
so ffh_copy_len will be 112 for the current DIGITS ACPI implementation.

In the latest revision, this length mismatch is also fixed. Raw data will
start at offset 32, so there both will come as 112.

Fixes: d0038ee ("NVIDIA: SAUCE: Add support for EC secure service communication")
Signed-off-by: Abhishek Sahu <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 141bd56 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2118663

Add cpu part and model macro definitions for NVIDIA Olympus core.

Signed-off-by: Shanker Donthineni <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 9273361 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2118663

Set CONFIG_ARM64_BRBE=y for arm64 linux-nvidia-6.14.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 26a417a noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

The kernel MM currently does not handle ECC errors / poison on a memory
region that is not backed by struct pages. If a memory region is mapped
using remap_pfn_range(), but not added to the kernel, MM will not have
associated struct pages. Add a new mechanism to handle memory failure
on such memory.

Make kernel MM expose a function to allow modules managing the device
memory to register a failure function and the physical address space
associated with the device memory. MM maintains this information as
interval tree. The registered memory failure function is used by MM to
notify the kernel module managing the PFN, so that the module may take
any required action. The module for example may use the information
to track the poisoned pages.

In this implementation, kernel MM follows the following sequence similar
(mostly) to the memory_failure() handler for struct page backed memory:
1. memory_failure() is triggered on reception of a poison error. An
absence of struct page is detected and consequently memory_failure_pfn()
is executed.
2. memory_failure_pfn() call the newly introduced failure handler exposed
by the module managing the poisoned memory to notify it of the problematic
PFN.
3. memory_failure_pfn() unmaps the stage-2 mapping to the PFN.
4. memory_failure_pfn() collects the processes mapped to the PFN.
5. memory_failure_pfn() sends SIGBUS (BUS_MCEERR_AO) to all the processes
mapping the faulty PFN using kill_procs().
6. An access to the faulty PFN by an operation in VM at a later point
is trapped and user_mem_abort() is called.
7. The vma ops fault function gets called due to the absence of Stage-2
mapping. It is expected to return VM_FAULT_HWPOISON on the PFN.
8. __gfn_to_pfn_memslot() then returns KVM_PFN_ERR_HWPOISON, which cause
the poison with SIGBUS (BUS_MCEERR_AR) to be sent to the QEMU process
through kvm_send_hwpoison_signal().

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(backported from commit f037dd7 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
(koba: Add a pgoff parameter to __add_to_kill)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(backported from commit 4bb248a https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
[Nirmoy: s/folio_shift(page_folio(p))/page_shift(compound_head(p)), add
missing arg in page_address_in_vma()]
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>
(backported from commit a3fe67d)
[maskedarray: context conflict due to upstream commit: c1f1fda "ACPI:
APEI: handle synchronous exceptions in task work". Adjusted context]
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2125434

Correct the logic used to identify the absence of struct page during
memory_failure().

Fixes: a3fe67d ("NVIDIA: SAUCE: mm: handle poisoning of pfn without struct pages")
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Jamie Nguyen <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Acked-by: Abdur Rahman <[email protected]>
Signed-off--by: Brad Figg <[email protected]>
(backported from 94017e2
noble/nvidia-6.14-next)
[maskedarray: removed pfn_t.h header file from mm/memory-failure.c
as this was no longer needed and removed upstream]
Signed-off-by: Abdur Rahman <[email protected]>
…apped pfn

BugLink: https://bugs.launchpad.net/bugs/2119656

The fixup_user_fault() currently does not expect a VM_FAULT_HWPOISON
and hence does not check for it while calling vm_fault_to_errno(). Since
we now have a new code path which can trigger such case, change
fixup_user_fault to look for VM_FAULT_HWPOISON.

Also make hva_to_pfn_remapped check for -EHWPOISON and communicate the
poison fault up to the user_mem_abort().

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 3e895d5 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(backported from commit 2c0d6cc https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
[Nirmoy: fix few offset shifts, adopt to b176f4b]
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit f4b8c6c noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

The GHES code allows calling of memory_failure() on the PFNs that pass the
pfn_valid() check. This contract is broken for the remapped PFNs which
fails the check and ghes_do_memory_failure() returns without triggering
memory_failure().

Update code to allow memory_failure() call on PFNs failing pfn_valid().

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit cbcf5ec https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 7e95d6c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>
(backported from commit 388d89b
nvidia-6.14)
[maskedarray: adjusted conflict due to upstream commit c1f1fda: "ACPI: APEI: handle synchronous exceptions in task work"]
Signed-off-by: Abdur Rahman <[email protected]>
…ndling

BugLink: https://bugs.launchpad.net/bugs/2119656

The nvgrace-gpu-vfio-pci module [1] maps the device memory to the user VA
(Qemu) using remap_pfn_range() without adding the memory to the kernel.
The device memory pages are not backed by struct page. Patches 1-3
implements the mechanism to handle ECC/poison on memory page without
struct page and expose a registration function. This new mechanism is
leveraged here.

The module registers its memory region with the kernel MM for ECC handling
using the register_pfn_address_space() registration API exposed by the
kernel. It also defines a failure callback function pfn_memory_failure()
to get the poisoned PFN from the MM.

The module track poisoned PFN using a hastable. The PFN is communicated
by the kernel MM to the module through the failure function, which push
the appropriate memory offset to the hashtable.

The module also defines a VMA fault ops for the module. It returns
VM_FAULT_HWPOISON in case the memory offset is found in the hashtable.

[1] https://lore.kernel.org/all/[email protected]/

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 2fae9af https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit d9c50d2 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 33a2f83 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 9433fd4 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit a1bdf88 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 60f9b04 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 3eff6df https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 6c6e893 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit a6a3ccc noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
…IO_CONTAINER

BugLink: https://bugs.launchpad.net/bugs/2119656

CONFIG_IOMMUFD_VFIO_CONTAINER is the VFIO compatible mode provided by
iommufd core, to replace VFIO_IOMMU_TYPE1. Enable it instead.

This might be used by VFIO mdev feature.

Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 8188507 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit b0d6efb https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 9e7a939 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
…e in VMA

BugLink: https://bugs.launchpad.net/bugs/2119656

When the Grace Hopper/Blackwell system is setup with EGM mode in
virtualization, the system memory is partitioned into two: A Host
OS visible memory and a second EGM region that is not added to
the host OS. The EGM region is assigned to the VM as its system memory
with the QEMU VMA mapped through remap_pfn_range.

Currently KVM sets up the stage-2 mapping for memory that is not
added to the kernel with device properties. It thus does not allow
support for execution fault on such region. Since the EGM memory is
mapped through remap_pfn_range and not added to the kernel, such
memory is set without execution fault support.

This patch intends to update the KVM behaviour. It is an extension
of the proposal [1] to make KVM determine whether a region should have
NORMAL memory properties based on the VMA pgprot. The KVM behavior is
changed to set a region with support of executable fault if and only
if its VMA is mapped cacheable.

The EGM memory is NORMAL system memory that is not added to the
kernel. It is safe in terms of execution fault and is expected to
display all properties of NORMAL memory. The patch enables this
use case.

Check QEMU VMA pgprot to check if it is mapped as Normal cacheable
memory and allow exec fault.

Link:
https://lore.kernel.org/lkml/[email protected] [1]

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit e38eceb https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(backported from commit b6bd6da https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
[Nirmoy: s/device/s2_force_noncacheable, s/mapping_type()/FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(page_prot))]
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 21c5951 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

The Extended GPU Memory (EGM) feature enables the GPU access to
the system memory across sockets and nodes. In this mode, the
physical memory can be allocated for GPU usage from anywhere
in a multi-node system. The feature is being extended to
virtualization.

EGM when enabled in the virtualization stack, the host memory
is partitioned into 2: One partition for the Host OS usage, and
a second EGM region. The EGM region essentially becomes the
system memory of the VM. The following figure shows the memory map
in the virtualization environment.

|---- Sysmem ----|                  |--- GPU mem ---|  VM Memory Map
|                |                  |               |
|                |                  |               |
|------ EGM -----|--Host Mem----|   |--- GPU mem ---|  Host Memory Map

The EGM region is not available to the host memory for its usage as it
is not added to the kernel. Its base HPA and the length is communicated
through the DSDT entries. A linear mapping between the VM IPA and system
HPA is a requirement for EGM support. The EGM region is thus assigned to
a VM by mapping the QEMU VMA to a linearly increasing HPA of the EGM
region using remap_pfn_range().

Introduce a new nvgrace-egm helper module to nvgrace-gpu to manage the
EGM/VM region for the VM.

nvgrace-egm module handles the following:
1. Fetch the EGM memory properties (base HPA, length, proximity domain).
2. Create a char device that can be used as memory-backend-file by Qemu
for the VM and implement file operations. The char device is /dev/egmX,
where X is the PXM node ID of the EGM being mapped fetched in 1.
3. Zero the EGM memory on first device open().
4. Map the QEMU VMA to the EGM region using remap_pfn_range.
5. Cleaning up state and destroying the chardev on device unbind.

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 892ac24 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 3a1b819 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 8807f4b noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

It is possible for some system memory pages on the EGM to
have uncorrectable ECC errors. A list of pages known with such
errors (referred as retired pages) are maintained by the Host
UEFI. The Host UEFI populates such list in a reserved region.
It communicates the SPA of this region through a ACPI DSDT property.

nvgrace-egm module is responsible to store the list of retired page
offsets to be made available for usermode processes. The module:
1. Get the reserved memory region SPA and maps to it to fetch
the list of bad pages.
2. Calculate the retired page offsets in the EGM and stores it.
3. Expose an ioctl to allow querying of the offsets.

The ioctl is called by usermode apps such as QEMU to get the
retired page offsets. The usermode apps are expected to take
appropriate action to communicate the list to the VM.

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit be54641 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit c4cb193 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 6b0a6d6 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
…errors handling

BugLink: https://bugs.launchpad.net/bugs/2119656

The Extended GPU Memory (EGM) is mapped through remap_pfn_range() and
is not backed by struct pages. Currently, memory_failure() on such
region is unsupported in kernel MM.

There is a proposal to handle such memory region [1]. The implementation
exports APIs to register a memory region and a corresponding callback
function with the kernel MM. On the occurrence of memory failure on the
registered region, kernel MM calls the callback to communicate the
faulting PFN.

This patch registers the EGM memory and the callback function
nvgrace_egm_pfn_memory_failure with the kernel MM. On memory failure,
nvgrace_egm_pfn_memory_failure is triggered and the nvgrace-egm module
adds the faulting PFN to the hashtable tracking retired ECC error pages.

It also implements a fault VM ops to check if the access is being made
to a page known with ECC errors and returns VM_FAULT_HWPOISON in such
case.

Link: https://lore.kernel.org/all/[email protected]/ [1]

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(backported from commit 215f345 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
(koba: vmalloc.h exists)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 4eba6e1 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit bd280a2 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 5bb23c1 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 7d2ea55 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 077c834 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

nvgrace-egm exposes the API register_egm_node & unregister_egm_node
to manage EGM (Extended GPU Memory) present on the system.

To allow out-of-tree driver such as nvidia-vgpu-vfio make use of them,
move the declaration to a new nvgrace-egm.h in include.

Signed-off-by: Ankit Agrawal <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit bed340f https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit a961663 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 020c46c noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
…tion

BugLink: https://bugs.launchpad.net/bugs/2119656

Free the kmalloc'd region when the EGM is unregistered.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit fc592b9 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit f24760c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 374b166 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

Move region hash initiaization alongside the other region initialization
statements to avoid situations where the hash table was not properly
initialized.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 8021c1d https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit e1264a6 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 0f8a098 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
…rrors

BugLink: https://bugs.launchpad.net/bugs/2119656

Update error handling within EGM regiration routine to catch and
return errors to the caller.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit a57210c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit a706ff8 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit edc0ac0 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

Detect and handle a failure from the EGM registration service.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit f18eee3 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit 8371b68 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit be5ae8f noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

Fix source to resolve checkpatch warnings

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit c7b47b7 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit dfa0e06 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit 0c2fbf6 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2119656

Fix minor syntax errors from sparse.

Signed-off-by: Matthew R. Ochs <[email protected]>
Acked-by: Kai-Heng Feng <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Acked-by: Koba Ko <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit bbb64e6 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <[email protected]>
Acked-by: Matthew R. Ochs <[email protected]>
Acked-by: Carol L. Soto <[email protected]>
Signed-off-by: Matthew R. Ochs <[email protected]>
(cherry picked from commit fe78194 https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-adv-6.11-next)
Signed-off-by: Nirmoy Das <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Noah Wager <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Signed-off--by: Brad Figg <[email protected]>

(cherry picked from commit b192960 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <[email protected]>
James Morse and others added 23 commits November 20, 2025 12:43
Resctrl previously had a 'range' schema format that took some kind of
number. This has since been split into percentage, MB/s and an AMD
platform specific scheme.
As range is no longer used, remove it.
The last user is mba_sc which should be described as taking MB/s.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit 93fda1d6632174fefddfe5e712110dd1e2947c95 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
…tmap controls

MPAM has cache capacity controls that effectively take a percentage.
Resctrl supports percentages, but the collection of files that are
exposed to describe this control belong to the MB resource.
To find the minimum granularity of the percentage cache capacity controls,
user-space is expected to rad the banwdidth_gran file, and know this has
nothing to do with bandwidth.
The only problem here is the name of the file. Add duplicates of these
properties with percentage and bitmap in the name. These will be exposed
based on the schema format.
The existing files must remain tied to the specific resources so that
they remain visible to user-space. Using the same helpers ensures the
values will always be the same regardless of the file used.
These files are not exposed until the new RFTYPE schema flags are
set on a resource 'fflags'.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit 673bcb00d2371a2876e164da55d642fdf7657b8d https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
…n schema format

MPAM has cache capacity controls that effectively take a percentage.
Resctrl supports percentages, but the collection of files that are
exposed to describe this control belong to the MB resource. New files
have been added that are selected based on the schema format.
Apply the flags to enable these files based on the schema format.
Add a new fflags_from_schema() that is used for controls.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit a837ccc258380d6aeef86df709cc0484b60a4acf https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
If more schemas are added to resctrl, user-space needs to know how to
configure them. To allow user-space to configure schema it doesn't know
about, it would be helpful to tell user-space the format, e.g. percentage.
Add a file under info that describes the schema format.
Percentages and 'mbps' are implicitly decimal, bitmaps are expected to be
in hex.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit b457019d995b2849e683aef0fd89066e64c679a4 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
MPAM can have both cache portion and cache capacity controls on any cache
that supports MPAM. Cache portion bitmaps can be exposed via resctrl if
they are implemented on L2 or L3.
The cache capacity controls can not be used to isolate portions, which is
in implicit in the L2 or L3 bitmap provided by user-space. These controls
need to be configured with something more like a percentage.
Add the resource enum entries for these two resources. No additional
resctrl code is needed because the architecture code will specify this
resource takes a 'percentage', re-using the support previously used only
for the MB resource.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit b601bbf375b016c417db4ec0e8bd6ae58b9057aa https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
…m cmax

MPAM's maximum cache-capacity controls take a fixed point fraction format.
Instead of dumping this on user-space, convert it to a percentage.
User-space using resctrl already knows how to handle percentages.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit 183d4c43260089e6b51518e50427d0f04a6af875 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
The cpu hotplug lock has a helper lockdep_assert_cpus_held() that makes it
easy to annotate functions that must be called with the cpu hotplug lock
held.
Do the same for memory.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit f40d4b8451b3d9e197166ff33104bd63f93709d0 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
…PU hotplug lock

resctrl takes the read side CPU hotplug lock whenever it is working
with the list of domains. This prevents a CPU being brought online
and the list being modified while resctrl is walking the list, or
picking CPUs from the CPU masks.
If resctrl domains for CPU-less NUMA nodes are to be supported, this
would not be enough to prevent the domain list form being modified as
a NUMA node can come online with only memory.
Take the memory hotplug lock whenever the CPU hotplug lock is taken.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit f5a082989a5f40b9b95515d68b230f8125648fdb https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
…arch stubs

Resctrl expects the domain IDs for the 'MB' resource to be the
corresponding L3 cache-ids.
This is a problem for platforms where the memory bandwidth controls
are implemented somewhere other than the L3 cache, and exist on a
platform with CPU-less NUMA nodes.
Such platforms can't currently be exposed via resctrl as not all
the memory bandwidth can be controlled.
Add a mount option to allow user-space to opt-in to the domain IDs
for the MB resource to be the NUMA nid instead.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit ae8929caac02dccdc932666c1d8c906dda541bf1 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
idx is not used. Remove it to avoid build warning.

The author is James but he doesn't add his Signed-off-by.

(backported from commit c9b4fabe0b1b4805186d4326d47547993a02d191 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
[fenghuay: Change subject to a meaningfull one. Add commit message.]
Signed-off-by: Fenghua Yu <[email protected]>
…stead of cache-id

The MB domain ids are the L3 cache-id. This is unfortunate if the
memory bandwidth controls are implemented for CPU-less NUMA nodes as
there is no L3 whose cache-id can be used to expose these controls
to resctrl.
When picking the class to use as MB, note whether it is possible
for the NUMA nid to be used as the domain-id. By default the MB
resource will use the cache-id.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit c2506e7fdb9e9de624af635f5060a1fe56a6bb80 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
… work with a set of CPUs

mpam_resctrl_offline_domain_hdr() expects to take a single CPU that is
going offline. Once all CPUs are offline, the domain header is removed
from its parent list, and the structure can be freed.
This doesn't work for NUMA nodes.
Change the CPU passed to mpam_resctrl_offline_domain_hdr() and
mpam_resctrl_domain_hdr_init to be a cpumask. This allows a single CPU
to be passed for CPUs going offline, and cpu_possible_mask to be passed
for a NUMA node going offline.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit 093483e5bca0aef546208b32eedf59f3aac665ff https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
…domain() to have CPU and node

mpam_resctrl_alloc_domain() brings a domain with CPUs online. To allow
for domains that don't have any CPUs, split it into a CPU and NUMA node
version.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit 817d04bd296871b61dd70f68d160b85837dfe9a8 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
…nline/offline

To expose resctrl resources that contain CPU-less NUMA domains, resctrl
needs to be told when a CPU-less NUMA domain comes online. This can't
be done with the cpuhp callbacks.
Add a memory hotplug notifier, and use this to create and destroy
resctrl domains.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit caf4034229d8df2c306658c2ddbe3c1ab73df109 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
…UMA nid as MB domain-id

Enable resctrl's use of NUMA nid as the domain-id for the MB resource.
Changing this state involves changing the IDs of all the domains
visible to resctrl. Writing to this list means preventing CPU and memory
hotplug.

Signed-off-by: James Morse <[email protected]>
(cherry picked from commit a795ac909c6c050daaf095abc9043217ddf5e746 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <[email protected]>
BugLink: https://bugs.launchpad.net/bugs/2122432

Modified for latest MPAM.

Signed-off-by: Brad Figg <[email protected]>
Signed-off-by: Koba Ko <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
(forward ported from commit 77bd02c https://github.com/NVIDIA/NV-Kernels/tree/24.04_linux-nvidia-6.14-next)
[fenghuay: change 6.14 path to 6.17]
Signed-off-by: Fenghua Yu <[email protected]>
Acked-by: Matt Ochs <[email protected]>
Acked-by: Carol L Soto <[email protected]>
Acked-by: Jacob Martin <[email protected]>
Acked-by: Abdur Rahman <[email protected]>
Acked-by: Koba Ko <[email protected]>
Define the missing SHIFT definitions to fix build errors.

Fixes: a76ea20 ("NVIDIA: SAUCE: arm_mpam: Add quirk framework")
Signed-off-by: Fenghua Yu <[email protected]>
partid is from 0 to partid_max, inclusively.
partid_max + 1 is out of valid partid range. Accessing partid_max + 1
will generate error interrupt and cause MPAM disabled.

Signed-off-by: Fenghua Yu <[email protected]>
…ed in mbm_event mode

The following NULL pointer dereference is encountered on mount of resctrl fs
after booting a system that supports assignable counters with the
"rdt=!mbmtotal,!mbmlocal" kernel parameters:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:mbm_cntr_get
  Call Trace:
  rdtgroup_assign_cntr_event
  rdtgroup_assign_cntrs
  rdt_get_tree

Specifying the kernel parameter "rdt=!mbmtotal,!mbmlocal" effectively disables
the legacy X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL features
and the MBM events they represent. This results in the per-domain MBM event
related data structures to not be allocated during early initialization.

resctrl fs initialization follows by implicitly enabling both MBM total and
local events on a system that supports assignable counters (mbm_event mode),
but this enabling occurs after the per-domain data structures have been
created.

After booting, resctrl fs assumes that an enabled event can access all its
state. This results in NULL pointer dereference when resctrl attempts to
access the un-allocated structures of an enabled event.

Remove the late MBM event enabling from resctrl fs.

This leaves a problem where the X86_FEATURE_CQM_MBM_TOTAL and
X86_FEATURE_CQM_MBM_LOCAL features may be disabled while assignable counter
(mbm_event) mode is enabled without any events to support. Switching between
the "default" and "mbm_event" mode without any events is not practical.

Create a dependency between the X86_FEATURE_{CQM_MBM_TOTAL,CQM_MBM_LOCAL} and
X86_FEATURE_ABMC (assignable counter) hardware features. An x86 system that
supports assignable counters now requires support of X86_FEATURE_CQM_MBM_TOTAL
or X86_FEATURE_CQM_MBM_LOCAL.

This ensures all needed MBM related data structures are created before use and
that it is only possible to switch between "default" and "mbm_event" mode when
the same events are available in both modes. This dependency does not exist in
the hardware but this usage of these feature settings work for known systems.

  [ bp: Massage commit message. ]

Fixes: 1339086 ("x86,fs/resctrl: Detect Assignable Bandwidth Monitoring feature details")
Co-developed-by: Reinette Chatre <[email protected]>
Signed-off-by: Reinette Chatre <[email protected]>
Signed-off-by: Babu Moger <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Reinette Chatre <[email protected]>
Link: https://patch.msgid.link/a62e6ac063d0693475615edd213d5be5e55443e6.1760560934.git.babu.moger@amd.com
(cherry picked from commit 19de711)
Signed-off-by: Tushar Dave <[email protected]>
Add 'CONFIG_ARM64_MPAM_RESCTRL_FS' to annotations.

No code yet exits for 'CONFIG_CGROUP_RESCTRL' and 'CONFIG_RESCTRL_PMU',
remove them from annotations.

Signed-off-by: Tushar Dave <[email protected]>
KUNIT_CASE_PARAM macro's parameter generator function expects signature
'const void* gen_params(const void *prev, char *desc)' but function
test_all_bwa_wd_gen_params() has wrong signature, causing compilation
failure.

Signed-off-by: Tushar Dave <[email protected]>
…it alignment fault

KUnit builds pack struct mpam_props, which can misalign its DECLARE_BITMAP
(features). On arm64, bitops perform unsigned long accesses that fault on
misaligned addresses, causing mpam_resctrl KUnit tests to abort
(EC=0x25 DABT, FSC=0x21 alignment fault).

Keep the struct packed (to preserve padding-sanitization intent) but force
its alignment to __alignof__(unsigned long) so bitmap operations are
naturally aligned.

No functional change outside tests.

Signed-off-by: Tushar Dave <[email protected]>
…ytes of mpam_props

Aligning struct mpam_props introduces potential tail padding beyond the
last field. The test previously used memcmp over the entire struct, which
now fails due to padding differences rather than content.

Compare only up to the last meaningful field (via offsetof + sizeof) to
avoid false negatives. No behavioral change to driver logic.

Signed-off-by: Tushar Dave <[email protected]>
@jamieNguyenNVIDIA
Copy link
Collaborator

Thanks, Tushar!

Acked-by: Jamie Nguyen <[email protected]>

@nvmochs nvmochs self-requested a review December 16, 2025 03:24
Copy link
Collaborator

@nvmochs nvmochs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the base branch is the same as PR 230 and that the requested fix commit that was added matches the upstream source.

The annotations and kunit fixes LGTM.

Acked-by: Matthew R. Ochs <[email protected]>

@clsotog clsotog self-requested a review December 16, 2025 16:53
Copy link
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <[email protected]>

@nvidia-bfigg nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.17-next branch 2 times, most recently from c7fca69 to 6a9a932 Compare December 18, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.