Skip to content

Conversation

cxdong
Copy link
Contributor

@cxdong cxdong commented May 16, 2025

This serial is to add FPU isolation for the protected VM. The pkvm hypervisor will restore pVM's FPU registers before entering the guest, and save pVM's FPU registers after exiting the guest. Before returning to the host, the FPU registers will be restored to the initial state. The host reponses to save/restore its own FPU registers.

For the non-protected VM, the host does the FPU switching between the npVM and the host.

cxdong added 15 commits May 16, 2025 08:37
The vcpu_after_set_cpuid PV interface will add a new cpuid entries to
the pkvm hypervisor if it is allowed. In this case, the previous cpuid
entries memory physical addresses will be returned back to the host. And
such memory is freed with the size of the new cpuid entries, which is
incorrect. To resolve this, the pkvm hypervisor returns such memory
physical addresses via the pkvm memcache format so that the size will
also be returned to the host.

Fixes: 241210f ("pkvm: x86: Add vcpu_after_set_cpuid PV interface")
Signed-off-by: Chuanxiao Dong <[email protected]>
To support the pkvm hypervisor doing the FPU isolation for the pVM, add
the fpu_kernel_cfg and fpu_user_cfg symbols to the pkvm hypervisor which
will be used by the pkvm hypervisor to initialize the guest fpstate. The
fpu configs are initialized according to the corresponding host symbols
before the deprivilege the host.

Signed-off-by: Chuanxiao Dong <[email protected]>
Setup the xstate offset and size in the pkvm hypervisor. These could be
used by the pkvm hypervisor to locate the FPU register offset in the
fpstate regs memory.

Signed-off-by: Chuanxiao Dong <[email protected]>
Add a percpu fpu to facilitate the FPU isolation. The percpu fpu is
implemented by leveraging the pcpu_hot->current_task->thread.fpu and
initialized with the initial fpstate, which is used to restore the FPU
to the initial state to isolation the FPU from the host for the pVM. The
permission is set to the maximum features to allow the guest to use all
the supported features.

Signed-off-by: Chuanxiao Dong <[email protected]>
The pkvm hypervisor needs to save/restore the FPU registers to/from the
memory for the guest to support the FPU isolation. The fpstate memory
is allocated by the host and send to the pkvm hypervisor via the
vcpu_create PV interfaces, which will be donated to pkvm hypervisor for
protection. It will be undoated when destroy the VM.

For the pVM, as the fpstate memory will be used by the pkvm hypervisor
to save/restore the pVM's FPU registers for the isolation purpose, its
size is counted according to the guest fpu allocated by the host.

For the npVM, this fpstate memory doesn't have to be used in the same as
no FPU isolation required. Instead, the fpstate memory owned by the host
can be used to save/restore the FPU registers for the npVM. So the size
for this fpstate memory is just size of the structure fpstate except for
the regs, which is large enough for the pkvm hypervisor to emulate the
IA32_XFD msr.

Signed-off-by: Chuanxiao Dong <[email protected]>
Initialize the guest fpu and mark it as not in use. The fpstate regs
memory is initialized according to the supported fpu features. The
xfeatures is set to the permitted default xfeatures.

Signed-off-by: Chuanxiao Dong <[email protected]>
With expoing the FPU dynamica feature via the cpuid, the fpstate
allocated when creating the vcpu may not be sufficient for the guest.
As the pVM's FPU state is managed by the pkvm hypervisor while the
npVM's FPU state is managed by the host, re-allocating the fpstate is
only necessary for the pVM, and should be done before adding the new
cpuid entries to the pkvm hypervisor. This is only permitted before the
pVM has started running. The old fpstate physical addresses are returned
back to the host via the pkvm memcache.

Signed-off-by: Chuanxiao Dong <[email protected]>
Enable the guest xfd features if the vcpuid expose the feature to the
guest. The fpstate will be reallocated for the pkvm hypervisor before
the cpuid entries are ready. So enabling the xfd feature for the guest
is just to update the xfeatures and xfd fields accordingly if the
fpstate size is sufficient.

Signed-off-by: Chuanxiao Dong <[email protected]>
Add the fpu_swap_kvm_fpstate() for the pkvm hypervisor to swap the FPU
registers to guarantee the FPU isolation. The implementation is ported
from the linux kernel source code with minor modifications that, only
saving/restoring the FPU registers for the pVM, and wipe the FPU
registers by restoring with the initial FPU statue before returning to
the host. For the npVM, no FPU registers swapping but only updates the
percpu fpu to indicate the current active fpstate.

Signed-off-by: Chuanxiao Dong <[email protected]>
To support the FPU isolation for the pVM, mark the pVM's fpstate as
confidential and do the FPU registers swapping around the vcpu run loop.

Signed-off-by: Chuanxiao Dong <[email protected]>
Separate the vcpu run loop core to a dedicated function. No function
change.

Signed-off-by: Chuanxiao Dong <[email protected]>
With the FPU swapping support in the pkvm hypervisor, the XFD and
XFD_ERR MSR can be handled inside the pkvm hypervisor. Add code to do
the XFD/XFD_ERR MSR switching, and handle the #NM exception inside the
pkvm hypervisor.

Signed-off-by: Chuanxiao Dong <[email protected]>
The host_pkru should be saved when load a vcpu so that the pkvm
hypervisor can do the xsave state switching for running a guest.

Fixes: 3729703 ("pkvm: vmx: Implement vcpu_run")
Signed-off-by: Chuanxiao Dong <[email protected]>
Doing vcpu reset requires to clear the BNDREGS/BNDCSR components in the
fpstate. Port the code fpstate_clear_xstate_component() from the linux
kernel to locate the position in the fpstate and do the memory clearing.

Signed-off-by: Chuanxiao Dong <[email protected]>
vineethrp pushed a commit to vineethrp/pKVM-IA that referenced this pull request Oct 1, 2025
…er dereference

commit 1bb3363da862e0464ec050eea2fb5472a36ad86b upstream.

A malicious HID device with quirk APPLE_MAGIC_BACKLIGHT can trigger a NULL
pointer dereference whilst the power feature-report is toggled and sent to
the device in apple_magic_backlight_report_set(). The power feature-report
is expected to have two data fields, but if the descriptor declares one
field then accessing field[1] and dereferencing it in
apple_magic_backlight_report_set() becomes invalid
since field[1] will be NULL.

An example of a minimal descriptor which can cause the crash is something
like the following where the report with ID 3 (power report) only
references a single 1-byte field. When hid core parses the descriptor it
will encounter the final feature tag, allocate a hid_report (all members
of field[] will be zeroed out), create field structure and populate it,
increasing the maxfield to 1. The subsequent field[1] access and
dereference causes the crash.

  Usage Page (Vendor Defined 0xFF00)
  Usage (0x0F)
  Collection (Application)
    Report ID (1)
    Usage (0x01)
    Logical Minimum (0)
    Logical Maximum (255)
    Report Size (8)
    Report Count (1)
    Feature (Data,Var,Abs)

    Usage (0x02)
    Logical Maximum (32767)
    Report Size (16)
    Report Count (1)
    Feature (Data,Var,Abs)

    Report ID (3)
    Usage (0x03)
    Logical Minimum (0)
    Logical Maximum (1)
    Report Size (8)
    Report Count (1)
    Feature (Data,Var,Abs)
  End Collection

Here we see the KASAN splat when the kernel dereferences the
NULL pointer and crashes:

  [   15.164723] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN NOPTI
  [   15.165691] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
  [   15.165691] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.15.0 intel-staging#31 PREEMPT(voluntary)
  [   15.165691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
  [   15.165691] RIP: 0010:apple_magic_backlight_report_set+0xbf/0x210
  [   15.165691] Call Trace:
  [   15.165691]  <TASK>
  [   15.165691]  apple_probe+0x571/0xa20
  [   15.165691]  hid_device_probe+0x2e2/0x6f0
  [   15.165691]  really_probe+0x1ca/0x5c0
  [   15.165691]  __driver_probe_device+0x24f/0x310
  [   15.165691]  driver_probe_device+0x4a/0xd0
  [   15.165691]  __device_attach_driver+0x169/0x220
  [   15.165691]  bus_for_each_drv+0x118/0x1b0
  [   15.165691]  __device_attach+0x1d5/0x380
  [   15.165691]  device_initial_probe+0x12/0x20
  [   15.165691]  bus_probe_device+0x13d/0x180
  [   15.165691]  device_add+0xd87/0x1510
  [...]

To fix this issue we should validate the number of fields that the
backlight and power reports have and if they do not have the required
number of fields then bail.

Fixes: 394ba61 ("HID: apple: Add support for magic keyboard backlight on T2 Macs")
Cc: [email protected]
Signed-off-by: Qasim Ijaz <[email protected]>
Reviewed-by: Orlando Chamberlain <[email protected]>
Tested-by: Aditya Garg <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant