Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for VMGenID device on x86 microVMs #4487

Merged
merged 8 commits into from
Apr 9, 2024

Conversation

bchalios
Copy link
Contributor

@bchalios bchalios commented Mar 6, 2024

Changes

Add support for Virtual Machine Generation ID for x86 microVMs. This also adds support for the ACPI Generic Event Device which routes interrupts for ACPI devices. VMGenID provides to the guest a 16-byte cryptographically random number which changes every time the microVM resumes from a snapshot.

Reason

VMGenID is used by the Linux kernel to re-seed its internal PRNG whenever a VM resumes from a snapshot. With Linux 6.8, Linux VMGenID driver will also emit a uevent to guest user-space every time the generation ID changes (i.e. the VM resumes from a snapshot).

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@bchalios
Copy link
Contributor Author

bchalios commented Mar 6, 2024

This is still a draft, since it depends on #4428

Copy link

codecov bot commented Mar 6, 2024

Codecov Report

Attention: Patch coverage is 95.69378% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 82.14%. Comparing base (9f8b0b2) to head (90db35b).

Files Patch % Lines
src/vmm/src/devices/acpi/vmgenid.rs 94.04% 5 Missing ⚠️
src/vmm/src/device_manager/acpi.rs 96.66% 2 Missing ⚠️
src/vmm/src/device_manager/persist.rs 92.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4487      +/-   ##
==========================================
+ Coverage   82.04%   82.14%   +0.09%     
==========================================
  Files         253      255       +2     
  Lines       31072    31278     +206     
==========================================
+ Hits        25492    25692     +200     
- Misses       5580     5586       +6     
Flag Coverage Δ
4.14-c5n.metal 79.63% <95.69%> (+0.12%) ⬆️
4.14-c7g.metal ?
4.14-m5n.metal 79.62% <95.69%> (+0.12%) ⬆️
4.14-m6a.metal 78.84% <95.69%> (+0.13%) ⬆️
4.14-m6g.metal 76.69% <100.00%> (+<0.01%) ⬆️
4.14-m6i.metal 79.62% <95.69%> (+0.12%) ⬆️
4.14-m7g.metal 76.69% <100.00%> (?)
5.10-c5n.metal 82.15% <95.69%> (+0.10%) ⬆️
5.10-c7g.metal ?
5.10-m5n.metal 82.14% <95.69%> (+0.11%) ⬆️
5.10-m6a.metal 81.44% <95.69%> (+0.11%) ⬆️
5.10-m6g.metal 79.46% <100.00%> (+<0.01%) ⬆️
5.10-m6i.metal 82.13% <95.69%> (+0.11%) ⬆️
5.10-m7g.metal 79.46% <100.00%> (?)
6.1-c5n.metal 82.15% <95.69%> (+0.11%) ⬆️
6.1-m5n.metal 82.14% <95.69%> (+0.11%) ⬆️
6.1-m6a.metal 81.44% <95.69%> (+0.11%) ⬆️
6.1-m6g.metal 79.46% <100.00%> (+<0.01%) ⬆️
6.1-m6i.metal 82.13% <95.69%> (+0.11%) ⬆️
6.1-m7g.metal 79.46% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bchalios bchalios force-pushed the vmgenid_x86_64 branch 7 times, most recently from e488135 to 9f43105 Compare March 13, 2024 09:22
@bchalios bchalios force-pushed the vmgenid_x86_64 branch 2 times, most recently from 3f3190c to af19d42 Compare April 4, 2024 15:49
@bchalios bchalios marked this pull request as ready for review April 4, 2024 16:08
@bchalios bchalios self-assigned this Apr 4, 2024
@bchalios bchalios added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Apr 4, 2024
src/vmm/src/builder.rs Outdated Show resolved Hide resolved
src/vmm/src/builder.rs Outdated Show resolved Hide resolved
src/vmm/src/device_manager/acpi.rs Outdated Show resolved Hide resolved
tests/integration_tests/security/test_vulnerabilities.py Outdated Show resolved Hide resolved
docs/snapshotting/snapshot-support.md Outdated Show resolved Hide resolved
docs/snapshotting/snapshot-support.md Outdated Show resolved Hide resolved
docs/snapshotting/random-for-clones.md Show resolved Hide resolved
Copy link
Contributor

@wearyzen wearyzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still going through the code but here are some initial comments

CHANGELOG.md Show resolved Hide resolved
docs/snapshotting/random-for-clones.md Outdated Show resolved Hide resolved
docs/snapshotting/snapshot-support.md Outdated Show resolved Hide resolved
docs/snapshotting/snapshot-support.md Show resolved Hide resolved
@bchalios
Copy link
Contributor Author

bchalios commented Apr 5, 2024

Changes from v1 to v2:

  • Fixed various references in the docs that VMGenID was added in Linux 6.1 (it was actually added in 5.18).
  • MMIODevManagerConstructorArgs now keeps a reference to guest memory, so we can avoid one clone() operation.
  • Removed comment that we need to inject the VMGenID notification after calling start_vcpus, as it was wrong.
  • Added explicit comment that VMGenID in kernels before 5.18 will have no effect in the guest.
  • dropped left over enable_console() call in tests.
  • Various typo fixes in documentation

src/vmm/src/builder.rs Outdated Show resolved Hide resolved
docs/snapshotting/snapshot-support.md Outdated Show resolved Hide resolved
@bchalios bchalios force-pushed the vmgenid_x86_64 branch 3 times, most recently from 93d08c1 to 109355e Compare April 5, 2024 15:18
@bchalios
Copy link
Contributor Author

bchalios commented Apr 5, 2024

Changes from v2 to v3:

  • Re-worked the logic to create VMGenID devices. Now, the resource allocation happens inside the VmGenId "constructor". Like that, we hide the logic of allocating resources which are specific to VMGenID devices.
  • Re-worked the logic for (de)serializing the VMGenID state, by moving VMGenID specific stuff under the VMGenID logic (similar to what we do with VirtIO devices).

Copy link
Contributor

@ShadowCurse ShadowCurse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall PR seems fine to me. The only thing to make it a bit better would be to split first commit into several commits smth like:

  • update mmio_restore_args
  • add acpi_device_manager
  • add vmgenid
  • ...

src/vmm/src/lib.rs Show resolved Hide resolved
tests/framework/artifacts.py Outdated Show resolved Hide resolved
@bchalios
Copy link
Contributor Author

bchalios commented Apr 8, 2024

@ShadowCurse regarding splitting the first commit in parts, I believe that splitting out the MMIO restore argument changes it makes sense. I don't see much value in splitting out the ACPI device manager though. Without the VMGenID changes, this would be just adding an empty struct (without any implementation at all). I would rather keep them together.

@bchalios
Copy link
Contributor Author

bchalios commented Apr 8, 2024

Changes from v3 to v4:

  • Split out in its own commit the change that stores in MMIO constructor arguments a reference rather than the actual object.
  • Inlined one-liner function in tests that was only used in one place.
  • Bumped the snapshot version to 2.0.0 since VMGenID makes the snapshot format backwards incompatible with previous snapshots.

ShadowCurse
ShadowCurse previously approved these changes Apr 8, 2024
Use a reference to the GuestMemoryMmap rather than the object itself
when restoring MMIO devices. This will avoid us a few clone() operations
later on when we add support for restoring VMGenID, where we will need
to pass information about the memory as well.

Signed-off-by: Babis Chalios <[email protected]>
Add support for the Virtual Machine Generation ID (VMGenID) device that
allows notifying the guest about snapshot resume events. The device
itself allocates one page of guest memory and an IRQ line. It stores a
16 bytes cryptographically random number (generation ID) at the
beginning of this page. Once the microVM resumes from a snapshot, it
writes a new 16-byte generation ID and sends an interrupt to the device.

Also add support for the Generic Event Device, an ACPI device which
handles IRQ lines allocated to ACPI devices and routes them as
ACPI notifications to the devices the belong to.

VMGenID state is saved in Firecracker snapshots. This renders the
current snapshot format incompatible with all previous snapshot
versions. Bump snapshot version to 2.0.0 to reflect that.

Signed-off-by: Babis Chalios <[email protected]>
We now reserve one IRQ for the VMGenID device, so we have one less IRQ
available for VirtIO devices.

Signed-off-by: Babis Chalios <[email protected]>
wearyzen
wearyzen previously approved these changes Apr 9, 2024
Copy link
Contributor

@wearyzen wearyzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I would however like to get a 2nd opinion on the kernels_unfiltered from @pb8o

At the moment, we filter out 6.1 guest kernel in all tests apart from
PTP on Graviton. Add a new filter that allows select any kernel that
exists in our CI artifacts folder and use it to create a pytest fixture
for guest kernel 6.1, which works independently of the platform we run
on.

Signed-off-by: Babis Chalios <[email protected]>
Add an integration test that launches a microVM, snapshots it repeatedly
and checks for the existence of the message that the kernel emits every
time it receives the VMGenID notification.

Signed-off-by: Babis Chalios <[email protected]>
In test_vulnerabilities.py we have various tests that check whether a
condition holds after resuming from a snapshot. These checks seem to
consistently fail if we take a snapshot before letting the guest kernel
boot.

Introduce an ssh command to ensure that the guest has booted before
taking the snapshot so that we avoid the issue.

Signed-off-by: Babis Chalios <[email protected]>
Extend our current documentation for snapshotting and entropy
recommendations with context about VMGenID. Mention the available
VMGenID features depending on Linux version and also provide
recommendations for entropy on VM clones based on VMGenID availability.

Also, add CHANGELOG entry for VMGenID support.

Signed-off-by: Babis Chalios <[email protected]>
@bchalios bchalios merged commit 092d90a into firecracker-microvm:main Apr 9, 2024
7 checks passed
@bchalios bchalios deleted the vmgenid_x86_64 branch April 9, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants