Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firecracker networking improvements #4364

Closed
3 tasks done
bchalios opened this issue Jan 15, 2024 · 3 comments
Closed
3 tasks done

Firecracker networking improvements #4364

bchalios opened this issue Jan 15, 2024 · 3 comments
Assignees
Labels
Status: WIP Indicates that an issue is currently being worked on or triaged

Comments

@bchalios
Copy link
Contributor

bchalios commented Jan 15, 2024

This is a tracker for investigating improvements in the Firecracker networking stack. We are mainly looking to improve
performance (throughput, latency and CPU utilization) of the emulation logic. However, we want to keep an eye on simplifying the control plane.

Currently, we want to focus our efforts towards a vhost-user-net data-plane, however, we will evaluate other alternatives along the way.

Additional context

Related issues:

Macvtap: #1933
vhost-net: #3707, #4312

Checks

  • Have you searched the Firecracker Issues database for similar requests?
  • Have you read all the existing relevant Firecracker documentation?
  • Have you read and understood Firecracker's core tenets?
@sagoresarker
Copy link

I think it can extend its functionality by not relying solely on TUN/TAP.

@zulinx86 zulinx86 moved this from Researching to We're Working On It in Firecracker Roadmap Jul 1, 2024
@zulinx86 zulinx86 added the Status: WIP Indicates that an issue is currently being worked on or triaged label Jul 1, 2024
@DemiMarie
Copy link

I think the most important change here is to move away from the Linux tap interface to something with much better performance. Tap devices cannot support a large number of packets per second, whereas vhost-net, XDP, and other solutions can.

@bchalios
Copy link
Contributor Author

Hello all.

I wanted to give an update on our work on this front and present some results regarding the improvements we shipped with release v1.10.

We implemented three optimizations:

  • Generic VirtIO queue optimizations: We removed a number of redundant safety checks in our emulation code. These checks are present in the upstream vm-memory crate, but they’re redundant in our logic, since we already make sure that all the safety properties are upheld in downstream Firecracker code.
  • Support VirtIO net mergeable buffers: This VirtIO feature allows the guest driver to perform smaller memory allocations, wasting less memory in the VirtIO queue, and increasing efficiency of memory allocations on the guest side.
  • Avoid a memory copy in the RX path: Before this optimization, we were performing two memory copies for every ethernet frame we received; one copy from the TAP device to a Firecracker buffer and a second one from the Firecracker buffer to guest memory. Now, we use the readv system call to read directly from the TAP device in guest memory.

Results

Applying these optimizations TCP throughput test reports an increase of 20% for RX and 10% for TX throughput average across all the combinations of instances, guest and host kernels Firecracker currently supports.

In more detail, the per-instance type improvements are:

Instance type RX throughput improvement (%) TX throughput improvement (%)
m5n 12.0 9.6
c5n 8.2 7.6
m6i 15.2 18.0
m6g 38.1 2.1
m7g 24.7 11.3

Here, you can see a comparison of the absolute numbers for TCP throughput between v1.9.1 and v1.10.0 Firecracker releases, for RX:

rx

and TX:

tx

These are the results from our throughput test. Briefly, this test uses iperf3 to measure TCP thoughput between a guest and the host it is running on both for TX and RX. It tests this for all our supported instance types and kernel configurations. The test uses microVM configurations with 1 and 2 vCPUs.

Apart from these optimizations, we have considered in the past vhost-net (#3707) and vhost-user-net. We discarded the former due to security concerns regarding a direct guest to host kernel attack surface which didn't match our security bar. For the latter, we realized that the bulk of the work there lies in the implementation of the back-end which is a responsibility that lives outside Firecracker, limiting the impacts which we can make for our users. For this reason, we decided to de-prioritize it (at least) for the time being.

As a result, we focused our efforts on improving the current back-end, which seemed more feasible according to our time frame.

I will close this issue as we do not plan to work on a specific optimization on this front in the near future. As always, we welcome the community to open issues or PRs with specific requests and ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: WIP Indicates that an issue is currently being worked on or triaged
Projects
Status: Shipped
Development

No branches or pull requests

5 participants