-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tx packet drops on tun device #132
Comments
Here's some data on the Tx packet loss for 3 values of txqueuelen (500, 1000 and 10000) for 20 x 20MB file downloads. txqueuelen = 500: Tx packet loss 6.6%
# ifconfig -a delay-6579
delay-6579: flags=81<UP,POINTOPOINT,RUNNING> mtu 1500
inet 100.64.0.1 netmask 255.255.255.255 destination 100.64.0.2
inet6 fe80::c521:b968:c2fa:a2b9 prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 500 (UNSPEC)
RX packets 150163 bytes 10001048 (9.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 282961 bytes 424320204 (404.6 MiB)
TX errors 0 dropped 18800 overruns 0 carrier 0 collisions 0
txqueuelen = 1000: Tx packet loss 6.9%
# ifconfig delay-3458
delay-3458: flags=81<UP,POINTOPOINT,RUNNING> mtu 1500
inet 100.64.0.1 netmask 255.255.255.255 destination 100.64.0.2
inet6 fe80::f444:d325:75cd:61c5 prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 141137 bytes 9434712 (8.9 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 282967 bytes 424330652 (404.6 MiB)
TX errors 0 dropped 19656 overruns 0 carrier 0 collisions 0
txqueuelen = 10000: Tx packet loss 4.4%
# ifconfig delay-5172
delay-5172: flags=81<UP,POINTOPOINT,RUNNING> mtu 1500
inet 100.64.0.1 netmask 255.255.255.255 destination 100.64.0.2
inet6 fe80::5c41:3015:9c31:ff13 prefixlen 64 scopeid 0x20<link>
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 10000 (UNSPEC)
RX packets 135786 bytes 8628744 (8.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 282997 bytes 424374204 (404.7 MiB)
TX errors 0 dropped 12679 overruns 0 carrier 0 collisions 0 |
Thanks for this detailed report! I think this is sort of inevitable. At the beginning of a connection, TCP increases its window exponentially until reaching the receiver's advertised window or a segment gets lost. If Or, bottom line, if you are expecting to use mm-delay without another bottleneck in the path (either a real bottleneck, or mm-link), I would treat it as imposing a constraint of around 300 Mbps itself as currently implemented. This could certainly be improved a ton (probably just by batching the syscalls -- e.g. not calling poll every time if there is still pending data to read from the tun device) if you need it. But even if we improve it from 300 Mbps to 3 Gbps or more, there still has to be *some bottleneck, and if mm-delay is the only thing in the path, it has to be the bottleneck. |
I was hoping to run some higher speed tests - up to 1Gbps - but this additional packet loss is causing issues. The intent of adding only delay and no throughput bound and no packet loss was to explore the max throughput possible. So far I'm seeing it peak out at around 150 to 200 Mbps. If I set txqueulen to 10000 and move the Rx and Tx queues for the tun device onto separate CPUs it does help reduce the packet loss but it's still significant (e.g. 6% to 3%). So maybe there is some contention between the packet read and writes? I don't see any loss on the Rx path - only Tx. # echo 1 > /sys/class/net/delay-27263/queues/tx-0/xps_cpus Any thoughts on things to try here? I could try something out if you have some pointers on what to change in the code. |
Well, here's one thing we do that's inefficient: the mahimahi link emulators only read one IP datagram each time that poll() returns that there is data available. See https://github.com/ravinet/mahimahi/blob/master/src/packet/packetshell.cc#L175 If this were changed so that the interface to the TUN device was nonblocking and the handler read from the TUN device in a loop until it got EAGAIN, I bet mm-delay would be able to keep up with a much larger TCP window size before the first drop! I'm not sure looking at the loss percentage is the best way to analyze this -- at the start of a TCP connection, the window grows exponentially, and will quickly reach a point where packets have to get dropped because mm-delay just can't read them fast enough. Once that happens, though, TCP will cut the window in half and then will only grow the window linearly, so it can take a looooong time (like, minutes) for it to get back to the first high level and further losses to occur. So my guess is that "loss percentage" is a pretty noisy indicator -- you would have to average over a long time to make that number reliable. |
Start a delay shell (no loss configured) and run a few downloads through it
$ mm-delay 20
$ curl -o /dev/null http://A.B.C.D/20MB.dat
wireshark analysis of the packet capture will show TCP retransmissions indicating there is some packet loss
If you check the configured tuntap device there are dropped Tx packets (typically > 3% on my machine)
I saw a couple of hints on the Internet suggesting that increasing the txqueuelen on the tun device from the default of 500 would help with the packet loss
Here's a patch I used to increase the txqueuelen on the tun interface
I tried increasing txqueuelen to 1000 and 10000 but still get tx drops. Any thoughts on other things to try?
The text was updated successfully, but these errors were encountered: