Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tx packet drops on tun device #132

Open
pmcgleenon opened this issue Nov 27, 2018 · 4 comments
Open

Tx packet drops on tun device #132

pmcgleenon opened this issue Nov 27, 2018 · 4 comments

Comments

@pmcgleenon
Copy link

pmcgleenon commented Nov 27, 2018

Start a delay shell (no loss configured) and run a few downloads through it

$ mm-delay 20
$ curl -o /dev/null http://A.B.C.D/20MB.dat

wireshark analysis of the packet capture will show TCP retransmissions indicating there is some packet loss

If you check the configured tuntap device there are dropped Tx packets (typically > 3% on my machine)

# ifconfig delay-31431
delay-31431: flags=81<UP,POINTOPOINT,RUNNING>  mtu 1500
        inet 100.64.0.1  netmask 255.255.255.255  destination 100.64.0.2
        inet6 fe80::edd3:27b4:2218:c58a  prefixlen 64  scopeid 0x20<link>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
        RX packets 6738  bytes 417912 (408.1 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 14157  bytes 21217095 (20.2 MiB)
        TX errors 0  dropped 1013 overruns 0  carrier 0  collisions 0

# ip -s -s link ls dev delay-31431
10: delay-31431: <POINTOPOINT,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10000
    link/none 
    RX: bytes  packets  errors  dropped overrun mcast   
    417912     6738     0       0       0       0       
    RX errors: length   crc     frame   fifo    missed
               0        0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    21217095   14157    0       1013    0       0       
    TX errors: aborted  fifo   window heartbeat transns
               0        0       0       0       0       

I saw a couple of hints on the Internet suggesting that increasing the txqueuelen on the tun device from the default of 500 would help with the packet loss

Here's a patch I used to increase the txqueuelen on the tun interface

diff --git a/src/util/netdevice.cc b/src/util/netdevice.cc
index 6ae7955..a350182 100644
--- a/src/util/netdevice.cc
+++ b/src/util/netdevice.cc
@@ -26,6 +26,10 @@ TunDevice::TunDevice( const string & name,
     interface_ioctl( *this, TUNSETIFF, name,
                      [] ( ifreq &ifr ) { ifr.ifr_flags = IFF_TUN; } );
 
+    /* increase txqueuelen from default 500 */
+    interface_ioctl( SIOCSIFTXQLEN, name,
+                     [] ( ifreq &ifr ) { ifr.ifr_qlen = 1000; } );
+
     assign_address( name, addr, peer );
 }

I tried increasing txqueuelen to 1000 and 10000 but still get tx drops. Any thoughts on other things to try?

@pmcgleenon
Copy link
Author

Here's some data on the Tx packet loss for 3 values of txqueuelen (500, 1000 and 10000) for 20 x 20MB file downloads.

txqueuelen = 500: Tx packet loss 6.6%

# ifconfig -a delay-6579
delay-6579: flags=81<UP,POINTOPOINT,RUNNING>  mtu 1500
        inet 100.64.0.1  netmask 255.255.255.255  destination 100.64.0.2
        inet6 fe80::c521:b968:c2fa:a2b9  prefixlen 64  scopeid 0x20<link>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
        RX packets 150163  bytes 10001048 (9.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 282961  bytes 424320204 (404.6 MiB)
        TX errors 0  dropped 18800 overruns 0  carrier 0  collisions 0

txqueuelen = 1000: Tx packet loss 6.9%

# ifconfig delay-3458
delay-3458: flags=81<UP,POINTOPOINT,RUNNING>  mtu 1500
        inet 100.64.0.1  netmask 255.255.255.255  destination 100.64.0.2
        inet6 fe80::f444:d325:75cd:61c5  prefixlen 64  scopeid 0x20<link>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 1000  (UNSPEC)
        RX packets 141137  bytes 9434712 (8.9 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 282967  bytes 424330652 (404.6 MiB)
        TX errors 0  dropped 19656 overruns 0  carrier 0  collisions 0

txqueuelen = 10000: Tx packet loss 4.4%

# ifconfig delay-5172
delay-5172: flags=81<UP,POINTOPOINT,RUNNING>  mtu 1500
        inet 100.64.0.1  netmask 255.255.255.255  destination 100.64.0.2
        inet6 fe80::5c41:3015:9c31:ff13  prefixlen 64  scopeid 0x20<link>
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 10000  (UNSPEC)
        RX packets 135786  bytes 8628744 (8.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 282997  bytes 424374204 (404.7 MiB)
        TX errors 0  dropped 12679 overruns 0  carrier 0  collisions 0

@keithw
Copy link
Collaborator

keithw commented Dec 2, 2018

Thanks for this detailed report! I think this is sort of inevitable. At the beginning of a connection, TCP increases its window exponentially until reaching the receiver's advertised window or a segment gets lost. If mm-delay is the only bottleneck, then the only place for a segment to get lost is in the input buffer to mm-delay, so it will happen inevitably at some point. If you run wget -O /dev/null http://$MAHIMAHI_BASE/bigfile inside the container in one window (fetching a large file from a local webserver), and watch -n 0.25 ip -s -s link ls dev delay-NNNNN in another window, you can see how there are no dropped packets at the beginning as the wget speed ramps up, and then at some point there will be a burst of dropped packets and then wget's speed will level off (indicating TCP has entered the congestion avoidance phase, i.e. the end of exponential growth of the window).

Or, bottom line, if you are expecting to use mm-delay without another bottleneck in the path (either a real bottleneck, or mm-link), I would treat it as imposing a constraint of around 300 Mbps itself as currently implemented. This could certainly be improved a ton (probably just by batching the syscalls -- e.g. not calling poll every time if there is still pending data to read from the tun device) if you need it. But even if we improve it from 300 Mbps to 3 Gbps or more, there still has to be *some bottleneck, and if mm-delay is the only thing in the path, it has to be the bottleneck.

@pmcgleenon
Copy link
Author

pmcgleenon commented Dec 5, 2018

I was hoping to run some higher speed tests - up to 1Gbps - but this additional packet loss is causing issues. The intent of adding only delay and no throughput bound and no packet loss was to explore the max throughput possible. So far I'm seeing it peak out at around 150 to 200 Mbps.

If I set txqueulen to 10000 and move the Rx and Tx queues for the tun device onto separate CPUs it does help reduce the packet loss but it's still significant (e.g. 6% to 3%). So maybe there is some contention between the packet read and writes? I don't see any loss on the Rx path - only Tx.

# echo 1 > /sys/class/net/delay-27263/queues/tx-0/xps_cpus 

Any thoughts on things to try here? I could try something out if you have some pointers on what to change in the code.

@keithw
Copy link
Collaborator

keithw commented Dec 5, 2018

Well, here's one thing we do that's inefficient: the mahimahi link emulators only read one IP datagram each time that poll() returns that there is data available. See https://github.com/ravinet/mahimahi/blob/master/src/packet/packetshell.cc#L175

If this were changed so that the interface to the TUN device was nonblocking and the handler read from the TUN device in a loop until it got EAGAIN, I bet mm-delay would be able to keep up with a much larger TCP window size before the first drop!

I'm not sure looking at the loss percentage is the best way to analyze this -- at the start of a TCP connection, the window grows exponentially, and will quickly reach a point where packets have to get dropped because mm-delay just can't read them fast enough. Once that happens, though, TCP will cut the window in half and then will only grow the window linearly, so it can take a looooong time (like, minutes) for it to get back to the first high level and further losses to occur. So my guess is that "loss percentage" is a pretty noisy indicator -- you would have to average over a long time to make that number reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants