Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etherate not measuring latency #54

Open
1 of 3 tasks
aapocketz opened this issue Jul 16, 2018 · 4 comments
Open
1 of 3 tasks

Etherate not measuring latency #54

aapocketz opened this issue Jul 16, 2018 · 4 comments
Assignees

Comments

@aapocketz
Copy link

ISSUE TYPE
  • Bug Report
  • Enhancement/Feature Request
  • Question
VERSION DETAILS
$uname -a
Linux 4.15.0-24-generic #26~16.04.1-Ubuntu SMP Fri Jun 15 14:35:08 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$cat /etc/redhat-release || lsb_release -a || cat /etc/issue
cat: /etc/redhat-release: No such file or directory
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.4 LTS
Release:	16.04
Codename:	xenial
$lspci | grep -i eth
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:ab:35:4c brd ff:ff:ff:ff:ff:ff

$sudo ./etherate -V
Etherate version 1.18 2018-06
$
$sudo ./etherate -l
Device lo, address 00:00:00:00:00:00, txqueuelen 1000, has interface index 1
Device enp0s3, address 08:00:27:ab:35:4c, txqueuelen 1000, has interface index 2

SUMMARY

I have two Ubuntu 16.04 VMs each running Etherate in a private network (vboxnet0). I use netem to simulate a delay in linux, I expect to get a higher latency value measurement corresponding to this delay. I observe no change latency measurement when running the test.

Looking at the source code and the FAQ regarding it (https://null.53bits.co.uk/index.php?page=etherate-faqs#how-does-etherate-measure-delay) it appears it does a strategy where it repeatedly measures the different between transmission and receiving times for pairs of transmitted frames. I could see how this could maybe measure some sort of packet jitter but I don't see how its possible to measure latency in a unidirectional manner without some sort of precisely synchronized clock between the systems (for example a GPSDO).

STEPS TO REPRODUCE

This is an example run using netem and a delay of 100ms

syncopated@ubuntu-vm:~/src/etherate$ ping 192.168.56.102
PING 192.168.56.102 (192.168.56.102) 56(84) bytes of data.
64 bytes from 192.168.56.102: icmp_seq=1 ttl=64 time=0.319 ms
64 bytes from 192.168.56.102: icmp_seq=2 ttl=64 time=0.587 ms
64 bytes from 192.168.56.102: icmp_seq=3 ttl=64 time=0.620 ms
64 bytes from 192.168.56.102: icmp_seq=4 ttl=64 time=0.621 ms
^C
--- 192.168.56.102 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3054ms
rtt min/avg/max/mdev = 0.319/0.536/0.621/0.129 ms
syncopated@ubuntu-vm:~/src/etherate$ sudo tc qdisc add dev enp0s3 root netem delay 100ms
syncopated@ubuntu-vm:~/src/etherate$ ping 192.168.56.102PING 192.168.56.102 (192.168.56.102) 56(84) bytes of data.
64 bytes from 192.168.56.102: icmp_seq=1 ttl=64 time=100 ms
64 bytes from 192.168.56.102: icmp_seq=2 ttl=64 time=100 ms
64 bytes from 192.168.56.102: icmp_seq=3 ttl=64 time=100 ms
64 bytes from 192.168.56.102: icmp_seq=4 ttl=64 time=100 ms
^C
--- 192.168.56.102 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3006ms
rtt min/avg/max/mdev = 100.724/100.802/100.868/0.322 ms
syncopated@ubuntu-vm:~/src/etherate$ sudo ./etherate -rRunning in RX mode
Using device enp0s3 with address 08:00:27:ab:35:4c, interface index 2
Entering promiscuous mode
Source MAC 00:00:5e:00:00:02
Destination MAC 00:00:5e:00:00:01
Sending gratuitous broadcasts...
Done.

Waiting for settings from TX host
Settings have been synchronised

Calculating delay between hosts...
Tx to Rx delay calculated as 0.000006720 seconds

Starting test on Mon Jul 16 15:26:41 2018

Frame size is 1514 bytes
Seconds		Mbps Rx		MBs Rx		FrmRx/s		Frames Rx
1		429.18		51		35434		35434
2		515.51		112		42562		77996
3		423.41		163		34958		112954
4		501.13		222		41375		154329
5		508.24		283		41962		196291
6		482.71		340		39854		236145
7		483.79		398		39943		276088
8		462.64		453		38197		314285
9		530.35		517		43787		358072
10		526.57		579		43475		401547
11		498.13		639		41127		442674
12		542.45		703		44786		487460
13		519.00		765		42850		530310
14		544.63		830		44966		575276
15		523.60		893		43230		618506
16		409.34		941		33796		652302
17		398.86		989		32931		685233
18		527.85		1052		43581		728814
19		449.20		1105		37087		765901
20		431.50		1157		35626		801527
21		431.59		1208		35633		837160
22		453.36		1262		37431		874591
23		506.28		1323		41800		916391
24		515.00		1384		42520		958911
25		463.80		1439		38293		997204
26		488.97		1498		40371		1037575
27		513.78		1559		42419		1079994
28		517.00		1620		42685		1122679
29		531.18		1684		43856		1166535
30		484.49		1742		40001		1206536
Speed test complete
Test frames transmitted: 0
Test frames received: 1206536
Non test frames received: 1
In order test frames received: 636
Out of order test frames received early: 1205900
Out of order test frames received late: 0
Maximum speed during test: 544.63Mbps, 44966Fps
Average speed during test: 487.12Mbps, 40217Fps
Data received during test: 1742MBs
Ending test on Mon Jul 16 15:27:11 2018


@jwbensley jwbensley self-assigned this Aug 3, 2018
@jwbensley
Copy link
Owner

Hi @aapocketz - thanks for reporting this issue and I am very sorry for such a long delay in responding, I have been changing jobs so life is crazy right now :)

At first glance this looks like an issue with netem. I'm looking at this URL, please correct me if I am wrong as I am not a netem expert. It seems that netem uses the queuing discipline layer of the network stack within the Kernel to implement features like extra latency.

Example from that URL:

# tc qdisc add dev eth0 root netem delay 100ms

Etherate bypasses this part of the networking stack to reduce latency and improve efficiency (simply put, it's less code to run within the Kernel for each packet that is sent). This is the relevant section within the Etherate source:

https://github.com/jwbensley/Etherate/blob/master/defaults.c#L239-L249

If you are able to, you can comment out that section of code, recompile Etherate and try gain? Or I can add a CLI flag to disable this bypass technique if you like?

@jwbensley
Copy link
Owner

I forgot to mention about the latency...

@aapocketz :

I could see how this could maybe measure some sort of packet jitter but I don't see how its possible to measure latency in a unidirectional manner without some sort of precisely synchronized clock between the systems (for example a GPSDO).

I mainly use Etherate in the lab and this is somewhere without Internet connectivity so there is no accurate or guaranteed time source (e.g. NTP). Etherate has to use a technique that doesn't rely on NTP or PTP to calculate the one way delay from the Tx to Rx host. To calculate one-way delay there is no need for both hosts to have their time sources in sync with each other (which in a lab with no Internet/NTP access is vital!).

It works like this:

  • Tx sends it's uptime value to Rx twice (two frames)
  • Rx records the time it receives both frames
  • Rx subtracts the two time values of when it received both the Tx uptime frames, the difference between these two Rx times is the Tx application/processing time (the time it takes Tx to get it's uptime, put that in a frame, send it down to the NIC for transmission etc.) and the time it took the frame to travel across the network to Rx.
  • Rx subtracts the two Tx uptime values it received, the difference between these is the time it took Tx to get it's uptime value, put that in a packet and send it down the networking stack to the NIC. This time is roughly the time Tx spent processing application/kernel code before the frames were transmitted across the network. This time is then subtracted from the value above, so that you are left with just the network delay from Tx to Rx, no application processing time.

This provides a best case figure for the network delay from Tx to Rx. The ping output you have provided also includes application processing time (ping gets the time values, puts that in a packet, gives it to the kernel to send, the Kernel sends it down to the NIC etc.). So ping is showing (in your output) an average application level one-way delay of 0.536ms and Etherate is showing a network level one-way delay of 0.006ms.

@aapocketz
Copy link
Author

So I disabled the qdisc bypass, which certainly had an effect, as the test took a long time to complete when adding just 10ms of delay. Unfortunately the latency calculation doesn't reflect this increased amount of delay.

syncopated@ubuntu-vm:~/src/etherate$ sudo tc qdisc add dev enp0s3 root netem delay 10ms
syncopated@ubuntu-vm:~/src/etherate$ sudo ./etherate
Running in TX mode
Using device enp0s3 with address 08:00:27:ab:35:4c, interface index 2
Entering promiscuous mode
Source MAC 00:00:5e:00:00:01
Destination MAC 00:00:5e:00:00:02
Sending gratuitous broadcasts...
Done.

Synchronising settings with RX host
Settings have been synchronised

Calculating delay between hosts...
Tx to Rx delay calculated as 0.000000197 seconds

Starting test on Fri Aug  3 10:33:29 2018

Frame size is 1514 bytes
Seconds		Mbps Tx		MBs Tx		FrmTx/s		Frames Tx
Speed test Tx error : No buffer space available
Speed test complete
Test frames transmitted: 999
Test frames received: 0
Non test frames received: 0
In order ACK frames received: 0
Out of order ACK frames received early: 0
Out of order ACK frames received late: 0
Maximum speed during test: 0.00Mbps, 0Fps
Average speed during test: 0.00Mbps, 0Fps
Data transmitted during test: 1MBs
Ending test on Fri Aug  3 10:33:29 2018


Leaving promiscuous mode

I guess I am really having trouble buying into your algorithm is effective for calculating one way latency.

Rx subtracts the two time values of when it received both the Tx uptime frames, the difference between these two Rx times is the Tx application/processing time (the time it takes Tx to get it's uptime, put that in a frame, send it down to the NIC for transmission etc.) and the time it took the frame to travel across the network to Rx.

I don't see how ΔRx includes the propagation delay, as these are frames that are sent back to back. As a thought experiment imagine two supercomputers on either side of the galaxy. The sending computer can generate a frame and transmit nearly instantly. Similarly the receiving computer can receive the frame and process it nearly instantly as well. The sender sends two frames back to back, so ΔTx=0. It takes 100 million years for each message to propagate across the galaxy. The receiver receives both frames, back to back, so ΔRx=0. The 100 million years isn't embedded anywhere in the variables you are measuring.
Here is an old post on stackexchange which is about a similar topic. The top answer seems to coincide with my reasoning.
https://cs.stackexchange.com/questions/602/measuring-one-way-network-latency

I mainly use Etherate in the lab and this is somewhere without Internet connectivity so there is no accurate or guaranteed time source (e.g. NTP).

This isn't totally relevant but there isn't really anything from stopping you from setting up NTP in the lab with no internet or GPS. Just make one computer use its own clock source as the master clock and configure the other machines as clients to sync time from that machine. That being said I think NTP is only so accurate and its accuracy can be driven by latency, so it may be unsuitable for highly accurate latency measurements at least in an environment with high latency.

@jwbensley
Copy link
Owner

@aapocketz very quick reply - Etherate is not meant for use between VMs, I will update the documentation to make this clearer, it is meant for real physical interfaces. Depending on the type of virtual interface being used all sorts of weird stuff can happen so I will make that clearer.

As for the latency issue I haven't had time to look into this yet - just wanted to let you know I'm not ignoring you ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants