Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket read error 104 occurred on TCP receiver running in a VM (Sender from a remote VM) #76

Open
wilsonNg007 opened this issue May 24, 2022 · 2 comments

Comments

@wilsonNg007
Copy link

wilsonNg007 commented May 24, 2022

Running Ntttcp (TCP sender) on a linux-based VM to another ntttcp (TCP receiver) on another linux-based VM on a remote host machine. The cmdlines are default.

After running for ~40mins, TCP receiver started error out with "Socket read error 104" (104 is Connection reset by peer). Then within 2 hrs, got 25 more socket read errors, and no traffic flowing from the sender. The sender did not output any error, so not sure why the sender stop transmitting becoz at least I can tell the 4 sockets (23,25,41,49) created on port 6051 are not getting any error, so that port 6051 should still be good at sending packets.

Both VMs continue to be functioning - meaning I can stream web traffic from a browser app running on VM.

This is just a basic default setup TCP un-directional scenario. I'm not sure if this is VM kernel issue or a Host kernel network issue. Is this something that has been seen before running on Azure VM?

---------------- Receiver ------------------
ntttcp -r -V --show-tcp-retrans --show-nic-packets wlan0 -p 6051 -t 86400 <
NTTTCP for Linux 1.4.0

*** receiver role
ports: 16
cpu affinity: *
server address: 0.0.0.0
domain: IPv4
protocol: TCP
server port starting at: 6051
receiver socket buffer (bytes): 65536
test warm-up (sec): no
test duration (sec): 86400
test cool-down (sec): no
show system tcp retransmit: yes
show packets for: wlan0
quiet mode: disabled
verbose mode: enabled

17:11:02 DBG : user limits for maximum number of open files: soft: 32768; hard: 32768
17:11:02 INFO: 17 threads created
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6051
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6050
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6066
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6065
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6064
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6063
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6062
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6061
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6060
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6059
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6058
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6057
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6056
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6055
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6054
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6053
17:11:02 DBG : ntttcp server is listening on 0.0.0.0:6052
17:11:05 DBG : Sync connection: 10.231.111.24:60992 --> local:6050 [socket 22]
17:11:05 DBG : New connection: 10.231.111.24:60946 --> local:6051 [socket 23]
17:11:05 DBG : New connection: 10.231.111.24:60908 --> local:6053 [socket 24]
17:11:05 DBG : New connection: 10.231.111.24:60950 --> local:6051 [socket 25]
17:11:05 DBG : New connection: 10.231.111.24:60886 --> local:6052 [socket 26]
17:11:05 DBG : New connection: 10.231.111.24:60888 --> local:6052 [socket 27]
17:11:05 DBG : New connection: 10.231.111.24:60914 --> local:6053 [socket 28]
17:11:05 DBG : New connection: 10.231.111.24:60794 --> local:6054 [socket 29]
17:11:05 DBG : New connection: 10.231.111.24:60798 --> local:6054 [socket 30]
17:11:05 DBG : New connection: 10.231.111.24:60992 --> local:6055 [socket 31]
17:11:05 DBG : New connection: 10.231.111.24:60988 --> local:6055 [socket 32]
17:11:05 DBG : New connection: 10.231.111.24:60858 --> local:6056 [socket 33]
17:11:05 DBG : New connection: 10.231.111.24:60756 --> local:6057 [socket 34]
17:11:05 DBG : New connection: 10.231.111.24:60762 --> local:6057 [socket 35]
17:11:05 DBG : New connection: 10.231.111.24:60862 --> local:6056 [socket 36]
17:11:05 DBG : New connection: 10.231.111.24:60916 --> local:6058 [socket 37]
17:11:05 DBG : New connection: 10.231.111.24:60912 --> local:6058 [socket 38]
17:11:05 DBG : New connection: 10.231.111.24:60938 --> local:6059 [socket 39]
17:11:05 DBG : New connection: 10.231.111.24:60890 --> local:6052 [socket 40]
17:11:05 DBG : New connection: 10.231.111.24:60948 --> local:6051 [socket 41]
17:11:05 DBG : New connection: 10.231.111.24:60710 --> local:6060 [socket 42]
17:11:05 DBG : New connection: 10.231.111.24:60758 --> local:6057 [socket 48]
17:11:05 DBG : New connection: 10.231.111.24:60864 --> local:6056 [socket 51]
17:11:05 DBG : New connection: 10.231.111.24:60944 --> local:6051 [socket 49]
17:11:05 DBG : New connection: 10.231.111.24:60910 --> local:6053 [socket 47]
17:11:05 DBG : New connection: 10.231.111.24:60716 --> local:6060 [socket 50]
17:11:05 DBG : New connection: 10.231.111.24:60994 --> local:6055 [socket 45]
17:11:05 DBG : New connection: 10.231.111.24:60942 --> local:6059 [socket 43]
17:11:05 DBG : New connection: 10.231.111.24:60800 --> local:6054 [socket 44]
17:11:05 DBG : New connection: 10.231.111.24:60910 --> local:6063 [socket 55]
17:11:05 DBG : New connection: 10.231.111.24:60884 --> local:6052 [socket 46]
17:11:05 DBG : New connection: 10.231.111.24:60912 --> local:6063 [socket 59]
17:11:05 DBG : New connection: 10.231.111.24:60918 --> local:6058 [socket 54]
17:11:05 DBG : New connection: 10.231.111.24:60914 --> local:6058 [socket 61]
17:11:05 DBG : New connection: 10.231.111.24:60860 --> local:6056 [socket 52]
17:11:05 DBG : New connection: 10.231.111.24:60728 --> local:6061 [socket 53]
17:11:05 DBG : New connection: 10.231.111.24:60700 --> local:6064 [socket 58]
17:11:05 DBG : New connection: 10.231.111.24:60712 --> local:6060 [socket 57]
17:11:05 DBG : New connection: 10.231.111.24:60908 --> local:6063 [socket 60]
17:11:05 DBG : New connection: 10.231.111.24:60826 --> local:6062 [socket 56]
17:11:05 DBG : New connection: 10.231.111.24:60796 --> local:6054 [socket 72]
17:11:05 DBG : New connection: 10.231.111.24:60990 --> local:6055 [socket 62]
17:11:05 DBG : New connection: 10.231.111.24:60732 --> local:6061 [socket 68]
17:11:05 DBG : New connection: 10.231.111.24:60702 --> local:6064 [socket 70]
17:11:05 DBG : New connection: 10.231.111.24:60730 --> local:6061 [socket 73]
17:11:05 DBG : New connection: 10.231.111.24:60824 --> local:6062 [socket 67]
17:11:05 DBG : New connection: 10.231.111.24:60714 --> local:6060 [socket 65]
17:11:05 DBG : New connection: 10.231.111.24:60822 --> local:6062 [socket 75]
17:11:05 DBG : New connection: 10.231.111.24:60768 --> local:6065 [socket 66]
17:11:05 DBG : New connection: 10.231.111.24:60916 --> local:6063 [socket 69]
17:11:05 DBG : New connection: 10.231.111.24:60944 --> local:6059 [socket 63]
17:11:05 DBG : New connection: 10.231.111.24:60912 --> local:6053 [socket 71]
17:11:05 DBG : New connection: 10.231.111.24:60998 --> local:6064 [socket 74]
17:11:05 DBG : New connection: 10.231.111.24:60760 --> local:6057 [socket 64]
17:11:05 DBG : New connection: 10.231.111.24:60704 --> local:6064 [socket 79]
17:11:05 DBG : New connection: 10.231.111.24:60834 --> local:6062 [socket 76]
17:11:05 DBG : New connection: 10.231.111.24:60734 --> local:6061 [socket 77]
17:11:05 DBG : New connection: 10.231.111.24:60940 --> local:6059 [socket 78]
17:11:05 DBG : New connection: 10.231.111.24:60770 --> local:6065 [socket 80]
17:11:05 DBG : New connection: 10.231.111.24:60766 --> local:6065 [socket 81]
17:11:05 DBG : New connection: 10.231.111.24:60846 --> local:6066 [socket 82]
17:11:05 DBG : New connection: 10.231.111.24:60840 --> local:6066 [socket 83]
17:11:05 DBG : New connection: 10.231.111.24:60792 --> local:6065 [socket 84]
17:11:05 DBG : New connection: 10.231.111.24:60844 --> local:6066 [socket 85]
17:11:05 DBG : New connection: 10.231.111.24:60842 --> local:6066 [socket 86]
17:11:05 INFO: Network activity progressing...
socket read error: 104466.65Kbps
17:53:07 DBG : socket closed: 36
socket read error: 104
17:53:07 DBG : socket closed: 29
Real-time throughput: 0.00bps
socket read error: 1040.00bps
socket read error: 104
socket read error: 104
18:59:39 DBG : socket closed: 86
18:59:39 DBG : socket closed: 70
18:59:39 DBG : socket closed: 59
socket read error: 104
18:59:39 DBG : socket closed: 83
socket read error: 104
socket read error: 104
18:59:39 DBG : socket closed: 58
18:59:39 DBG : socket closed: 62
socket read error: 1040.00bps
19:04:41 DBG : socket closed: 50
socket read error: 104
19:04:41 DBG : socket closed: 79
socket read error: 104
19:04:41 DBG : socket closed: 40
socket read error: 104
socket read error: 104
socket read error: 104
19:04:41 DBG : socket closed: 32
19:04:41 DBG : socket closed: 81
19:04:41 DBG : socket closed: 28
socket read error: 104
19:04:41 DBG : socket closed: 57
socket read error: 1040.00bps
19:04:43 DBG : socket closed: 38
socket read error: 104
19:04:43 DBG : socket closed: 72
socket read error: 104
19:04:43 DBG : socket closed: 61
socket read error: 104
socket read error: 104
19:04:43 DBG : socket closed: 80
19:04:43 DBG : socket closed: 37
socket read error: 104
19:04:43 DBG : socket closed: 77
socket read error: 104
19:04:43 DBG : socket closed: 53
socket read error: 1040.00bps
19:11:51 DBG : socket closed: 47
socket read error: 104
19:11:51 DBG : socket closed: 24
socket read error: 104
socket read error: 104
19:11:51 DBG : socket closed: 85
19:11:51 DBG : socket closed: 69
socket read error: 104
19:11:51 DBG : socket closed: 82
Real-time throughput: 0.00bps

-------------------Sender --------------------
ntttcp -s 10.231.111.166 -V --show-tcp-retrans --show-nic-packets wlan0 -p 6051 -t 86400 <
NTTTCP for Linux 1.4.0

*** sender role
connections: 16 X 4 X 1
cpu affinity: *
server address: 10.231.111.166
domain: IPv4
protocol: TCP
server port starting at: 6051
sender socket buffer (bytes): 131072
test warm-up (sec): no
test duration (sec): 86400
test cool-down (sec): no
show system tcp retransmit: yes
show packets for: wlan0
quiet mode: disabled
verbose mode: enabled

17:11:07 DBG : user limits for maximum number of open files: soft: 32768; hard: 32768
17:11:07 DBG : Sync connection: local:60992 [socket:3] --> 10.231.111.166:6050
17:11:07 DBG : New connection: local:60946 [socket:5] --> 10.231.111.166:6051
17:11:07 DBG : New connection: local:60950 [socket:7] --> 10.231.111.166:6051
17:11:07 DBG : New connection: local:60908 [socket:13] --> 10.231.111.166:6053
17:11:07 DBG : New connection: local:60886 [socket:9] --> 10.231.111.166:6052
17:11:07 DBG : New connection: local:60888 [socket:10] --> 10.231.111.166:6052
17:11:07 DBG : New connection: local:60914 [socket:16] --> 10.231.111.166:6053
17:11:07 DBG : New connection: local:60794 [socket:17] --> 10.231.111.166:6054
17:11:07 DBG : New connection: local:60798 [socket:19] --> 10.231.111.166:6054
17:11:07 DBG : New connection: local:60992 [socket:22] --> 10.231.111.166:6055
17:11:07 DBG : New connection: local:60988 [socket:21] --> 10.231.111.166:6055
17:11:07 DBG : New connection: local:60858 [socket:25] --> 10.231.111.166:6056
17:11:07 DBG : New connection: local:60756 [socket:29] --> 10.231.111.166:6057
17:11:07 DBG : New connection: local:60762 [socket:32] --> 10.231.111.166:6057
17:11:07 DBG : New connection: local:60862 [socket:27] --> 10.231.111.166:6056
17:11:07 DBG : New connection: local:60912 [socket:33] --> 10.231.111.166:6058
17:11:07 DBG : New connection: local:60916 [socket:35] --> 10.231.111.166:6058
17:11:07 DBG : New connection: local:60938 [socket:37] --> 10.231.111.166:6059
17:11:07 DBG : New connection: local:60942 [socket:39] --> 10.231.111.166:6059
17:11:07 DBG : New connection: local:60948 [socket:6] --> 10.231.111.166:6051
17:11:07 DBG : New connection: local:60944 [socket:4] --> 10.231.111.166:6051
17:11:07 DBG : New connection: local:60890 [socket:12] --> 10.231.111.166:6052
17:11:07 DBG : New connection: local:60910 [socket:14] --> 10.231.111.166:6053
17:11:07 DBG : New connection: local:60884 [socket:8] --> 10.231.111.166:6052
17:11:07 DBG : New connection: local:60944 [socket:40] --> 10.231.111.166:6059
17:11:07 DBG : New connection: local:60800 [socket:20] --> 10.231.111.166:6054
17:11:07 DBG : New connection: local:60710 [socket:41] --> 10.231.111.166:6060
17:11:07 DBG : New connection: local:60712 [socket:42] --> 10.231.111.166:6060
17:11:07 DBG : New connection: local:60994 [socket:24] --> 10.231.111.166:6055
17:11:07 DBG : New connection: local:60912 [socket:15] --> 10.231.111.166:6053
17:11:07 DBG : New connection: local:60990 [socket:23] --> 10.231.111.166:6055
17:11:07 DBG : New connection: local:60864 [socket:28] --> 10.231.111.166:6056
17:11:07 DBG : New connection: local:60716 [socket:44] --> 10.231.111.166:6060
17:11:07 DBG : New connection: local:60758 [socket:30] --> 10.231.111.166:6057
17:11:07 DBG : New connection: local:60796 [socket:18] --> 10.231.111.166:6054
17:11:07 DBG : New connection: local:60730 [socket:46] --> 10.231.111.166:6061
17:11:07 DBG : New connection: local:60860 [socket:26] --> 10.231.111.166:6056
17:11:07 DBG : New connection: local:60940 [socket:38] --> 10.231.111.166:6059
17:11:07 DBG : New connection: local:60826 [socket:52] --> 10.231.111.166:6062
17:11:07 DBG : New connection: local:60714 [socket:43] --> 10.231.111.166:6060
17:11:07 DBG : New connection: local:60728 [socket:45] --> 10.231.111.166:6061
17:11:07 INFO: 64 threads created
17:11:07 DBG : New connection: local:60732 [socket:47] --> 10.231.111.166:6061
17:11:07 DBG : New connection: local:60822 [socket:49] --> 10.231.111.166:6062
17:11:07 DBG : New connection: local:60910 [socket:53] --> 10.231.111.166:6063
17:11:07 DBG : New connection: local:60760 [socket:31] --> 10.231.111.166:6057
17:11:07 DBG : New connection: local:60734 [socket:48] --> 10.231.111.166:6061
17:11:07 DBG : New connection: local:60918 [socket:36] --> 10.231.111.166:6058
17:11:07 DBG : New connection: local:60844 [socket:66] --> 10.231.111.166:6066
17:11:07 DBG : New connection: local:60914 [socket:34] --> 10.231.111.166:6058
17:11:07 DBG : New connection: local:60824 [socket:50] --> 10.231.111.166:6062
17:11:07 DBG : New connection: local:60792 [socket:64] --> 10.231.111.166:6065
17:11:07 DBG : New connection: local:60834 [socket:54] --> 10.231.111.166:6062
17:11:07 DBG : New connection: local:60700 [socket:58] --> 10.231.111.166:6064
17:11:07 DBG : New connection: local:60912 [socket:55] --> 10.231.111.166:6063
17:11:07 DBG : New connection: local:60908 [socket:51] --> 10.231.111.166:6063
17:11:07 DBG : New connection: local:60916 [socket:56] --> 10.231.111.166:6063
17:11:07 DBG : New connection: local:60702 [socket:59] --> 10.231.111.166:6064
17:11:07 DBG : New connection: local:60998 [socket:57] --> 10.231.111.166:6064
17:11:07 DBG : New connection: local:60770 [socket:63] --> 10.231.111.166:6065
17:11:07 DBG : New connection: local:60704 [socket:60] --> 10.231.111.166:6064
17:11:07 DBG : New connection: local:60766 [socket:61] --> 10.231.111.166:6065
17:11:07 DBG : New connection: local:60846 [socket:67] --> 10.231.111.166:6066
17:11:07 DBG : New connection: local:60840 [socket:65] --> 10.231.111.166:6066
17:11:07 DBG : New connection: local:60768 [socket:62] --> 10.231.111.166:6065
17:11:07 DBG : New connection: local:60842 [socket:68] --> 10.231.111.166:6066
17:11:07 INFO: 64 connections created in 103018 microseconds
17:11:07 INFO: Network activity progressing...
Real-time throughput: 0.00bps

@wilsonNg007
Copy link
Author

Looking at tcpstream.c, the "socket read error" is returned from recv API in n_recv function.

/proc/net/dev does not report any error or dropped packets from VM kernel (on both sender and receiver).

The sender is the one that is not explained in terms of if there is any problem sending packets or not. I noticed the n_send function commented out the "socket write error" printf if getting error from send API. @simonxiaoss, do you know why this is commented out? That should be uncommented by default; otherwise, it's limiting our debug-ability as it's present in this case.

@wilsonNg007
Copy link
Author

wilsonNg007 commented May 25, 2022

I reduced the # of sender threads from 64 to 16 (-n 1). I still noticed a few socket read errors, and occasionally sender would just stop for a while (can be longer than 5-10 mins or so) not sending any packet out (monitored via /proc/net/dev), or the sender would just transmit just a very few 10s of bytes out (vs. ~700+Mbps normally), but then resume afterwards. So it's better than original setup (where the sender & receiver just stopped totally after 2hrs).

Updated: Checked the next day, and both sender/receiver stopped, more socket read errors total 15.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant