-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observing significant delays (>30 seconds) on "handover" to subflow #554
Comments
This configuration is a bit self-inconsistent. If you want a 'fullmesh' topology you should increase
Do you always observe this sequence of events? I suspect you should also see SF_ESTABLISHED with different address pairs. In any case, could you please share a pcap trace for the whole connection with a long handover time? Thanks |
Ok, I wasn't sure if I understood the options correctly and "fullmesh" was just a left-over of some tests. I removed it again, so now I have the following configuration:
From my understanding, the "fullmesh" option didn't make sense anyhow because both paths, while on the same switch, are in different subnets without any routing between these subnets. This also explains IMO the output I see with Anyhow, also with these corrected settings I observe the same behaviour. I attached a Wireshark trace of a "bad handover" case. It is taken from a mirror port of my switch, mirroring both ports connected to the interfaces of the "server" device. You can see that there is a huge gap between packet No. 2087 and packet No. 4044 without any MPTCP communication. Some more information about our system that might be important:
|
Ok, regarding the last bullet point of my previous comment: I also tested with the exact same setup between to other embedded devices that do not have this switch and obtained the same results - so the switch does not seem to have any influence. |
I now also tested the same setup, but defining the "secondary" endpoints link with the 192.168.* addresses as "backup".
The beavior is different in this case. I get this "stuck" connection when removing the 10.100.4.* link every time in this case. Additionally, looks like one or two single packets come through every few seconds before the connection recovers over the backup path after ~10 to 30 seconds. |
Ok, I think I have some clue what is happenening. But because I only want two paths established, As far as I understand I can go with: If I reduce the subflows from 2 to 1 I do cannot observe the issue anymore on my test setup. In the wireshark trace I attached in my previous post you can see that there are TCP SYN retransmissions for a connection 192.158.42.130 <--> 10.100.4.129. This connection is not possible because there is no routing between these networks. but the retransmissions go on for a long time (at least 40 seconds). May it be that this unestablished path disturbs the scheduler or path manager in some way? |
Hi @FrankLorenz, Thank you for your different replies! Just to make sure I understand your issue, is the following correct?
The behaviour of the (default) in-kernel path-manager is described on our website. In short for your case:
Some notes:
Does this help to better understand the PM and fix your issue? |
Hi @matttbe , thanks for the detailed reply.
Yes, correct. Our goal is to have a redundant connection between the devices over two completely independent networks. I removed the second enpoint on client side and this seems to solve the issue. After a connection is established, I now see an "implicit" endpoint: When I now disconnect the primary link, the handover happens in less than a second which is fine. I did read the docs for the Path Manager before doing my tests, but I misinterpreted the statement:
I interpreted this as "It is necessary to set a 'ip mptcp endpoint [...] subflow' on the secondary interface of the client to enable this interface for MPTCP paths/subflows in general". The term "subflow" is IMO a little bit ambigous in the documentation at all, because e.g. on the top picture of the Path Manager doc there is the "primary path" also called "initial subflow", while this "initial subflow" does not count for the limits you set with For me, the issue is solved and IMO you can close the ticket, but it would be great if you could answer me this two questions I did not find reliable information to:
|
The 'subflow' endpoint description might be confusing: is it needed to be specified to create subflows to additional IP announced by the other peer. Mentioning that it is only to create subflows to the other peer's IP address should help clarifying this. Link: multipath-tcp/mptcp_net-next#554 Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Good news! For your case, I guess a feature like #503 (not implemented yet) would be useful: you could easily tell that the client's second address can be used to create additional subflows with additional addresses from the server.
OK, indeed, that's not very clear. This behaviour is explained in more details a bit below. We can always improve the doc. Do you think the following modification would help? multipath-tcp/mptcp.dev#51
OK. Do you think the other modification from multipath-tcp/mptcp.dev#51 can help?
Only one scheduler for the moment. We are working on allow the creation of additional schedulers in BPF (#75), but this is not ready yet. Note that it is technically possible to have a redundant packet schedulers like we had in the old fork, the two implementations will suffer from some protocol limitations that might not make this feature that interesting, see here. But I can understand that MPTCP might be easy to deploy, and the limitation might not be a problem in some cases.
Mmh, currently the in-kernel doesn't explicitly remember the addresses announced by the other peer. If, when announced, the peer cannot do anything with it, it will not be used later on. I guess that's what you have here, right? A userspace daemon could support that. Or the in-kernel PM could be modified to support that (linked to #496) If all the questions have been answered, please close this ticket. |
Hi Matt, My questions are answered and therefore I close this ticket. |
@FrankLorenz commented 44 minutes ago:
This also is similar to my issue I had a year ago (in 2024), when I tried doing minimal MPTCP research for my thesis. IIUC, in an enterprise DCNs it is perfectly fine and expected for the most subnets to have no routes between or to be firewalled each from the other (especially with the Zero Trust security model). So the scenario seems to be a very fair request to me. My support here. :) |
PS, @FrankLorenz: you mentioned |
Pre-requisites
What did you do?
I am currently evaluating MPTCP for redundancy usage. At the moment, we are using the old "out of tree" MPTCP with the redundant scheduler to provide redundant network connectivity via two seperate networks. Because we are now migrating to a newer kernel (6.6) we will need to also migrate to the new in-kernel MPTCP.
I have a simple setup running where two of our devices are connected with two network interfaces to a switch, each interface in a different subnet. I use a simple test application where one device (client) sends a 200 byte packet each 100 ms to the other device and the other device (server) echos it back.
server client
---------------------------------------
10.100.4.129 <-> 10.100.4.130
192.168.42.129 <-> 192.168.42.130
The MPTCP configuration, set with
ip mptcp
is:add_addr_accepted 0 subflows 1
10.100.4.129 id 1 signal dev media1
192.168.42.129 id 2 signal dev media2
*client:
add_addr_accepted 2 subflows 2
192.168.42.130 id 1 subflow fullmesh dev media2
10.100.4.130 id 2 subflow fullmesh dev media1
When starting the application, I can see via
ip mptcp monitor
on the server that the subflow is established:[LISTENER_CREATED] saddr4=0.0.0.0 sport=40675
[ CREATED] token=03d621d1 remid=0 locid=0 saddr4=10.100.4.129 daddr4=10.100.4.130 sport=40675 dport=45436
[ ESTABLISHED] token=03d621d1 remid=0 locid=0 saddr4=10.100.4.129 daddr4=10.100.4.130 sport=40675 dport=45436
[ SF_ESTABLISHED] token=03d621d1 remid=1 locid=2 saddr4=192.168.42.129 daddr4=192.168.42.130 sport=40675 dport=41095 backup=0
What happened?
What I observe is when I disconnect the network cable on the "primary" link on the client (10.100.4.130), it takes a random amount of time until the communication gets up and running on the subflow (192.168.42.130). Sometimes it is quite smooth within less than a second, but I also observed cases where it took nearly 60 seconds until it worked again.
When I re-plug the cable communication continues without any delay.
Is this expected behaviour? My assumption was that the scheduler and path manager will detect the broken connection in a short amount of time (the RTO of a normal TCP connection is around 200 ms according to the "ss" command, but I cannot find these measures for MPTCP connections), so for me this randomly long delays look like some mis-configuration or bug.
What did you expect to have?
My expectation would be to have a more or less seamless handover to the remaining network path in less than a second.
System info: Client
System info: Server
Additional context
No response
The text was updated successfully, but these errors were encountered: