-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] srt-live-transmit Fails to Recover from Malformed NAK Packet #3088
Comments
I was able to fix the issue by adding the following two lines to the updateBrokenConnection();
completeBrokenConnectionDependencies(SRT_ECONNFAIL); // LOCKS! Original Code:Before the modification, the code looked like this: if (!secure)
{
LOGC(inlog.Warn,
log << CONID() << "out-of-band LOSSREPORT received; BUG or ATTACK - last sent %" << m_iSndCurrSeqNo
<< " vs loss %" << wrong_loss);
// this should not happen: attack or bug
m_bBroken = true;
m_iBrokenCounter = 0;
return;
} Updated Code:After applying the fix, the updated code is as follows: if (!secure)
{
LOGC(inlog.Warn,
log << CONID() << "out-of-band LOSSREPORT received; BUG or ATTACK - last sent %" << m_iSndCurrSeqNo
<< " vs loss %" << wrong_loss);
// this should not happen: attack or bug
m_bBroken = true;
m_iBrokenCounter = 0;
updateBrokenConnection();
completeBrokenConnectionDependencies(SRT_ECONNFAIL); // LOCKS!
return;
} Does this seem like a reasonable fix, or could this change introduce any unintended side effects? |
Furthermore, I found similar code in the // Check the validation of the ack
if (CSeqNo::seqcmp(ackdata_seqno, CSeqNo::incseq(m_iSndCurrSeqNo)) > 0)
{
// this should not happen: attack or bug
LOGC(gglog.Error,
log << CONID() << "ATTACK/IPE: incoming ack seq " << ackdata_seqno << " exceeds current "
<< m_iSndCurrSeqNo << " by " << (CSeqNo::seqoff(m_iSndCurrSeqNo, ackdata_seqno) - 1) << "!");
m_bBroken = true;
m_iBrokenCounter = 0;
return;
} I haven’t attempted to reproduce this issue, but I suspect it might be vulnerable to a similar type of malformed packet. |
These two calls are prone to introduce deadlocks, hence one needs to be extremely careful with this. Would you be able to compile it using thread sanitizer and try to reproduce the problem on that fixed version and see what problems it reports? Note that some of the reports will come, but don't worry, many are worked around and or are false-positives. NOTE: |
Here are the results when running with Listener:
Caller:
|
Many thanks! I see no dangers introduced by this fix; we may need to test it also more intensively, but so far it looks good. @maxsharabayko please take a look. |
Update: I have been able to recreate this issue using a malformed ACK. As mentioned in my comment previously.
I updates the proxy to support this (see revision 2): Link to Proxy Script. The same replication steps apply other than changing the startup command of the proxy to: Unfortunately, Applying the same fix as in the NAK case seem to cause a potential deadlock according to the thread sanitizer. ThreadSanitizer output:
I do not know how to continue the investigation. |
Ok, that doesn't sound good. Could you please describe the exact reproduction procedure, using your helper script? (I would also save this script in our repository, if you don't mind - bug detection helper scripts are never too many.) I'll be investigating it myself. |
You are allowed to save the script in the repository. Steps to Reproduce ACK
After starting all programs, wait for a few seconds (as the proxy in this configuration only drops the 500th ACK) and observe that the system breaks. When the proxy drops the ACK it will print to standard out. Running an unpatched version of srt-live-transmitt, the connection will break and the caller is not able to reconnect (as the listener does not know the connection is broken). If one applies the following patch: if (CSeqNo::seqcmp(ackdata_seqno, CSeqNo::incseq(m_iSndCurrSeqNo)) > 0)
{
// this should not happen: attack or bug
LOGC(gglog.Error,
log << CONID() << "ATTACK/IPE: incoming ack seq " << ackdata_seqno << " exceeds current "
<< m_iSndCurrSeqNo << " by " << (CSeqNo::seqoff(m_iSndCurrSeqNo, ackdata_seqno) - 1) << "!");
m_bBroken = true;
m_iBrokenCounter = 0;
updateBrokenConnection();
completeBrokenConnectionDependencies(SRT_ECONNFAIL); // LOCKS!
return;
} Adding the The behavior changes to a deadlock. Please add a comment if anything is unclear or does not work on your machine. |
Both these functions you call here do locking. What you can try is that the lock on That's my first thought, I'll still be trying to repro it myself. |
Ok, after reproduction, I can see something weird here: after the rogue ACK has come, all further ACKs are rejected. The transmission is no longer possible, but the connection doesn't get broken. The fix you proposed simply breaks the connection (the rejection code is wrong, but I'll choose the right one). I'm not sure if that is exctly what should happen, though. I need to dig a bit deeper. |
... and yes, when applying the fix as you have shown, with my correction for the forcefully unlocked mutex, seems to work at least as expected - that is, the connection breaks and gets reconnected. We just need to check several other things, but your proposed fix seems to be the right approach. |
Please check on the attached PR. I have checked it myself for both solving the problem and lack of any disturbing thread sanity reports. |
Issue: srt-live-transmit Fails to Recover from Malformed NAK Packet
Describe the Bug
The
srt-live-transmit
application fails to recover when it receives a malformed NAK (Negative Acknowledgment) packet with the "Up to sequence number" field in the Loss List set to-1
(0xFFFFFFFF). Once this state is triggered, the listener enters an error state where all received bytes are dropped. Additionally, the caller cannot reconnect, even after a restart, unless the listener is manually restarted.Packet Structure:
Below is the structure of the relevant SRT header and Loss List (NAK packet):
Example Packets:
Steps to Reproduce
Use a two-way UDP proxy to drop three consecutive packets and modify the NAK packet on its return path.
UDP proxy script: Link to Proxy Script.
Run the following setup:
Listener (Source):
./srt-live-transmit udp://127.0.0.1:1111 "srt://127.0.0.1:3333?adapter=127.0.0.1&latency=120&maxbw=-1&mode=listener"
Caller (Target):
./srt-live-transmit "srt://127.0.0.1:2222?latency=120&conntimeo=31393&mode=caller" udp://127.0.0.1:4444
UDP Proxy:
UDP Stream Input:
(Optional) UDP Stream Monitor:
Observe logs on the listener when the malformed NAK packet is received.
Listener Logs Example:
The listener reports repeated byte losses and fails to recover:
Expected Behavior
The
srt-live-transmit
listener should handle malformed NAK packets gracefully, recover from the error state, and allow new callers to connect without requiring a restart.System Information
The text was updated successfully, but these errors were encountered: