Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept use-candidate for ice-lite #739

Merged
merged 1 commit into from
Oct 31, 2024
Merged

Conversation

boks1971
Copy link
Contributor

@boks1971 boks1971 commented Oct 30, 2024

There could be a mismatch between the two ends in candidate priority
when using peer reflexive. It happens in the following scenario

  1. Client has two srflx candidates.
    a. The first one gets discovered by LiveKit server as prflx.
    b. The second one gets added via ice-trickle first and then
    gets a STUN ping. So, it is srflx remote candidate from
    server's point-of-view.
  2. This leads to a priority issue.
    a. Both candidates have same priority from client's point-of-view
    (both are srflx).
    b. But, from server's point-of-view, the first candidate has
    higher priority (prflx).
  3. The first candidate establishes connectivity and becomes
    the selected pair (client is ICE controlling and server is
    ICE controlled, server is in ICE lite).
  4. libwebrtc does a sort and switch some time later based on RTT.
    As client side has both at same priority, RTT based sorting
    could make the second candidate the preferred one.
    So, the client sends useCandidate=1 for the second candidate.
    pion/ice does not switch because the selected pair is at
    higher priority due to prflx candidate.
  5. STUN pings do not happen and the ICE connection eventually fails.

ICE controlled agent should accept use-candidate unconditionally if
it is an ICE lite agent.
Just in case existing behaviour is needed, it can be configured
using EnableUseCandidateCheckPriority.

NOTE: With aggressive nomination, the selected pair could change
a few times, but should eventually settle on what the controlling
side wants.

Copy link

codecov bot commented Oct 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.61%. Comparing base (166b1b7) to head (15ab10a).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #739      +/-   ##
==========================================
+ Coverage   78.53%   78.61%   +0.07%     
==========================================
  Files          41       41              
  Lines        4762     4764       +2     
==========================================
+ Hits         3740     3745       +5     
+ Misses        788      785       -3     
  Partials      234      234              
Flag Coverage Δ
go 78.61% <100.00%> (+0.07%) ⬆️
wasm 22.49% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nils-ohlmeier
Copy link

Interesting scenario.
I assume 1.a gets added because the client uses it's host candidate to send a binding request. And 1.b takes longer because of the additional round trip time for the STUN checks to discovers it's srflx and transmitting it to the server.

The part which concerns me here is the proposed solution: I don't think that the right solution here should be to disable peer reflexive altogether. This will slow down ICE connectivity in certain cases.
In 3. you say that Pion in ICE controlled (because it is using ICE lite). So when the ICE controller sends a USE_CANDIDATE Pion needs to follow that. Priorities should only matter for the order in which candidates get tried. The controlled can not use priorities to ignore a USE_CANDIDATE from the controller.

@boks1971
Copy link
Contributor Author

Interesting scenario. I assume 1.a gets added because the client uses it's host candidate to send a binding request. And 1.b takes longer because of the additional round trip time for the STUN checks to discovers it's srflx and transmitting it to the server.

The part which concerns me here is the proposed solution: I don't think that the right solution here should be to disable peer reflexive altogether. This will slow down ICE connectivity in certain cases. In 3. you say that Pion in ICE controlled (because it is using ICE lite). So when the ICE controller sends a USE_CANDIDATE Pion needs to follow that. Priorities should only matter for the order in which candidates get tried. The controlled can not use priorities to ignore a USE_CANDIDATE from the controller.

Thank you @nils-ohlmeier for the feedback.

That was my original inclination too (accepting USE-CANDIDATE - see a bit of discussion in Pion Slack). But, wanted to do something that did not change existing behaviour. But, after discussion with @Sean-Der in Slack and your feedback here, convinced that it is the right approach. So, I am going to change this PR to accept USE-CANDIDATE (will still have a flag to fall back to older behaviour just in case somebody wants it). Will tag you for review once I finish the changes.

There is this long standing FIrefox issue where it does aggressive nomination even if the peer is ice-lite. I guess in that case the selected pair will change a bunch of times and settle on the final one. Is that correct?

@boks1971 boks1971 changed the title Add option to disble peer reflexive discovery Accept use-candidate Oct 31, 2024
@boks1971
Copy link
Contributor Author

@nils-ohlmeier Please have a look at this PR when you get a chance and let me know if it looks okay.

@cnderrauber
Copy link
Member

I think we'd better to make it as a default behavior for controlled ice-lite agent insead of all controlled agent. The failure case is chrome/firefox does renomiantion/aggressive-nomination when remote peer is ice lite, and there is no problem for full agent. So the current behavior of pion comply with the rfc, we don't need to break it for the bad behavior cause by remote peer.

@cnderrauber
Copy link
Member

cnderrauber commented Oct 31, 2024

Compare priority is ok for both nomiation type in rfc5245:

  • regular: there are only 1 nomiate so it is ok
  • aggressive: ICE always selects the highest-priority nominated candidate pair from the valid list as the one used for media , it doesn't mention only controlled needs to compare priority so I think both ends would do the check.

Checked the libwebrtc code, it also compares the nominated connections from remote side when it picks controlled role, the conditions are complex but contains the priority.

@boks1971 boks1971 changed the title Accept use-candidate Accept use-candidate for ice-lite Oct 31, 2024
@boks1971
Copy link
Contributor Author

@cnderrauber Thank you for the review. I have updated the code to apply it only for ice-lite agent. Can you please re-review when you get a chance?

Copy link
Member

@cnderrauber cnderrauber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

selection.go Outdated
@@ -241,7 +241,7 @@ func (s *controlledSelector) HandleSuccessResponse(m *stun.Message, local, remot
s.log.Tracef("Found valid candidate pair: %s", p)
if p.nominateOnBindingSuccess {
if selectedPair := s.agent.getSelectedPair(); selectedPair == nil ||
(selectedPair != p && selectedPair.priority() <= p.priority()) {
(selectedPair != p && ((!s.agent.enableUseCandidateCheckPriority && s.agent.lite) || selectedPair.priority() <= p.priority())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about make it a agent method needsToCheckPriorityOnNominated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call @cnderrauber . Updated with a helper function.

There could be a mismatch between the two ends in candidate priority
when using peer reflexive. It happens in the following scenario

1. Client has two srflx candidates.
   a. The first one gets discovered by LiveKit server as prflx.
   b. The second one gets added via ice-trickle first and then
      gets a STUN ping. So, it is srflx remote candidate from
      server's point-of-view.
2. This leads to a priority issue.
   a. Both candidates have same priority from client's point-of-view
      (both are srflx).
   b. But, from server's point-of-view, the first candidate has
      higher priority (prflx).
3. The first candidate establishes connectivity and becomes
   the selected pair (client is ICE controlling and server is
   ICE controlled, server is in ICE lite).
4. libwebrtc does a sort and switch some time later based on RTT.
   As client side has both at same priority, RTT based sorting
   could make the second candidate the preferred one.
   So, the client sends useCandidate=1 for the second candidate.
   pion/ice does not switch because the selected pair is at
   higher priority due to prflx candidate.
5. STUN pings do not happen and the ICE connection eventually fails.

ICE controlled agent should accept use-candidate unconditionally if
it is an ICE lite agentt.
Just in case existing behaviour is needed, it can be configured
using `EnableUseCandidateCheckPriority`.

NOTE: With aggressive nomination, the selected pair could change
a few times, but should eventually settle on what the controlling
side wants.
@boks1971
Copy link
Contributor Author

@Sean-Der @davidzhao @nils-ohlmeier going to merge this based on discussions in Slack/this PR and @cnderrauber 's review. Want to port this to v2 and start using it in LiveKit project. But, please do review when you get a chance and let me know if you have feedback and I will address it in a subsequent PR.

@boks1971 boks1971 merged commit 9407bb0 into master Oct 31, 2024
15 checks passed
@boks1971 boks1971 deleted the raja_disable_peer_reflexive branch October 31, 2024 05:35
boks1971 added a commit that referenced this pull request Oct 31, 2024
There could be a mismatch between the two ends in candidate priority
when using peer reflexive. It happens in the following scenario

1. Client has two srflx candidates.
   a. The first one gets discovered by LiveKit server as prflx.
   b. The second one gets added via ice-trickle first and then
      gets a STUN ping. So, it is srflx remote candidate from
      server's point-of-view.
2. This leads to a priority issue.
   a. Both candidates have same priority from client's point-of-view
      (both are srflx).
   b. But, from server's point-of-view, the first candidate has
      higher priority (prflx).
3. The first candidate establishes connectivity and becomes
   the selected pair (client is ICE controlling and server is
   ICE controlled, server is in ICE lite).
4. libwebrtc does a sort and switch some time later based on RTT.
   As client side has both at same priority, RTT based sorting
   could make the second candidate the preferred one.
   So, the client sends useCandidate=1 for the second candidate.
   pion/ice does not switch because the selected pair is at
   higher priority due to prflx candidate.
5. STUN pings do not happen and the ICE connection eventually fails.

ICE controlled agent should accept use-candidate unconditionally if
it is an ICE lite agentt.
Just in case existing behaviour is needed, it can be configured
using `EnableUseCandidateCheckPriority`.

NOTE: With aggressive nomination, the selected pair could change
a few times, but should eventually settle on what the controlling
side wants.
@Sean-Der
Copy link
Member

@boks1971 sounds good to me!

What do you need to move to /v3? If I port livekit/livekit would that be a good start?

@boks1971
Copy link
Contributor Author

Nothing specifically needed @Sean-Der as the default behaviour is the desired behaviour. I cherry-picked the change to v2 and tagged it v2.3.37 and have updated livekit dependencies to use that tag.

@boks1971
Copy link
Contributor Author

I was debating adding a settings engine setting to enable the priority check (the flag added in this PR) and thread it through. But, as we discussed the default behaviour added in this PR looks like the right one and the flag is just a safety net. So, decided to wait to see if there are any ill effects before adding another things to SettingsEngine.

@nils-ohlmeier
Copy link

There is this long standing FIrefox issue where it does aggressive nomination even if the peer is ice-lite. I guess in that case the selected pair will change a bunch of times and settle on the final one. Is that correct?

Yeah that one sucks. The problem is that Firefox uses its own ICE stack written in C with lots of pointer magic. Most of the developers don't want to touch it with a ten foot pole. The team is aware of the problem. But my guess is that it is rather unlikely this will get fixed anytime soon.

The best known workaround is to implement on the ice-lite server side to use the candidate pair where it receives traffic from. I'm not sure if a workaround like that should go into Pion to be more compatible with Firefox. @Sean-Der do you have thoughts on this?

@boks1971
Copy link
Contributor Author

boks1971 commented Nov 1, 2024

I did that approach in a previous life and it worked well. But, that was a few years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants