Various changes of RTP timeout handling to handle timeouts on single streams #1403

arnd-s · 2021-11-22T08:29:39Z

introduced timeout-mode parameter to control mode of operation off/any/all during session
off - don't monitor timeouts / turn monitoring off
any - react if a timeout occur on any RTP stream
all - react if timeout occur on all streams (old behavior)
Don't monitor RTCP streams for timeout (to prevent false alarms if RTP timeout is very low)

This patch introduces primary the possibility to react on RTP timeouts of a single stream.

During call establishment / early-media, there is often only one stream direction active. To prevent false alarms
for this case, the timeout-mode parameter for the call-interface was added. The logic is, that a call starts with
a disabled timeout monitoring and when the session is established in both directions (for example 200OK Reply for
an answer) the timeout will be activated.

When activated, timeout_activated element of struct call will be set to earliest possible timeout occurence
(activation time + timeout time). This should prevent false alarms immediatliy after session establishment

There is still an unresolved problem at the moment. Even timeout_activated is used, it sometimes happens
that on first check a timeout is detected. This occurs even, when timeout value (and timeout_activated
in succession) is increased.
To circumvent this, missed_packet_counter of struct packet_stream is used, so that the checks fails on the third
missed package in a row.

Signed-off-by: Arnd Schmitter [email protected]

- introduced timeout-mode parameter to control mode of operation off/any/all during session off - don't monitor timeouts / turn monitoring off any - react if a timeout occur on any RTP stream all - react if timeout occur on all streams (old behavior) - Don't monitor RTCP streams for timeout (to prevent false alarms if RTP timeout is very low) This patch introduces primary the possibility to react on RTP timeouts of a single stream. During call establishment / early-media, there is often only one stream direction active. To prevent false alarms for this case, the timeout-mode parameter for the call-interface was added. The logic is, that a call starts with a disabled timeout monitoring and when the session is established in both directions (for example 200OK Reply for an answer) the timeout will be activated. When activated, timeout_activated element of struct call will be set to earliest possible timeout occurence (activation time + timeout time). This should prevent false alarms immediatliy after session establishment There is still an unresolved problem at the moment. Even timeout_activated is used, it sometimes happens that on first check a timeout is detected. This occurs even, when timeout value (and timeout_activated in succession) is increased. To circumvent this, missed_packet_counter of struct packet_stream is used, so that the checks fails on the third missed package in a row. Signed-off-by: Arnd Schmitter <[email protected]>

rfuchs · 2021-11-23T14:52:50Z

I'm not sure about your exact use case, but I'm wondering if this can be solved in a more automatic way? (Not opposed to having this as a settable value, but we could do both...)

Since rtpengine has a vague concept of which sides/parties to expect RTP from, and so I'm wondering if the timeout check can simply be made conditional on that? (There's some exceptions here, e.g. there's no distinction between an answer from a 18x and an answer from a 200, so that could be something else that could be added new)

There is still an unresolved problem at the moment. Even timeout_activated is used, it sometimes happens that on first check a timeout is detected. This occurs even, when timeout value (and timeout_activated in succession) is increased. To circumvent this, missed_packet_counter of struct packet_stream is used, so that the checks fails on the third missed package in a row.

I did come across this issue in the past and thought that resetting the last_packet timestamp to the current time during a signalling event could solve this. Or perhaps taking the last_signal timestamp into account somehow. What do you think?

rfuchs · 2021-11-23T14:26:54Z

daemon/call.c

@@ -206,6 +213,16 @@ static void call_timer_iterator(struct call *c, struct iterator_helper *hlp) {

 		/* valid stream */

+		// ignore RTCP Streams
+		if (PS_ISSET(ps, RTCP))


I think it might be more appropriate to use if (!PS_ISSET(ps, RTP)) here? (Streams can be both RTP and RTCP in case of RTCP-mux)

Yes, this sounds better. I hadn't RTCP-mux in mind, because it's not used in our case.

rfuchs · 2021-11-23T14:29:06Z

daemon/call.c

@@ -4022,6 +4047,7 @@ int call_delete_branch(const str *callid, const str *branch,
 				"(via-branch '" STR_FORMAT_M "') in %d seconds",
 				STR_FMT_M(&ml->tag), STR_FMT0_M(branch), delete_delay);
 		ml->deleted = rtpe_now.tv_sec + delete_delay;
+		ml->mark_deleted = 1;


I'm not 100% sure, but could ml->deleted != 0 be used as a test for this instead of adding a new flag?

As far as i remember i first tried it this way but there was an issue with it but i currently don't know what it was.
I need to make some tests and see, if i can reproduce the issues i had.

arnd-s · 2021-11-23T15:11:27Z

Since rtpengine has a vague concept of which sides/parties to expect RTP from, and so I'm wondering if the timeout check can simply be made conditional on that? (There's some exceptions here, e.g. there's no distinction between an answer from a 18x and an answer from a 200, so that could be something else that could be added new)

In our use case, UAC often don't send any RTP traffic until the receive of a 200 answer. So in case of a sdp-reply via 18x a timeout will occur.
It's also in my experience feasible to have more control in case of call forking scenarios with multiple early media streams or to disable timeouts during call hold.

There is still an unresolved problem at the moment. Even timeout_activated is used, it sometimes happens that on first check a timeout is detected. This occurs even, when timeout value (and timeout_activated in succession) is increased. To circumvent this, missed_packet_counter of struct packet_stream is used, so that the checks fails on the third missed package in a row.

I did come across this issue in the past and thought that resetting the last_packet timestamp to the current time during a signalling event could solve this. Or perhaps taking the last_signal timestamp into account somehow. What do you think?

I'll look into it and try if i find a better solution.

Please be aware, that this PR should in my opinion get extensive testing from users with other use cases before merging. Especially with forking scenarios or more than one stream in each direction. I also don't tested it in webrtc scenarios.

arnd-s mentioned this pull request Nov 22, 2021

RTP Timeout per caller/callee stream #1402

Closed

rfuchs reviewed Nov 23, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various changes of RTP timeout handling to handle timeouts on single streams #1403

Various changes of RTP timeout handling to handle timeouts on single streams #1403

arnd-s commented Nov 22, 2021

rfuchs commented Nov 23, 2021

rfuchs Nov 23, 2021

arnd-s Nov 23, 2021

rfuchs Nov 23, 2021

arnd-s Nov 23, 2021

arnd-s commented Nov 23, 2021

Various changes of RTP timeout handling to handle timeouts on single streams #1403

Are you sure you want to change the base?

Various changes of RTP timeout handling to handle timeouts on single streams #1403

Conversation

arnd-s commented Nov 22, 2021

rfuchs commented Nov 23, 2021

rfuchs Nov 23, 2021

Choose a reason for hiding this comment

arnd-s Nov 23, 2021

Choose a reason for hiding this comment

rfuchs Nov 23, 2021

Choose a reason for hiding this comment

arnd-s Nov 23, 2021

Choose a reason for hiding this comment

arnd-s commented Nov 23, 2021