Simplify Opus RTP timestamp mangling #1563
Open
+32
−250
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi @dgreif,
Here are some minor enhancements that improve Opus audio to the point where I believe it is nearly on-par with the native Ring app. These were developed with some diving into the HAP Spec and RFCs and, in my testing with these changes, the audio is nearly as good as the Ring app. I've tested with iPad Pro, iPhone, and Apple Watch, both local and remote, and it seems to work well, although it is not as tolerant of packet loss/poor network conditions as native Ring app. I believe this gap could be narrowed by implementing HAP spec compliant RTCP sender reports, but I don't think the difference is that significant, however, I may try to add this (Scrypted does have an implementation).
These are the major changes (mostly here for reference so I don't forget why I did stuff). While it sounds like a lot, they actual changes are pretty minor overall, but make the RTP streams much more compliant with the defined specs/RFCs and is overall simpler to follow:
Remove OpusRepacketizer
From what I can tell, this isn't needed at all. I think it was born from the idea that most Opus encoders produce 20ms frames by default, while Homekit may requests 20, 40 or 60ms frames (only 20 and 60 seem to be observed in practice). The general idea of OpusRepacketizer seems to be that it could combine 3x 20ms frames into a single RTP packet to create the 60ms frame requested by Homekit.
Combining frames this way is supported by RFC 7587, however, this isn't required for our case because we specifically request FFmpeg to produce Opus frames of the desired duration and actually, when 60ms frames are requested, the current code ends up combining these into 180ms frames, which are not even spec compliant because the RFC defines the maximum duration as 120ms, so a maximum of 2x 60ms frames are allowed.
As far as I can tell, Homekit works best when you simply deliver a single frame of the requested size per RTP packet, so, since FFmpeg already does this for us, there's never a reason to repacketize this.
Optimized Timestamp Mangling
While we don't need to repacketize the frames from FFmpeg, we do need to replace the RTP timestamps since Homekit requires RFC 3550 timestamps instead of the standard timestamps defined in RFC 7587. RFC 3550 is pretty straightforward, timestamps increment at a constant rate of sampleRate * duration, so a sampleRate of 48000 with a duration of 20ms, would be 48000 * .020 = 960. The updated code simple calculates this constant value and increments the timestamp for each packet vs performing multiplication on a packet count each time. It's not a huge performance difference, but the logic is much more clear and similar to other RFC 3550 timestamp implementations.
Transcode return audio using the Ring optimized values
Previously, return audio was transcoded using the Homekit maximum 24Khz sample rate and the default FFmpeg frame duration (20ms), but it makes more sense to transcode the audio to Ring optimized values, which are 48Khz, 2ch, 60ms frames. This reduces packet count by 3x and improves the quality of return audio quite a bit.