If the number of handlers is large, split Conference.sendOut on a task queue. #1062

JonathanLennox · 2020-01-24T21:24:29Z

In the spirit of #1058, this is a quick-and-dirty hack to test out splitting Conference.sendOut among multiple tasks on the work queue. Should this perform better, we can look at a bigger refactor to achieve this. DO NOT MERGE.

gpolitis · 2020-01-27T16:25:07Z

src/main/java/org/jitsi/videobridge/Conference.java

+    /** The maximum number of packet handlers we want to execute in one task.
+     *   TODO: this number is pulled out of the air; tune it.
+     */
+    public static final int MAX_HANDLERS_PER_TASK = 5;


I suspect this may depend on the number of cores a machine has. Shall we make it configurable?

Possibly; this was just a quick and dirty check to see how much difference this made.

We already populate the CPU_POOL based on the number of cores, so if anything maybe we'd want to base this on the size of the cpu pool, so we only make this decision in one place.

Looking at YourKit, AbstractExectorService.submit seems to spend a fair amount of its time in LinkedBlockingQueue.offer which has to acquire a lock, so we may well want more tasks than cores here.

Hm, we probably just shouldn't be using offer there.

That's inside the Java implementation -- and never mind, I was looking at putting things on the executor queue, not the execution of the queues.

Oh, I see what you mean now. 👍

bbaldino · 2020-01-27T18:11:19Z

src/main/java/org/jitsi/videobridge/Endpoint.java

@@ -386,7 +386,7 @@ public double getRtt()
     * {@inheritDoc}
     */
    @Override
-    public boolean wants(PacketInfo packetInfo)
+    public synchronized boolean wants(PacketInfo packetInfo)


Why does this need to be synchronized now? We're only parallelizing the send out of a single packet at a time, right?

Since the receive pipeline farms out the parallelized sendOut calls and then returns, without waiting for the executor tasks, there's nothing stopping the next packet's receive pipeline's executor tasks from starting while the previous one's are still running.

Hm, I think this could result in reordering the packets: I don't think we'd guarantee that the lock on Endpoint A will be taken when processing packet "1" before a thread handling packet "2" for Endpoint A is scheduled.

That's true. It's not the end of the world if it happens; worst case the destination sends a spurious NACK.

Alternately we could synchronize the sender to wait for all its tasks.

Yeah, I'm thinking something more like the latter as maintaining packet order within our own pipeline would be desirable, I think. Depending on how fancy we want/need to get, we could also look at mailboxes per source endpoint for the packets, and we only process one packet from one endpoint at a time (but could still parallelize across packets for multiple endpoints).

Unfortunately I discovered that synchronizing to wait for the tasks won't work, because they're executed on the same task pool as the rtp receiver, and an earlier task waiting for a later task is a deadlock.

I don't quite follow your description of the fancier methods, but they potentially sound promising if this turns out to be needed.

My thought was that we could do something like:

Have a Map<String, PacketInfoQueue>, where the key is an endpoint ID in the conference.

Receiver thread drops the 'fully processed' incoming packet from its endpoint into the appropriate queue.

Now, this functions the same way our other queues do (we process the packets within a single queue serially), we just have N of them. Basically mirror the input queue we have for each endpoint already. This way we can parallelize the work across endpoints, but not within a single endpoint.

Ah, I see.

The disadvantage (over what I have now) is that the packet cloning optimization is weaker -- right now as long as at least one of the last batch of packet handlers wants the packet, we get the optimization, whereas with this the very last one would need to.

If we do have these queues, I'm not sure there's an architectural difference between them and the RTP sender queues. Is there any way we could just inject these as a leading node (or other process) on the RTP sender rather than having a separate, third set of queues?

The disadvantage (over what I have now) is that the packet cloning optimization is weaker -- right now as long as at least one of the last batch of packet handlers wants the packet, we get the optimization, whereas with this the very last one would need to.

Hmm, maybe I'm missing something here. The optimization should be the same, as we're still looking at every potential "sender" for a given ingress packet, so we can know how many. We just don't do anything with the next packet for that same "receiver" until we're done with the previous one.

As long as there's more than one reference outstanding in the PacketInfoDistributor, it has to clone, rather than hand out its reference. It's only on the very last reference that it can optimize by handing out a non-cloned buffer. If the last reference turns out not to want it, we lose the optimization.

If every projection processing runs in parallel, we have to have a reference for every sender, rather than a reference for every batch of senders.

…k queue.

Use loops, rather than stream API, because the former became too hard to understand...

gpolitis reviewed Jan 27, 2020

View reviewed changes

bbaldino reviewed Jan 27, 2020

View reviewed changes

JonathanLennox mentioned this pull request Jan 27, 2020

Node cleanup jitsi/jitsi-media-transform#194

Merged

JonathanLennox added 11 commits January 30, 2020 12:03

If the number of handlers is large, split Conference.sendOut on a tas…

5f5a9ba

…k queue.

Bug fixes to packet distribution.

4275c68

Changes for clarity, should be no behavior change.

4d0cd2a

Bookkeeping for PacketInfoDistributor.

af1b6da

Synchronize Endpoint and Tentacle wants/send.

6a96189

Fix use of last ref to packet buffer as preview after it's given out.

b50e55a

Turn off packet info distributor bookkeeping logs.

0abeee2

Don't create a logger for each PacketInfoDistributor.

b51be62

Optimize small-conference codepath.

b9646b7

Optimize how tasks are split.

7422b4b

Use loops, rather than stream API, because the former became too hard to understand...

Rework how PacketInfoDistributor works.

4661414

JonathanLennox force-pushed the distribute-Conference-sendOut branch from a18f5c6 to 4661414 Compare January 30, 2020 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If the number of handlers is large, split Conference.sendOut on a task queue. #1062

If the number of handlers is large, split Conference.sendOut on a task queue. #1062

JonathanLennox commented Jan 24, 2020

gpolitis Jan 27, 2020

JonathanLennox Jan 27, 2020

bbaldino Jan 27, 2020

JonathanLennox Jan 27, 2020

bbaldino Jan 27, 2020

JonathanLennox Jan 27, 2020

bbaldino Jan 27, 2020

bbaldino Jan 27, 2020

JonathanLennox Jan 27, 2020

bbaldino Jan 27, 2020

JonathanLennox Jan 27, 2020

bbaldino Jan 27, 2020

JonathanLennox Jan 29, 2020

bbaldino Jan 29, 2020

JonathanLennox Jan 29, 2020

bbaldino Jan 29, 2020

JonathanLennox Jan 29, 2020

If the number of handlers is large, split Conference.sendOut on a task queue. #1062

Are you sure you want to change the base?

If the number of handlers is large, split Conference.sendOut on a task queue. #1062

Conversation

JonathanLennox commented Jan 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment