feat(facade): bound IoLoopV2 dispatch_q_ quota to prevent starvation by glevkovich · Pull Request #7234 · dragonflydb/dragonfly

glevkovich · 2026-04-28T10:47:23Z

Previously, IoLoopV2 drained dispatch_q_ with an unbounded while loop. Under a PubSub flood, this trapped the fiber in the control path, starving pipelined commands (e.g GET/SET) and causing client timeouts.

Key changes:

Bounded dispatch: process at most FLAGS_async_dispatch_quota messages per iteration; if the quota is hit, fall through to the data path so pipeline commands get a turn. Mirrors V1's async_dispatch_quota / prefer_pipeline_execution mechanism in AsyncFiber.
Deferred flush: the quota-hit path falls through to ParseLoop, which reaches the idle-await flush, coalescing PubSub and command replies into a single sendmsg syscall.
Batched backpressure: pubsub_ec.notifyAll() is now called once per quota chunk instead of once per message.
Testing: parameterized test_pubsub_pipeline_starvation for both V1 and V2 to prevent regressions.

augmentcode · 2026-04-28T10:52:24Z

🤖 Augment PR Summary

Summary: This PR prevents PubSub/control-path floods from starving pipelined command execution in IoLoopV2.

Changes:

Introduce a per-iteration dispatch quota (FLAGS_async_dispatch_quota) when draining dispatch_q_
When the quota is reached, fall through to the data path to parse/execute pipelined commands
Batch PubSub backpressure notifications (pubsub_ec.notifyAll()) once per processed chunk
Adjust existing tests to run against both V1 and V2 I/O loops where applicable
Add a V2 regression test ensuring conditional flush does not stall replies on fragmented pipelines

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

Copilot

Pull request overview

Adds fairness to the V2 connection I/O loop by bounding how many control-path (dispatch queue) messages are processed per iteration, preventing PubSub floods from starving pipelined command execution; expands Python integration tests to run key cases against both IoLoop V1 and V2.

Changes:

Bound IoLoopV2 dispatch-queue draining using FLAGS_async_dispatch_quota, falling through to the data path when the quota is hit.
Update/parameterize existing connection tests to run with experimental_io_loop_v2 enabled/disabled.
Add a V2-focused regression test to ensure conditional flushing does not stall replies.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
tests/dragonfly/connection_test.py	Parameterizes tests across V1/V2, adjusts reply-count expectations, and adds a V2 conditional-flush regression test.
src/facade/dragonfly_connection.cc	Implements quota-bounded dispatch queue draining in `IoLoopV2` to prevent pipeline starvation under heavy control-path load.

Previously, IoLoopV2 drained dispatch_q_ with an unbounded while loop. Under a PubSub flood, this trapped the fiber in the control path, starving pipelined commands (GET/SET) and causing client timeouts. Key changes: - Bounded dispatch: process at most FLAGS_async_dispatch_quota messages per iteration; if the quota is hit, fall through to the data path so pipeline commands get a turn. Mirrors V1's async_dispatch_quota / prefer_pipeline_execution mechanism in AsyncFiber. - Deferred flush: the quota-hit path falls through to ParseLoop, which reaches the idle-await flush, coalescing PubSub and command replies into a single sendmsg syscall. - Batched backpressure: pubsub_ec.notifyAll() is now called once per quota chunk instead of once per message. - Testing: parameterized test_pubsub_pipeline_starvation for both V1 and V2 to prevent regressions. Signed-off-by: Gil Levkovich <69595609+glevkovich@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

romange · 2026-04-28T14:29:16Z

+      // at the top of the loop, allowing PubSub and command replies to be coalesced into
+      // one sendmsg syscall.
+      if (!quota_reached) {
+        continue;


I do not understand the flow here.
if quota was not reached why go back and not fall through? I see continue existed before but I still do not understand what it does.

The continue is not just optimization, it is needed for correctness and to keep low latency. I'll explain:

Reason 1:
When we are done processing here the dispatch_q, it took time, in some cases even long time, and during that time we might have got new client messages waiting in the socket (e.g GET/SET command). Sicne we havn't called ReadPendingInput (worse-case scenario), the io_buf_ is totally empty. If we just fall through to the data path, it checks io_buf_.InputLen() and thinks that there is no data from the client and skips parsing entirely (It makes a decision based on wrong info).

When we jump to the top, these are the only areas in the hot-path where we flush and read. (ignore the special case flashes further down). So we read and pull more data:

if (pending_input_) { ReadPendingInput(); }

Now, when the loop eventually reaches the data path, it is working with fresh, up-to-date network data.

Reason 2:
When we process PubSub messages, their replies accumulate in reply_builder_. Flush() only happens at the top of the loop (in the idle-await block). If we fall through, we traverse the entire data path section doing nothing useful, then loop back to the top to flush. The continue skips that dead code and reaches the flush immediately - one fewer loop iteration, lower latency.

In short: We use continue to guarantee low latency and fresh socket reads when the queue is naturally empty. We only use the fall-through as an emergency case (when quota_reached is true) to force the data path to run so it doesn't get starved by a never-ending flood of PubSub messages.

Ok, then the comment above focuses only on "quota_reached" path, but it's not clear why continue is needed in the first place. I would add additional comment around "continue" to explain why it's needed. Maybe the code structured differently would provide a more natural flow but I might be wrong too and comment at least will close the gap.

romange · 2026-04-30T05:06:07Z

+    // - This mirrors V1's async_dispatch_quota / prefer_pipeline_execution mechanism in AsyncFiber.
    if (!dispatch_q_.empty()) {
+      uint32_t dispatched{};
+      bool quota_reached = false;


another suggestion - consider extracting this inner while loop into a helper function, aka:
bool quota_reached = ProcessControlCommands(async_dispatch_quota);

Copilot AI review requested due to automatic review settings April 28, 2026 10:47

Copilot started reviewing on behalf of glevkovich April 28, 2026 10:47 View session

augmentcode Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread src/facade/dragonfly_connection.cc Outdated

Comment thread tests/dragonfly/connection_test.py Outdated

Comment thread tests/dragonfly/connection_test.py Outdated

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Comment thread tests/dragonfly/connection_test.py Outdated

Comment thread src/facade/dragonfly_connection.cc

glevkovich force-pushed the glevkovich/dispatch_q_starvation_prevention branch from 229e939 to 5a48ddb Compare April 28, 2026 11:24

glevkovich requested review from Copilot, dranikpg, kostasrim and romange April 28, 2026 11:25

Copilot started reviewing on behalf of glevkovich April 28, 2026 11:46 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

romange reviewed Apr 28, 2026

View reviewed changes

romange reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(facade): bound IoLoopV2 dispatch_q_ quota to prevent starvation#7234

feat(facade): bound IoLoopV2 dispatch_q_ quota to prevent starvation#7234
glevkovich wants to merge 1 commit intomainfrom
glevkovich/dispatch_q_starvation_prevention

glevkovich commented Apr 28, 2026

Uh oh!

augmentcode Bot commented Apr 28, 2026

Uh oh!

augmentcode Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

romange Apr 28, 2026

Uh oh!

glevkovich Apr 29, 2026 •

edited

Loading

Uh oh!

romange Apr 30, 2026 •

edited

Loading

Uh oh!

romange Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

glevkovich commented Apr 28, 2026

Uh oh!

augmentcode Bot commented Apr 28, 2026

Uh oh!

augmentcode Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

romange Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

glevkovich Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romange Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romange Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

glevkovich Apr 29, 2026 •

edited

Loading

romange Apr 30, 2026 •

edited

Loading