-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(facade): bound IoLoopV2 dispatch_q_ quota to prevent starvation #7234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2948,6 +2948,8 @@ variant<error_code, Connection::ParserStatus> Connection::IoLoopV2() { | |
| // to global pipeline-backpressure relief notifications. | ||
| util::fb2::detail::Waiter backpressure_waiter{ioevent_cb}; | ||
|
|
||
| const uint32_t async_dispatch_quota = GetFlag(FLAGS_async_dispatch_quota); | ||
|
|
||
| do { | ||
| HandleMigrateRequest(); | ||
|
|
||
|
|
@@ -2992,12 +2994,29 @@ variant<error_code, Connection::ParserStatus> Connection::IoLoopV2() { | |
| phase_ = PROCESS; | ||
| bool reached_capacity = io_buf_.AppendLen() == 0; | ||
|
|
||
| // Temporary: Handle dispatch queue items (Control Path) one by one blocking command execution | ||
| // Handle dispatch queue items (Control Path) with a bounded quota to prevent | ||
| // starvation of the data path: | ||
| // - Under a PubSub flood, dispatch_q_ can accumulate thousands of messages. | ||
| // - Without a quota, the fiber would drain them all before parsing any new data from the | ||
| // socket, starving GET/SET and other pipeline commands. | ||
| // - This mirrors V1's async_dispatch_quota / prefer_pipeline_execution mechanism in AsyncFiber. | ||
| if (!dispatch_q_.empty()) { | ||
| uint32_t dispatched{}; | ||
| bool quota_reached = false; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. another suggestion - consider extracting this inner while loop into a helper function, aka: |
||
|
|
||
| while (!dispatch_q_.empty()) { | ||
| // Quota reached: stop draining the dispatch queue and fall through to the data path. | ||
| if ((async_dispatch_quota > 0) && (dispatched >= async_dispatch_quota)) { | ||
| quota_reached = true; | ||
| LOG_EVERY_T(INFO, 1) << "[" << id_ << "] V2 dispatch_q_ quota reached (" << dispatched | ||
| << "/" << async_dispatch_quota << "), falling through to data path"; | ||
| break; | ||
| } | ||
|
|
||
| auto msg = std::move(dispatch_q_.front()); | ||
| dispatch_q_.pop_front(); | ||
| UpdateDispatchStats(msg, false /* subtract */); | ||
| dispatched++; | ||
|
|
||
| // If a MigrationRequestMessage arrives via the dispatch queue, stop processing | ||
| // and let the loop iterate back to HandleMigrateRequest() at the top. | ||
|
|
@@ -3006,15 +3025,19 @@ variant<error_code, Connection::ParserStatus> Connection::IoLoopV2() { | |
| } | ||
|
|
||
| std::visit(AsyncOperations{reply_builder_.get(), this}, msg.handle); | ||
| } | ||
|
|
||
| // Note: No flush needed here: the `continue` below re-enters the loop, which either | ||
| // hits the data path (ParseLoop flushes via ReplyBatch) or the idle-await block | ||
| // (Flush 1), which always flushes before sleeping. | ||
| } // while | ||
|
|
||
| // TODO: Properly handle backpressure | ||
| GetQueueBackpressure().pubsub_ec.notifyAll(); | ||
| continue; | ||
|
|
||
|
glevkovich marked this conversation as resolved.
|
||
| // Starvation prevention: if quota was hit, fall through to the data path so | ||
| // pipelined commands get a turn. Otherwise `continue` back to the top. | ||
| // | ||
| // No flush here: both paths reach a flush before sleeping via the idle-await block | ||
| // at the top of the loop, allowing PubSub and command replies to be coalesced into | ||
| // one sendmsg syscall. | ||
| if (!quota_reached) { | ||
| continue; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not understand the flow here.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The continue is not just optimization, it is needed for correctness and to keep low latency. I'll explain: Reason 1: When we jump to the top, these are the only areas in the hot-path where we flush and read. (ignore the special case flashes further down). So we read and pull more data: Now, when the loop eventually reaches the data path, it is working with fresh, up-to-date network data. Reason 2: In short: We use continue to guarantee low latency and fresh socket reads when the queue is naturally empty. We only use the fall-through as an emergency case (when quota_reached is true) to force the data path to run so it doesn't get starved by a never-ending flood of PubSub messages.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, then the comment above focuses only on "quota_reached" path, but it's not clear why continue is needed in the first place. I would add additional comment around "continue" to explain why it's needed. Maybe the code structured differently would provide a more natural flow but I might be wrong too and comment at least will close the gap. |
||
| } | ||
| } | ||
|
|
||
| // Handle Parsed Commands Queue (Data Path) | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.