Buffered protocol changes #24839

mmaslankaprv · 2025-01-16T15:50:07Z

When the request is buffered in the buffered protocol queue we do not
want to account that time for the overall timeout

Backports Required

Release Notes

none

micheleRP

looks good from docs

vbotbuildovich · 2025-01-16T19:17:20Z

CI test results

test results on build#60851

test_id	test_kind	job_url	test_status	passed
idempotency_tests_rpunit.idempotency_tests_rpunit	unit	https://buildkite.com/redpanda/redpanda/builds/60851#01946fd0-d399-4794-95da-0a1193035581	FLAKY	1/2
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_higher_level_migration_api	ducktape	https://buildkite.com/redpanda/redpanda/builds/60851#01947013-69c9-49a2-811b-31318b51ec2d	FLAKY	5/6

dotnwat

When the request is buffered in the buffered protocol queue we do not
want to account that time for the overall timeout

why?

mmaslankaprv · 2025-01-20T06:37:50Z

When the request is buffered in the buffered protocol queue we do not
want to account that time for the overall timeout

why?

It does more harm than good, in a scenario with overloaded cluster the buffered requests are immediately timing out when sent through RPC layer, this makes the cluster unstable as leaders have more work and followers are unable to receive requests.

dotnwat · 2025-01-20T18:37:19Z

It does more harm than good, in a scenario with overloaded cluster the buffered requests are immediately timing out when sent through RPC layer, this makes the cluster unstable as leaders have more work and followers are unable to receive requests.

should the timeout be longer then? i mean generally it sounds fragile to just ignore real time that a request is alive, if it is a buffer or not. but maybe i don't fully understand the issue.

src/v/raft/buffered_protocol.cc

bashtanov

Otherwise LGTM

When the request is buffered in the buffered protocol queue we do not want to account that time for the overall timeout Signed-off-by: Michał Maślanka <[email protected]>

The default buffer size of 5 MiB made the buffer to grow very large. Changed the default to minimize the buffering impact on the producer latency in saturated clusters. Signed-off-by: Michał Maślanka <[email protected]>

mmaslankaprv · 2025-01-23T06:53:59Z

It does more harm than good, in a scenario with overloaded cluster the buffered requests are immediately timing out when sent through RPC layer, this makes the cluster unstable as leaders have more work and followers are unable to receive requests.

should the timeout be longer then? i mean generally it sounds fragile to just ignore real time that a request is alive, if it is a buffer or not. but maybe i don't fully understand the issue.

It indeed is a buffer, i guess i can adjust the timeout at the caller, what i observed is that this is very fragile. The timeout is there to bound the network latency. This is why is decided to adjust the timeout. There are also situations in which the dispatch loop is sending a requests that has 5 ms left to be timed out. This case is the worst as the request will error out at the requester but it will be sent, this will force the leader to resend the message.

dotnwat

The offline discussion I had with Michal made it clearer what this change was doing. The timeout in question is not something chosen as part of initiating a raft replicate request, in which case yes, the buffering time should be accounted for in processing. Rather, this timeout is intended to detect problems with followers or network partitions etc... so it makes sense in this case that buffering time should be used as part of this particular timeout use case.

mmaslankaprv requested a review from a team as a code owner January 16, 2025 15:50

github-actions bot added the area/redpanda label Jan 16, 2025

micheleRP previously approved these changes Jan 16, 2025

View reviewed changes

dotnwat reviewed Jan 17, 2025

View reviewed changes

mmaslankaprv requested review from bharathv, bashtanov and ztlpn January 20, 2025 06:38

bashtanov reviewed Jan 22, 2025

View reviewed changes

src/v/raft/buffered_protocol.cc Outdated Show resolved Hide resolved

bashtanov reviewed Jan 22, 2025

View reviewed changes

mmaslankaprv added 2 commits January 22, 2025 10:08

r/buffered_protocol: do not account time spent in queue for timeout

d00a5b6

When the request is buffered in the buffered protocol queue we do not want to account that time for the overall timeout Signed-off-by: Michał Maślanka <[email protected]>

config: changed the default buffered bytes for buffered protocol

7cc196d

The default buffer size of 5 MiB made the buffer to grow very large. Changed the default to minimize the buffering impact on the producer latency in saturated clusters. Signed-off-by: Michał Maślanka <[email protected]>

mmaslankaprv dismissed micheleRP’s stale review via 7cc196d January 22, 2025 09:08

mmaslankaprv force-pushed the buffered-protocol-changes branch from 9c2b2d4 to 7cc196d Compare January 22, 2025 09:08

mmaslankaprv requested review from bashtanov and dotnwat January 22, 2025 09:12

bashtanov approved these changes Jan 22, 2025

View reviewed changes

mmaslankaprv merged commit 0b1b5a4 into redpanda-data:dev Jan 23, 2025
17 checks passed

mmaslankaprv deleted the buffered-protocol-changes branch January 23, 2025 20:11

dotnwat reviewed Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffered protocol changes #24839

Buffered protocol changes #24839

mmaslankaprv commented Jan 16, 2025 •

edited

Loading

micheleRP left a comment

vbotbuildovich commented Jan 16, 2025

dotnwat left a comment

mmaslankaprv commented Jan 20, 2025

dotnwat commented Jan 20, 2025

bashtanov left a comment

mmaslankaprv commented Jan 23, 2025

dotnwat left a comment

Buffered protocol changes #24839

Buffered protocol changes #24839

Conversation

mmaslankaprv commented Jan 16, 2025 • edited Loading

Backports Required

Release Notes

micheleRP left a comment

Choose a reason for hiding this comment

vbotbuildovich commented Jan 16, 2025

CI test results

dotnwat left a comment

Choose a reason for hiding this comment

mmaslankaprv commented Jan 20, 2025

dotnwat commented Jan 20, 2025

bashtanov left a comment

Choose a reason for hiding this comment

mmaslankaprv commented Jan 23, 2025

dotnwat left a comment

Choose a reason for hiding this comment

mmaslankaprv commented Jan 16, 2025 •

edited

Loading