Naïve question about performance bottlenecks #3160
Replies: 1 comment 6 replies
-
hi @peterbourgon, good question! For context, remember that what we are trying to do is not be a Kafka++ but rather evolve the conversation around streaming like embedding computational engines (V8, Wasmer, WaVM, etc) into redpanda, for things like customizable compaction strategies, partition placement strategies, etc, etc. Anything that you think should have an API, at some point will. For raft, there area a few networking bottlenecks (heartbeats+data rpc) - for heartbeats we built a custom "lossy" compressor to reduce the heartbeat size on the wire. The second one is data. Turns out that this is not as trivial as saying the network is the bottleneck (which it can be), but more nuanced. Let's take the i3en.12xlarge(50Gbps) vs the ie3n.6xlarge (25gbps) instances. The former is IOPS-disk+CPU(compression) bound the latter is network bound. Assuming that we can go as fast as FIO - say 1.1GB/s on an xfs raid0 (software), ata the 12xlarge you really have a lot of network wiggle room (which is quickly consumed by other things like tiered storage, etc), but for the raft part in specific it ultimately depends on the hardware that is running on. The bottleneck will shift from subsystem to subsystem. The TpC design is all about giving us tools to saturate the underlying devices by ultimately reducing coordination. Reducing coordination is not just at the filesystem level, or at the cpu level, but we really spend a lot of time thinking on coalescing, debouncing, batching, pipelining, removing barriers etc. No panacea but for the kinds of systems like redpanda, the TpC is a good foundation to help us build for the future we intend to see in streaming (see initial sentence on context) hope this helps |
Beta Was this translation helpful? Give feedback.
-
I watched Alex's presentation Co-Designing Raft + Thread-per-Core Execution Model with some interest and learned a lot. But I'm left with kind of a high-level and probably naïve question. My intuition is that the bottleneck in any Raft-like (i.e. CP) system is always going to be the inter-node communication required for consensus, and that this would dominate by orders of magnitude any gains you could get by speeding up local I/O performance. What about this system am I missing, such that the optimizations discussed in the presentation have meaningful impact? I'm sure it's something! :)
Beta Was this translation helpful? Give feedback.
All reactions