-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Primary replica role at the slot level #1372
Comments
Even if we support the idea directionally. there are so many things to consider, opened this issue to hear initial feedback to the direction, see if there is community interest, or maybe it was discussed before and so on... |
I've heard a lot of feedback that folks like the way hazel cast supports "homogenous clusters". The idea being that all nodes own some set of data and replicate it to others, which sounds very similar to what you're proposing. I overall really like this, and would dramatically simplify a lot of setups. We did also recently introduce the concept of a "shard", which makes this flexible slot ownership more difficult. We could implement something like a shard can have multiple primaries over different parts of the data, either that or allow nodes to be part of multiple shards. It would also hopefully keep the replication stream pretty straightforward. We also need per slot replication effectively for atomic slot migration. |
Interesting, didn't think of non-homogenous hardware explicitly, I was thinking of, "a mirror of the same problem" that is non-homogenous slots [in cpu\memory usage], essentially the same technical problem but with a different user/business problem[utilize non homogenous hw vs handle unbalanced slots]. |
+1 to this idea. I think this is the right long term vision for a feature like atomic slot migration. The main challenge for this will be that the code is very coupled to the idea of a node either being just a primary or just a replica, and this assumption is made in how some things are designed (I.e. we assume that serving traffic while full syncing isn't very important since it is just a replica). Brainstorming some incremental improvements that we can make to reach this goal:
Regarding appcompat - I think probably the best path forward is to add a CLIENT CAPA for "replication-at-slot-level" or something like that. If it isn't supported, we can sanitize the CLUSTER SLOTS output as follows:
We can still have features like atomic slot migration at that point, but clients wouldn't be able to observe the intermediary state of "node B owns slot N and is a replica of slot M in node A" and instead would just see an immediate transition from "node B owns slot N and node A owns slot M" to "node B owns slot N and M and node A owns nothing". |
@madolson can you please add label Client support required. It will be easier for clients maintainers to track these changes |
The problem/use-case that the feature addresses
Currently, replicas provide availability, increased durability (from a domain failure perspective), and performance improvements when using Read from Replica (RFR). However, performance scaling is limited to stale reads and does not extend to regular writes\reads. Many customers do not use RFR for various reasons. Additionally, when a primary node fails, all write traffic to that node fails [potentially requiring application-level logic to handle the failure].
Description of the feature
We propose redefining role assignments from the node level to the slot level. In this model, a node can be the primary for certain slots and a replica for others. This involves adjusting the codebase so that any primary/replica designations are applied to slots rather than nodes. Essentially, the node becomes a logical container of compute, memory, and services that manages atomic data entities (slots).
With this approach, we can scale the performance of both writes and reads based on the number of nodes in a shard, eliminating the concept of a replica node. If a node fails, only the slots for which it was the primary are directly impacted, improving fault granularity\isolation.
The recent introduction of the dict-per-slot has shifted many processes to operate at the slot level, which facilitates the transition to this model. As part of this feature we will need to continue going down this path for other flows in the system, including bgsave, for example.
This change would require client support, but for clients that do not have the support we can initially implement the feature in a degenerated form where all slots in a shard have the same primary node, maintaining backward compatibility.
Additional information
An added benefit of this approach is the potential to reduce code complexity by unifying the code paths of replication and slot migration, which are currently two similar processes for maintaining data consistency between nodes.
For Cluster Mode Disabled (CMD), we can consider all data to reside in slot 0. In the long term, we might consider enabling slots (or logical grouping) for CMD, allowing customers to gain the benefits of this model without adopting Cluster Mode Enabled (CME).
The text was updated successfully, but these errors were encountered: