-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rack-aware load balancing #31
Comments
@nyh , I don't know details about alternator implementation, but I know that it uses LWT under the hood, recently we have been asked to make sure that regualr gocql drivers void load balancing logic for If so, then we need to do token-aware load balancing as well to get more performance out of the cluster. |
@dkropachev this is true - we have #11 for token-aware load balancer, but as I noted there, there is a difficulty: it will mean we'll need to monkey-patch the AWS SDK at a different place than we do today, to let it see the full query - and parse it (unfortunately) - to decide where to route a write request (for reads, LWT is not relevant). We also have scylladb/scylladb#5703 on the Scylla side, which says that if the AWS SDK isn't token-aware (like it isn't today), we can rescue the contention problem by forward writes to the "right" node. But you're right - if the load balancer is rack-aware (as this issue proposes), and different racks will send writes to different nodes, we will end up with more LWT contention. I don't know what to do about this - other than making rack-aware load balancing optional. Personally, I think the LWT whole-partition-contention problems need to be fixed (scylladb/scylladb#16261) instead of trying to work around them in the load balancer. |
CC @kostja - thoughts? |
if we have plans to enable LWT on tablets, then either we need to bring drivers to the alternator load balancers, or we need to expose routing info via API. |
@dkropachev - any progress on this? (regardless of tablets) |
@mykaul, it is scheduled to the next sprint |
It'd be great if it can be prioritized and delivered sooner. It has a material impact. |
@dkropachev , @roydahan - what's the status of this? |
@mykaul do you know which of the languages and SDK versions that we support you want this feature to appear in first? |
Java for sure. Not sure about SDK version. |
The first PRs are only for Java. |
Java, both SDK versions. |
Implemented in Java, see pull request #40. |
Currently, all our Alternator load-balancing implementations in this repository ignore rack (a.k.a Amazon availability zone, AZ) information: We use the "/localnodes" API to get a list of all live Scylla servers in this data center (a.k.a Amazon region), and send the request to one of it.
But when the Scylla DC has multiple racks on different Amazon AZs, cross-AZ traffic costs money. It is cheaper for the client running on a specific AZ to send the request to a random node on the same AZ - and not to nodes on other AZs. This issue requests that the load balancers do this: Prefer to send requests to a node on the client's rack, not a node on other racks.
See scylladb/scylladb#12147 on a server-side modification to "/localnodes" that can help us get the list of nodes in the current AZ.
Beyond server-size modifications the following two points will also need to be considered:
The text was updated successfully, but these errors were encountered: