add cluster support #122

zklgame · 2024-03-15T13:43:04Z

Why make this pull request?

Add cluster option for the Async service.

How to test this pull request?

go run cmd/server/main.go --config ./config/development-postgres-cluster.yaml

TODO in next MRs

Recovery for failed Async server.
- when there is any member change, use delegates to call a new API in each async server to add/remove queues
remove all the defaultShard and store tasks of the same execution in the same shard.

codecov · 2024-03-15T16:58:25Z

Codecov Report

Attention: Patch coverage is 34.32836% with 220 lines in your changes are missing coverage. Please review.

Project coverage is 60.11%. Comparing base (c5da3b2) to head (59eb787).

Files	Patch %	Lines
service/async/service_impl.go	30.12%	109 Missing and 7 partials ⚠️
cluster/event_delegate.go	0.00%	30 Missing ⚠️
config/config.go	15.62%	25 Missing and 2 partials ⚠️
cluster/delegate.go	0.00%	26 Missing ⚠️
service/async/default_server.go	69.38%	11 Missing and 4 partials ⚠️
cmd/server/bootstrap/xcherry.go	55.55%	3 Missing and 1 partial ⚠️
engine/immediate_task_queue.go	0.00%	1 Missing ⚠️
engine/timer_task_queue.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #122      +/-   ##
==========================================
- Coverage   61.38%   60.11%   -1.28%     
==========================================
  Files          88       91       +3     
  Lines        6936     7200     +264     
==========================================
+ Hits         4258     4328      +70     
- Misses       2404     2591     +187     
- Partials      274      281       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

longquanzheng

As a hierarchy of the cluster mode, it will be nice to:

Each async service process is a standard deployment in K8s(Marvin) so the number of port needs to be static.
Each async service process is composed by several shards, each shard is composed by immediateQueue + delayQueue.

longquanzheng · 2024-03-18T16:28:13Z

.github/workflows/ci-postgres14-cluster.yaml

+        run: sleep 5 && make install-schema-postgres
+
+      - name: Start Postgres server # then run in the background
+        run: ./xcherry-server --config ./config/development-postgres-cluster.yaml&


you can split into multiple PRs. The end goal is to run multiple processes as different nodes of the cluster:

./xcherry-server --config ./config/development-postgres-cluster-api-node-a.yaml& ./xcherry-server --config ./config/development-postgres-cluster-async-node-a.yaml& ./xcherry-server --config ./config/development-postgres-cluster-async-node-b.yaml& ./xcherry-server --config ./config/development-postgres-cluster-async-node-c.yaml&

It includes:

an API node

three async nodes:

node A

node B

node C

longquanzheng · 2024-03-18T16:32:36Z

cluster/delegate.go

+	"log"
+)
+
+type ClusterDelegate struct {


what is this for? seems not being used yet. Maybe put a comment if you want to keep for later PRs

It's used to save the meta data. We use meta data to get the target client address of a node.

longquanzheng · 2024-03-18T16:38:52Z

cluster/event_delegate.go

+
+type ClusterEventDelegate struct {
+	consistent    *hashring.HashRing
+	ServerAddress string


is ServerAddress the address of the current node? or others?

Maybe better to have a constructor method:

func NewClusterEventDelegate

longquanzheng · 2024-03-18T16:44:05Z

cluster/event_delegate.go

+	meta := ParseClusterDelegateMetaData(node.Meta)
+
+	hostPort := BuildHostAddress(node)
+	log.Printf("ClusterEvent JOIN %s: advertise address %s, server address %s", d.ServerAddress, hostPort, meta.ServerAddress)


we should use our logger(unless impossible like during startup before the logger is available). This log will only print to stderr/sdtout. Our logger allows configure to different implementation, and using better format

longquanzheng · 2024-03-18T16:47:59Z

service/async/service_impl.go

@@ -6,66 +6,117 @@ package async
 import (
 	"context"
 	"fmt"
+	"github.com/hashicorp/memberlist"


looks like a lot of the membership implementation code is added into this file.

It will be great if we can abstract the code into a ClusterMembership interface & impl in a spearate place. And this server_impl will just use it

longquanzheng · 2024-03-18T16:50:09Z

service/async/default_server.go

+	serverAddresses := strings.Split(cfg.AsyncService.ClientAddress, ",")
+	advertiseAddresses := []string{""}
+
+	if cfg.AsyncService.Mode == config.AsyncServiceModeConsistentHashingCluster {


btw, we can rename AsyncServiceModeConsistentHashingCluster to AsyncServiceModeCluster :D

longquanzheng · 2024-03-18T17:00:13Z

config/config.go

+		// Used in AsyncServiceConfig with AsyncServiceModeConsistentHashingCluster mode only.
+		// Multiple Address seperated by comma.
+		ClusterAddresses string `yaml:"clusterAddresses"`
+		// Used in AsyncServiceConfig with AsyncServiceModeConsistentHashingCluster mode only.
+		// These are addresses used by memberlist.
+		ClusterAdvertiseAddresses string `yaml:"clusterAdvertiseAddresses"`


I would suggest to put them under AsyncServiceConfig.ClusterConfig

Also let's add more details on what are those configs for and how to use them

longquanzheng · 2024-03-18T17:03:58Z

service/async/default_server.go

+	var servers []Server
+	addressToServerMap := map[string]Server{}
+
+	serverAddresses := strings.Split(cfg.AsyncService.ClientAddress, ",")


cfg.AsyncService.ClientAddress is the address for api service to call async service. API service needs to notify the async serivce's specific node/shard. But api service doesn't need to use memberlist to lookup. Instead, api service will send to a random node, and then the receiving async service node may forward the request to the right node of the shard.

Because api service doesn't need to know which node own the shard, it can send request behind a LBS.

The current way is:

api server send the internal request to a random async server

the random async server forward the request to the target async server

This is the same as what you described above. See the codes near the comment // randomly send the request to an async service.

zklgame force-pushed the add_memberlist branch 4 times, most recently from a418028 to d637ea9 Compare March 15, 2024 16:52

add cluster support

01581eb

zklgame force-pushed the add_memberlist branch from d637ea9 to 01581eb Compare March 15, 2024 16:55

zklgame marked this pull request as ready for review March 15, 2024 16:58

zklgame requested a review from longquanzheng March 15, 2024 16:59

zklgame force-pushed the add_memberlist branch 7 times, most recently from 46f0bbb to 931d428 Compare March 17, 2024 11:53

add CI test for cluster mode

00d59a5

zklgame force-pushed the add_memberlist branch from 931d428 to 00d59a5 Compare March 17, 2024 12:52

add delegate

59eb787

longquanzheng reviewed Mar 18, 2024

View reviewed changes

zklgame closed this Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add cluster support #122

add cluster support #122

zklgame commented Mar 15, 2024 •

edited

Loading

codecov bot commented Mar 15, 2024 •

edited

Loading

longquanzheng left a comment

longquanzheng Mar 18, 2024 •

edited

Loading

longquanzheng Mar 18, 2024

zklgame Mar 19, 2024

longquanzheng Mar 18, 2024

longquanzheng Mar 18, 2024

longquanzheng Mar 18, 2024

longquanzheng Mar 18, 2024

longquanzheng Mar 18, 2024

longquanzheng Mar 18, 2024

longquanzheng Mar 18, 2024

longquanzheng Mar 18, 2024

zklgame Mar 19, 2024

add cluster support #122

add cluster support #122

Conversation

zklgame commented Mar 15, 2024 • edited Loading

Why make this pull request?

How to test this pull request?

TODO in next MRs

codecov bot commented Mar 15, 2024 • edited Loading

Codecov Report

longquanzheng left a comment

Choose a reason for hiding this comment

longquanzheng Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zklgame commented Mar 15, 2024 •

edited

Loading

codecov bot commented Mar 15, 2024 •

edited

Loading

longquanzheng Mar 18, 2024 •

edited

Loading