feed downstream by relay log at startup if need #883

july2993 · 2020-01-14T08:21:12Z

What problem does this PR solve?

support to recover at startup by relay log.
follows #847 #849

What is changed and how it works?

add a status in checkpoint
- we can learn from the status that there are dirty data written into downstream behind commit ts or not.
group the config items about relay
- remove the relay-read-buf-size
- rename relay-log-size to max-file-size
make relay.Reader return the origin format directly
- this will be more flexible by the user of the reader
feed downstream by relay log at startup if need.
- mainly implement in drainer/relay.go
- only fails to init the pd client will try this or just assume the upstream cluster is normal and works like before.

Check List

Tests

Unit test
Manual test (add detailed scripts or steps below)
kill -9 drainer & pd while insert data into upstream
check status is Running
startup drainer
check recover and status updated to be Normal

Side effects

Related changes

Need to cherry-pick to the release branch
Need to update the documentation
Need to update the tidb-ansible repository
Need to be included in the release note

the statue may be Running even it's consistent with the log

july2993 · 2020-01-14T12:23:11Z

/run-all-tests

suzaku · 2020-02-04T01:54:05Z

@july2993 Please resolve conflicts.

july2993 · 2020-02-04T02:31:29Z

@july2993 Please resolve conflicts.

resolved

drainer/checkpoint/util.go

drainer/config.go

drainer/sync/mysql.go

drainer/relay/reader.go

drainer/relay.go

suzaku · 2020-02-04T06:20:24Z

drainer/relay_test.go

+
+var _ = check.Suite(&relaySuite{})
+
+type noOpLoader struct {


Embed the interface so that we don't have to repeat methods we don't need in this test?

all need unless SaveMode, I will keep this.
instead, we should make the pkg/Loader easy to test(not make it an interface only for test), we can use the struct directly and add some method like set it silently accept input and return success directly.

IANTHEREAL · 2020-02-04T08:17:50Z

drainer/checkpoint/checkpoint.go

@@ -19,6 +19,14 @@ import (
 	"go.uber.org/zap"
 )

+const (
+	// StatusNormal means server quit normally, data <= ts is synced to downstream
+	StatusNormal int = 0


Worth a better name, like StatusConsistent and StatusDrained

In fact, I'm not sure what's meaning of it, does it mean the binlog is drained, or reaching a consistent replication state? which one is right?

Good Change!

rename as StatusConsistent 002d449

IANTHEREAL · 2020-02-04T09:17:54Z

drainer/checkpoint/mysql.go

@@ -68,6 +74,16 @@ func newMysql(cfg *Config) (CheckPoint, error) {
 		return nil, errors.Annotatef(err, "exec failed, sql: %s", sql)
 	}

+	if sp.clusterID == 0 {
+		id, err := getClusterID(db, sp.schema, sp.table)


This logic is very weird.
In order to handle the situation where the upstream cluster is completely down, drainer needs a correct way to fetch the cluster id. But current implementation is not satisfying, like drainer can't allow store multiple entries in checkpoint table

I file an issue about it #889

IANTHEREAL · 2020-02-04T12:45:43Z

drainer/checkpoint/util_test.go

+	}{
+		{"no row", nil, 0, true, ErrNoCheckpointItem},
+		{"on row", []uint64{1}, 1, false, nil},
+		{"multi row", []uint64{1, 2}, 0, true, nil},


should the checkSpecifiedErr be nil?

yes, true, nil means the result will be some error but we don't check what kind of error.

IANTHEREAL · 2020-02-04T12:57:34Z

drainer/config.go

@@ -139,9 +148,6 @@ func NewConfig() *Config {
 	fs.StringVar(&cfg.SyncerCfg.IgnoreSchemas, "ignore-schemas", "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql", "disable sync those schemas")
 	fs.IntVar(&cfg.SyncerCfg.WorkerCount, "c", 16, "parallel worker count")
 	fs.StringVar(&cfg.SyncerCfg.DestDBType, "dest-db-type", "mysql", "target db type: mysql or tidb or file or kafka; see syncer section in conf/drainer.toml")
-	fs.StringVar(&cfg.SyncerCfg.RelayLogDir, "relay-log-dir", "", "path to relay log of syncer")


can we leave them?

they may improve ease of use, like in K8s, because configuration files are always cumbersome

IANTHEREAL · 2020-02-04T13:12:26Z

drainer/relay.go

+
+	defer cp.Close()
+
+	if cp.Status() == checkpoint.StatusNormal {


does it mean relay log only used to make the downstream reaching a consistent state?

drainer/relay.go

IANTHEREAL · 2020-02-04T13:55:59Z

drainer/relay.go

+
+	log.Info("finish feed by relay log")
+
+	readerErr := <-r.Error()


how about put codes L136 ~ L147 before L59? there may be some benefits, like no matter what reason feedByRelayLog exits, it's good for releasing the object, like calling r.Close, ld.Close(), in feedByRelayLogIfNeed

feedByRelayLog will need to close ld once having put all txns into it, and r.Error() should better put here together with the according r.Run() and r.Txns()

IANTHEREAL · 2020-02-04T14:06:42Z

drainer/relay.go

+	go func() {
+		ld.SetSafeMode(true)
+		loaderErr = ld.Run()
+		close(loaderQuit)


if ld.Run was exited with some error and close(loaderQuit) is still not executed, may the logic at L120 cause to miss the loaderErr?

120 case success, ok := <-successTxnC: 121 if !ok { 122 successTxnC = nil 123 log.Info("success closed") 124 continue 125 } 126 lastSuccessTS = success.Metadata.(int64)

will continue the loop and run into the loaderQuit case at nest select

pkg/util/util.go

suzaku · 2020-02-05T09:24:20Z

drainer/relay.go

+			break
+		}
+
+		select {


The logic in this select is a bit hard to follow and reason about. Will it be simpler if we just we two separate goroutine for reading and writing?

actually first try in multi goroutine style first, but finally, give up, will keep the code style now.

suzaku

LGTM

IANTHEREAL

LGTM

july2993 added 10 commits January 8, 2020 17:37

Group config about relay

5767a31

Add status to tell quit normal or not

b764a25

Try to recover status as normal if can't init pd client

8876d36

Merge branch 'master' into relay

2acc6fa

Fix return wrong err

3604ca6

remove check by empty dir directly

95e28ff

Rename log-size to max-file-size

56877bb

Update as normal anyway

81db7e5

the statue may be Running even it's consistent with the log

Fix lint

ae18878

Refine some code and comment

8107f2f

july2993 changed the title ~~Relay~~ feed downstream by relay log at startup if need Jan 14, 2020

july2993 added the status/PTAL label Jan 14, 2020

july2993 requested review from suzaku and zier-one January 14, 2020 08:22

Merge branch 'master' into relay

5bd613a

suzaku reviewed Feb 4, 2020

View reviewed changes

IANTHEREAL reviewed Feb 4, 2020

View reviewed changes

IANTHEREAL reviewed Feb 5, 2020

View reviewed changes

pkg/util/util.go Show resolved Hide resolved

july2993 added 2 commits February 5, 2020 16:32

Address trivial comment

d919746

Use std error

d341425

suzaku reviewed Feb 5, 2020

View reviewed changes

july2993 added 2 commits February 5, 2020 23:14

Make CreateLoader accept sql.DB

bb81853

Rename StatusNormal to StatusConsitent

002d449

july2993 requested review from IANTHEREAL and suzaku February 5, 2020 15:56

suzaku reviewed Feb 7, 2020

View reviewed changes

suzaku added status/LGT1 and removed status/PTAL labels Feb 7, 2020

Merge branch 'master' into relay

34cb18e

IANTHEREAL approved these changes Feb 7, 2020

View reviewed changes

IANTHEREAL added status/LGT2 need-update-docs and removed status/LGT1 labels Feb 7, 2020

IANTHEREAL merged commit db9bb50 into pingcap:master Feb 7, 2020

july2993 deleted the relay branch February 7, 2020 07:46

july2993 mentioned this pull request Feb 10, 2020

reference/tidb-binlog: add docs about relay log pingcap/docs-cn#2224

Merged

4 tasks

july2993 added a commit to july2993/tidb-binlog that referenced this pull request Feb 10, 2020

feed downstream by relay log at startup if need (pingcap#883)

0fe1f4f

july2993 mentioned this pull request Feb 10, 2020

support relay log #893

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feed downstream by relay log at startup if need #883

feed downstream by relay log at startup if need #883

july2993 commented Jan 14, 2020 •

edited by IANTHEREAL

Loading

july2993 commented Jan 14, 2020

suzaku commented Feb 4, 2020

july2993 commented Feb 4, 2020

suzaku Feb 4, 2020

july2993 Feb 5, 2020

IANTHEREAL Feb 4, 2020 •

edited

Loading

IANTHEREAL Feb 4, 2020

july2993 Feb 5, 2020

IANTHEREAL Feb 4, 2020 •

edited

Loading

IANTHEREAL Feb 4, 2020

IANTHEREAL Feb 4, 2020

july2993 Feb 5, 2020

IANTHEREAL Feb 4, 2020

july2993 Feb 5, 2020

IANTHEREAL Feb 4, 2020

IANTHEREAL Feb 4, 2020

july2993 Feb 5, 2020

IANTHEREAL Feb 4, 2020

july2993 Feb 5, 2020

suzaku Feb 5, 2020

july2993 Feb 6, 2020

suzaku left a comment

IANTHEREAL left a comment


		log.Info("finish feed by relay log")

		readerErr := <-r.Error()

feed downstream by relay log at startup if need #883

feed downstream by relay log at startup if need #883

Conversation

july2993 commented Jan 14, 2020 • edited by IANTHEREAL Loading

What problem does this PR solve?

What is changed and how it works?

Check List

july2993 commented Jan 14, 2020

suzaku commented Feb 4, 2020

july2993 commented Feb 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IANTHEREAL Feb 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IANTHEREAL Feb 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzaku left a comment

Choose a reason for hiding this comment

IANTHEREAL left a comment

Choose a reason for hiding this comment

july2993 commented Jan 14, 2020 •

edited by IANTHEREAL

Loading

IANTHEREAL Feb 4, 2020 •

edited

Loading

IANTHEREAL Feb 4, 2020 •

edited

Loading