Skip to content

Commit b1a22d7

Browse files
pingyumarsishandsomepeng1999andylokandyJmPotato
authored
Update RFC of API v2 according to latest codes (#79)
* RFC: RawKV Batch Export (#76) Signed-off-by: pingyu <[email protected]> * rawkv bulk load: add description for pause merge (#74) * rawkv bulk load: add description for pause merge Signed-off-by: Peng Guanwen <[email protected]> * Update text/0072-online-bulk-load-for-rawkv.md Co-authored-by: Liangliang Gu <[email protected]> Signed-off-by: Peng Guanwen <[email protected]> * Add future improvements Signed-off-by: Peng Guanwen <[email protected]> Co-authored-by: Liangliang Gu <[email protected]> Signed-off-by: pingyu <[email protected]> * ref pd#4112: implementation detail of PD Signed-off-by: pingyu <[email protected]> * ref pd#4112: implementation detail of PD Signed-off-by: pingyu <[email protected]> * remove raw cf Signed-off-by: Andy Lok <[email protected]> Signed-off-by: pingyu <[email protected]> * update Signed-off-by: Andy Lok <[email protected]> Signed-off-by: pingyu <[email protected]> * update pd design Signed-off-by: andylokandy <[email protected]> Signed-off-by: pingyu <[email protected]> * revert to keyspace_next_id Signed-off-by: andylokandy <[email protected]> Signed-off-by: pingyu <[email protected]> * RFC: Improve the Scalability of TSO Service (#78) Signed-off-by: pingyu <[email protected]> * make region size dynamic (#82) Signed-off-by: Jay Lee <[email protected]> Signed-off-by: pingyu <[email protected]> * update pd url Signed-off-by: andylokandy <[email protected]> Signed-off-by: pingyu <[email protected]> * address comment Signed-off-by: andylokandy <[email protected]> Signed-off-by: pingyu <[email protected]> * resolve pd flashback problem Signed-off-by: andylokandy <[email protected]> Signed-off-by: pingyu <[email protected]> * update rfcs Signed-off-by: Andy Lok <[email protected]> Signed-off-by: pingyu <[email protected]> * RFC: In-memory Pessimistic Locks (#77) * RFC: In-memory Pessimistic Locks Signed-off-by: Yilin Chen <[email protected]> * clarify where to delete memory locks after writing a lock CF KV Signed-off-by: Yilin Chen <[email protected]> * Elaborate transfer leader handlings and add correctness section Signed-off-by: Yilin Chen <[email protected]> * add an addition step of proposing pessimistic locks before transferring leader Signed-off-by: Yilin Chen <[email protected]> * clarify about new leaders of region split Signed-off-by: Yilin Chen <[email protected]> * Add tracking issue link Signed-off-by: Yilin Chen <[email protected]> * update design and correctness analysis of lock migration Signed-off-by: Yilin Chen <[email protected]> * add configurations Signed-off-by: Yilin Chen <[email protected]> Signed-off-by: pingyu <[email protected]> * propose online unsafe recovery (#91) Signed-off-by: Connor1996 <[email protected]> Signed-off-by: pingyu <[email protected]> * physical isolation between region (#93) Signed-off-by: Jay Lee <[email protected]> Signed-off-by: pingyu <[email protected]> * wip Signed-off-by: pingyu <[email protected]> * update Signed-off-by: pingyu <[email protected]> * update Signed-off-by: pingyu <[email protected]> * Apply suggestions from code review Co-authored-by: Xiaoguang Sun <[email protected]> Signed-off-by: pingyu <[email protected]> * fix case Signed-off-by: pingyu <[email protected]> Signed-off-by: pingyu <[email protected]> Signed-off-by: Andy Lok <[email protected]> Signed-off-by: andylokandy <[email protected]> Signed-off-by: Jay Lee <[email protected]> Signed-off-by: Yilin Chen <[email protected]> Signed-off-by: Connor1996 <[email protected]> Co-authored-by: Liangliang Gu <[email protected]> Co-authored-by: Peng Guanwen <[email protected]> Co-authored-by: Andy Lok <[email protected]> Co-authored-by: JmPotato <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: Yilin Chen <[email protected]> Co-authored-by: Connor <[email protected]> Co-authored-by: Xiaoguang Sun <[email protected]>
1 parent fe50384 commit b1a22d7

File tree

1 file changed

+131
-138
lines changed

1 file changed

+131
-138
lines changed

text/0069-api-v2.md

+131-138
Original file line numberDiff line numberDiff line change
@@ -2,216 +2,200 @@
22

33
## Motivation
44

5-
`API V2` is a set of breaking changes that aim to solve serval issues with current RawKV (hereafter referred to as `API V1`):
5+
`API V2` is a set of breaking changes that aims to solve serval issues with current RawKV (hereafter referred to as `API V1`):
66

7-
1. RawKV is not safe to use along with TxnKV. By solving this, TiDB will be able to support RawKV as the table's storage engine, which will enrich TiDB's use cases.
8-
2. RawKV TTL is controlled by Store configuration. Switching the configuration will cause data corruption in silence.
9-
3. RawKV TTL is encoded into the value by appending 8-bytes UNIX timestamp to the end of the value, therefore it's hard to introduce other encode afterward.
10-
4. It could be nice if we can deploy multiple applications on one TiKV cluster.
7+
1. RawKV is not safe to use along with TxnKV & TiDB. If it get solved, the three different modes can be used in the same cluster, and reduce cost of resource and maintenance. It even makes it possible that TiDB can support RawKV as the table's storage engine, and enrich TiDB's use cases.
8+
2. RawKV TTL is controlled by TiKV configuration. Switching the configuration will cause data corruption in silence.
9+
3. Key and value of RawKV are just raw bytes, therefore it's hard to add more metadata afterward to support more features, such as keyspace to support multi-tenant, or timestamp to support [Change Data Capture].
1110

1211
## Detailed Design
1312

1413
### New key-value codec
1514

16-
This RFC introduces a new key encode to RawKV and TxnKV, and a new value encode to RawKV, which will allow the RawKV to be used along with TxnKV and also allow TiKV to flexibly add meta, e.g. the TTL, to a RawKV value.
15+
This RFC introduces a new key encoding to RawKV and TxnKV, and a new value encoding to RawKV, which allow RawKV to be used along with TxnKV and being flexible to add more fields to value.
1716

18-
In addition, keys will be contained in keyspaces, where the keys in different keyspace are totally independant. If keyspace is not specified, the keyspace 'default' will be used.
17+
Since API V2 changed the storage encoding, it will be not compatible to switch between `API V1` and `API V2` while there are non-TiDB data in TiKV. TiDB data is specially treated in order to not be affected by this change.
1918

20-
The `API V2` is enabled by a switch on PD. Since it changes the storage encode, it will be not compatible to switch between `API V1` and `API V2` while there are non-TiDB data in TiKV. TiDB data is specially treated in order to not be affected by the change.
19+
#### Key Encoding
2120

22-
#### Key Encode
21+
Once `API V2` is enabled, keys will start with either:
2322

24-
Once the `API V2` is enabled, the key will be starting with either:
23+
1. `m` and `t`: TiDB keys.
24+
2. `x`: TxnKV keys.
25+
3. `r`: RawKV keys.
2526

26-
1. `m` or `t`: TxnKV key. Used by TiDB.
27-
2. `k{keyspace prefix id}x`: TxnKV key.
28-
3. `k{keyspace prefix id}r`: RawKV key.
27+
`x`, `r` are mode prefixes that indicates which mode the key is belonging to. After mode prefix is 3 bytes for keyspace. So in API V2, RawKV & TxnKV keys will be encoded as `MCE(mode-prefix + keyspace + user-key) + timestamp`. `MCE` is abbreviation of [Memory Comparable Encoding], which is necessary to keep the encoded keys having the same order with user keys.
2928

30-
The `{keyspace prefix id}` is the [keyspace](https://github.com/tikv/rfcs/pull/39) prefix for seperating keys of different keyspace. It should be an vary-length integer whose highest bit of every byte denotes whether the next byte is still part of the integer. The client will fetch the prefix from PD by the keyspace name specified by the user when initializing the client, so it means that the keyspace prefix is valid during the session, in other words, the change on keyspace on PD will not take effect on running seesions.
29+
`Keyspace` is fixed-length of 3 bytes in network byte order, and will be introduced in another RFC.
3130

32-
Note that in TxnKV, the key will be encoded by `Memory Comparable Encoding`. But since the `Memory Comparable Encoding` will not change the starting bytes but only add paddings, there won't be any overlap between RawKV and TxnKV.
31+
`Timestamp` in RawKV entries are necessary to implement [Change Data Capture] feature, which will indicate what and when data is changed.
3332

34-
#### RawKV Value Encode
33+
##### Timestamp requirement
3534

36-
If the key has RawKV prefix, which is `k{keyspace id prefix}r`, then the value can be either:
35+
Among requests of a single key, the timestamp must be monotonic with the sequence of data flushed to disk in TiKV.
3736

38-
1. `{0x0}{data}`
39-
2. `{0x1}{TTL expire timestamp}{data}`
37+
In general, if request `A` [Happened Before] `B`, then `Timestamp(A)` < `Timestamp(B)`. As to RawKV, we provide [Causal Consistency] by keeping order of the timestamp the same as sequence of data flush to disk.
4038

41-
### Keyspace Management
39+
At the same time, as RawKV doesn't provide cross-rows transaction and snapshot isolation, we allow concurrent updates to different keys, which means that the timestamp order of two different keys would not be consistent with data flush, to improve efficiency.
4240

43-
Add a new http interface to PD for adding, renaming, deleteing and querying the mapping from keyspace name to prefix id:
41+
##### Timestamp Generation
4442

45-
```javascript
46-
// list all keyspaces
47-
GET /keyspaces
48-
[
49-
{
50-
name: "default",
51-
id: "0",
52-
properties: {
53-
"description": "this is default keyspace",
54-
"default-config": {
55-
"raw-client": {
56-
"ttl-secs": 30000000
57-
},
58-
"txn-client: {
59-
"enable-async-commit": true
60-
}
61-
}
62-
}
63-
},
64-
{
65-
name: "redis",
66-
id: "1"
67-
}
68-
]
43+
Timestamp is generated by PD (i.e. TSO), the same as TiDB & TxnKV. But differently, TSO is acquired by TiKV internally, to get a better overall performance and client compatibility.
6944

70-
// add new keyspace
71-
POST /keyspaces
72-
{
73-
name: "foo",
74-
}
45+
To reduce latency and improve availability, TiKV will prefetch and cache a number of TSO locally. User can specify how long the TSO cache is required to tolerate fault of PD, then TiKV will calculate the size of batch according to recent QPS.
7546

76-
// delete a keyspace
77-
DELETE /keyspaces/{keyspace_name}
78-
{}
47+
Note that TSO cache brings another issue. If subsequent writes of a single key happen in another store of TiKV cluster (caused by leader transfer), the TSO cache of another store must be renewed to ensure that it's larger than the original store. TiKV observes events of `leader transfer` and then flushes the cache. Another event should be observed is `region merge`, as the leader of merged region would be on another store as to the region being merged from.
7948

80-
// recover the latest deleted keyspace
81-
POST /keyspaces?action=flashback
82-
{
83-
new_name: "bar",
84-
}
85-
```
49+
In current implementation, the flush of TSO cache is asynchronous to avoid blocking leader transfer and region merge. Clients will get an `MaxTimestampNotSynced` error until the flush is done.
8650

87-
The keyspaces are stored in etcd and has the no limitation on the id number.
51+
*The alternative of timestamp generation is `HLC` ([Hybrid Logical Clock]). The pros of `HLC` is being independent to availability of PD, but the cons is that it depends on local clock and [NTP], and it's also not easy to make it right (refer to [strict monotonicity](https://github.com/cockroachdb/cockroach/blob/13c5a25238ce75cfb7ff151d620e82aa44c72e27/pkg/util/hlc/doc.go#L150) in CockroachDB). All things considered, as PD is designed to be highly available, and fault of PD will affect not only TSO but also other critical components (e.g, region metadata), we prefer to utilize TSO as timestamp.*
8852

89-
1. Adding keyspace: newly added keyspaces can only be viable to new clients.
90-
2. Deleting keyspace:
91-
2.1. Deleting keyspace only marks the metadata to inviable in PD, the data in the keyspace and metadata in PD will not be deleted automatically. Garbage collecting the data in deleted keyspaces may be introduced in the future since it increases the complexity of this RFC. To ensure no data is left, the user should clean up the keyspace before deleting the keyspace at present.
92-
2.2. Every client syncs the keyspace information with the PD leader every 5 minutes. When a keyspace is deleted, the client should be aware of the deletion in 5 minutes. The keyspace can not be accessed by any living client after 5 minutes.
93-
3. Flashbacking keyspace:
94-
3.1. Flashbacking keyspace only affects the metadata, turning the keyspace name and id mapping to viable by clients.
95-
3.2. If there are multiple deleted keyspaces with the same name, only the last deleted keyspace is flashbacked. Users can flashback all these deleted keyspaces by calling the flashback API multiple times with different new keyspace names.
53+
#### RawKV Value Encoding
9654

97-
#### pd-ctl
55+
The value of RawKV `API V2` by now can be either:
9856

99-
To enable API V2, which also enables the keyspace API:
57+
1. `{data}{0x0}`, for values without TTL
58+
2. `{data}{TTL expire timestamp}{0x1}`, for values with TTL
59+
3. `{0x1}`, for values deleted
10060

101-
```bash
102-
>> config set api-version 2
103-
```
61+
The last byte of value is used as meta flags. Bit `0` of flag is for TTL, if it's set, the `8` bytes just before meta flags is the TTL expire timestamp. Bit `1` is for deleted mark, if it's set, the entries is logical deleted (used for [Change Data Capture]).
10462

105-
To manage keyspace:
63+
Extra fields in future can utilize other bits of meta flags, and will be inserted between user value & meta flags in reverse order. The most significant bit of meta flags is supposed to be used for extended meta flags if there are more than 7 fields.
10664

107-
```bash
108-
>> config keyspaces show
109-
>> config keyspaces create <keyspace name>
110-
>> config keyspaces delete <keyspace name>
65+
`{user value}{field of bit n}...{extended meta flags}...{field of bit 2}{field of bit 0 (TTL)}{meta flags}`
11166

112-
# example: config keyspaces set-property foo default-config.raw-client.ttl-secs 100000
113-
>> config keyspaces set-property <keyspace name> <property-path> <property-value>
67+
### How to safely enable API V2
11468

115-
>> config keyspaces delete-property <keyspace name> <property-path>
116-
>> config keyspaces flashback <new keyspace name>
117-
```
69+
#### Upgrade from `API V1` to `API V2`
11870

119-
#### Keyspace metadata
71+
1. Upgrade TiKV, TiDB, and PD to the version that supports `API V2`.
72+
2. Ensure that all the keys in TiKV are written by TiDB, which are prefixed with `m` or `t`. Any other data should be migrated out, or else the step 3 will fail.
73+
3. Enable `API V2` in TiKV config file and restart TiKV (user should also take the responsibility to enable `API V2` for all TiKV clients excluding TiDB).
12074

121-
```json
122-
{
123-
name: string,
124-
id: int64,
125-
created_at: timestamp,
126-
deleted_at: timestamp, // if set, the keyspace is not visiable to users
127-
flashbacked_at: timstamp
128-
properties: object
129-
}
130-
```
75+
#### Downgrade from `API V2` to `API V1`
13176

132-
### How to safely enable API V2
77+
1. Ensure that all the keys in TiKV are written by TiDB, which are prefixed with `m` or `t`. Any other data should be migrated out.
78+
2. Disable `API V2` in TiKV config file and restart TiKV (user should also take the responsibility to enable `API V1` for all TiKV clients excluding TiDB).
13379

134-
#### Upgrade
80+
#### Data migration
13581

136-
Upgrade from `API V1` to `API V2` is a simple process:
82+
A backup and restore tool would be provided to export data from TiKV cluster of `API V1`, and convert to `API V2` encoding. Then import the backup data into another TiKV cluster of `API V2`.
13783

138-
1. Update TiKV, TiDB, and PD to the version that supports `API V2`.
139-
2. Ensure that all the keys in TiKV are written by TiDB, which are prefixed with `m` or `t`. Delete if any. Or else the step 4 will fail.
140-
3. Use `pd-ctl` to enable `API V2`.
141-
4. Enable `API V2` in TiKV config file and restart TiKV (User should take the responsibility to offline all tikv clients excluding TiDB. Or set by online config change API (Not proposed in this RFC, but is good to have).
84+
## Implementation Details
14285

143-
#### Downgrade
86+
### kvproto
14487

145-
Downgrade from `API V2` to `API V1` is also simple:
88+
```proto
89+
// kvrpcpb.proto
14690
147-
1. Ensure that all the keys in TiKV are written by TiDB, which are prefixed with `m` or `t`. Delete if any.
148-
2. Use `pd-ctl` to disable `API V2`.
149-
3. Disable `API V2` in TiKV config file and restart TiKV (User should take the responsibility to offline all tikv clients excluding TiDB). Or set by online config change API (Not proposed in this RFC, but is good to have).
91+
message Context {
92+
// ... omited other fields
15093
151-
#### Data migration
94+
// API version implies the encode of the key and value.
95+
APIVersion api_version = 21;
96+
}
15297
153-
It's reasonable to provide a way to import and export non-TiDB data in TiKV during the upgrade or downgrade. On TiKV before 4.0, the only way to do that is `scan` and `batch_put` on the client. After 4.0, TiKV start to support importing SST file into TxnKV, and after 5.1, importing on RawKV is also supported. You can find more information in [`RFC: Online Bulk Load for RawKV`](https://github.com/tikv/rfcs/pull/72).
98+
// The API version the server and the client is using.
99+
// See more details in https://github.com/tikv/rfcs/blob/master/text/0069-api-v2.md.
100+
enum APIVersion {
101+
V1 = 0;
102+
V1TTL = 1;
103+
V2 = 2;
104+
}
105+
```
154106

155-
### Implementation Details
107+
```proto
108+
// raft_serverpb.proto
156109
157-
#### PD
110+
message StoreIdent {
111+
// ... omited other fields
112+
113+
kvrpcpb.APIVersion api_version = 3;
114+
}
115+
```
158116

159-
Add the new APIs described [above](#Keyspace-Management).
117+
```proto
118+
// brpb.proto
160119
161-
#### TiKV Server
120+
message BackupMeta {
121+
// ... omited other fields
162122
163-
In TiKV config file, add a new configuration `storage.api_version`. When enabled, `storage.enable_ttl` must also be enabled.
123+
kvrpcpb.APIVersion api_version = 18;
124+
}
164125
165-
In kvdb, add a store meta `api_version`. When the store meta mismatches the config `storage.enable_ttl`, it means that the user is switching the API version, then check no non-TiDB exist, and then save the new api version in store meta.
126+
message BackupResponse {
127+
// ... omited other fields
128+
129+
kvrpcpb.APIVersion api_version = 5;
130+
}
131+
```
166132

167-
In kvproto message `SSTMeta`, add `api_version`. Reject the SST file if the version is mismatched.
133+
### TiKV Server
168134

169-
In TiKV gRPC's context, add a field `api_version`.
135+
In TiKV config file, add a new configuration `storage.api-version`.
170136

171-
If `storage.api_version=2`:
137+
If the API version in `StoreIdent` mismatches with `storage.api-version` in the config, it means that the user is switching the API version, therefore TiKV will check if there is any non-TiDB data in storage, and eventually save the new API version in `StoreIdent`.
172138

173-
- Run TTL compaction filter only on the keys that start with RawKV prefix.
139+
If `storage.api-version=2`:
174140

175-
- Use the `API V2` Value encode in `RawStore`, `TTLStore` and `sst_importer`.
141+
- Use the new value encoding in `RawStore` and `sst_importer`.
176142

177-
- If the request's context has `api_version=1`:
143+
- Only allow RawKV to access `default` CF.
144+
145+
- If the request's context has `api-version=1`:
178146
- Reject the request unless it's a TxnKV request and the keys starting with `m` or `t`.
179147

180-
- If the request's context has `api_version=2`:
148+
- If the request's context has `api-version=2`:
181149
- Only allow the key that has RawKV prefix for RawKV requests.
182150
- Only allow the key that has TxnKV prefix for TxnKV requests.
183151

184-
If `storage.api_version=1`:
152+
If `storage.api-version=1` & `storage.enable-ttl=true`:
153+
154+
- Reject all requests with `api-version=2` in the context.
155+
- Reject all transactional requests otherwise the raw TTL encoding in V1TTL will corrupt transaction data.
156+
157+
If `storage.api-version=1` & `storage.enable-ttl=false`:
185158

186-
- Reject all requests with `api_version=2` in the context.
159+
- Reject all requests with `api-version=2` in the context.
187160

188-
#### TiKV Client
161+
### TiKV Client
189162

190163
Provide two modes for users:
191164

192165
- V2:
193-
- Fetch keyspace prefix by keyspace name from PD and then prepend `k{keyspace prefix}x` on TxnKV keys or prepend `k{keyspace prefix}r` on RawKV keys.
194-
- Set `api_version=2` in TiKV gRPC's `Context`.
195-
- Disallow specify CF in `RawClient`.
196-
- Allow user to specify a keyspace for a session of `RawClient` or `TxnCient`. Default keyspace is named `default`.
197-
- Fetch keyspace information from PD every 5 mins. Destory the client session if the keyspace it's using is deleted.
166+
- Prepend `x{keyspace}` before TxnKV keys or prepend `r{keyspace}` before RawKV keys.
167+
- `Keyspace` is optional and defaults to `0` for backward compatible
168+
- Set `api_version=2` in `kvrpcpb.Context`.
169+
- Disallow specifying `cf` in `RawClient`.
198170

199171
- V1:
200-
- Behaves jusk like current client.
201-
- Set `api_version=1` in TiKV gRPC's `Context`.
172+
- Set `api_version=1` in `kvrpcpb.Context`.
173+
- Besides above, behaves just like current client.
202174

203175
Listed below is the compatibility matrix:
204176

205-
| | V1 Server | V2 Server |
206-
| ------------ | --------- | --------- |
207-
| V1 RawClient | Raw Data | Forbidden |
208-
| V1 TxnClient | Txn Data | TiDB Data |
209-
| V2 RawClient | Forbidden | Raw Data |
210-
| V2 TxnClient | Forbidden | Txn Data |
177+
| | V1 Server | V1TTL Server | V2 Server |
178+
| --------------------- | --------- | ------------ | --------- |
179+
| V1 RawClient | Raw | Raw | Error |
180+
| V1 RawClient with TTL | Error | Raw | Error |
181+
| V1 TxnClient | Txn | Error | Error |
182+
| V1 TiDB | TiDB Data | Error | TiDB Data |
183+
| V2 RawClient | Error | Error | Raw |
184+
| V2 TxnClient | Error | Error | Txn |
211185

212-
### CDC / BR
186+
### Garbage Collection
213187

214-
Since all access to TiDB is unchanged during the upgrade, CDC and BR should work the same after upgrade/downgrade.
188+
*To be supplemented in another PR*
189+
190+
### Backup and Restore
191+
192+
*To be supplemented in another PR*
193+
194+
### Change Data Capture
195+
196+
The details of CDC will be introduced in another RFC.
197+
198+
*TODO: add link here.*
215199

216200
### tikv-ctl
217201

@@ -222,3 +206,12 @@ Read `api_version` in kvdb and decode data using the corresponding version.
222206
Upgrade to the latest TiKV Go Client and use `V1` mode.
223207

224208
## Unresolved questions
209+
210+
*TBD*
211+
212+
[Change Data Capture]: https://en.wikipedia.org/wiki/Change_data_capture
213+
[Memory Comparable Encoding]: https://github.com/facebook/mysql-5.6/wiki/MyRocks-record-format#memcomparable-format
214+
[Causal Consistency]: https://en.wikipedia.org/wiki/Causal_consistency
215+
[Happened Before]: https://en.wikipedia.org/wiki/Happened-before
216+
[Hybrid Logical Clock]: https://cse.buffalo.edu/tech-reports/2014-04.pdf
217+
[NTP]: https://en.wikipedia.org/wiki/Network_Time_Protocol

0 commit comments

Comments
 (0)