You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: text/0069-api-v2.md
+131-138
Original file line number
Diff line number
Diff line change
@@ -2,216 +2,200 @@
2
2
3
3
## Motivation
4
4
5
-
`API V2` is a set of breaking changes that aim to solve serval issues with current RawKV (hereafter referred to as `API V1`):
5
+
`API V2` is a set of breaking changes that aims to solve serval issues with current RawKV (hereafter referred to as `API V1`):
6
6
7
-
1. RawKV is not safe to use along with TxnKV. By solving this, TiDB will be able to support RawKV as the table's storage engine, which will enrich TiDB's use cases.
8
-
2. RawKV TTL is controlled by Store configuration. Switching the configuration will cause data corruption in silence.
9
-
3. RawKV TTL is encoded into the value by appending 8-bytes UNIX timestamp to the end of the value, therefore it's hard to introduce other encode afterward.
10
-
4. It could be nice if we can deploy multiple applications on one TiKV cluster.
7
+
1. RawKV is not safe to use along with TxnKV & TiDB. If it get solved, the three different modes can be used in the same cluster, and reduce cost of resource and maintenance. It even makes it possible that TiDB can support RawKV as the table's storage engine, and enrich TiDB's use cases.
8
+
2. RawKV TTL is controlled by TiKV configuration. Switching the configuration will cause data corruption in silence.
9
+
3. Key and value of RawKV are just raw bytes, therefore it's hard to add more metadata afterward to support more features, such as keyspace to support multi-tenant, or timestamp to support [Change Data Capture].
11
10
12
11
## Detailed Design
13
12
14
13
### New key-value codec
15
14
16
-
This RFC introduces a new key encode to RawKV and TxnKV, and a new value encode to RawKV, which will allow the RawKV to be used along with TxnKV and also allow TiKV to flexibly add meta, e.g. the TTL, to a RawKV value.
15
+
This RFC introduces a new key encoding to RawKV and TxnKV, and a new value encoding to RawKV, which allow RawKV to be used along with TxnKV and being flexible to add more fields to value.
17
16
18
-
In addition, keys will be contained in keyspaces, where the keys in different keyspace are totally independant. If keyspace is not specified, the keyspace 'default' will be used.
17
+
Since API V2 changed the storage encoding, it will be not compatible to switch between `API V1` and `API V2` while there are non-TiDB data in TiKV. TiDB data is specially treated in order to not be affected by this change.
19
18
20
-
The `API V2` is enabled by a switch on PD. Since it changes the storage encode, it will be not compatible to switch between `API V1` and `API V2` while there are non-TiDB data in TiKV. TiDB data is specially treated in order to not be affected by the change.
19
+
#### Key Encoding
21
20
22
-
#### Key Encode
21
+
Once `API V2` is enabled, keys will start with either:
23
22
24
-
Once the `API V2` is enabled, the key will be starting with either:
23
+
1.`m` and `t`: TiDB keys.
24
+
2.`x`: TxnKV keys.
25
+
3.`r`: RawKV keys.
25
26
26
-
1.`m` or `t`: TxnKV key. Used by TiDB.
27
-
2.`k{keyspace prefix id}x`: TxnKV key.
28
-
3.`k{keyspace prefix id}r`: RawKV key.
27
+
`x`, `r` are mode prefixes that indicates which mode the key is belonging to. After mode prefix is 3 bytes for keyspace. So in API V2, RawKV & TxnKV keys will be encoded as `MCE(mode-prefix + keyspace + user-key) + timestamp`. `MCE` is abbreviation of [Memory Comparable Encoding], which is necessary to keep the encoded keys having the same order with user keys.
29
28
30
-
The `{keyspace prefix id}` is the [keyspace](https://github.com/tikv/rfcs/pull/39) prefix for seperating keys of different keyspace. It should be an vary-length integer whose highest bit of every byte denotes whether the next byte is still part of the integer. The client will fetch the prefix from PD by the keyspace name specified by the user when initializing the client, so it means that the keyspace prefix is valid during the session, in other words, the change on keyspace on PD will not take effect on running seesions.
29
+
`Keyspace` is fixed-length of 3 bytes in network byte order, and will be introduced in another RFC.
31
30
32
-
Note that in TxnKV, the key will be encoded by `Memory Comparable Encoding`. But since the `Memory Comparable Encoding` will not change the starting bytes but only add paddings, there won't be any overlap between RawKV and TxnKV.
31
+
`Timestamp` in RawKV entries are necessary to implement [Change Data Capture] feature, which will indicate what and when data is changed.
33
32
34
-
####RawKV Value Encode
33
+
##### Timestamp requirement
35
34
36
-
If the key has RawKV prefix, which is `k{keyspace id prefix}r`, then the value can be either:
35
+
Among requests of a single key, the timestamp must be monotonic with the sequence of data flushed to disk in TiKV.
37
36
38
-
1.`{0x0}{data}`
39
-
2.`{0x1}{TTL expire timestamp}{data}`
37
+
In general, if request `A`[Happened Before]`B`, then `Timestamp(A)` < `Timestamp(B)`. As to RawKV, we provide [Causal Consistency] by keeping order of the timestamp the same as sequence of data flush to disk.
40
38
41
-
### Keyspace Management
39
+
At the same time, as RawKV doesn't provide cross-rows transaction and snapshot isolation, we allow concurrent updates to different keys, which means that the timestamp order of two different keys would not be consistent with data flush, to improve efficiency.
42
40
43
-
Add a new http interface to PD for adding, renaming, deleteing and querying the mapping from keyspace name to prefix id:
41
+
##### Timestamp Generation
44
42
45
-
```javascript
46
-
// list all keyspaces
47
-
GET/keyspaces
48
-
[
49
-
{
50
-
name:"default",
51
-
id:"0",
52
-
properties: {
53
-
"description":"this is default keyspace",
54
-
"default-config": {
55
-
"raw-client": {
56
-
"ttl-secs":30000000
57
-
},
58
-
"txn-client: {
59
-
"enable-async-commit": true
60
-
}
61
-
}
62
-
}
63
-
},
64
-
{
65
-
name: "redis",
66
-
id: "1"
67
-
}
68
-
]
43
+
Timestamp is generated by PD (i.e. TSO), the same as TiDB & TxnKV. But differently, TSO is acquired by TiKV internally, to get a better overall performance and client compatibility.
69
44
70
-
// add new keyspace
71
-
POST /keyspaces
72
-
{
73
-
name: "foo",
74
-
}
45
+
To reduce latency and improve availability, TiKV will prefetch and cache a number of TSO locally. User can specify how long the TSO cache is required to tolerate fault of PD, then TiKV will calculate the size of batch according to recent QPS.
75
46
76
-
// delete a keyspace
77
-
DELETE /keyspaces/{keyspace_name}
78
-
{}
47
+
Note that TSO cache brings another issue. If subsequent writes of a single key happen in another store of TiKV cluster (caused by leader transfer), the TSO cache of another store must be renewed to ensure that it's larger than the original store. TiKV observes events of `leader transfer` and then flushes the cache. Another event should be observed is `region merge`, as the leader of merged region would be on another store as to the region being merged from.
79
48
80
-
// recover the latest deleted keyspace
81
-
POST /keyspaces?action=flashback
82
-
{
83
-
new_name: "bar",
84
-
}
85
-
```
49
+
In current implementation, the flush of TSO cache is asynchronous to avoid blocking leader transfer and region merge. Clients will get an `MaxTimestampNotSynced` error until the flush is done.
86
50
87
-
The keyspaces are stored in etcd and has the no limitation on the id number.
51
+
*The alternative of timestamp generation is `HLC` ([Hybrid Logical Clock]). The pros of `HLC` is being independent to availability of PD, but the cons is that it depends on local clock and [NTP], and it's also not easy to make it right (refer to [strict monotonicity](https://github.com/cockroachdb/cockroach/blob/13c5a25238ce75cfb7ff151d620e82aa44c72e27/pkg/util/hlc/doc.go#L150) in CockroachDB). All things considered, as PD is designed to be highly available, and fault of PD will affect not only TSO but also other critical components (e.g, region metadata), we prefer to utilize TSO as timestamp.*
88
52
89
-
1. Adding keyspace: newly added keyspaces can only be viable to new clients.
90
-
2. Deleting keyspace:
91
-
2.1. Deleting keyspace only marks the metadata to inviable in PD, the data in the keyspace and metadata in PD will not be deleted automatically. Garbage collecting the data in deleted keyspaces may be introduced in the future since it increases the complexity of this RFC. To ensure no data is left, the user should clean up the keyspace before deleting the keyspace at present.
92
-
2.2. Every client syncs the keyspace information with the PD leader every 5 minutes. When a keyspace is deleted, the client should be aware of the deletion in 5 minutes. The keyspace can not be accessed by any living client after 5 minutes.
93
-
3. Flashbacking keyspace:
94
-
3.1. Flashbacking keyspace only affects the metadata, turning the keyspace name and id mapping to viable by clients.
95
-
3.2. If there are multiple deleted keyspaces with the same name, only the last deleted keyspace is flashbacked. Users can flashback all these deleted keyspaces by calling the flashback API multiple times with different new keyspace names.
53
+
#### RawKV Value Encoding
96
54
97
-
#### pd-ctl
55
+
The value of RawKV `API V2` by now can be either:
98
56
99
-
To enable API V2, which also enables the keyspace API:
57
+
1.`{data}{0x0}`, for values without TTL
58
+
2.`{data}{TTL expire timestamp}{0x1}`, for values with TTL
59
+
3.`{0x1}`, for values deleted
100
60
101
-
```bash
102
-
>> config set api-version 2
103
-
```
61
+
The last byte of value is used as meta flags. Bit `0` of flag is for TTL, if it's set, the `8` bytes just before meta flags is the TTL expire timestamp. Bit `1` is for deleted mark, if it's set, the entries is logical deleted (used for [Change Data Capture]).
104
62
105
-
To manage keyspace:
63
+
Extra fields in future can utilize other bits of meta flags, and will be inserted between user value & meta flags in reverse order. The most significant bit of meta flags is supposed to be used for extended meta flags if there are more than 7 fields.
106
64
107
-
```bash
108
-
>> config keyspaces show
109
-
>> config keyspaces create <keyspace name>
110
-
>> config keyspaces delete <keyspace name>
65
+
`{user value}{field of bit n}...{extended meta flags}...{field of bit 2}{field of bit 0 (TTL)}{meta flags}`
1. Upgrade TiKV, TiDB, and PD to the version that supports `API V2`.
72
+
2. Ensure that all the keys in TiKV are written by TiDB, which are prefixed with `m` or `t`. Any other data should be migrated out, or else the step 3 will fail.
73
+
3. Enable `API V2` in TiKV config file and restart TiKV (user should also take the responsibility to enable `API V2` for all TiKV clients excluding TiDB).
120
74
121
-
```json
122
-
{
123
-
name: string,
124
-
id: int64,
125
-
created_at: timestamp,
126
-
deleted_at: timestamp, // if set, the keyspace is not visiable to users
127
-
flashbacked_at: timstamp
128
-
properties: object
129
-
}
130
-
```
75
+
#### Downgrade from `API V2` to `API V1`
131
76
132
-
### How to safely enable API V2
77
+
1. Ensure that all the keys in TiKV are written by TiDB, which are prefixed with `m` or `t`. Any other data should be migrated out.
78
+
2. Disable `API V2` in TiKV config file and restart TiKV (user should also take the responsibility to enable `API V1` for all TiKV clients excluding TiDB).
133
79
134
-
#### Upgrade
80
+
#### Data migration
135
81
136
-
Upgrade from `API V1`to `API V2` is a simple process:
82
+
A backup and restore tool would be provided to export data from TiKV cluster of `API V1`, and convert to `API V2`encoding. Then import the backup data into another TiKV cluster of `API V2`.
137
83
138
-
1. Update TiKV, TiDB, and PD to the version that supports `API V2`.
139
-
2. Ensure that all the keys in TiKV are written by TiDB, which are prefixed with `m` or `t`. Delete if any. Or else the step 4 will fail.
140
-
3. Use `pd-ctl` to enable `API V2`.
141
-
4. Enable `API V2` in TiKV config file and restart TiKV (User should take the responsibility to offline all tikv clients excluding TiDB. Or set by online config change API (Not proposed in this RFC, but is good to have).
84
+
## Implementation Details
142
85
143
-
#### Downgrade
86
+
###kvproto
144
87
145
-
Downgrade from `API V2` to `API V1` is also simple:
88
+
```proto
89
+
// kvrpcpb.proto
146
90
147
-
1. Ensure that all the keys in TiKV are written by TiDB, which are prefixed with `m` or `t`. Delete if any.
148
-
2. Use `pd-ctl` to disable `API V2`.
149
-
3. Disable `API V2` in TiKV config file and restart TiKV (User should take the responsibility to offline all tikv clients excluding TiDB). Or set by online config change API (Not proposed in this RFC, but is good to have).
91
+
message Context {
92
+
// ... omited other fields
150
93
151
-
#### Data migration
94
+
// API version implies the encode of the key and value.
95
+
APIVersion api_version = 21;
96
+
}
152
97
153
-
It's reasonable to provide a way to import and export non-TiDB data in TiKV during the upgrade or downgrade. On TiKV before 4.0, the only way to do that is `scan` and `batch_put` on the client. After 4.0, TiKV start to support importing SST file into TxnKV, and after 5.1, importing on RawKV is also supported. You can find more information in [`RFC: Online Bulk Load for RawKV`](https://github.com/tikv/rfcs/pull/72).
98
+
// The API version the server and the client is using.
99
+
// See more details in https://github.com/tikv/rfcs/blob/master/text/0069-api-v2.md.
100
+
enum APIVersion {
101
+
V1 = 0;
102
+
V1TTL = 1;
103
+
V2 = 2;
104
+
}
105
+
```
154
106
155
-
### Implementation Details
107
+
```proto
108
+
// raft_serverpb.proto
156
109
157
-
#### PD
110
+
message StoreIdent {
111
+
// ... omited other fields
112
+
113
+
kvrpcpb.APIVersion api_version = 3;
114
+
}
115
+
```
158
116
159
-
Add the new APIs described [above](#Keyspace-Management).
117
+
```proto
118
+
// brpb.proto
160
119
161
-
#### TiKV Server
120
+
message BackupMeta {
121
+
// ... omited other fields
162
122
163
-
In TiKV config file, add a new configuration `storage.api_version`. When enabled, `storage.enable_ttl` must also be enabled.
123
+
kvrpcpb.APIVersion api_version = 18;
124
+
}
164
125
165
-
In kvdb, add a store meta `api_version`. When the store meta mismatches the config `storage.enable_ttl`, it means that the user is switching the API version, then check no non-TiDB exist, and then save the new api version in store meta.
126
+
message BackupResponse {
127
+
// ... omited other fields
128
+
129
+
kvrpcpb.APIVersion api_version = 5;
130
+
}
131
+
```
166
132
167
-
In kvproto message `SSTMeta`, add `api_version`. Reject the SST file if the version is mismatched.
133
+
### TiKV Server
168
134
169
-
In TiKV gRPC's context, add a field `api_version`.
135
+
In TiKV config file, add a new configuration `storage.api-version`.
170
136
171
-
If `storage.api_version=2`:
137
+
If the API version in `StoreIdent` mismatches with `storage.api-version` in the config, it means that the user is switching the API version, therefore TiKV will check if there is any non-TiDB data in storage, and eventually save the new API version in `StoreIdent`.
172
138
173
-
- Run TTL compaction filter only on the keys that start with RawKV prefix.
139
+
If `storage.api-version=2`:
174
140
175
-
- Use the `API V2` Value encode in `RawStore`, `TTLStore` and `sst_importer`.
141
+
- Use the new value encoding in `RawStore` and `sst_importer`.
176
142
177
-
- If the request's context has `api_version=1`:
143
+
- Only allow RawKV to access `default` CF.
144
+
145
+
- If the request's context has `api-version=1`:
178
146
- Reject the request unless it's a TxnKV request and the keys starting with `m` or `t`.
179
147
180
-
- If the request's context has `api_version=2`:
148
+
- If the request's context has `api-version=2`:
181
149
- Only allow the key that has RawKV prefix for RawKV requests.
182
150
- Only allow the key that has TxnKV prefix for TxnKV requests.
183
151
184
-
If `storage.api_version=1`:
152
+
If `storage.api-version=1` & `storage.enable-ttl=true`:
153
+
154
+
- Reject all requests with `api-version=2` in the context.
155
+
- Reject all transactional requests otherwise the raw TTL encoding in V1TTL will corrupt transaction data.
156
+
157
+
If `storage.api-version=1` & `storage.enable-ttl=false`:
185
158
186
-
- Reject all requests with `api_version=2` in the context.
159
+
- Reject all requests with `api-version=2` in the context.
187
160
188
-
#### TiKV Client
161
+
### TiKV Client
189
162
190
163
Provide two modes for users:
191
164
192
165
- V2:
193
-
- Fetch keyspace prefix by keyspace name from PD and then prepend `k{keyspace prefix}x` on TxnKV keys or prepend `k{keyspace prefix}r` on RawKV keys.
194
-
- Set `api_version=2` in TiKV gRPC's `Context`.
195
-
- Disallow specify CF in `RawClient`.
196
-
- Allow user to specify a keyspace for a session of `RawClient` or `TxnCient`. Default keyspace is named `default`.
197
-
- Fetch keyspace information from PD every 5 mins. Destory the client session if the keyspace it's using is deleted.
166
+
- Prepend `x{keyspace}` before TxnKV keys or prepend `r{keyspace}` before RawKV keys.
167
+
-`Keyspace` is optional and defaults to `0` for backward compatible
0 commit comments