-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PD: supports multiple level meta data space #87
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: zhangjinpeng1987 <[email protected]>
|
||
1. Multiple TiKV Cluster share the same PD cluster. Because the minimal demplyment of a TiKV Cluster is 3 TiKV 3 PD, | ||
but it is not cost-effect if every small cluster has 3 dedicated meta data node. | ||
2. There are Multiple tenant in the same TiKV Cluster, each tenant has it own meta data, each tenant's key range can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the keyspace in API v2 match this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, v2 API can not satisfy multiple TiDB tenants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there are multiple TiDB tenants, each TiDB should has its own ddl-owner
, gc-safepoint
and other meta data, these meta data should be stored in PD separately. This RFC is more about how PD store multiple user's meta data.
## Alternatives | ||
|
||
In the multi-tenant scenario, tenant can add a {tenant-id} prefix for each data key, but tenant-id | ||
is a meta data esstionally, each data key has a tenant-id prefix may cost more disk space & memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any perf stats to show the cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The insert QPS of having prefix has 4% regression compare with no prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bigger key size will consume more raftlog or wal and more CPU when comparing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prefix is used for testing? Note a two byte prefix can support 32768 tenants already.
1. Multiple TiKV Cluster share the same PD cluster. Because the minimal demplyment of a TiKV Cluster is 3 TiKV 3 PD, | ||
but it is not cost-effect if every small cluster has 3 dedicated meta data node. | ||
2. There are Multiple tenant in the same TiKV Cluster, each tenant has it own meta data, each tenant's key range can | ||
contains any key in the range of [min-key, max-key]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it practical, every APIs need to be accept a user
prefix. And each users' data can't be stored in the same rocksdb obviously. This also requires PD have knowledge about the underlying storage engine and avoid scheduling replicas from different users to the same storage engine. And TiKV needs to split all memory meta to different users. For example, the index of range becomes HashMap<UserKey, BTreeMap<Vec<u8>, u64>>
.
In my opinion, using prefix is more straightforward and simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And each users' data can't be stored in the same rocksdb obviously
This is what I expected. After TiKV implemented the Multiple-RocksDB feature, data from different tenant should stored in the different RocksDB instance. Tenant is the meta data, include the meta data in very row of data is redundant, we can store the tenant id to the RocksDB instance's directory name, like u0001_rangeid
. Even more, the table id essentially also is meta data, it can can be stored in the directory name like u0001_rangeid_tableid
, so the data key in RocksDB row_id
. In this way, we can satisfy the compatibility requirement with old cluster's data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using prefix can also achieve the same improvement. The difference of using prefix and using a separate explicit meta is that PD/TiKV/TiDB needs to take good care about meta in the later case.
Signed-off-by: zhangjinpeng1987 <[email protected]>
Another using scenario: multiple tidb cluster share the same pd cluster to reduce the overhead of PD in the TiDB Cloud. |
Signed-off-by: zhangjinpeng1987 [email protected]
PD supports multiple level meta data space.