-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for QoS: Quality of Service #58
base: master
Are you sure you want to change the base?
Conversation
gregwebs
commented
Sep 24, 2020
•
edited
Loading
edited
|
||
#### QoS Policy stored in PD | ||
|
||
A QoS policy is set by an administrator in PD. It is a combination of a region group and a QoS value. The main region group is a key space. Smaller regions within a key space may be specified such as a table and this QoS setting will take precedence over that of the key space. These groups are dynamic (new regions can be added) and translated to regions by PD which has knowledge of tenant and table groupings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably when a region splits, it inherits the QoS parameters from its parents. What happens when two regions with different QoS are merged?
Does PD have knowledge of how tables/tenants are represented within a key space? My assumption is that only TiDB knows this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little fuzzy on this detail, but I know we can now prevent tables from sharing regions, so it should be possible for PD to know this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps QoS police should bind to a range instead of some regions, just like placement rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps QoS police should bind to a range instead of some regions, just like placement rules.
This proposal doesn't closely specify how regions will be grouped. Grouping by key range could be a great solution. The problem with this approach is that a region could span multiple key ranges. We could try to use key range as the underlying primitive but also try to have APIs that talk in terms of key spaces. We can also reject key ranges that already don't fully enclose their existing regions. It is noted in placement rules that the key range of a table can change due to DDL commands. So I am thinking that for the first version of QoS PD can understand key spaces but won't understand tables and may need to accept key ranges.
|
||
Ti Components are loosely coupled: | ||
* PD stores policies and communicates them to TiKV | ||
* TiKV performs query admission, providing localized back pressure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably TiFlash would work in the same way as TiKV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought about TiFlash. Managing QoS for the OLTP workload path is critical. For OLAP it is less important. TiFlash is also gaining some direct write support, but I have no idea how that works. As the MPP support for TiFlash improves it will be easier to handle TiFlash load by scaling out. Additionally, applications that benefit from TiFlash would generally be big enough to have their own TiDB cluster. This proposal will benefit smaller applications the most that must use a shared TiDB cluster until they grow larger.
|
||
The amount of inhibition required depends on the number of requests and amount of resources being requested. Effectively when resources are highly utilized we build up a queue of pending requests with a limited size where the overflow is rejected. | ||
|
||
Policy application is allowed to take into account resources that will be used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you talk about prioritisation of queries but in the above section it sounds like TiKV just has a run/reject binary for queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was not written clearly. To decide what queries to admit we need to apply QoS policies. But admission can also take into account the resources being used. Once admitted I think we will just do policy (region-based) prioritization. This should be fleshed out more.