-
Notifications
You must be signed in to change notification settings - Fork 224
docs: Document Query Throttler feature #2046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
284 changes: 284 additions & 0 deletions
284
content/en/docs/24.0/reference/features/query-throttler.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,284 @@ | ||
| --- | ||
| title: Query Throttler | ||
| weight: 22 | ||
| --- | ||
|
|
||
| VTTablet runs a query throttler that protects tablets from being overloaded by incoming queries. Unlike the [Tablet Throttler](../tablet-throttler/), which manages outgoing operations like VReplication and OnlineDDL, the query throttler manages incoming user queries to prevent database overload. | ||
|
|
||
| ## Why throttle incoming queries? | ||
|
|
||
| When tablets experience high load from incoming queries, they can become overloaded. This can cause: | ||
|
|
||
| - **Increased query latency**: High query volume increases query execution times as the database struggles to process all requests. | ||
| - **Resource exhaustion**: Too many concurrent queries can consume all available connections, memory, or CPU resources. | ||
| - **Cascading failures**: An overloaded tablet can affect replica lag, which impacts the entire shard and can lead to system-wide issues. | ||
| - **Degraded user experience**: When tablets are overwhelmed, all users suffer from poor performance instead of just lower-priority workloads. | ||
|
|
||
| The query throttler monitors tablet health metrics and selectively rejects queries when the tablet is under stress. This keeps critical queries running with acceptable performance while temporarily rejecting lower-priority traffic. | ||
|
|
||
| ## How it works | ||
|
|
||
| The query throttler evaluates each incoming query before execution. When enabled, it: | ||
|
|
||
| 1. Checks the query's priority (if specified via the `PRIORITY` comment directive) | ||
| 2. Determines which throttling strategy to apply based on configuration | ||
| 3. Evaluates current system metrics (replication lag, load average, running threads, etc.) | ||
| 4. Makes a decision to allow or reject the query based on configured thresholds | ||
| 5. Returns a `RESOURCE_EXHAUSTED` error if the query should be throttled | ||
|
|
||
| The throttler adds minimal overhead in healthy conditions (typically less than 5% latency increase) through fast-path optimization and aggressive caching. | ||
|
|
||
| ## Architecture | ||
|
|
||
| The query throttler uses a pluggable strategy architecture. | ||
|
|
||
| ### Strategies | ||
|
|
||
| The throttler supports different throttling strategies, which can be selected via configuration: | ||
|
|
||
| - **NoOp**: The default strategy. Does not throttle any queries. This is a safe fallback that ensures queries are never blocked if configuration is missing or invalid. | ||
| - **TabletThrottler**: A production-ready strategy that uses the existing tablet throttler's metrics to make throttling decisions. This strategy can be configured with detailed rules for different tablet types and SQL statement types. | ||
| - **Custom strategies**: The architecture supports custom throttling strategies through a registry system, allowing you to implement your own logic. | ||
|
|
||
| ### Configuration | ||
|
|
||
| The throttler loads configuration from a JSON file at `/config/throttler-config.json` by default. Configuration is refreshed periodically (default: every 1 minute) without requiring tablet restarts. | ||
|
|
||
| ### Basic configuration | ||
|
|
||
| The throttler uses JSON configuration: | ||
|
|
||
| ```json | ||
| { | ||
| "enabled": true, | ||
| "strategy": "NoOp" | ||
| } | ||
| ``` | ||
|
|
||
| ### TabletThrottler strategy | ||
|
|
||
| When using the `TabletThrottler` strategy, you can define rules for different tablet types and SQL statement types: | ||
|
|
||
| ```json | ||
| { | ||
| "enabled": true, | ||
| "strategy": "TabletThrottler", | ||
| "tablet_strategy_config": { | ||
| "tablet_rules": { | ||
| "PRIMARY": { | ||
| "INSERT": { | ||
| "lag": { | ||
| "thresholds": [ | ||
| {"above": 10.0, "throttle": 25} | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| This configuration: | ||
|
|
||
| - Enables the throttler | ||
| - Uses the `TabletThrottler` strategy | ||
| - Applies a rule to `PRIMARY` tablets for `INSERT` statements | ||
| - When replication lag exceeds 10 seconds, throttles 25% of `INSERT` queries | ||
|
|
||
| ### Advanced configuration | ||
|
|
||
| You can define multiple thresholds for graduated throttling along with monitoring multiple metrics: | ||
|
|
||
| ```json | ||
| { | ||
| "enabled": true, | ||
| "strategy": "TabletThrottler", | ||
| "tablet_strategy_config": { | ||
| "tablet_rules": { | ||
| "PRIMARY": { | ||
| "INSERT": { | ||
| "lag": { | ||
| "thresholds": [ | ||
| {"above": 5.0, "throttle": 10}, | ||
| {"above": 15.0, "throttle": 25}, | ||
| {"above": 30.0, "throttle": 50} | ||
| ] | ||
| }, | ||
| "threads_running": { | ||
| "thresholds": [ | ||
| {"above": 50, "throttle": 15}, | ||
| {"above": 100, "throttle": 35} | ||
| ] | ||
| } | ||
| }, | ||
| "UPDATE": { | ||
| "lag": { | ||
| "thresholds": [ | ||
| {"above": 10.0, "throttle": 20} | ||
| ] | ||
| } | ||
| } | ||
| }, | ||
| "REPLICA": { | ||
| "SELECT": { | ||
| "lag": { | ||
| "thresholds": [ | ||
| {"above": 60.0, "throttle": 20} | ||
| ] | ||
| }, | ||
| "loadavg": { | ||
| "thresholds": [ | ||
| {"above": 4.0, "throttle": 25}, | ||
| {"above": 8.0, "throttle": 50} | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| This configuration: | ||
|
|
||
| - Sets different rules for `PRIMARY` and `REPLICA` tablets | ||
| - Uses graduated thresholds (higher metric values trigger more aggressive throttling) | ||
| - Monitors multiple metrics simultaneously (`lag`, `threads_running`, `loadavg`) | ||
| - Applies different rules for different SQL statement types (`INSERT`, `UPDATE`, `SELECT`) | ||
|
|
||
| ### Supported metrics | ||
|
|
||
| The `TabletThrottler` strategy can monitor the same metrics as the [Tablet Throttler](../tablet-throttler/): | ||
|
|
||
| - `lag`: Replication lag in seconds | ||
| - `threads_running`: MySQL's `Threads_running` status value | ||
| - `loadavg`: Load average per core on the tablet server | ||
| - `mysqld-loadavg`: Load average per core on the MySQL server | ||
| - `custom`: Custom query results | ||
| - `mysqld-datadir-used-ratio`: Disk space usage (0.0 to 1.0) | ||
| - `history_list_length`: InnoDB's history list length | ||
|
|
||
| ## Priority-based throttling | ||
|
|
||
| The query throttler supports priority-based query execution using the `PRIORITY` comment directive. This ensures critical queries are never throttled while allowing lower-priority queries to be throttled more aggressively. | ||
|
|
||
| ### How priority works | ||
|
|
||
| Priority is specified as a value from 0 to 100, with 0 being the highest priority and 100 the lowest. The value determines whether or not the query is *potentially* throttled based on the current configuration and system state: | ||
|
|
||
| - **Priority 0**: Never throttled. Reserved for the most critical queries. | ||
| - **Priority 1-99**: Probabilistically throttled based on the priority value. Higher numbers mean it's more likely to be throttled. | ||
| - **Priority 100**: Always evaluated for potential throttling. | ||
|
|
||
| If no priority is specified, queries default to priority 100. | ||
|
|
||
| ### Using priority in queries | ||
|
|
||
| Specify priority using the `PRIORITY` comment directive: | ||
|
|
||
| ```sql | ||
| SELECT /*vt+ PRIORITY=0 */ * FROM critical_table; | ||
| SELECT /*vt+ PRIORITY=50 */ * FROM normal_table; | ||
| SELECT /*vt+ PRIORITY=100 */ * FROM batch_table; | ||
| ``` | ||
|
|
||
| ### Priority evaluation | ||
|
|
||
| The throttler uses probabilistic priority checking: | ||
|
|
||
| 1. Generate a random number between 0 and 99 | ||
| 2. If the random number is less than the query's priority, evaluate throttling rules | ||
| 3. If the random number is greater than or equal to the priority, allow the query without checking metrics | ||
|
|
||
| This means: | ||
|
|
||
| - Priority 0 queries always skip throttling (random 0-99 is never < 0) | ||
| - Priority 50 queries are checked 50% of the time | ||
| - Priority 100 queries are always checked | ||
|
|
||
| ## Workload classification | ||
|
|
||
| The query throttler can track metrics by workload using the `WORKLOAD_NAME` comment directive. This lets you monitor which workloads are being throttled most frequently. | ||
|
|
||
| ```sql | ||
| SELECT /*vt+ WORKLOAD_NAME=analytics */ * FROM large_table; | ||
| ``` | ||
|
|
||
| When combined with the `--enable-per-workload-table-metrics` flag on vttablet, you can track throttling behavior per workload in the `QueryThrottlerRequests` and `QueryThrottlerThrottled` metrics. | ||
|
|
||
| ## Monitoring | ||
|
|
||
| The query throttler emits several metrics: | ||
|
|
||
| - `QueryThrottlerRequests`: Total number of queries evaluated by the throttler | ||
| - `QueryThrottlerThrottled`: Number of queries that were throttled | ||
| - `QueryThrottlerTotalLatencyNs`: Total latency added by throttler evaluation | ||
| - `QueryThrottlerEvaluateLatencyNs`: Latency of the throttling decision evaluation | ||
|
|
||
| These metrics include labels for: | ||
|
|
||
| - `strategy`: The throttling strategy used (`NoOp`, `TabletThrottler`) | ||
| - `workload`: The workload name (if specified via `WORKLOAD_NAME` directive) | ||
| - `priority`: The query priority (if specified via `PRIORITY` directive) | ||
|
|
||
| See [Query Serving Metrics](../../query-serving/metrics/) for details. | ||
|
|
||
| ## Error messages | ||
|
|
||
| When a query is throttled, the query throttler returns a `RESOURCE_EXHAUSTED` error with details about why the query was rejected: | ||
|
|
||
| ``` | ||
| vttablet: rpc error: code = ResourceExhausted desc = [VTTabletThrottler] Query throttled: metric=lag value=15.23 breached threshold=10.00 throttle=25% | ||
| ``` | ||
|
|
||
| The error message includes: | ||
|
|
||
| - The metric that triggered throttling | ||
| - The current metric value | ||
| - The configured threshold | ||
| - The throttle percentage applied | ||
|
|
||
| ## Differences from tablet throttler | ||
|
|
||
| The query throttler differs from the [Tablet Throttler](../tablet-throttler/): | ||
|
|
||
| | Feature | Tablet Throttler | Query Throttler | | ||
| |---------|------------------|-----------------| | ||
| | **Purpose** | Throttles outgoing operations (VReplication, OnlineDDL) | Throttles incoming user queries | | ||
| | **What it protects** | Prevents background jobs from overloading the database | Prevents user queries from overloading the database | | ||
| | **Default behavior** | Enabled by default | Disabled by default (`NoOp` strategy) | | ||
| | **Strategies** | Single strategy | Pluggable strategies (`NoOp`, `TabletThrottler`, or custom) | | ||
|
|
||
| Both throttlers can monitor the same set of metrics and can coexist in the same cluster. | ||
|
|
||
| ## Performance considerations | ||
|
|
||
| The query throttler adds minimal overhead: | ||
|
|
||
| - **Healthy systems**: Less than 5% latency increase due to fast-path optimization | ||
| - **Cache hit rate**: Greater than 95% in normal operations, reducing the need for metric collection | ||
| - **Under load**: Graduated throttling (10-50% throttle rates) prevents complete overload while allowing some queries through | ||
| - **Priority 0 queries**: Zero throttling overhead, allowing critical queries to bypass all checks | ||
|
|
||
| ## Best practices | ||
|
|
||
| 1. **Start with NoOp**: Begin with the `NoOp` strategy to ensure the throttler is working without impacting traffic | ||
| 2. **Use priority carefully**: Reserve priority 0 for truly critical queries only | ||
| 3. **Set graduated thresholds**: Use multiple threshold levels to gradually increase throttling as metrics worsen | ||
| 4. **Monitor metrics**: Watch the `QueryThrottlerRequests` and `QueryThrottlerThrottled` metrics to understand throttling behavior | ||
| 5. **Test in development**: Test throttling configurations in non-production environments first | ||
| 6. **Combine with workload names**: Use `WORKLOAD_NAME` to track which workloads are being throttled most | ||
| 7. **Adjust thresholds**: Start with conservative thresholds and adjust based on observed behavior | ||
|
|
||
| ## Flags | ||
|
|
||
| The query throttler behavior can be configured with `vttablet` flags: | ||
|
|
||
| - `--query-throttler-config-refresh-interval`: How frequently to refresh configuration (default: 1 minute) | ||
|
|
||
| ## See also | ||
|
|
||
| - [Tablet Throttler](../tablet-throttler/): For throttling outgoing operations | ||
| - [Comment Directives](../../../user-guides/configuration-advanced/comment-directives/): For using `PRIORITY` and `WORKLOAD_NAME` | ||
| - [Query Serving Metrics](../../query-serving/metrics/): For monitoring throttler behavior | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will go away with vitessio/vitess#18979
I guess we'll run promptless again post that PR is merged :D