title | summary |
---|---|
Wrong Index Solution |
Learn how to solve the wrong index issue. |
If you find that the execution speed of some query does not reach the expectation, the optimizer might choose the wrong index to run the query.
There are multiple reasons why the optimizer might choose an unexpected index:
- Outdated statistics: the optimizer relies on statistics to estimate query costs. If the statistics are outdated, the optimizer might make suboptimal choices.
- Statistics mismatch: even if statistics are current, they might not accurately reflect the data distribution, leading to incorrect cost estimations.
- Incorrect cost calculation: the optimizer might miscalculate the cost of using an index due to complex query structures or data distribution.
- Inappropriate engine selection: in some cases, the optimizer might select a storage engine that is not optimal for the query.
- Function pushdown limitations: certain functions or operations might not be pushed down to storage engines, potentially affecting query performance.
You can first view the health state of tables in the statistics, and then solve this issue according to the different health states.
The low health state means TiDB has not performed theANALYZE
statement for a long time. You can update the statistics by running the ANALYZE
command. After the update, if the optimizer still uses the wrong index, refer to the next section.
The near 100% health state suggests that the ANALYZE
statement is just completed or was completed a short time ago. In this case, the wrong index issue might be related to TiDB's estimation logic for the number of rows.
For equivalence queries, the cause might be Count-Min Sketch. You can check whether Count-Min Sketch is the cause and take corresponding solutions.
If the cause above does not apply to your problem, you can force-select indexes by using the USE_INDEX
or use index
optimzer hint (see USE_INDEX for details). Also, you can change the query behavior by using SQL Plan Management in a non-intrusive way.
Apart from the aforementioned situations, the wrong index issue might also be caused by data updates which renders all the indexes no longer applicable. In such cases, you need to perform analysis on the conditions and data distribution to see whether new indexes can speed up the query. If so, you can add new indexes by running the ADD INDEX
command.
When data distribution is highly skewed, the statistics might not accurately reflect the actual data. In such cases, try configuring the options of the ANALYZE TABLE
statement. It might help improve the accuracy of statistics and better match the index.
For example, suppose you have an orders
table with an index on the customer_id
column, and more than 50% of the orders share the same customer_id
. In this case, the statistics might not represent the data distribution well, affecting query performance.
To view detailed information on execution costs, you can execute the EXPLAIN
and EXPLAIN ANALYZE
statements with the FORMAT=verbose
option. According to the information, you can see cost differences between different execution paths.
By default, TiDB chooses TiKV or TiFlash for table access based on cost estimation. You can experiment with different engines for the same query by applying engine isolation.
For more information, see Engine isolation.
To enhance query performance, TiDB can push down certain functions to the TiKV or TiFlash storage engine for execution. However, some functions do not support pushdown, which might limit available execution plans and potentially affect query performance.
For expressions that support pushdown, see TiKV supported pushdown calculations and TiFlash supported pushdown calculations.
Note that you can also disable the pushdown of specific expressions. For more information, see Blocklist of optimization rules and expression pushdown.