Skip to content

Latest commit

 

History

History
260 lines (194 loc) · 11.8 KB

dev-guide-optimize-sql.md

File metadata and controls

260 lines (194 loc) · 11.8 KB
title summary
SQL Performance Tuning
Introduces TiDB's SQL performance tuning scheme and analysis approach.

SQL Performance Tuning

This document introduces some common reasons for slow SQL statements and techniques for tuning SQL performance.

Before you begin

You can use tiup demo import to prepare data:

tiup demo bookshop prepare --host 127.0.0.1 --port 4000 --books 1000000

Or using the Import feature of TiDB Cloud to import the pre-prepared sample data.

Issue: Full table scan

The most common reason for slow SQL queries is that the SELECT statements perform full table scan or use incorrect indexes.

When TiDB retrieves a small number of rows from a large table based on a column that is not the primary key or in the secondary index, the performance is usually poor:

SELECT * FROM books WHERE title = 'Marian Yost';
+------------+-------------+-----------------------+---------------------+-------+--------+
| id         | title       | type                  | published_at        | stock | price  |
+------------+-------------+-----------------------+---------------------+-------+--------+
| 65670536   | Marian Yost | Arts                  | 1950-04-09 06:28:58 | 542   | 435.01 |
| 1164070689 | Marian Yost | Education & Reference | 1916-05-27 12:15:35 | 216   | 328.18 |
| 1414277591 | Marian Yost | Arts                  | 1932-06-15 09:18:14 | 303   | 496.52 |
| 2305318593 | Marian Yost | Arts                  | 2000-08-15 19:40:58 | 398   | 402.90 |
| 2638226326 | Marian Yost | Sports                | 1952-04-02 12:40:37 | 191   | 174.64 |
+------------+-------------+-----------------------+---------------------+-------+--------+
5 rows in set
Time: 0.582s

To understand why this query is slow, you can use EXPLAIN to see the execution plan:

EXPLAIN SELECT * FROM books WHERE title = 'Marian Yost';
+---------------------+------------+-----------+---------------+-----------------------------------------+
| id                  | estRows    | task      | access object | operator info                           |
+---------------------+------------+-----------+---------------+-----------------------------------------+
| TableReader_7       | 1.27       | root      |               | data:Selection_6                        |
| └─Selection_6       | 1.27       | cop[tikv] |               | eq(bookshop.books.title, "Marian Yost") |
|   └─TableFullScan_5 | 1000000.00 | cop[tikv] | table:books   | keep order:false                        |
+---------------------+------------+-----------+---------------+-----------------------------------------+

As can be seen from TableFullScan_5 in the execution plan, TiDB performs a full table scan on the books table and checks whether title satisfies the condition for each row. The estRows value of TableFullScan_5 is 1000000.00, which means the optimizer estimates that this full table scan takes 1000000.00 rows of data.

For more information about the usage of EXPLAIN, see EXPLAIN Walkthrough.

Solution: Use secondary index

To speed up this query above, add a secondary index on the books.title column:

CREATE INDEX title_idx ON books (title);

The query execution is much faster:

SELECT * FROM books WHERE title = 'Marian Yost';
+------------+-------------+-----------------------+---------------------+-------+--------+
| id         | title       | type                  | published_at        | stock | price  |
+------------+-------------+-----------------------+---------------------+-------+--------+
| 1164070689 | Marian Yost | Education & Reference | 1916-05-27 12:15:35 | 216   | 328.18 |
| 1414277591 | Marian Yost | Arts                  | 1932-06-15 09:18:14 | 303   | 496.52 |
| 2305318593 | Marian Yost | Arts                  | 2000-08-15 19:40:58 | 398   | 402.90 |
| 2638226326 | Marian Yost | Sports                | 1952-04-02 12:40:37 | 191   | 174.64 |
| 65670536   | Marian Yost | Arts                  | 1950-04-09 06:28:58 | 542   | 435.01 |
+------------+-------------+-----------------------+---------------------+-------+--------+
5 rows in set
Time: 0.007s

To understand why the performance is improved, use EXPLAIN to see the new execution plan:

EXPLAIN SELECT * FROM books WHERE title = 'Marian Yost';
+---------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------+
| id                        | estRows | task      | access object                       | operator info                                         |
+---------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------+
| IndexLookUp_10            | 1.27    | root      |                                     |                                                       |
| ├─IndexRangeScan_8(Build) | 1.27    | cop[tikv] | table:books, index:title_idx(title) | range:["Marian Yost","Marian Yost"], keep order:false |
| └─TableRowIDScan_9(Probe) | 1.27    | cop[tikv] | table:books                         | keep order:false                                      |
+---------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------+

As can be seen from IndexLookup_10 in the execution plan, TiDB queries the data by the title_idx index. Its estRows value is 1.27, which means that the optimizer estimates that only 1.27 rows are scanned. The estimated rows scanned are much fewer than the 1000000.00 rows of data in the full table scan.

The IndexLookup_10 execution plan is to first use the IndexRangeScan_8 operator to read the index data that meets the condition through the title_idx index, and then use the TableLookup_9 operator to query the corresponding rows according to the Row ID stored in the index data.

For more information on the TiDB execution plan, see TiDB Query Execution Plan Overview.

Solution: Use covering index

If the index is a covering index, which contains all the columns queried by the SQL statements, scanning the index data is sufficient for the query.

For example, in the following query, you only need to query the corresponding price based on title:

SELECT title, price FROM books WHERE title = 'Marian Yost';
+-------------+--------+
| title       | price  |
+-------------+--------+
| Marian Yost | 435.01 |
| Marian Yost | 328.18 |
| Marian Yost | 496.52 |
| Marian Yost | 402.90 |
| Marian Yost | 174.64 |
+-------------+--------+
5 rows in set
Time: 0.007s

Because the title_idx index only contains data in the title column, TiDB still needs to first scan the index data and then query the price column from the table.

EXPLAIN SELECT title, price FROM books WHERE title = 'Marian Yost';
+---------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------+
| id                        | estRows | task      | access object                       | operator info                                         |
+---------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------+
| IndexLookUp_10            | 1.27    | root      |                                     |                                                       |
| ├─IndexRangeScan_8(Build) | 1.27    | cop[tikv] | table:books, index:title_idx(title) | range:["Marian Yost","Marian Yost"], keep order:false |
| └─TableRowIDScan_9(Probe) | 1.27    | cop[tikv] | table:books                         | keep order:false                                      |
+---------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------+

To optimize the performance, drop the title_idx index and create a new covering index title_price_idx:

ALTER TABLE books DROP INDEX title_idx;
CREATE INDEX title_price_idx ON books (title, price);

Because the price data is stored in the title_price_idx index, the following query only needs to scan the index data:

EXPLAIN SELECT title, price FROM books WHERE title = 'Marian Yost';
--------------------+---------+-----------+--------------------------------------------------+-------------------------------------------------------+
| id                 | estRows | task      | access object                                    | operator info                                         |
+--------------------+---------+-----------+--------------------------------------------------+-------------------------------------------------------+
| IndexReader_6      | 1.27    | root      |                                                  | index:IndexRangeScan_5                                |
| └─IndexRangeScan_5 | 1.27    | cop[tikv] | table:books, index:title_price_idx(title, price) | range:["Marian Yost","Marian Yost"], keep order:false |
+--------------------+---------+-----------+--------------------------------------------------+-------------------------------------------------------+

Now this query runs faster:

SELECT title, price FROM books WHERE title = 'Marian Yost';
+-------------+--------+
| title       | price  |
+-------------+--------+
| Marian Yost | 174.64 |
| Marian Yost | 328.18 |
| Marian Yost | 402.90 |
| Marian Yost | 435.01 |
| Marian Yost | 496.52 |
+-------------+--------+
5 rows in set
Time: 0.004s

Since the books table will be used in later examples, drop the title_price_idx index:

ALTER TABLE books DROP INDEX title_price_idx;

Solution: Use primary index

If a query uses the primary key to filter data, the query runs fast. For example, the primary key of the books table is the id column, so you can use the id column to query data:

SELECT * FROM books WHERE id = 896;
+-----+----------------+----------------------+---------------------+-------+--------+
| id  | title          | type                 | published_at        | stock | price  |
+-----+----------------+----------------------+---------------------+-------+--------+
| 896 | Kathryne Doyle | Science & Technology | 1969-03-18 01:34:15 | 468   | 281.32 |
+-----+----------------+----------------------+---------------------+-------+--------+
1 row in set
Time: 0.004s

Use EXPLAIN to see the execution plan:

EXPLAIN SELECT * FROM books WHERE id = 896;
+-------------+---------+------+---------------+---------------+
| id          | estRows | task | access object | operator info |
+-------------+---------+------+---------------+---------------+
| Point_Get_1 | 1.00    | root | table:books   | handle:896    |
+-------------+---------+------+---------------+---------------+

Point_Get is a very fast execute plan.

Use the right join type

See JOIN Execution Plan.

See also

Need help?

Ask the community on Discord or Slack, or submit a support ticket.

Ask the community on Discord or Slack, or submit a support ticket.