Skip to content

Commit ac13080

Browse files
ran-huangTomShawn
andauthored
perf-tuning: add doc for max-min-eliminate (pingcap#3272)
* perf-tuning: add doc for max-min-eliminate * update max-min-eliminate * fix wrong anchors * Apply suggestions from code review Co-authored-by: TomShawn <[email protected]> Co-authored-by: TomShawn <[email protected]>
1 parent 7a25176 commit ac13080

File tree

3 files changed

+119
-1
lines changed

3 files changed

+119
-1
lines changed

TOC.md

+1
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@
110110
+ [Subquery Related Optimizations](/subquery-optimization.md)
111111
+ [Column Pruning](/column-pruning.md)
112112
+ [Decorrelation of Correlated Subquery](/correlated-subquery-optimization.md)
113+
+ [Eliminate Max/Min](/max-min-eliminate.md)
113114
+ [Predicates Push Down](/predicates-push-down.md)
114115
+ [TopN and Limit Push Down](/topn-limit-push-down.md)
115116
+ [Join Reorder](/join-reorder.md)

max-min-eliminate.md

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: Eliminate Max/Min
3+
summary: Introduce the rules for eliminating Max/Min functions.
4+
---
5+
6+
# Eliminate Max/Min
7+
8+
When a SQL statement contains `max`/`min` functions, the query optimizer tries to convert the `max`/`min` aggregate functions to the TopN operator by applying the `max`/`min` optimization rule. In this way, TiDB can perform the query more efficiently through indexes.
9+
10+
This optimization rule is divided into the following two types according to the number of `max`/`min` functions in the `select` statement:
11+
12+
- [The statement with only one `max`/`min` function](#one-maxmin-function)
13+
- [The statement with multiple `max`/`min` functions](#multiple-maxmin-functions)
14+
15+
## One `max`/`min` function
16+
17+
When a SQL statement meets the following conditions, this rule is applied:
18+
19+
- The statement contains only one aggregate function, which is `max` or `min`.
20+
- The aggregate function has no related `group by` clause.
21+
22+
For example:
23+
24+
{{< copyable "sql" >}}
25+
26+
```sql
27+
select max(a) from t
28+
```
29+
30+
The optimization rule rewrites the statement as follows:
31+
32+
{{< copyable "sql" >}}
33+
34+
```sql
35+
select max(a) from (select a from t where a is not null order by a desc limit 1) t
36+
```
37+
38+
When column `a` has an index, or when column `a` is the prefix of some composite index, with the help of index, the new SQL statement can find the maximum or minimum value by scanning only one row of data. This optimization avoids full table scan.
39+
40+
The example statement has the following execution plan:
41+
42+
```sql
43+
mysql> explain select max(a) from t;
44+
+------------------------------+---------+-----------+-------------------------+-------------------------------------+
45+
| id | estRows | task | access object | operator info |
46+
+------------------------------+---------+-----------+-------------------------+-------------------------------------+
47+
| StreamAgg_13 | 1.00 | root | | funcs:max(test.t.a)->Column#4 |
48+
| └─Limit_17 | 1.00 | root | | offset:0, count:1 |
49+
| └─IndexReader_27 | 1.00 | root | | index:Limit_26 |
50+
| └─Limit_26 | 1.00 | cop[tikv] | | offset:0, count:1 |
51+
| └─IndexFullScan_25 | 1.00 | cop[tikv] | table:t, index:idx_a(a) | keep order:true, desc, stats:pseudo |
52+
+------------------------------+---------+-----------+-------------------------+-------------------------------------+
53+
5 rows in set (0.00 sec)
54+
```
55+
56+
## Multiple `max`/`min` functions
57+
58+
When a SQL statement meets the following conditions, this rule is applied:
59+
60+
- The statement contains multiple aggregate functions, which are all `max` or `min` functions.
61+
- None of the aggregate functions has a related `group by` clause.
62+
- The columns in each `max`/`min` function has indexes to preserve the order.
63+
64+
For example:
65+
66+
{{< copyable "sql" >}}
67+
68+
```sql
69+
select max(a) - min(a) from t
70+
```
71+
72+
The optimization rule first checks whether column `a` has an index to preserve its order. If yes, the SQL statement is rewritten as the Cartesian product of two subqueries:
73+
74+
{{< copyable "sql" >}}
75+
76+
```sql
77+
select max_a - min_a
78+
from
79+
(select max(a) as max_a from t) t1,
80+
(select min(a) as min_a from t) t2
81+
```
82+
83+
Through the rewrite, the optimizer can apply the rule for statements with only one `max`/`min` function to the two subqueries respectively. The statement is then rewritten as follows:
84+
85+
{{< copyable "sql" >}}
86+
87+
```sql
88+
select max_a - min_a
89+
from
90+
(select max(a) as max_a from (select a from t where a is not null order by a desc limit 1) t) t1,
91+
(select min(a) as min_a from (select a from t where a is not null order by a asc limit 1) t) t2
92+
```
93+
94+
Similarly, if column `a` has an index to preserve its order, the optimized execution only scans two rows of data instead of the whole table. However, if column `a` does not have an index to preserve its order, this rule results in two full table scans, but the execution only needs one full table scan if it is not rewritten. Therefore, in such cases, this rule is not applied.
95+
96+
The final execution plan is as follows:
97+
98+
```sql
99+
mysql> explain select max(a)-min(a) from t;
100+
+------------------------------------+---------+-----------+-------------------------+-------------------------------------+
101+
| id | estRows | task | access object | operator info |
102+
+------------------------------------+---------+-----------+-------------------------+-------------------------------------+
103+
| Projection_17 | 1.00 | root | | minus(Column#4, Column#5)->Column#6 |
104+
| └─HashJoin_18 | 1.00 | root | | CARTESIAN inner join |
105+
| ├─StreamAgg_45(Build) | 1.00 | root | | funcs:min(test.t.a)->Column#5 |
106+
| │ └─Limit_49 | 1.00 | root | | offset:0, count:1 |
107+
| │ └─IndexReader_59 | 1.00 | root | | index:Limit_58 |
108+
| │ └─Limit_58 | 1.00 | cop[tikv] | | offset:0, count:1 |
109+
| │ └─IndexFullScan_57 | 1.00 | cop[tikv] | table:t, index:idx_a(a) | keep order:true, stats:pseudo |
110+
| └─StreamAgg_24(Probe) | 1.00 | root | | funcs:max(test.t.a)->Column#4 |
111+
| └─Limit_28 | 1.00 | root | | offset:0, count:1 |
112+
| └─IndexReader_38 | 1.00 | root | | index:Limit_37 |
113+
| └─Limit_37 | 1.00 | cop[tikv] | | offset:0, count:1 |
114+
| └─IndexFullScan_36 | 1.00 | cop[tikv] | table:t, index:idx_a(a) | keep order:true, desc, stats:pseudo |
115+
+------------------------------------+---------+-----------+-------------------------+-------------------------------------+
116+
12 rows in set (0.01 sec)
117+
```

sql-optimization-concepts.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Based on rules, logical optimization applies some optimization rules to the inpu
1515
- Column pruning
1616
- Eliminate projection
1717
- Decorrelate correlated subqueries
18-
- Eliminate Max/Min
18+
- [Eliminate Max/Min](/max-min-eliminate.md)
1919
- Push down predicates
2020
- Partition pruning
2121
- Push down TopN and Limit

0 commit comments

Comments
 (0)