Skip to content

Commit 4b93cc3

Browse files
zz-jasonyikekeSunRunAwayyikeke
authored
add distinct optimization (pingcap#3077)
* add distinct optimization * Apply suggestions from code review Co-authored-by: Keke Yi <[email protected]> Co-authored-by: Feng Liyuan <[email protected]> * Update agg-distinct-optimization.md Co-authored-by: Feng Liyuan <[email protected]> * category meta is not needed any longer * fix a link Co-authored-by: Keke Yi <[email protected]> Co-authored-by: Feng Liyuan <[email protected]> Co-authored-by: yikeke <[email protected]>
1 parent d7a9cf4 commit 4b93cc3

File tree

2 files changed

+59
-0
lines changed

2 files changed

+59
-0
lines changed

TOC.md

+1
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@
112112
+ [Join Reorder](/join-reorder.md)
113113
+ Physical Optimization
114114
+ [Statistics](/statistics.md)
115+
+ [Distinct Optimization](/agg-distinct-optimization.md)
115116
+ [Prepare Execution Plan Cache](/sql-prepare-plan-cache.md)
116117
+ Control Execution Plan
117118
+ [Optimizer Hints](/optimizer-hints.md)

agg-distinct-optimization.md

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: Distinct Optimization
3+
summary: Introduce the `distinct` optimization in the TiDB query optimizer.
4+
---
5+
6+
# Distinct Optimization
7+
8+
This document introduces the `distinct` optimization in the TiDB query optimizer, including `SELECT DISTINCT` and `DISTINCT` in the aggregate functions.
9+
10+
## `DISTINCT` modifier in `SELECT` statements
11+
12+
The `DISTINCT` modifier specifies removal of duplicate rows from the result set. `SELECT DISTINCT` is transformed to `GROUP BY`, for example:
13+
14+
```sql
15+
mysql> explain SELECT DISTINCT a from t;
16+
+--------------------------+---------+-----------+---------------+-------------------------------------------------------+
17+
| id | estRows | task | access object | operator info |
18+
+--------------------------+---------+-----------+---------------+-------------------------------------------------------+
19+
| HashAgg_6 | 2.40 | root | | group by:test.t.a, funcs:firstrow(test.t.a)->test.t.a |
20+
| └─TableReader_11 | 3.00 | root | | data:TableFullScan_10 |
21+
| └─TableFullScan_10 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
22+
+--------------------------+---------+-----------+---------------+-------------------------------------------------------+
23+
3 rows in set (0.00 sec)
24+
```
25+
26+
## `DISTINCT` option in aggregate functions
27+
28+
Usually, aggregate functions with the `DISTINCT` option is executed in the TiDB layer in a single-threaded execution model.
29+
30+
The [`tidb_opt_distinct_agg_push_down`](/system-variables.md#tidb_opt_distinct_agg_push_down) system variable or the [`distinct-agg-push-down`](/tidb-configuration-file.md#distinct-agg-push-down) configuration item in TiDB controls whether to rewrite the distinct aggregate queries and push them to the TiKV/TiFlash Coprocessor.
31+
32+
Take the following queries as an example of this optimization. `tidb_opt_distinct_agg_push_down` is disabled by default, which means the aggregate functions are executed in the TiDB layer. After enabling this optimization by setting its value to `1`, the `distinct a` part of `count(distinct a)` is pushed to TiKV/TiFlash Coprocessor: there is a HashAgg_5 to remove the duplicated values on column a in the TiKV Coprocessor. It might reduce the computation overhead of `HashAgg_8` in the TiDB layer.
33+
34+
```sql
35+
mysql> desc select count(distinct a) from test.t;
36+
+-------------------------+----------+-----------+---------------+------------------------------------------+
37+
| id | estRows | task | access object | operator info |
38+
+-------------------------+----------+-----------+---------------+------------------------------------------+
39+
| StreamAgg_6 | 1.00 | root | | funcs:count(distinct test.t.a)->Column#4 |
40+
| └─TableReader_10 | 10000.00 | root | | data:TableFullScan_9 |
41+
| └─TableFullScan_9 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
42+
+-------------------------+----------+-----------+---------------+------------------------------------------+
43+
3 rows in set (0.01 sec)
44+
45+
mysql> set session tidb_opt_distinct_agg_push_down = 1;
46+
Query OK, 0 rows affected (0.00 sec)
47+
48+
mysql> desc select count(distinct a) from test.t;
49+
+---------------------------+----------+-----------+---------------+------------------------------------------+
50+
| id | estRows | task | access object | operator info |
51+
+---------------------------+----------+-----------+---------------+------------------------------------------+
52+
| HashAgg_8 | 1.00 | root | | funcs:count(distinct test.t.a)->Column#3 |
53+
| └─TableReader_9 | 1.00 | root | | data:HashAgg_5 |
54+
| └─HashAgg_5 | 1.00 | cop[tikv] | | group by:test.t.a, |
55+
| └─TableFullScan_7 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
56+
+---------------------------+----------+-----------+---------------+------------------------------------------+
57+
4 rows in set (0.00 sec)
58+
```

0 commit comments

Comments
 (0)