diff --git a/TOC.md b/TOC.md index bf75a8f53fdf9..e313a1b3c33ac 100644 --- a/TOC.md +++ b/TOC.md @@ -882,6 +882,7 @@ - [`SET ROLE`](/sql-statements/sql-statement-set-role.md) - [`SET TRANSACTION`](/sql-statements/sql-statement-set-transaction.md) - [`SET `](/sql-statements/sql-statement-set-variable.md) + - [`SHOW AFFINITY`](/sql-statements/sql-statement-show-affinity.md) - [`SHOW ANALYZE STATUS`](/sql-statements/sql-statement-show-analyze-status.md) - [`SHOW [BACKUPS|RESTORES]`](/sql-statements/sql-statement-show-backups.md) - [`SHOW BINDINGS`](/sql-statements/sql-statement-show-bindings.md) @@ -1002,6 +1003,7 @@ - [Temporary Tables](/temporary-tables.md) - [Cached Tables](/cached-tables.md) - [FOREIGN KEY Constraints](/foreign-key.md) + - [Table-Level Data Affinity](/table-affinity.md) - Character Set and Collation - [Overview](/character-set-and-collation.md) - [GBK](/character-set-gbk.md) diff --git a/information-schema/information-schema-partitions.md b/information-schema/information-schema-partitions.md index 044474113732e..61510949922ab 100644 --- a/information-schema/information-schema-partitions.md +++ b/information-schema/information-schema-partitions.md @@ -45,8 +45,9 @@ The output is as follows: | TABLESPACE_NAME | varchar(64) | YES | | NULL | | | TIDB_PARTITION_ID | bigint(21) | YES | | NULL | | | TIDB_PLACEMENT_POLICY_NAME | varchar(64) | YES | | NULL | | +| TIDB_AFFINITY | varchar(128) | YES | | NULL | | +-------------------------------+--------------+------+------+---------+-------+ -27 rows in set (0.00 sec) +28 rows in set (0.00 sec) ``` ```sql @@ -85,6 +86,7 @@ SUBPARTITION_ORDINAL_POSITION: NULL TABLESPACE_NAME: NULL TIDB_PARTITION_ID: 89 TIDB_PLACEMENT_POLICY_NAME: NULL + TIDB_AFFINITY: NULL *************************** 2. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test @@ -113,6 +115,7 @@ SUBPARTITION_ORDINAL_POSITION: NULL TABLESPACE_NAME: NULL TIDB_PARTITION_ID: 90 TIDB_PLACEMENT_POLICY_NAME: NULL + TIDB_AFFINITY: NULL 2 rows in set (0.00 sec) ``` diff --git a/information-schema/information-schema-tables.md b/information-schema/information-schema-tables.md index 8ae5df9a46bf1..77eb125c82bde 100644 --- a/information-schema/information-schema-tables.md +++ b/information-schema/information-schema-tables.md @@ -41,8 +41,12 @@ DESC tables; | TABLE_COMMENT | varchar(2048) | YES | | NULL | | | TIDB_TABLE_ID | bigint(21) | YES | | NULL | | | TIDB_ROW_ID_SHARDING_INFO | varchar(255) | YES | | NULL | | +| TIDB_PK_TYPE | varchar(64) | YES | | NULL | | +| TIDB_PLACEMENT_POLICY_NAME | varchar(64) | YES | | NULL | | +| TIDB_TABLE_MODE | varchar(16) | YES | | NULL | | +| TIDB_AFFINITY | varchar(128) | YES | | NULL | | +---------------------------+---------------+------+------+----------+-------+ -23 rows in set (0.00 sec) +27 rows in set (0.00 sec) ``` {{< copyable "sql" >}} @@ -72,10 +76,14 @@ SELECT * FROM tables WHERE table_schema='mysql' AND table_name='user'\G CHECK_TIME: NULL TABLE_COLLATION: utf8mb4_bin CHECKSUM: NULL - CREATE_OPTIONS: - TABLE_COMMENT: + CREATE_OPTIONS: + TABLE_COMMENT: TIDB_TABLE_ID: 5 TIDB_ROW_ID_SHARDING_INFO: NULL + TIDB_PK_TYPE: CLUSTERED +TIDB_PLACEMENT_POLICY_NAME: NULL + TIDB_TABLE_MODE: Normal + TIDB_AFFINITY: NULL 1 row in set (0.00 sec) ``` @@ -115,7 +123,7 @@ The description of columns in the `TABLES` table is as follows: * `CREATE_OPTIONS`: Creates options. * `TABLE_COMMENT`: The comments and notes of the table. -Most of the information in the table is the same as MySQL. Only two columns are newly defined by TiDB: +Most of the information in the table is the same as MySQL. The following columns are newly defined by TiDB: * `TIDB_TABLE_ID`: to indicate the internal ID of a table. This ID is unique in a TiDB cluster. * `TIDB_ROW_ID_SHARDING_INFO`: to indicate the sharding type of a table. The possible values are as follows: @@ -123,4 +131,8 @@ Most of the information in the table is the same as MySQL. Only two columns are - `"NOT_SHARDED(PK_IS_HANDLE)"`: the table that defines an integer Primary Key as its row id is not sharded. - `"PK_AUTO_RANDOM_BITS={bit_number}"`: the table that defines an integer Primary Key as its row id is sharded because the Primary Key is assigned with `AUTO_RANDOM` attribute. - `"SHARD_BITS={bit_number}"`: the table is sharded using `SHARD_ROW_ID_BITS={bit_number}`. - - NULL: the table is a system table or view, and thus cannot be sharded. + - `NULL`: the table is a system table or view, and thus cannot be sharded. +* `TIDB_PK_TYPE`: the primary key type of the table. Possible values include `CLUSTERED` (clustered primary key) and `NONCLUSTERED` (non-clustered primary key). +* `TIDB_PLACEMENT_POLICY_NAME`: the name of the placement policy applied to the table. +* `TIDB_TABLE_MODE`: the mode of the table, for example, `Normal`, `Import`, or `Restore`. +* `TIDB_AFFINITY`: the affinity level of the table. It is `table` for non-partitioned tables, `partition` for partitioned tables, and `NULL` when affinity is not enabled. diff --git a/pd-configuration-file.md b/pd-configuration-file.md index efef36dff0a4e..c128d49a9def1 100644 --- a/pd-configuration-file.md +++ b/pd-configuration-file.md @@ -293,6 +293,13 @@ Configuration items related to scheduling + Specifies the upper limit of the `Region Merge` key. When the Region key is greater than the specified value, the PD does not merge the Region with its adjacent Regions. + Default value: `540000`. Before v8.4.0, the default value is `200000`. Starting from v8.4.0, the default value is `540000`. +### `max-affinity-merge-region-size` New in v8.5.5 and v9.0.0 + ++ Controls the threshold for automatically merging small adjacent Regions that belong to the same [affinity](/table-affinity.md) group. When a Region belongs to an affinity group and its size is smaller than this threshold, PD attempts to merge this Region with other small adjacent Regions in the same affinity group to reduce the number of Regions and maintain the affinity effect. ++ Setting it to `0` disables the automatic merging of small adjacent Regions within an affinity group. ++ Default value: `256` ++ Unit: MiB + ### `patrol-region-interval` + Controls the running frequency at which the checker inspects the health state of a Region. The smaller this value is, the faster the checker runs. Normally, you do not need to adjust this configuration. @@ -373,6 +380,11 @@ Configuration items related to scheduling + The number of the `Region Merge` scheduling tasks performed at the same time. Set this parameter to `0` to disable `Region Merge`. + Default value: `8` +### `affinity-schedule-limit` New in v8.5.5 and v9.0.0 + ++ Controls the number of [affinity](/table-affinity.md) scheduling tasks that can be performed concurrently. Setting it to `0` disables affinity scheduling. ++ Default value: `0` + ### `high-space-ratio` + The threshold ratio below which the capacity of the store is sufficient. If the space occupancy ratio of the store is smaller than this threshold value, PD ignores the remaining space of the store when performing scheduling, and balances load mainly based on the Region size. This configuration takes effect only when `region-score-formula-version` is set to `v1`. diff --git a/sql-statements/sql-statement-alter-table.md b/sql-statements/sql-statement-alter-table.md index eb70bd699d43c..4c88f0bc749f2 100644 --- a/sql-statements/sql-statement-alter-table.md +++ b/sql-statements/sql-statement-alter-table.md @@ -55,6 +55,7 @@ AlterTableSpec ::= | TTLEnable EqOpt ( 'ON' | 'OFF' ) | TTLJobInterval EqOpt stringLit ) +| 'AFFINITY' EqOpt stringLit | PlacementPolicyOption PlacementPolicyOption ::= @@ -182,6 +183,8 @@ The following major restrictions apply to `ALTER TABLE` in TiDB: - Changes of some data types (for example, some TIME, Bit, Set, Enum, and JSON types) are not supported due to the compatibility issues of the `CAST` function's behavior between TiDB and MySQL. +- The `AFFINITY` option is a TiDB extension syntax. After `AFFINITY` is enabled for a table, you cannot modify the partition scheme of that table, such as adding, dropping, reorganizing, or swapping partitions. To modify the partition scheme, you must first remove `AFFINITY`. + - Spatial data types are not supported. - `ALTER TABLE t CACHE | NOCACHE` is a TiDB extension to MySQL syntax. For details, see [Cached Tables](/cached-tables.md). diff --git a/sql-statements/sql-statement-create-table.md b/sql-statements/sql-statement-create-table.md index 97d0d7b890696..3050ee43ee1f5 100644 --- a/sql-statements/sql-statement-create-table.md +++ b/sql-statements/sql-statement-create-table.md @@ -118,6 +118,7 @@ TableOption ::= | 'UNION' EqOpt '(' TableNameListOpt ')' | 'ENCRYPTION' EqOpt EncryptionOpt | 'TTL' EqOpt TimeColumnName '+' 'INTERVAL' Expression TimeUnit (TTLEnable EqOpt ( 'ON' | 'OFF' ))? (TTLJobInterval EqOpt stringLit)? +| 'AFFINITY' EqOpt StringName | PlacementPolicyOption OnCommitOpt ::= @@ -170,13 +171,16 @@ The following *table_options* are supported. Other options such as `AVG_ROW_LENG |`AUTO_ID_CACHE`| To set the auto ID cache size in a TiDB instance. By default, TiDB automatically changes this size according to allocation speed of auto ID |`AUTO_ID_CACHE` = 200 | |`AUTO_RANDOM_BASE`| To set the initial incremental part value of auto_random. This option can be considered as a part of the internal interface. Users can ignore this parameter |`AUTO_RANDOM_BASE` = 0| | `CHARACTER SET` | To specify the [character set](/character-set-and-collation.md) for the table | `CHARACTER SET` = 'utf8mb4' | +| `COLLATE` | To specify the character set collation for the table | `COLLATE` = 'utf8mb4_bin' | | `COMMENT` | The comment information | `COMMENT` = 'comment info' | +| `AFFINITY` | To enable affinity scheduling for a table or partition. It can be set to `'table'` for non-partitioned tables and `'partition'` for partitioned tables. Setting it to `'none'` or leaving it empty disables affinity scheduling. | `AFFINITY` = 'table' | > **Note:** > -> The `split-table` configuration option is enabled by default. When it is enabled, a separate Region is created for each newly created table. For details, see [TiDB configuration file](/tidb-configuration-file.md). +> - The `split-table` configuration option is enabled by default. When it is enabled, a separate Region is created for each newly created table. For details, see [TiDB configuration file](/tidb-configuration-file.md). +> - Before using `AFFINITY`, note that modifying the partitioning scheme (such as adding, dropping, reorganizing, or swapping partitions) of a table with affinity enabled is not supported, and configuring `AFFINITY` on temporary tables or views is not supported. @@ -184,7 +188,8 @@ The following *table_options* are supported. Other options such as `AVG_ROW_LENG > **Note:** > -> TiDB creates a separate Region for each newly created table. +> - TiDB creates a separate Region for each newly created table. +> - Before using `AFFINITY`, note that modifying the partitioning scheme (such as adding, dropping, reorganizing, or swapping partitions) of a table with affinity enabled is not supported, and configuring `AFFINITY` on temporary tables or views is not supported. diff --git a/sql-statements/sql-statement-show-affinity.md b/sql-statements/sql-statement-show-affinity.md new file mode 100644 index 0000000000000..7334e5eefa2a5 --- /dev/null +++ b/sql-statements/sql-statement-show-affinity.md @@ -0,0 +1,61 @@ +--- +title: SHOW AFFINITY +summary: An overview of the usage of SHOW AFFINITY for the TiDB database. +--- + +# SHOW AFFINITY New in v8.5.5 and v9.0.0 + +The `SHOW AFFINITY` statement shows [affinity](/table-affinity.md) scheduling information for tables configured with the `AFFINITY` option, as well as the target replica distribution currently recorded by PD. + +## Synopsis + +```ebnf+diagram +ShowAffinityStmt ::= + "SHOW" "AFFINITY" ShowLikeOrWhereOpt +``` + +`SHOW AFFINITY` supports filtering table names using `LIKE` or `WHERE` clauses. + +## Examples + +The following examples create two tables with affinity scheduling enabled and show how to view their scheduling information: + +```sql +CREATE TABLE t1 (a INT) AFFINITY = 'table'; +CREATE TABLE tp1 (a INT) AFFINITY = 'partition' PARTITION BY HASH(a) PARTITIONS 2; + +SHOW AFFINITY; +``` + +The example output is as follows: + +```sql ++---------+------------+----------------+-----------------+------------------+----------+--------------+----------------------+ +| Db_name | Table_name | Partition_name | Leader_store_id | Voter_store_ids | Status | Region_count | Affinity_region_count| ++---------+------------+----------------+-----------------+------------------+----------+--------------+----------------------+ +| test | t1 | NULL | 1 | 1,2,3 | Stable | 8 | 8 | +| test | tp1 | p0 | 4 | 4,5,6 | Preparing| 4 | 2 | +| test | tp1 | p1 | 4 | 4,5,6 | Preparing| 3 | 2 | ++---------+------------+----------------+-----------------+------------------+----------+--------------+----------------------+ +``` + +The meaning of each column is as follows: + +- `Leader_store_id`, `Voter_store_ids`: the IDs of TiKV stores recorded by PD, indicating which stores host the target Leader and Voter replicas for the table or partitions. If the target replica locations for the affinity group are not determined, or if [`schedule.affinity-schedule-limit`](/pd-configuration-file.md#affinity-schedule-limit-new-in-v855-and-v900) is set to `0`, the value is displayed as `NULL`. +- `Status`: indicates the current status of affinity scheduling. Possible values are: + - `Pending`: PD has not started affinity scheduling for the table or partition, such as when Leaders or Voters are not yet determined. + - `Preparing`: PD is scheduling Regions to meet affinity requirements. + - `Stable`: all Regions have reached the target distribution. +- `Region_count`: the current number of Regions in the affinity group. +- `Affinity_region_count`: the number of Regions that currently meet the affinity replica distribution requirements. + - When `Affinity_region_count` is less than `Region_count`, it indicates that some Regions have not yet completed replica scheduling based on affinity. + - When `Affinity_region_count` equals `Region_count`, it indicates that replica scheduling based on affinity is complete, meaning the distribution of all related Regions meets the affinity requirements. However, this does not indicate that related Region merge operations are complete. + +## MySQL compatibility + +This statement is a TiDB extension to MySQL syntax. + +## See also + +- [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) +- [`ALTER TABLE`](/sql-statements/sql-statement-alter-table.md) \ No newline at end of file diff --git a/table-affinity.md b/table-affinity.md new file mode 100644 index 0000000000000..f0a93e3daf973 --- /dev/null +++ b/table-affinity.md @@ -0,0 +1,109 @@ +--- +title: Table-Level Data Affinity +summary: Learn how to configure affinity constraints for tables or partitions to control Region replica distribution and how to view the scheduling status. +--- + +# Table-Level Data Affinity New in v8.5.5 and v9.0.0 + +> **Warning:** +> +> This feature is experimental. It is not recommended that you use it in the production environment. It might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + +Table-level data affinity is a PD mechanism for scheduling data distribution at the table level. This mechanism controls how Leader and Voter replicas for Regions of the same table or partition are distributed across a TiKV cluster. + +When you enable PD affinity scheduling and set the `AFFINITY` option of a table to `table` or `partition`, PD groups Regions belonging to the same table or partition into the same affinity group. During scheduling, PD prioritizes placing the Leader and Voter replicas of these Regions on the same subset of a few TiKV nodes. This reduces network latency caused by cross-node access during queries, thereby improving query performance. + +## Limitations + +Before using table-level data affinity, note the following limitations: + +- This feature does not take effect in [PD Microservices Mode](/pd-microservices.md). +- This feature does not work with [Temporary tables](/temporary-tables.md) and [views](/views.md). +- After data affinity is configured for a [partitioned table](/partitioned-table.md), **modifying the table partitioning scheme is not supported**, including adding, dropping, reorganizing, or swapping partitions. To change the partitioning scheme, you must first remove the affinity configuration for that table. +- **Evaluate disk capacity in advance for large data volumes**: after affinity is enabled, PD prioritizes scheduling Regions of a table or partition to the same subset of a few TiKV nodes. For tables or partitions with large data volumes, this might significantly increase disk usage on these nodes. It is recommended to evaluate disk capacity and monitor it in advance. +- Data affinity affects only the distribution of Leader and Voter replicas. If a table has Learner replicas (such as TiFlash), their distribution is not affected by affinity settings. + +## Prerequisites + +PD affinity scheduling is disabled by default. Before setting affinity for tables or partitions, you must enable and configure this feature. + +1. Set the PD configuration item [`schedule.affinity-schedule-limit`](/pd-configuration-file.md#affinity-schedule-limit-new-in-v855-and-v900) to a value greater than `0` to enable affinity scheduling. + + For example, the following command sets the value to `4`, allowing PD to run up to four affinity scheduling tasks concurrently: + + ```bash + pd-ctl config set schedule.affinity-schedule-limit 4 + ``` + +2. (Optional) Modify the PD configuration item [`schedule.max-affinity-merge-region-size`](/pd-configuration-file.md#max-affinity-merge-region-size-new-in-v855-and-v900) as needed. The default value is `256` MiB. It controls the size threshold for automatically merging adjacent small Regions within the same affinity group. Setting it to `0` disables the automatic merging of adjacent small Regions within affinity groups. + +## Usage + +This section describes how to configure affinity for tables or partitions and how to view affinity scheduling status. + +### Configure table or partition affinity + +You can configure table or partition affinity using the `AFFINITY` option in `CREATE TABLE` or `ALTER TABLE` statements. + +| Affinity level | Scope | Effect | +|---|---|---| +| `AFFINITY='table'` | Non-partitioned table | Enables affinity for the table. PD creates a single affinity group for all Regions of the table. | +| `AFFINITY='partition'` | Partitioned table | Enables affinity for each partition in the table. PD creates a separate affinity group for the Regions of each partition. For example, for a table with four partitions, PD creates four independent affinity groups. | +| `AFFINITY=''` or `AFFINITY='none'` | Tables configured with `AFFINITY='table'` or `AFFINITY='partition'` | Disables affinity for the table or partitions. When you disable affinity, PD deletes the corresponding affinity group for the target table or partition, so Regions of that table or partition are no longer subject to affinity scheduling constraints. Automatic Region splitting in TiKV reverts to the default behavior within a maximum of 10 minutes. | + +**Examples** + +Enable affinity when creating a non-partitioned table: + +```sql +CREATE TABLE t1 (a INT) AFFINITY = 'table'; +``` + +Enable affinity for each partition when creating a partitioned table: + +```sql +CREATE TABLE tp1 (a INT) + AFFINITY = 'partition' + PARTITION BY HASH(a) PARTITIONS 4; +``` + +Enable affinity for an existing non-partitioned table: + +```sql +CREATE TABLE t2 (a INT); +ALTER TABLE t2 AFFINITY = 'table'; +``` + +Disable table affinity: + +```sql +ALTER TABLE t1 AFFINITY = ''; +``` + +### View affinity information + +You can view table or partition affinity information in the following ways: + +- Execute the [`SHOW AFFINITY`](/sql-statements/sql-statement-show-affinity.md) statement. In the `Status` column, you can view tables or partitions with affinity enabled and their scheduling status. The meanings of the values in the `Status` column are as follows: + + - `Pending`: PD has not started affinity scheduling for the table or partition, such as when Leaders or Voters are not yet determined. + - `Preparing`: PD is scheduling Regions to meet affinity requirements. + - `Stable`: all Regions have reached the target distribution. + +- Query the [`INFORMATION_SCHEMA.TABLES`](/information-schema/information-schema-tables.md) table and check the `TIDB_AFFINITY` column for the affinity level of a table. +- Query the [`INFORMATION_SCHEMA.PARTITIONS`](/information-schema/information-schema-partitions.md) table and check the `TIDB_AFFINITY` column for the affinity level of a partition. + +## Notes + +- **Automatic splitting of Regions**: when a Region belongs to an affinity group and affinity is in effect, automatic splitting of that Region is disabled by default to avoid the creation of too many Regions that could weaken the affinity effect. Automatic splitting is triggered only when the Region size exceeds four times the value of [`schedule.max-affinity-merge-region-size`](/pd-configuration-file.md#max-affinity-merge-region-size-new-in-v855-and-v900). Note that splits triggered by components other than TiKV or PD (such as manual splits triggered by [`SPLIT TABLE`](/sql-statements/sql-statement-split-region.md)) are not subject to this restriction. + +- **Degradation and expiration mechanism**: if the TiKV nodes hosting the target Leaders or Voters in an affinity group become unavailable (for example, due to node failure or insufficient disk space), if a Leader is evicted, or if there is a conflict with existing placement rules, PD marks the affinity group as degraded. During degradation, affinity scheduling for the corresponding table or partition is paused. + + - If the affected nodes recover within 10 minutes, PD resumes scheduling based on the original affinity settings. + - If the affected nodes do not recover within 10 minutes, the affinity group is marked as expired. At this point, PD restores normal scheduling behavior (the status in [`SHOW AFFINITY`](/sql-statements/sql-statement-show-affinity.md) returns to `Pending`), and automatically updates Leaders and Voters in the affinity group to re-enable affinity scheduling. + +## Related statements and configurations + +- `AFFINITY` option in [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) and [`ALTER TABLE`](/sql-statements/sql-statement-alter-table.md) +- [`SHOW AFFINITY`](/sql-statements/sql-statement-show-affinity.md) +- PD configuration items: [`schedule.affinity-schedule-limit`](/pd-configuration-file.md#affinity-schedule-limit-new-in-v855-and-v900) and [`schedule.max-affinity-merge-region-size`](/pd-configuration-file.md#max-affinity-merge-region-size-new-in-v855-and-v900) \ No newline at end of file