Skip to content

Conversation

@qiancai
Copy link
Owner

@qiancai qiancai commented Nov 1, 2024

…ingcap#19275)

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v8.4 (TiDB 8.4 versions)
  • v8.3 (TiDB 8.3 versions)
  • v8.2 (TiDB 8.2 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)
  • v5.3 (TiDB 5.3 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

Summary by CodeRabbit

  • New Features

    • Added system variables tidb_stats_update_during_ddl (controls ANALYZE during DDL) and tidb_opt_selectivity_factor (default optimizer selectivity when stats are missing).
  • Documentation

    • Added guide on ANALYZE embedded in DDL and updated TOC entries.
    • Updated restore-mode clarification: tables block reads/writes during restore then resume.
    • Updated Data Migration MySQL compatibility info (MySQL 8.1–9.x).

@coderabbitai
Copy link

coderabbitai bot commented Nov 1, 2024

Walkthrough

Adds two new system variables (a DDL-embedded ANALYZE toggle and an optimizer selectivity factor), introduces documentation for ANALYZE embedded in DDL, updates TOC entries, clarifies restore-mode behavior in BR docs, expands DM MySQL compatibility rows, and refactors OWNERS/OWNERS_ALIASES to filter-based ownership groups.

Changes

Cohort / File(s) Summary
System Configuration
system-variables.md
Added tidb_stats_update_during_ddl (GLOBAL, default OFF) to control DDL-embedded ANALYZE; added tidb_opt_selectivity_factor (SESSION
New Feature Documentation
ddl_embedded_analyze.md
New document describing ANALYZE embedded in DDL (index creation/reorg), behavior when tidb_stats_update_during_ddl is ON, timing/timeout semantics, verification commands, and examples
Table of Contents
TOC.md, TOC-tidb-cloud.md
Inserted TOC entries linking to the new "ANALYZE Embedded in DDL Statements" doc in Reference → SQL Statements sections
Backup & Restore Documentation
br/br-pitr-guide.md, br/br-snapshot-guide.md
Added notes that target tables enter a restore mode (reads/writes blocked) during restore and automatically revert to normal after completion
Compatibility Updates
dm/dm-compatibility-catalog.md
Added MySQL compatibility rows: 8.1–8.3 (not tested), 8.4 (incompatible; reference DM Issue #11020), 9.x (not tested)
Ownership Configuration
OWNERS, OWNERS_ALIASES
Replaced explicit approver/reviewer lists with a filter-based schema mapping to sig-community-approvers and sig-community-reviewers; added both groups and member lists in OWNERS_ALIASES; updated referenced ownership docs URL

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TiDB_DDL as TiDB DDL Job
    participant StatsWorker as Stats Collector
    participant Storage as Storage/PD

    Note over User,TiDB_DDL: Create or reorganize index (ADD INDEX / MODIFY/CHANGE COLUMN)
    User->>TiDB_DDL: submit DDL job
    alt tidb_stats_update_during_ddl = ON
        TiDB_DDL->>StatsWorker: trigger ANALYZE for index (before visibility)
        StatsWorker->>Storage: read table data, build stats
        StatsWorker-->>TiDB_DDL: report stats (may be async if timeout)
        TiDB_DDL->>TiDB_DDL: make index visible (after stats or timeout)
    else tidb_stats_update_during_ddl = OFF
        TiDB_DDL->>TiDB_DDL: create index without embedded ANALYZE
        TiDB_DDL->>User: DDL completes (stats not updated)
    end
    Note over TiDB_DDL,User: Use SHOW STATS_HISTOGRAMS / ADMIN SHOW DDL JOBS to verify
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review focus: correctness of new system variable declarations and defaults in system-variables.md.
  • Verify ddl_embedded_analyze.md for accuracy of control-flow, examples, and command references.
  • Check OWNERS/OWNERS_ALIASES filter schema and member lists for formatting and intended mapping.

"🐇
I hopped through docs and tidied the trail,
DDLs now whisper stats without fail.
New knobs and owners in tidy array,
Restore modes pause, then hop away.
Celebrate with carrots and a dev-bunny ballet!"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description contains only the template structure with all sections empty or unchecked, including the required 'What is changed, added or deleted?' and 'Which TiDB version(s)' sections. Complete all required sections: describe what was changed/added/deleted, specify which TiDB version(s) apply, and provide any related PR or reference links.
Title check ❓ Inconclusive The pull request title 'test' is vague and generic, providing no meaningful information about the actual changes made. Replace the title with a descriptive summary of the main changes, such as 'Add DDL-embedded ANALYZE documentation and system variables' or 'Document ANALYZE embedded in DDL statements and related changes'.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch test-or

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
glossary.md (4)

47-49: Consider adding a documentation link for CF.

The definition is accurate but could be enhanced with a link to TiKV's storage documentation for more details about Column Families.


59-84: Consider adding documentation links for CTE, DDL, and DML entries.

While the definitions are accurate, consider adding links to:

  • CTE: Link to WITH clause documentation already referenced
  • DDL: Link to DDL statements documentation
  • DML: Link to DML statements documentation

This would maintain consistency with other entries that include documentation links.


Line range hint 165-167: Consider adding TiDB-specific documentation link for MVCC.

While the Wikipedia link provides general context, consider adding a link to TiDB's transaction documentation for MVCC implementation details.


Line range hint 246-267: Consider adding documentation links for Region-related entries.

While the definitions are clear, consider adding links to:

  • Region/peer/Raft group: Link to TiKV architecture documentation
  • Region split: Link to Region management documentation

This would help users find more detailed information about these fundamental concepts.

alert-rules.md (1)

569-570: Consider enhancing the alert description.

While the description clearly explains the alert's purpose, consider adding information about potential system impact when Raftstore threads are under heavy pressure, such as increased latency or reduced throughput.

 This rule monitors CPU usage by Raftstore. If the value is high, it indicates pressure on Raftstore threads is heavy.
+This can lead to increased latency in Raft operations and reduced overall system throughput.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0eb1cd5 and 427e910.

📒 Files selected for processing (2)
  • alert-rules.md (1 hunks)
  • glossary.md (3 hunks)
🔇 Additional comments (5)
glossary.md (4)

33-35: LGTM!

The BR entry is clear, concise, and includes appropriate documentation reference.


103-120: LGTM!

The entries are well-defined with appropriate external and internal documentation links where relevant.


123-128: LGTM!

The HTAP entry provides a clear definition, TiDB-specific context, and appropriate external reference.


Line range hint 238-244: LGTM!

Both entries provide clear definitions with appropriate configuration references and status indicators.

alert-rules.md (1)

565-571: LGTM! Well-structured alert rule for Raftstore CPU monitoring.

The alert rule is properly configured with an appropriate threshold (80% of raftstore.store-pool-size) and includes clear metric selection.

glossary.md Outdated

Disaster Recovery (DR) includes solutions that can be used to recover data from a disaster in the future. These solutions typically involve backups and standby clusters.

For more information, see [Overview of TiDB Disaster Recovery Solutions](dr-solution-introduction).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix broken documentation link in DR entry.

The link to dr-solution-introduction is missing the .md extension.

-For more information, see [Overview of TiDB Disaster Recovery Solutions](dr-solution-introduction).
+For more information, see [Overview of TiDB Disaster Recovery Solutions](/dr-solution-introduction.md).

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 427e910 and 8d98914.

📒 Files selected for processing (1)
  • glossary.md (3 hunks)
🧰 Additional context used
🪛 LanguageTool
glossary.md

[grammar] ~131-~131: The plural noun “files” cannot be used with the article “a”. Did you mean “a file” or “files”?
Context: ...mputing-technologies). Test Thies are a files. ## I ### Index Merge Index Merge is...

(A_NNS)

🔇 Additional comments (1)
glossary.md (1)

Line range hint 33-96: LGTM on new glossary entries!

The new entries (BR, CF, CTE, DDL, DM, DML, DMR, DR, DXF) are well-structured and provide clear, accurate definitions with appropriate documentation links.

Note: The broken documentation link in the DR entry still needs to be fixed as mentioned in the previous review.

🧰 Tools
🪛 LanguageTool

[grammar] ~131-~131: The plural noun “files” cannot be used with the article “a”. Did you mean “a file” or “files”?
Context: ...mputing-technologies). Test Thies are a files. ## I ### Index Merge Index Merge is...

(A_NNS)

glossary.md Outdated
Comment on lines 129 to 131
Test

Thies are a files.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove test content

These lines appear to be test content that was accidentally committed. They break both the document structure and alphabetical ordering of the glossary entries.

-Test
-
-Thies are a files.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Test
Thies are a files.
🧰 Tools
🪛 LanguageTool

[grammar] ~131-~131: The plural noun “files” cannot be used with the article “a”. Did you mean “a file” or “files”?
Context: ...mputing-technologies). Test Thies are a files. ## I ### Index Merge Index Merge is...

(A_NNS)

glossary.md Outdated

### HTTP

Hypertext Transfer Protocol (HTTP) is a application-layer protocol using for transmitting hypermedia dacuments, such as HTML. It is the foundation of data communication on the web, enabling browsers and servers to request and transfer data in a standardized way. HTTP operates as a request-response protocol: a client sends a request to a server, which then returns a response. Common HTTP methods include GET, POST, PUT, and DELETE, which are used to retrieve, create, update, or delete resources, respectively.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai review the above line

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiancai, I noticed a few typos and grammatical errors in the above line:

  • "a application-layer protocol" should be "an application-layer protocol".
  • "using for transmitting" should be "used for transmitting".
  • "dacuments" should be "documents".

Please consider correcting these issues to improve clarity.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai review the above line and provide committable suggestion

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiancai, here is the corrected paragraph:

### HTTP

Hypertext Transfer Protocol (HTTP) is an application-layer protocol used for transmitting hypermedia documents, such as HTML. It is the foundation of data communication on the web, enabling browsers and servers to request and transfer data in a standardized way. HTTP operates as a request-response protocol: a client sends a request to a server, which then returns a response. Common HTTP methods include GET, POST, PUT, and DELETE, which are used to retrieve, create, update, or delete resources, respectively.

@qiancai
Copy link
Owner Author

qiancai commented Nov 1, 2024

@coderabbitai configuration

@coderabbitai
Copy link

coderabbitai bot commented Nov 1, 2024

CodeRabbit configuration
language: en-US
tone_instructions: ''
early_access: false
enable_free_tier: true
reviews:
  profile: chill
  request_changes_workflow: false
  high_level_summary: true
  high_level_summary_placeholder: '@coderabbitai summary'
  auto_title_placeholder: '@coderabbitai'
  review_status: true
  commit_status: true
  poem: true
  collapse_walkthrough: false
  sequence_diagrams: true
  changed_files_summary: true
  labeling_instructions: []
  path_filters: []
  path_instructions: []
  abort_on_close: true
  auto_review:
    enabled: true
    auto_incremental_review: true
    ignore_title_keywords: []
    labels: []
    drafts: false
    base_branches: []
  tools:
    shellcheck:
      enabled: true
    ruff:
      enabled: true
    markdownlint:
      enabled: true
    github-checks:
      enabled: true
      timeout_ms: 90000
    languagetool:
      enabled: true
      enabled_only: false
      level: default
    biome:
      enabled: true
    hadolint:
      enabled: true
    swiftlint:
      enabled: true
    phpstan:
      enabled: true
      level: default
    golangci-lint:
      enabled: true
    yamllint:
      enabled: true
    gitleaks:
      enabled: true
    checkov:
      enabled: true
    detekt:
      enabled: true
    eslint:
      enabled: true
    rubocop:
      enabled: true
    buf:
      enabled: true
    regal:
      enabled: true
    actionlint:
      enabled: true
    pmd:
      enabled: true
    cppcheck:
      enabled: true
    semgrep:
      enabled: true
    circleci:
      enabled: true
chat:
  auto_reply: true
knowledge_base:
  opt_out: false
  learnings:
    scope: auto
  issues:
    scope: auto
  jira:
    project_keys: []
  linear:
    team_keys: []
  pull_requests:
    scope: auto

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
.coderabbit.yaml (5)

11-11: Consider versioning documentation links

The link to /auto-increment.md should include version information to ensure it works correctly across different TiDB versions. Consider using a versioned path or relative link.


92-101: Add explanatory comments for SHOW REGIONS output

The SHOW TABLE t REGIONS command output contains important information about data distribution, but it may be difficult for users to understand. Consider adding explanatory comments about:

  • What each column represents
  • How to interpret the region distribution
  • The relationship between region distribution and the AUTO_RANDOM shard bits

131-135: Enhance shard bits selection documentation with practical examples

While the explanation of shard bits selection is technically accurate, it would be more helpful to include:

  1. A concrete example showing the impact of different shard bit values on write distribution
  2. Performance implications of various shard bit configurations
  3. Example scenarios for different cluster sizes

172-172: Clarify NONCLUSTERED primary key restriction

The restriction regarding NONCLUSTERED primary keys would benefit from:

  1. An explanation of why AUTO_RANDOM is incompatible with NONCLUSTERED indexes
  2. Alternative approaches for such scenarios
  3. A reference to relevant design documentation explaining this limitation

1-178: Consider restructuring documentation for better readability

While the content is comprehensive, consider reorganizing it to:

  1. Start with a clear "Quick Start" section for common use cases
  2. Move detailed technical implementation after basic usage
  3. Add a troubleshooting section with common issues and solutions
  4. Include a comparison table between AUTO_RANDOM and AUTO_INCREMENT

This would align better with the documentation style mentioned in the PR objectives.

🧰 Tools
🪛 yamllint

[warning] 4-4: too few spaces after comma

(commas)


[error] 23-23: syntax error: found character '`' that cannot start any token

(syntax)

glossary.md (1)

47-49: Consider enhancing the CF definition

While accurate, the entry could be more helpful by briefly explaining the purpose of Column Families in TiKV's storage architecture.

Consider expanding the definition:

-In RocksDB and TiKV, a Column Family (CF) represents a logical grouping of key-value pairs within a database.
+In RocksDB and TiKV, a Column Family (CF) represents a logical grouping of key-value pairs within a database. TiKV uses different CFs to store various types of data (such as default data, write operations, and lock information) for better performance and management.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8d98914 and 04ce8c8.

📒 Files selected for processing (2)
  • .coderabbit.yaml (1 hunks)
  • glossary.md (3 hunks)
🧰 Additional context used
🪛 yamllint
.coderabbit.yaml

[warning] 4-4: too few spaces after comma

(commas)


[error] 23-23: syntax error: found character '`' that cannot start any token

(syntax)

🪛 LanguageTool
glossary.md

[misspelling] ~131-~131: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ... Hypertext Transfer Protocol (HTTP) is a application-layer protocol using for tr...

(EN_A_VS_AN)

🔇 Additional comments (7)
.coderabbit.yaml (1)

1-6: ⚠️ Potential issue

File extension should match content type

This file contains Markdown content but has a .yaml extension. Consider renaming the file to use the .md extension to accurately reflect its content type and ensure proper handling by documentation tools.

🧰 Tools
🪛 yamllint

[warning] 4-4: too few spaces after comma

(commas)

glossary.md (6)

33-35: LGTM: BR entry is well-defined

The entry is clear, concise, and includes proper documentation link.


59-61: LGTM: CTE entry is well-defined

The entry provides a clear definition and proper documentation link.


65-77: LGTM: DDL, DM, and DML entries are well-defined

All entries provide clear definitions and proper formatting. The DM entry appropriately includes a documentation link.


85-89: LGTM with existing link issue

The definition is clear and accurate. Note that there's an existing comment about the broken documentation link that needs to be addressed.


109-127: LGTM: Remaining entries are well-defined

The entries for GA, GC, GTID, and HTAP are clear, accurate, and properly formatted with appropriate documentation links where needed.


129-131: ⚠️ Potential issue

Fix grammatical errors and typos in HTTP entry

There are several issues in the text that need to be corrected:

-Hypertext Transfer Protocol (HTTP) is a application-layer protocol using for transmitting hypermedia dacuments, such as HTML.
+Hypertext Transfer Protocol (HTTP) is an application-layer protocol used for transmitting hypermedia documents, such as HTML.

Likely invalid or redundant comment.

🧰 Tools
🪛 LanguageTool

[misspelling] ~131-~131: Use “an” instead of ‘a’ if the following word starts with a vowel sound, e.g. ‘an article’, ‘an hour’.
Context: ... Hypertext Transfer Protocol (HTTP) is a application-layer protocol using for tr...

(EN_A_VS_AN)

@qiancai
Copy link
Owner Author

qiancai commented Apr 21, 2025

/@gemini-code-assist help me review this PR

@gemini-code-assist
Copy link

It looks like you're asking me to review this pull request using the /gemini review command. However, it seems like you may have made a mistake in the command syntax. To properly invoke a review, please use the command /gemini review in a pull request comment. For a list of available commands, you can use /gemini help.

@qiancai
Copy link
Owner Author

qiancai commented Apr 21, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several new terms to the glossary, updates alert rules, and adds the AUTO_RANDOM attribute for BIGINT columns. Overall, the changes seem well-organized and contribute to the documentation's comprehensiveness. However, some minor improvements can be made to enhance clarity and consistency.

Merge Readiness

The pull request is almost ready for merging. Addressing the low and medium severity issues would further improve the quality of the documentation. I am unable to approve this pull request, and recommend that others review and approve this code before merging.

Synced from: qiancai/docs-cn#15
Target PR: #31
AI Provider: gemini

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@github-actions
Copy link

Auto-sync completed successfully

Source PR: qiancai/docs-cn#15
Target PR: #31
English documentation has been updated based on Chinese documentation changes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 04ce8c8 and 8b53f4a.

📒 Files selected for processing (1)
  • system-variables.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.md

⚙️ CodeRabbit configuration file

As a technical writer in the database area, you need to first learn the changes in this PR, and then improve the content to make it clear, concise, and easy to understand in English according to your database knowledge.

Files:

  • system-variables.md
🪛 markdownlint-cli2 (0.18.1)
system-variables.md

1795-1795: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)

Comment on lines 1795 to 1804
### `tidb_opt_selectivity_factor` <span class="version-mark">Introduced in v9.0.0</span>

- Scope: SESSION | GLOBAL
- Is persisted to the cluster: Yes
- Is controlled by Hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes
- Type: Floating-point number
- Value range: `[0, 1]`
- Default value: `0.8`
- This variable is used to specify the default selectivity of the TiDB optimizer. In some cases, when the optimizer cannot derive the predicate selectivity based on statistics, the optimizer uses this default selectivity as an alternative value. **It is not recommended** to modify this value.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix misplaced heading, style, and MD022; move to the tidb_opt_ section and standardize labels.*

  • The new variable block is inserted inside the tidb_dml_batch_size section, breaking the structure and triggering MD022 (missing blank line).
  • Labels and formatting deviate from house style; heading should not be code-formatted; use “New in …” and standard field names.
  • Alphabetical order: tidb_opt_selectivity_factor belongs with other tidb_opt_* variables (near tidb_opt_seek_factor).

Apply:

  1. Remove the misplaced block here.
  2. Add the corrected block after tidb_opt_seek_factor (see second diff).
-### `tidb_opt_selectivity_factor` <span class="version-mark">Introduced in v9.0.0</span>
-
-- Scope: SESSION | GLOBAL
-- Is persisted to the cluster: Yes
-- Is controlled by Hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes
-- Type: Floating-point number
-- Value range: `[0, 1]`
-- Default value: `0.8`
-- This variable is used to specify the default selectivity of the TiDB optimizer. In some cases, when the optimizer cannot derive the predicate selectivity based on statistics, the optimizer uses this default selectivity as an alternative value. **It is not recommended** to modify this value.

Add (outside this hunk; see next diff) the corrected, relocated block.


Additional change (relocation + style fix):
```diff
@@
 ### tidb_opt_seek_factor
 ...
 
+### tidb_opt_selectivity_factor <span class="version-mark">New in v9.0.0</span>
+
+- Scope: SESSION | GLOBAL
+- Persists to cluster: Yes
+- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes
+- Type: Float
+- Range: `[0, 1]`
+- Default value: `0.8`
+- Sets the fallback selectivity the optimizer uses when predicate selectivity cannot be derived from statistics. Changing this can affect cardinality estimates and plan selection. It is not recommended to modify this value; prefer collecting/updating statistics instead.
+
 ### tidb_opt_skew_distinct_agg

Please confirm:

  • The canonical variable name and visibility (SESSION | GLOBAL) match TiDB implementation.
  • The “New in v9.0.0” marker is correct for this doc version.
  • Whether a cross‑reference to tidb_default_string_match_selectivity is desired.

I need to gather information from the codebase to verify this review comment. Let me inspect the file structure and formatting.


🏁 Script executed:

#!/bin/bash
# Locate system-variables.md and check the content around lines 1795-1804
fd -t f "system-variables.md" | head -1 | xargs -I {} sh -c 'echo "=== File: {} ===" && wc -l {} && echo "" && echo "=== Lines 1790-1810 ===" && sed -n "1790,1810p" {}'

Length of output: 1908


🏁 Script executed:

#!/bin/bash
# Find tidb_dml_batch_size and tidb_opt_seek_factor sections to understand structure
fd -t f "system-variables.md" | head -1 | xargs -I {} sh -c 'echo "=== tidb_dml_batch_size section ===" && sed -n "/^### tidb_dml_batch_size/,/^### /p" {} | head -20 && echo "" && echo "=== tidb_opt_seek_factor section ===" && sed -n "/^### tidb_opt_seek_factor/,/^### /p" {} | head -25'

Length of output: 674


🏁 Script executed:

#!/bin/bash
# Check formatting patterns for other tidb_opt_* variables
fd -t f "system-variables.md" | head -1 | xargs -I {} sh -c 'grep -n "^### tidb_opt_" {} | head -15'

Length of output: 795


🏁 Script executed:

#!/bin/bash
# Check the version marker style used in the file
fd -t f "system-variables.md" | head -1 | xargs -I {} sh -c 'echo "=== Version marker patterns ===" && grep -n "version-mark" {} | head -10'

Length of output: 963


🏁 Script executed:

#!/bin/bash
# Verify standard field labels by checking a nearby variable entry
fd -t f "system-variables.md" | head -1 | xargs -I {} sh -c 'echo "=== tidb_opt_seek_factor full entry ===" && sed -n "/^### tidb_opt_seek_factor/,/^### /p" {} | head -15'

Length of output: 527


Fix structural break, formatting, and alphabetical order of tidb_opt_selectivity_factor.

The variable block is incorrectly placed inside the tidb_dml_batch_size section (lines 1803–1811), breaking the document structure and leaving tidb_dml_batch_size without a body. Additionally, the formatting deviates from house style used throughout the file:

  • Version marker should be New in v9.0.0 (not Introduced), consistent with all other entries.
  • Field labels should use Persists to cluster: and Applies to hint: (not Is persisted to the cluster: / Is controlled by Hint:), matching tidb_opt_seek_factor and surrounding entries.
  • Type should be Float (not Floating-point number); Range should be Range: (not Value range:).
  • Variable must be relocated to its alphabetically correct position after tidb_opt_seek_factor.

Actions:

  1. Remove the misplaced block at lines 1803–1811.
  2. Insert the corrected block after tidb_opt_seek_factor with standardized formatting.
-### `tidb_opt_selectivity_factor` <span class="version-mark">Introduced in v9.0.0</span>
-
-- Scope: SESSION | GLOBAL
-- Is persisted to the cluster: Yes
-- Is controlled by Hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes
-- Type: Floating-point number
-- Value range: `[0, 1]`
-- Default value: `0.8`
-- This variable is used to specify the default selectivity of the TiDB optimizer. In some cases, when the optimizer cannot derive the predicate selectivity based on statistics, the optimizer uses this default selectivity as an alternative value. **It is not recommended** to modify this value.

Insert after tidb_opt_seek_factor:

+### tidb_opt_selectivity_factor <span class="version-mark">New in v9.0.0</span>
+
+- Scope: SESSION | GLOBAL
+- Persists to cluster: Yes
+- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes
+- Type: Float
+- Range: `[0, 1]`
+- Default value: `0.8`
+- Sets the fallback selectivity the optimizer uses when predicate selectivity cannot be derived from statistics. Changing this can affect cardinality estimates and plan selection. It is not recommended to modify this value; prefer collecting or updating statistics instead.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

1795-1795: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
In system-variables.md around lines 1795–1804 (and specifically remove the
misplaced block at lines 1803–1811), the tidb_opt_selectivity_factor entry is
incorrectly nested under tidb_dml_batch_size and uses nonstandard
labels/formatting; delete that misplaced block and insert a new tidy entry
immediately after tidb_opt_seek_factor (to maintain alphabetical order) using
house style: "New in v9.0.0", "Scope: SESSION | GLOBAL", "Persists to cluster:
Yes", "Applies to hint: Yes", "Type: Float", "Range: [0, 1]", "Default value:
0.8", and a one-sentence description that it specifies the default optimizer
selectivity used when statistics are unavailable and is not recommended to
change.

@github-actions
Copy link

Auto-sync completed successfully

Source PR: qiancai/docs-cn#15
Target PR: #31
English documentation has been updated based on Chinese documentation changes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
br/br-pitr-guide.md (1)

120-121: Consider simplifying the wording for better readability.

The explanation is clear and accurate. However, the phrase "the table mode of the target table" could be simplified to "the target table mode" for conciseness. Additionally, breaking the long paragraph into two sentences might improve readability:

"During data restore, the target table is automatically set to restore mode, which prevents all read and write operations. After restore completes, the mode automatically reverts to normal to resume normal operations. This mechanism ensures task stability and data consistency."

br/br-snapshot-guide.md (1)

106-107: Consider simplifying the wording for better readability.

The explanation matches the one in br-pitr-guide.md, which is good for consistency. The same optional improvement applies: "the table mode of the target table" could be simplified to "the target table mode" for conciseness.

ddl_embedded_analyze.md (2)

15-44: Consider breaking up the long explanation paragraph.

The usage scenario is well-explained with a concrete example. However, the paragraph at line 42 is quite long (6 sentences) and covers multiple concepts: the problem, heuristic behavior, potential issues, TiDB improvements, and the solution. Consider breaking it into 2-3 shorter paragraphs for better readability:

  1. First paragraph: Explain the problem and heuristic behavior
  2. Second paragraph: Mention TiDB v8.5.0 improvements
  3. Third paragraph: Introduce the embedded ANALYZE solution and the system variable

46-111: Well-documented ADD INDEX section with comprehensive examples.

The section effectively demonstrates the feature with clear examples and verification steps. The timeout mechanism explanation is important and well-placed. The progression from DDL execution to verification (EXPLAIN → SHOW STATS_HISTOGRAMS → ADMIN SHOW DDL JOBS) provides a complete picture of the feature's behavior.

One minor suggestion: Consider adding a brief explanation of what "Reorg phase" means for readers unfamiliar with DDL internals, or link to relevant documentation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b53f4a and 4d30fcc.

📒 Files selected for processing (8)
  • OWNERS (1 hunks)
  • OWNERS_ALIASES (2 hunks)
  • TOC-tidb-cloud.md (1 hunks)
  • TOC.md (1 hunks)
  • br/br-pitr-guide.md (1 hunks)
  • br/br-snapshot-guide.md (1 hunks)
  • ddl_embedded_analyze.md (1 hunks)
  • dm/dm-compatibility-catalog.md (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.md

⚙️ CodeRabbit configuration file

As a technical writer in the database area, you need to first learn the changes in this PR, and then improve the content to make it clear, concise, and easy to understand in English according to your database knowledge.

Files:

  • dm/dm-compatibility-catalog.md
  • ddl_embedded_analyze.md
  • br/br-pitr-guide.md
  • TOC-tidb-cloud.md
  • TOC.md
  • br/br-snapshot-guide.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: tidb-cloud-check
  • GitHub Check: tidb-check
🔇 Additional comments (7)
dm/dm-compatibility-catalog.md (1)

23-25: LGTM! Clear compatibility documentation.

The new MySQL version compatibility entries are well-structured and informative. The GitHub issue reference for MySQL 8.4 incompatibility is particularly helpful for users encountering compatibility issues.

OWNERS (1)

1-8: LGTM! Clean migration to filter-based ownership model.

The restructuring from static lists to a filter-based schema is well-implemented. The references to sig-community-approvers and sig-community-reviewers align with the groups defined in OWNERS_ALIASES, and the updated documentation URL is current.

OWNERS_ALIASES (1)

1-51: LGTM! Well-organized ownership groups.

The new sig-community-reviewers and sig-community-approvers groups are properly structured and support the filter-based ownership model in the OWNERS file. The header comments clearly explain the syncing mechanism.

ddl_embedded_analyze.md (2)

1-14: Clear introduction with proper context.

The front matter and introduction are well-structured. The version mark is correctly placed, and the explanation of the two DDL statement types is clear. The benefit statement effectively communicates why this feature is important.


112-176: Consistent and comprehensive documentation for index reorganization DDLs.

The section maintains excellent parallel structure with the ADD INDEX section, making it easy for readers to understand the consistent behavior across different DDL types. The MODIFY COLUMN example effectively demonstrates the feature, and the verification steps confirm proper statistics collection.

TOC.md (1)

1091-1091: LGTM! Well-placed TOC entry.

The new entry is properly placed in the Reference section near related DDL content, with consistent formatting and correct link path.

TOC-tidb-cloud.md (1)

842-842: LGTM! Appropriate placement in TiDB Cloud TOC.

The entry is well-positioned adjacent to DDL-related documentation with consistent formatting and correct link path.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
system-variables.md (1)

1795-1804: Fix misplaced tidb_opt_selectivity_factor, restore structure, and standardize labels.

  • The block is inserted inside the tidb_enable_fast_create_table section, breaking document structure.
  • Standardize labels: “Persists to cluster”, “Applies to hint”, “Type: Float”, “Range”.
  • Heading should not be code-formatted and use “New in …”.
  • Relocate alphabetically after tidb_opt_seek_factor.

Apply (remove here):

-### `tidb_opt_selectivity_factor` <span class="version-mark">Introduced in v9.0.0</span>
-
-- Scope: SESSION | GLOBAL
-- Persist to cluster: Yes
-- Controlled by Hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes
-- Type: Floating-point number
-- Range: `[0, 1]`
-- Default value: `0.8`
-- This variable is used to specify the default selectivity of the TiDB optimizer. In some cases, when the optimizer cannot derive the predicate selectivity based on statistics, the optimizer uses this default selectivity as an alternative value. **It is not recommended** to modify this value.

Insert after the ### tidb_opt_seek_factor entry:

+### tidb_opt_selectivity_factor <span class="version-mark">New in v9.0.0</span>
+
+- Scope: SESSION | GLOBAL
+- Persists to cluster: Yes
+- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes
+- Type: Float
+- Range: `[0, 1]`
+- Default value: `0.8`
+- Sets the fallback selectivity the optimizer uses when predicate selectivity cannot be derived from statistics. Changing this can affect cardinality estimates and plan selection. It is not recommended to modify this value; prefer collecting or updating statistics instead.
🧹 Nitpick comments (1)
system-variables.md (1)

1659-1666: Add missing Type and keep alphabetical order for system variables.

  • Add “Type: Boolean” to this entry to match house style.
  • Consider relocating this variable under the tidb_stats_* section to maintain alphabetical order across tidb_* variables.

Apply:

 ### tidb_stats_update_during_ddl <span class="version-mark">New in v8.5.4 and v9.0.0</span>

 - Scope: GLOBAL
 - Persists to cluster: Yes
 - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No
+- Type: Boolean
 - Default value: `OFF`
 - This variable controls whether to enable DDL-embedded `ANALYZE`. When enabled, DDL statements that create new indexes ([`ADD INDEX`](/sql-statements/sql-statement-add-index.md)) or reorganize existing indexes ([`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md)) automatically collect statistics before the index becomes visible. For more information, see [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md).
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d30fcc and 11b518c.

📒 Files selected for processing (1)
  • system-variables.md (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.md

⚙️ CodeRabbit configuration file

As a technical writer in the database area, you need to first learn the changes in this PR, and then improve the content to make it clear, concise, and easy to understand in English according to your database knowledge.

Files:

  • system-variables.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants