Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for data type storage requirements #4390

Open
kolbe opened this issue Dec 10, 2020 · 1 comment
Open

Add documentation for data type storage requirements #4390

kolbe opened this issue Dec 10, 2020 · 1 comment
Assignees
Labels
lifecycle/frozen Issues with this label will not be labeled as "stale".

Comments

@kolbe
Copy link
Contributor

kolbe commented Dec 10, 2020

Change Request

This repository is ONLY used to solve issues related to DOCS.
For other issues (related to TiDB, PD, etc), please move to other repositories.

Please answer the following questions before submitting your issue. Thanks!

  1. Describe what you find is inappropriate or missing in the existing docs.

MySQL documentation has a lot of detail about "data type storage requirements" (https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html). This discusses per-row and per-column overhead for each data type, affects of data type on storage in on-disk data pages, etc. MySQL also has a number of important behaviors related to multi-byte (and variable-byte) character sets, where, for example, a utf8mb4 column with a length of 255 characters can actually require up to 1020 bytes in some situations.

  1. Describe your suggestion or addition.

We should provide detailed information about how data types in TiDB related to storage requirements.

  • How much overhead is there per-column for each data type?
  • How do multi- and variable-byte character sets affect data storage and various limits?
  • Are there situations where certain data types (text, blob, very large varchar) are stored specially in ways that could affect performance?
  • How are secondary indexes structured and how do they use space?
  • When sorting or performing other tasks that require a "temporary table", how do variable-length and variable-byte data types affect storage utilization?
  • How do variable-length and variable-byte columns affect the allocation of in-memory and on-disk buffers?
  1. Provide some reference materials (documents, websites, etc) if you could.

We should provide information that at minimum parallels the information provided in the MySQL documentation. This will enable users migrating from MySQL to understand how TiDB behaves in comparison. Relevant references from the MySQL documentation include:

https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html
https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html
https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html

@TomShawn TomShawn added the lifecycle/frozen Issues with this label will not be labeled as "stale". label Dec 11, 2020
@TomShawn
Copy link
Contributor

Looks like this document (https://docs.pingcap.com/tidb/dev/data-type-string) needs technical correction and more information for users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Issues with this label will not be labeled as "stale".
Projects
None yet
Development

No branches or pull requests

3 participants