Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: only enable new string serdes format when MppVersion >= MppVersionV3 #9759

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

JinheLin
Copy link
Contributor

@JinheLin JinheLin commented Jan 2, 2025

What problem does this PR solve?

Issue Number: close #9673

Problem Summary:

  • Before this PR, during the rolling upgrade process, data of exchange string type may be incompatible.

What is changed and how it works?

  • tidb will send an MppVersion field to tiflash.

  • When rolling upgrade, tiflash will be upgraded before tidb.

  • During the process of upgrading tiflash, tidb has not been upgraded yet, so all mpp versions received are MppVersionsV2, tiflash still using the old format to exchange data.

  • When upgrading tidb, the new tidb uses MppVersionsV3 to send requests, and tiflash starts using the new format to exchange data (at this point, all tiflash has been upgraded, so there will be no compatibility issues).

  • This PR pass mpp_version to CHBlockChunkCodec and CHBlockChunkCodecV1 for encoding: if mpp_version <= MppVersion2, use the legacy format of string.

  • When decoding, CHBlockChunkCodec and CHBlockChunkCodecV1 respect to the type name from encoder.

  • PR of tidb is copr: add MppVersionV3 tidb#58652.

    • Merge this PR before merging the PR of tidb.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    • In a cluster with two tiflash servers, upgrade one tiflash, and executing queries which exchanging string data.
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Jan 2, 2025
Copy link
Contributor

ti-chi-bot bot commented Jan 2, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jinhelin, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 2, 2025
@JinheLin
Copy link
Contributor Author

JinheLin commented Jan 2, 2025

PR of tidb is pingcap/tidb#58652.

Merge this PR before merging the PR of tidb.

@JinheLin JinheLin changed the title *: only enable new string serdes format when MppVersion >= MppVersionV3 WIP: *: only enable new string serdes format when MppVersion >= MppVersionV3 Jan 2, 2025
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 2, 2025
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 3, 2025
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 5, 2025
@JinheLin
Copy link
Contributor Author

JinheLin commented Jan 6, 2025

/retest

std::unique_ptr<ChunkCodecStream> ArrowChunkCodec::newCodecStream(const std::vector<tipb::FieldType> & field_types)
std::unique_ptr<ChunkCodecStream> ArrowChunkCodec::newCodecStream(
const std::vector<tipb::FieldType> & field_types,
MppVersion)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add mpp_version for CHBlockChunkCodecStream, useless for other codec.

@JinheLin JinheLin changed the title WIP: *: only enable new string serdes format when MppVersion >= MppVersionV3 *: only enable new string serdes format when MppVersion >= MppVersionV3 Jan 6, 2025
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 6, 2025
Comment on lines 163 to 166
writeStringBinary(CodecUtils::convertDataTypeNameByMppVersion(column.type->getName(), mpp_version), *output);

if (rows)
WriteColumnData(*column.type, column.column, *output, 0, 0);
CHBlockChunkCodec::WriteColumnData(*column.type, column.column, *output, 0, 0, mpp_version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about getting a ser_type and use it for writing down the data type name and also writeColumnData?

auto ser_type = CodecUtils::convertDataTypeNameByMppVersion(column.type->getName(), mpp_version);

writeStringBinary(ser_type, *output);
if (rows)
    CHBlockChunkCodec::WriteColumnData(*ser_type, column.column, *output, 0, 0);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has refined.

Keep convertDataTypeByMppVersion and remove convertDataTypeNameByMppVersion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance deserialization performance of short string.
2 participants