Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats inner/inter zone network traffic for mpp tasks #9747

Merged
merged 20 commits into from
Jan 7, 2025

Conversation

yibin87
Copy link
Contributor

@yibin87 yibin87 commented Dec 27, 2024

What problem does this PR solve?

Issue Number: close #9748

Problem Summary:

What is changed and how it works?

Inner/inter zone network traffic is priced differently in cloud, thus trace each mpp query's such info will help users to locate the most network-expensive query.
This PR stats thress kinds of network traffic:

  1. Exchange sender
  2. Exchange receiver
  3. Remote table reader using coprocessor request

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    Currently, set same and different zone labels for tiflash proxy config, and hack DAGStorageInterpreter to always use remote coprocessor read. Then check the MPPTaskStatistics.cpp output to see if local/inner/inter zone label is set correctly.
    When tidb side change is merged, will check slow log and statements summary.
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels Dec 27, 2024
@yibin87
Copy link
Contributor Author

yibin87 commented Dec 27, 2024

/hold

@ti-chi-bot ti-chi-bot bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 27, 2024
@yibin87
Copy link
Contributor Author

yibin87 commented Jan 2, 2025

/cc @SeaRise @JinheLin @windtalker

@ti-chi-bot ti-chi-bot bot requested review from JinheLin and windtalker January 2, 2025 02:24
Copy link
Contributor

ti-chi-bot bot commented Jan 2, 2025

@yibin87: GitHub didn't allow me to request PR reviews from the following users: SeaRise.

Note that only pingcap members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @SeaRise @JinheLin @windtalker

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@yibin87 yibin87 requested a review from JinheLin January 2, 2025 09:53
@yibin87
Copy link
Contributor Author

yibin87 commented Jan 3, 2025

/unhold

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 3, 2025
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 3, 2025
@yibin87 yibin87 force-pushed the network_stats branch 2 times, most recently from 5b96f79 to e1de416 Compare January 3, 2025 03:59
@yibin87
Copy link
Contributor Author

yibin87 commented Jan 3, 2025

/cc @xzhangxian1008

@ti-chi-bot ti-chi-bot bot requested a review from xzhangxian1008 January 3, 2025 04:59
Copy link
Contributor

@xzhangxian1008 xzhangxian1008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other LGTM

inter_zone_receive_bytes += bytes;
break;
default:
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can throw exception if encountering an unknown type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated.

inter_zone_send_bytes += bytes;
break;
default:
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 6, 2025
@@ -311,7 +322,7 @@ class CoprocessorReader
return toResult(result_pair, block_queue, header);
}

static size_t getSourceNum() { return 1; }
static size_t getSourceNum() { return 2; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I‘m not sure if the change will bring some side affects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked that it's only used in CoprocessorReaderSourceOp as the IOProfile's connection info num.


String store_zone_label;
auto kv_store = tmt.getKVStore();
if likely (kv_store)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work for the compute node in disagg-arch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for disagg-arch, compute node, kv_store is nullptr.

}
String LocalTableScanDetail::toJson() const
{
return fmt::format(R"({{"is_local":false,"bytes":{},{}}})", bytes, time_detail.toJson());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is_local is false for LocalTableScanDetail?

Copy link
Contributor Author

@yibin87 yibin87 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated.

@yibin87 yibin87 requested a review from windtalker January 7, 2025 01:43
Signed-off-by: yibin <[email protected]>
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added the lgtm label Jan 7, 2025
Copy link
Contributor

ti-chi-bot bot commented Jan 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: windtalker, xzhangxian1008

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jan 7, 2025
Copy link
Contributor

ti-chi-bot bot commented Jan 7, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-01-06 02:56:50.626658828 +0000 UTC m=+149553.915490532: ☑️ agreed by xzhangxian1008.
  • 2025-01-07 01:59:47.836122279 +0000 UTC m=+232531.124953983: ☑️ agreed by windtalker.

@ti-chi-bot ti-chi-bot bot merged commit 330a709 into pingcap:master Jan 7, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stats inner/inter zone network traffic for mpp tasks
4 participants