-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MINOR] Server heartbeat to coordinator while unregister shuffle #2132
base: master
Are you sure you want to change the base?
Conversation
Test Results 2 950 files - 6 2 950 suites - 6 6h 14m 38s ⏱️ - 12m 37s For more details on these failures, see this check. Results for commit 770177a. ± Comparison against base commit c1bfa04. ♻️ This comment has been updated with latest results. |
8d6a4b0
to
3e4ce0c
Compare
I hope the heartbeat could not be invoked by other thread like the unrelated unregister operations. If you want the latest info, you can decrease the heartbeat interval of server -> coordinator. WDYT? @maobaolong |
@zuston The affect of decrease interval can make it better but not a fundamental solution, the latest info could be lost also, and the frequency heartbeat could make Coordinator busy. And could you elaborate a little more on what concern from your side if heartbeat invoked by another thread? |
External client operation will control the frequency of internal heartbeat, this is unreasonable and danger. |
@zuston Hmm, well, is there another way to send the latest info? Without the latest info, the app information is not completed. Do you think it could be better if I create a new RPC method to talk to coordinator for the latest info? |
3e4ce0c
to
886bd76
Compare
@zuston Thanks for the discussion offline. I think I need to introduce the motivation of this PR. The motivation of this PR aimed to resolve the scenario for our regression test, it is a simple test with tiny shuffle data as the description given, without this PR, the coordinator cannot collect any application information. For the production scenario, it could be a little better, since lots of app have a huge mount of shuffle data, but the risky of lost last heartbeat still exist. As a tradeoff, I add a configure option(default to false to keep the original behavior), it can be set to true if we need to get the exact information of application, and keep to false by default. |
ping @zuston |
@@ -719,6 +719,11 @@ public class ShuffleServerConf extends RssBaseConf { | |||
.booleanType() | |||
.defaultValue(false) | |||
.withDescription("Whether to enable app detail log"); | |||
public static final ConfigOption<Boolean> SERVER_TRIGGER_REPORT_WHILE_UNREGISTER_ENABLED = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this option rename to rss.server.heartbeatReportOnUnregisterEnabled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
0b12740
to
7c3289c
Compare
@zuston Thanks for your suggestion, renamed by last commit . PTAL |
9e8042a
to
770177a
Compare
What changes were proposed in this pull request?
Server send heartbeat to coordinator while receive unregister shuffle request.
Why are the changes needed?
Without this PR, server could not heartbeat the updated app info after unregister this app, so the coordinator and dashboard could display the outdate information.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Tested by a tiny spark job executed by spark-shell