-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-54655][SS] Make StateStore.MaintenanceThreadPool.stop to wait minimum of 60s and maintenanceShutdownTimeout
#53411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // Wait a while for tasks to respond to being cancelled | ||
| if (!threadPool.awaitTermination(60, TimeUnit.SECONDS)) { | ||
| // To avoid long test times, use minimum of timeout value or 60 seconds | ||
| if (!threadPool.awaitTermination(Math.min(60, shutdownTimeout), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the timeout is set to be very low - lets say 1s, we would now wait only for a total of 2s before we can confirm the tpool shutdown. Should we really repurpose the timeout here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal config so should only be set lower than 300s in tests, so shouldn't affect production queries. From running the test locally it looks like sometimes the thread pool isn't shut down even after 60s, so not sure if waiting less time really makes a huge difference.
| // Verify only the partitions in badPartitions doesn't have a snapshot | ||
| verifySnapshotUploadEvents(coordRef, query, badPartitions) | ||
| verifySnapshotUploadEvents(coordRef, query2, badPartitions) | ||
| eventually(timeout(5.seconds)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know this will be enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran the test 20 times and did not see any flakiness. Also the previous Thread.sleep(500) is 500 milliseconds which is far shorter than 5 seconds.
| withCoordinatorAndSQLConf( | ||
| sc, | ||
| SQLConf.SHUFFLE_PARTITIONS.key -> "5", | ||
| SQLConf.SHUFFLE_PARTITIONS.key -> "3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much difference does this make ? Should we do it in other tests also ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the test with the largest runtime so only decided to reduce the shuffle partitions here. It reduces the test time around 16% (from 6s to 5s).
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the misleading PR title. Although the initial intention was about testing time, this PR introduce a major behavior change in src/main, @liviazhu .
@dongjoon-hyun The only way that functionality changes is by setting the internal conf ( If it still counts as a behavior change, if I instead added another internal conf to configure this second timeout, would that be okay? Thanks! |
|
When you touch
As you already described the behavior change like the main body comment, you knew that, right? BTW, it seems that you are confused the review comment, I'm not against your code change. I'm simply asking to revise the PR title clearly. No need to make another code change by mentioning
|
StateStore.MaintenanceThreadPool.stop to wait minimum of 60s and maintenanceShutdownTimeout
StateStore.MaintenanceThreadPool.stop to wait minimum of 60s and maintenanceShutdownTimeoutStateStore.MaintenanceThreadPool.stop to wait minimum of 60s and maintenanceShutdownTimeout
|
Resolved by #53432 |
What changes were proposed in this pull request?
Maintenance thread pool timeout set to minimum of the timeout from SQLConf and 60 seconds. Additionally reduce test time by doing the following:
Why are the changes needed?
Test is very slow currently (>12 min)
Does this PR introduce any user-facing change?
No. Maintenance thread pool timeout change is set by internal conf so does not affect users.
How was this patch tested?
Testing only. Ran 20 times to ensure that test is not flaky. New test time is ~5min.
Was this patch authored or co-authored using generative AI tooling?
No