-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microbatch first last batch serial #11072
base: main
Are you sure you want to change the base?
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11072 +/- ##
==========================================
- Coverage 89.13% 89.11% -0.02%
==========================================
Files 183 183
Lines 23783 23797 +14
==========================================
+ Hits 21198 21206 +8
- Misses 2585 2591 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woooo! Thank you for doing this work ❤️ I think this will make it so hooks work how people expect, which is incredibly important. I do think there are some changes we can make though to improve the mental model of what happens where for maintainability though.
# Run first batch runs in serial | ||
relation_exists = self._submit_batch( | ||
node, relation_exists, batches, batch_idx, batch_results, pool, parallel=False | ||
) | ||
batch_idx += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's actually preferable to the first batch separated out. It makes it mirror how the last batch is separated out. We could also have an optional arg force_sequential
(defaulted to False
) which would skip the should_run_in_parallel
check in _submit_batch
elif self.batch_idx == 0 or self.batch_idx == len(self.batches) - 1: | ||
# First and last batch don't run in parallel | ||
run_in_parallel = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check could also be skipped if we're instead handling force_sequential
to determine if we should even check should_run_in_parallel
in _submit_batch
. It'd be nice for this function to be less dependent on "where" it is, and I think this check breaks that.
while batch_idx < len(runner.batches) - 1: | ||
relation_exists = self._submit_batch( | ||
node, relation_exists, batches, batch_idx, batch_results, pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason for splitting out the first batch:
We don't want to do any of the other batches if the first batch fails. By lumping it in the while loop which all other batches (except for the last batch), we lose that. We could add that logic to the loop, however I think that'd crowd the loop logic as it will only ever be logic needed for the first batch result.
Resolves #
Problem
Solution
Checklist