Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microbatch first last batch serial #11072

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

MichelleArk
Copy link
Contributor

Resolves #

Problem

Solution

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

@cla-bot cla-bot bot added the cla:yes label Nov 28, 2024
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.11%. Comparing base (1b7d9b5) to head (37dbd11).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11072      +/-   ##
==========================================
- Coverage   89.13%   89.11%   -0.02%     
==========================================
  Files         183      183              
  Lines       23783    23797      +14     
==========================================
+ Hits        21198    21206       +8     
- Misses       2585     2591       +6     
Flag Coverage Δ
integration 86.42% <86.20%> (-0.03%) ⬇️
unit 62.14% <13.79%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.14% <13.79%> (-0.03%) ⬇️
Integration Tests 86.42% <86.20%> (-0.03%) ⬇️

Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woooo! Thank you for doing this work ❤️ I think this will make it so hooks work how people expect, which is incredibly important. I do think there are some changes we can make though to improve the mental model of what happens where for maintainability though.

Comment on lines 721 to 725
# Run first batch runs in serial
relation_exists = self._submit_batch(
node, relation_exists, batches, batch_idx, batch_results, pool, parallel=False
)
batch_idx += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually preferable to the first batch separated out. It makes it mirror how the last batch is separated out. We could also have an optional arg force_sequential (defaulted to False) which would skip the should_run_in_parallel check in _submit_batch

Comment on lines +611 to +613
elif self.batch_idx == 0 or self.batch_idx == len(self.batches) - 1:
# First and last batch don't run in parallel
run_in_parallel = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could also be skipped if we're instead handling force_sequential to determine if we should even check should_run_in_parallel in _submit_batch. It'd be nice for this function to be less dependent on "where" it is, and I think this check breaks that.

Comment on lines +719 to +721
while batch_idx < len(runner.batches) - 1:
relation_exists = self._submit_batch(
node, relation_exists, batches, batch_idx, batch_results, pool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason for splitting out the first batch:
We don't want to do any of the other batches if the first batch fails. By lumping it in the while loop which all other batches (except for the last batch), we lose that. We could add that logic to the loop, however I think that'd crowd the loop logic as it will only ever be logic needed for the first batch result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants