Microbatch first last batch serial #11072

MichelleArk · 2024-11-28T21:53:05Z

Resolves #

Problem

Solution

Checklist

I have read the contributing guide and understand what's expected of me.
I have run this code in development, and it appears to resolve the stated issue.
This PR includes tests, or tests are not required or relevant for this PR.
This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
This PR includes type annotations for new and modified functions.

github-actions · 2024-11-28T21:53:20Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

codecov · 2024-11-28T21:55:25Z

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.11%. Comparing base (1b7d9b5) to head (37dbd11).

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11072      +/-   ##
==========================================
- Coverage   89.13%   89.11%   -0.02%     
==========================================
  Files         183      183              
  Lines       23783    23797      +14     
==========================================
+ Hits        21198    21206       +8     
- Misses       2585     2591       +6

Flag	Coverage Δ
integration	`86.42% <86.20%> (-0.03%)`	⬇️
unit	`62.14% <13.79%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Unit Tests	`62.14% <13.79%> (-0.03%)`	⬇️
Integration Tests	`86.42% <86.20%> (-0.03%)`	⬇️

QMalcolm

Woooo! Thank you for doing this work ❤️ I think this will make it so hooks work how people expect, which is incredibly important. I do think there are some changes we can make though to improve the mental model of what happens where for maintainability though.

QMalcolm · 2024-12-02T19:02:33Z

core/dbt/task/run.py

-        # Run first batch runs in serial
-        relation_exists = self._submit_batch(
-            node, relation_exists, batches, batch_idx, batch_results, pool, parallel=False
-        )
-        batch_idx += 1


I think it's actually preferable to the first batch separated out. It makes it mirror how the last batch is separated out. We could also have an optional arg force_sequential (defaulted to False) which would skip the should_run_in_parallel check in _submit_batch

QMalcolm · 2024-12-02T19:07:08Z

core/dbt/task/run.py

+        elif self.batch_idx == 0 or self.batch_idx == len(self.batches) - 1:
+            # First and last batch don't run in parallel
+            run_in_parallel = False


This check could also be skipped if we're instead handling force_sequential to determine if we should even check should_run_in_parallel in _submit_batch. It'd be nice for this function to be less dependent on "where" it is, and I think this check breaks that.

QMalcolm · 2024-12-02T19:09:14Z

core/dbt/task/run.py

+        while batch_idx < len(runner.batches) - 1:
+            relation_exists = self._submit_batch(
+                node, relation_exists, batches, batch_idx, batch_results, pool


Another reason for splitting out the first batch:
We don't want to do any of the other batches if the first batch fails. By lumping it in the while loop which all other batches (except for the last batch), we lose that. We could add that logic to the loop, however I think that'd crowd the loop logic as it will only ever be logic needed for the first batch result.

MichelleArk added 3 commits November 28, 2024 16:11

microbatch: split out first and last batch to run in serial

0d61609

use Task.get_runner

bec5d57

only run pre_hook on first batch, post_hook on last batch

32002ea

cla-bot bot added the cla:yes label Nov 28, 2024

refactor: internalize parallel to RunTask._submit_batch

37dbd11

QMalcolm reviewed Dec 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microbatch first last batch serial #11072

Microbatch first last batch serial #11072

MichelleArk commented Nov 28, 2024

github-actions bot commented Nov 28, 2024

codecov bot commented Nov 28, 2024 •

edited

Loading

QMalcolm left a comment

QMalcolm Dec 2, 2024

QMalcolm Dec 2, 2024

QMalcolm Dec 2, 2024

Microbatch first last batch serial #11072

Are you sure you want to change the base?

Microbatch first last batch serial #11072

Conversation

MichelleArk commented Nov 28, 2024

Problem

Solution

Checklist

github-actions bot commented Nov 28, 2024

codecov bot commented Nov 28, 2024 • edited Loading

Codecov Report

QMalcolm left a comment

Choose a reason for hiding this comment

QMalcolm Dec 2, 2024

Choose a reason for hiding this comment

QMalcolm Dec 2, 2024

Choose a reason for hiding this comment

QMalcolm Dec 2, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 28, 2024 •

edited

Loading