-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support MicrobatchConcurrency capability #1259
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-snowflake contributing guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📉 🎩 I've done a bunch of benchmarking around this enablement! Setup
Resultsbatch size = 1 row
batch size = 1k rows^ no significant difference to 1 row result batch size = 10k rows
batch size = 50k rows
Validation:
ConclusionsIt's clear that there is some amount of overhead associated with enabling concurrency (longer avg runs for individual batches), with the benefit of significantly reduced. It appears that this overhead is non-linear however, and that it should decline relative to the overall runtime the larger the batch to be merged is. I believe we should proceed with enabling concurrency for dbt-snowflake, and document that the benefit of running models concurrently is faster overall backfilling capabilities, with the side effect of longer-running batches, as some amount of overhead is incurred by the platform to merge into the main dataset safely + concurrently. |
(cherry picked from commit 86cf6e6)
resolves #1260
docs dbt-labs/docs.getdbt.com/#
Problem
dbt-snowflake does not yet support running the microbatch incremental strategy in concurrent threads.
Solution
batch.id
is available (in microbatch model context)Checklist