Skip to content

Commit

Permalink
Merge pull request #4075 from Tmonster/fix_typos_in_optimizers_blog_post
Browse files Browse the repository at this point in the history
fix typos in optimizers blog post
  • Loading branch information
szarnyasg authored Nov 16, 2024
2 parents f474933 + 7c6a3e6 commit a60b3bf
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions _posts/2024-11-14-optimizers.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ Here is a breakdown of running the queries with and without the optimizer as the
| \|orders\| = 10M | 0.240 s | 0.044 s |
| \|orders\| = 100M | 2.266 s | 0.259 s |

At first the different in execution time is not really noticeable, so no one would think a query rewrite would be the solution. But once enough orders are reached, waiting 2 seconds every time the dashboard loads becomes tedious. If the optimizer is enabled, the query performance improves by a factor of 10×. So if you every think you have identified a scenario where you are smarter than the optimizer, make sure you have also thought about all possible updates to the data and have hand-optimized for those as well.
At first the difference in execution time is not really noticeable, so no one would think a query rewrite would be the solution. But once enough orders are reached, waiting 2 seconds every time the dashboard loads becomes tedious. If the optimizer is enabled, the query performance improves by a factor of 10×. So if you ever think you have identified a scenario where you are smarter than the optimizer, make sure you have also thought about all possible updates to the data and have hand-optimized for those as well.

## Optimizations That Are Impossible by Hand

Expand All @@ -266,7 +266,7 @@ With a small change, we can use the query from above to demonstrate this. Suppos

Imagine trying to express this logic in your favorite data frame API; it would be extremely difficult and error-prone. The library would need to implement this optimization automatically for all hash joins. The Join Filter Pushdown optimization can improve query performance by 10x, so it should be a key factor when deciding what analytical system to use.

If you use a data frame library like [collapse](https://github.com/SebKrantz/collapse), [pandas](https://github.com/pandas-dev/pandas), [data.table](https://github.com/Rdatatable/data.table), [modin](https://github.com/modin-project/modin), then you are most likely not enjoying the benefits of query optimization techniques. This means your optimizations need to be applied by hand, which is sustainable if your data starts changing. Moreover, you are most likely writing imperatively, using a syntax specific to the dataframe library. This means the scripts responsible for analyzing data are not very portable. SQL, on the other hand, can be much more intuitive to write since it is a declarative language, and can be ported to practically any other database system.
If you use a data frame library like [collapse](https://github.com/SebKrantz/collapse), [pandas](https://github.com/pandas-dev/pandas), [data.table](https://github.com/Rdatatable/data.table), [modin](https://github.com/modin-project/modin), then you are most likely not enjoying the benefits of query optimization techniques. This means your optimizations need to be applied by hand, which is not sustainable if your data starts changing. Moreover, you are most likely writing imperatively, using a syntax specific to the dataframe library. This means the scripts responsible for analyzing data are not very portable. SQL, on the other hand, can be much more intuitive to write since it is a declarative language, and can be ported to practically any other database system.

## Summary of All Optimizers

Expand Down Expand Up @@ -335,4 +335,4 @@ If there are multiple filters on a column, the order in which these filters are

## Conclusion

A well-written optimizer can provide significant performance improvements when allowed to optimize freely. Not only can the optimizer apply the many optimization rules a human might naturally miss, an optimizer can respond to changes in the data. Some optimizations can result in a performance improvement of 100×, which might be the difference when deciding to use analytical system _A_ vs. analytical system _B_. With DuckDB, all optimization rules are applied automatically to every query, so you can continually enjoy the benefits. Hopefully this blog post has convinced you to consider the optimizer next time you hear about the next database that has everyones ears burning.
A well-written optimizer can provide significant performance improvements when allowed to optimize freely. Not only can the optimizer apply the many optimization rules a human might naturally miss, an optimizer can respond to changes in the data. Some optimizations can result in a performance improvement of 100×, which might be the difference when deciding to use analytical system _A_ vs. analytical system _B_. With DuckDB, all optimization rules are applied automatically to every query, so you can continually enjoy the benefits. Hopefully this blog post has convinced you to consider the optimizer next time you hear about the next database that has everyone's ears burning.

0 comments on commit a60b3bf

Please sign in to comment.