Releases: bodo-ai/Bodo
2025.12.1
What's Changed
- Handle tuple column selection in df.loc by @ehsantn in #968
- Fix Conda Build Pipeline by @scott-routledge2 in #966
- Skip tests to unblock PR CI by @scott-routledge2 in #967
- Avoid double cast for integer division by @ehsantn in #970
- Support negative n in df.head() by @ehsantn in #972
- 2025.12 Release Notes by @scott-routledge2 in #971
- Fix slicing with negative step by @ehsantn in #973
- Keep input timestamp unit in Series.dt.tz_localize by @ehsantn in #974
- Fix testing issues on Nightly by @scott-routledge2 in #969
- Cast binary op result to output type by @ehsantn in #976
- Fix output Series.name for pd.to_datetime() by @ehsantn in #977
- Support concurrent async messages to the same rank in shuffle by @scott-routledge2 in #975
- Skip Iceberg DDL tests on PR CI by @scott-routledge2 in #980
- TPCH benchmarking improvements by @scott-routledge2 in #964
- Avoid using arrow types for to_datetime by @scott-routledge2 in #983
- Upgrade to Numba 0.63.1 by @ehsantn in #978
- AWS titan-text eol by @IsaacWarren in #984
- Fixes for TPCH scripts by @scott-routledge2 in #986
- Fix overflow in parquet read cardinality est by @scott-routledge2 in #988
Full Changelog: 2025.12...2025.12.1
2025.12
Bodo 2025.12 Release (Date: 12/11/2025) {#December_2025}
🎉 Highlights
This release, we are excited to add Join Filters in the plan optimizer, significantly improving performance on real workloads. We also improve Bodo's timezone support and fix several minor bugs.
✨ New Features
- Support datetime.datetime in query plans.
- Validate repl argument of Series.str.replace same as Pandas.
- Improve concat output order.
- Support Series.take().
- Support timezones in to_datetime.
🏎️ Performance Improvements
- Add Join Filters to remove rows with keys that won’t match a join key as early as possible.
- Box/unbox date arrays using Arrow.
- Box/unbox time arrays using Arrow.
🐛 Bug Fixes
- Avoid hang in scattering BodoSeries.
- Fix explode bug for Arrow large list type.
- Allow int64/uint64 mismatch in some internal data structures.
- Fix for ArrowExtensionArray iloc indexing.
- Fix timezone in convert_dtypes.
- Fix for nested join.
- Fixed remove unused column pass to keep column references alive in conjunction with duckdb version upgrade.
- Better support for operations that result in empty dataframes.
Full Changelog: 2025.11.2...2025.12
2025.11.2
What's Changed
- Pipeline reordering. by @DrTodd13 in #933
- NaN/NA handling. by @DrTodd13 in #926
- 2025.11 Release notes by @IsaacWarren in #936
- Numba 0.62.1 upgrade by @scott-routledge2 in #934
- Add metrics for Iceberg Read/Parquet Read by @scott-routledge2 in #935
- More fixes for narwhals tests. by @DrTodd13 in #937
- Filter manifest files based on partition summaries by @scott-routledge2 in #938
- Pin Numba to 0.62 by @scott-routledge2 in #940
- Increase timeout for BodoSQL smoke test by @IsaacWarren in #939
- Support np.ufunc calls on BodoSeries by @ehsantn in #941
- Todd/prune fix by @DrTodd13 in #942
Full Changelog: 2025.11.1...2025.11.2
2025.11.1
What's Changed
- Support groupby.size() (with no value cols) by @scott-routledge2 in #904
- Don't redistribute data twice in example by @IsaacWarren in #912
- Fix test_basic_iceberg_read path issue on nightly by @ehsantn in #916
- CTE column pruning by @DrTodd13 in #910
- First attempt at adding Narwhals to test suite. by @DrTodd13 in #918
- Avoid JIT imports in BodoSQL C++ backend by @ehsantn in #907
- Support passing BodoDataFrames to BodoSQL C++ backend without extra plan executions by @ehsantn in #919
- Timezone Support by @scott-routledge2 in #915
- Support creating empty DataFrames by @ehsantn in #921
- Fixes for test_unique narwhals test by @scott-routledge2 in #920
- Change param name to not conflict on windows. by @DrTodd13 in #924
- Expose pandas.Timestamp in bodo.pandas by @scott-routledge2 in #922
- Fix BodoSQLContext use inside JIT by @ehsantn in #928
- Fix drop_duplicates() for non-trivial Indexes by @ehsantn in #925
- Run Narwhals tests on a single worker and fix issues by @scott-routledge2 in #923
- BSE-5174: Duckdb Planner Upgrade by @IsaacWarren in #917
- Support specifying Glue Catalog in pd.read_sql_table by @scott-routledge2 in #929
- BSE-5206: Fix windows duckdb by @IsaacWarren in #930
- Support passing BodoSQL as an arg to JIT by @scott-routledge2 in #932
Full Changelog: 2025.11.0...2025.11.1
2025.11.0
What's Changed
- Add release notes for 2025.10.1 by @scott-routledge2 in #890
- Fix docker release files symlink target by @IsaacWarren in #894
- TPCH improvements. by @DrTodd13 in #893
- Support TPC-H Q5 in BodoSQL C++ backend by @ehsantn in #886
- Support filesystem Iceberg catalog in BodoSQL C++ backend by @ehsantn in #895
- Support Iceberg filter/project/limit in BodoSQL C++ backend by @ehsantn in #898
- Skip pandas ddp example by @IsaacWarren in #892
- Improvements based on Narwhals tests. by @DrTodd13 in #897
- Capture API usage. by @DrTodd13 in #896
- Add all duckdb timestamp types by @IsaacWarren in #900
- Call tokenize with just a tokenizer by @IsaacWarren in #901
- Support join filters in BodoSQL C++ backend by @ehsantn in #902
- Finetune on Iceberg Data by @IsaacWarren in #903
- Update demo notebook by @scott-routledge2 in #908
- Fix two narwhals issues. by @DrTodd13 in #906
- Upgrade to arrow 22 by @scott-routledge2 in #905
- Remove Python 3.9 in a few places by @ehsantn in #911
- Add iceberg marker to test by @ehsantn in #913
Full Changelog: 2025.10.2...2025.11.0
2025.10.2
What's Changed
- Remove reindex check in publish_binary scripts by @scott-routledge2 in #889
- Refactor Guides Docs by @scott-routledge2 in #882
- BSE-5132: prepare_dataset by @IsaacWarren in #873
Full Changelog: 2025.10.1...2025.10.2
2025.10.1
What's Changed
- Speed up TPCH. by @DrTodd13 in #856
- Convert to BodoDataFrame/BodoSeries on fallback by @scott-routledge2 in #855
- Add support for running str.match in arrow compute. by @DrTodd13 in #865
- Add C++ backend for BodoSQL by @ehsantn in #861
- Support Parquet read in BodoSQL C++ backend by @ehsantn in #866
- Avoid converting output to DF lib to fix dev docs test by @ehsantn in #867
- BSE-5119: torch trainer by @IsaacWarren in #846
- Overload array dunder method to convert BodoDataFrames of floats/ints to ndarrays with the correct dtype by @scott-routledge2 in #868
- Support join in BodoSQL C++ backend by @ehsantn in #869
- Initial filter support in BodoSQL C++ backend by @ehsantn in #871
- Skip slice test by @IsaacWarren in #872
- Initial groupby support for BodoSQL C++ backend by @ehsantn in #874
- Combine chunks before passing table to arrow_table_to_bodo by @scott-routledge2 in #877
- Support large string types in AI functions by @scott-routledge2 in #878
- First batch of narwhals support. by @DrTodd13 in #875
- Fix copy elision compile error on Mac by @ehsantn in #883
- Initial sort support for BodoSQL C++ backend by @ehsantn in #884
- Todd/rmod fix by @DrTodd13 in #876
- Distributed training example by @IsaacWarren in #879
- Skip artifactory upload except for platform package by @scott-routledge2 in #885
Full Changelog: 2025.10...2025.10.1
2025.10
Bodo 2025.10 Release (Date: 10/03/2025)
🎉 Highlights
This release, we are excited to significantly improve the responsiveness of Bodo DataFrames with lazy JIT imports, optimize performance with Common Table Expressions (CTEs), as well as upgrade to Arrow 21.
✨ New Features
- Getting the length of a BodoDataFrame or BodoSeries now returns a lazily evaluated BodoScalar.
- Add support for subset argument to drop_duplicates.
🏎️ Performance Improvements
- Support lazy BodoScalar binary operations for better optimizations.
- Recognize duplicate computations in execution trees and execute them only once using Common Table Expressions (CTEs).
- Support internal gather/scatter calls without JIT for faster response times.
- Support Iceberg read/write without JIT import for faster response times.
⚙️ Dependency Changes
- Upgraded Arrow dependency to 21.0.
2025.9
Bodo 2025.9 Release (Date: 09/18/2025)
🎉 Highlights
This release, we are excited to significantly improve the import time of the Bodo package, as well as introduce new features like Series.where support and lazy BodoScalars.
✨ New Features
- Bodo DataFrames now imports the JIT compiler lazily only when necessary, which reduces import time substantially.
- Support for Series.where().
- Series reductions such as “sum” or “max” now produce a BodoScalar that is evaluated lazily and can be used in some operations such as Series.where() and filter expressions without execution.
- Optimized support for “not in series” cases like
df[~df.A.isin(df.B)]using anti-join. - Support for bodo.pandas uses inside JIT functions.
- Anthropic models used through AWS Bedrock now use Anthropic’s messages API to support newer versions of Claude.
🐛 Bug Fixes
- Fix for join non-equi condition keys that are not part of the output.
- Fix for Series expression with non-range Indexes.
🏎️ Performance Improvements
- Improved the initialization time for cfuncs used in the acceleration of user defined functions in Series.map and DataFrame.apply calls.
⚙️ Dependency Changes
- Added upper bound to Numba dependency to avoid issues with version 0.62.
2025.8.2
New Features
Support for AWS Bedrock backend for llm_generate and embed.
Support passing user defined functions that return scalars to groupby.agg and groupby.apply
Support renaming DataFrame column using df.columns = [...] syntax
Add API map_partition_with_state to DataFrame that allows you to do a one-time initialization of state on each worker which can then be used to map batches of rows from a DataFrame to produce a new DataFrame.
Added JIT fallback to Bodo DataFrames such that operations not supported natively in DataFrames can use the equivalent operation from Bodo engine.
Performance Improvements
Improve Series.quantile/describe performance.
Improve the performance of fetching row counts for Parquet datasets
Improve package import time and worker spinup time substantially
Bug Fixes
Fix a crash with llm_generate and embed in Jupyter Notebooks/when an asyncio executor is already running.
Fix OpenAI environment variables not being sent to workers.
Fix bug in loss computation when fitting LogisticRegression in parallel.
Fix crash when running map/apply on large numbers of workers