Releases · pola-rs/polars

16 Sep 08:51

github-actions

rs-0.51.0

400ca33

Rust Polars 0.51.0 Latest

Latest

💥 Breaking changes

Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

🚀 Performance improvements

Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
Allocate only for read items when reading Parquet with predicate (#24401)
Don't aggregate groups for strict cast if original len (#24381)
Allocate only for read items when reading Parquet with predicate (#24324)
Native streaming int_range with len or count (#24280)
Lower arg_unique natively to the streaming engine (#24279)
Move unordering optimization to end (#24286)
Do ordering simplification step after common sub-plan elimination (#24269)
Always simplify order requirements in IR (#24192)
Basic de-duplication of filter expressions (#24220)
Cache the IR in pipe_with_schema (#24213)
Lower arg_where natively to streaming engine (#24088)
Lower Expr.shift to streaming engine (#24106)
Lower order-preserving groupby to streaming engine (#24053)
Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
Lower top-k to streaming engine (#23979)
Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)

✨ Enhancements

Roundtrip BinaryOffset type through Parquet (#24344)
Add opt-in unstable functionality to load interval types as Struct (#24320)
Add user guide section on AWS role assumption (#24421)
Support unique / n_unique / arg_unique for array columns (#24406)
Support S3 virtual-hosted–style URI (#24405)
Remove explicit file create for local async writes (#24358)
Support Partitioning sinks in cloud (#24399)
User-friendly error message on empty path expansion (#24337)
Add Polars security policy (#24314)
Allow pl.Expr.log to take in an expression (#24226)
Implement diff() in streaming engine (#24189)
Enable Expr.diff(n) for negative n (#24200)
Allow upcasting null-typed columns to nested column types in scans (#24185)
Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
Add a deprecation warning for pl.Series.shift(Null) (#24114)
Improve Debug formatting of DataType (#24056)
Add cum_* as native streaming nodes (#23977)
Add peak_{min,max} support for booleans (#24068)
Add DataFrame.map_columns for eager evaluation (#23821)
Add native streaming for peaks_{min,max} (#24039)
IR graph arrows, monospace font, box nodes (#24021)
Add DataTypeExpr.default_value (#23973)
Lower rle to a native streaming engine node (#23929)
Add support for Int128 to pyo3-polars (#23959)
Lower rle_id to a native streaming node (#23894)
Pass endpoint_url loaded from CredentialProviderAWS to scan/write_delta (#23812)
Dispatch scan_iceberg to native by default (#23912)
Lower unique_counts and value_counts to streaming engine (#23890)
Implement dt.days_in_month function (#23119)
Fix errors on native scan_iceberg (#23811)
Reinterpret binary data to fixed size numerical array (#22840)
Make rolling_map serializable (#23848)

🐞 Bug fixes

Fix AggState on all_literal in BinaryExpr (#24461)
Replace unsafe with collect (#24494)
Show IR sort options in explain (#24465)
Benchmark CI import (#24463)
Fix schema on ApplyExpr with single row literal in agg context (#24422)
Fix planner schema for dividing pl.Float32 by int (#24432)
Fix panic scanning from AWS legacy global endpoint URL (#24450)
Emit proper tuple for Log in expression nodes (#24426)
Do not propagate struct of nulls with null (#24420)
Be stricter with invalid NDJSON input when ignore_errors=False (#24404)
Implement approx_n_unique for temporal dtypes and Null (#24417)
Correct sink_ipc overload for compression (#24398)
Enable all integer dtypes for by parameter in join_asof (#24384)
Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
Fix incorrect output ordering for row-separable exprs (#24354)
Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120)
Match output type to engine for Struct arithmetic (#23805)
Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
Incorrect logic in negative streaming slice (#24326)
Do not error on non-list Sequence for columns parameter in read_excel (#23967)
Invalid conversion from non-bit numpy bools (#24312)
Make dt.epoch('s') serializable (#24302)
Make Expr.rechunk serializable (#24303)
Schema mismatch for 'log' operation (#24300)
Incorrect first/last aggregate in streaming engine (#24289)
Fix group offsets in sliced groups (#24274)
Panic in inexact date(time) conversion (#24268)
The index_of feature should not depends on the object feature (#24256)
Keep DSL cache after serialization and deserialization (#24265)
Sanitize and warn about eval usage (#24262)
Unique with keep="none" in new optimization pass (#24261)
Correct size limits for Decimal cast (#24252)
Unordered unions in check order observing pass (#24253)
Fix dtype for slice on Literal in agg context (#24137)
Fix incorrect filter(lit(True)) when scanning hive (#24237)
In-memory group_by on 128-bit integers (#24242)
Fix panic in gather inside groupby with invalid indices (#24182)
Release the GIL in map_groups (#24225)
Remove extra explode in LazyGroupBy.{head,tail} (#24221)
Fix panic in polars cloud CSV scan (#24197)
Fix panic when loading categorical columns from IO plugin (#24205)
Fix engine type for concat_list on AggScalar implode (#24160)
Rolling_mean handle centered weights with len(values) < window_size (#24158)
Reading is_in predicate for Parquet plain strings (#24184)
Make PyCategories pickleable (#24170)
Remove unused unsound function to_mutable_slice (#24173)
PyO3 extension types giving compat_level errors (#24166)
Allow non-elementwise by in top_k (#24164)
Fix sort_by for group_by_dynamic context (#24152)
Input-independent length aggregations in streaming (#24153)
Release GIL when iterating df in to_arrow (#24151)
Respect non-elementwise join_where conditions (#24135)
Resolve schema mismatch for div on Boolean (#24111)
Keep name when doing empty group-aware aggregation (#24098)
Implode instead of reshape_list (#24078)
Rolling mean with weights incorrect when min_samples < window_size (#23485)
Allow merge_sorted for all types (#24077)
Include datatypes in row_encode expression (#24074)
Include UDF materialized type in serialization (#24073)
Correct .rolling() output type for non-aggregations (#24072)
Correct planner output schema for join_asof (#24071)
Allow %B to work without specifying day (#24009)
Correct output for fold and reduce (#24069)
Expr.meta.output_name for struct fields (#24064)
Ensure upcast operations on pl.Date default to microsecond precision (#23981)
Add peak_{min,max} support for booleans (#24068)
Planner output type for mean with strange input type (#24052)
Remove, deprecate or change eager Exprs to be lazy compatible (#24027)
Scan of multiple sources with null datatype (#24065)
Categorical in nested data in row encoding (#24051)
Missing length update in builder for pl.Array repetition (#24055)
Race condition in global categories init (#24045)
Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
Error when using named functions (#24041)
Don't encode entire CategoricalMapping when going to Arrow (#24036)
Fix cast on arithmetic with lit (#23941)
Incorrect slice-slice pushdown (#24032)
Dedup common cache subplan in IR graph (#24028)
Allow join on Decimal in in-memory engine (#24026)
Fix datatypes for eval.list in aggregation context (#23911)
Allocator capsule fallback panic (#24022)
Accept another zlib "magic header" file signature (#24013)
Fix truediv dtypes so cast in list.eval is not dropped (#23936)
Don't reuse cached return_dtype for expanded map expressions (#24010)
Cache id is not a valid dot node id (#24005)
Align map_elements with and without return_dtype (#24007)
Fix column dtype lifetime for csv_write segfault on Categorical (#23986)
Allow serializing LazyGroupBy.map_groups (#23964)
Correct allocator name in PyCapsule (#23968)
Mismatched types for write function for windows (#23915)
Fix unpivot panic when index= column not found (#23958)
Fix assert_frame_equal with check_dtypes=False for all-null series with different types (#23943)
Return correct python package version (#23951)
Categorical namespace functions fail on Enum columns (#23925)
Properly set sumwise complete on filter for missing columns (#23877)
Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
Group By with filters (#23917)
Fix read_csv ignoring Decimal schema for header-only data (#23886)
Ensure collect() native Iceberg always scans latest when no snapshot_id is given (#23907)
Writing List(Array) columns to JSON without panic (#23875)
Fill Iceberg missing fields with partition values if present in metadata (#23900)
Create file for streaming sink even if unspawned (#23672)
Update cloud testing environment (#23908)
Parquet filtering on multiple RGs with literal predicate (#23903)
Incorrect datatype passed to libc::write (#23904)
Properly feature gate TZ_AWARE_RE usage (#23888)
Improve identification of "non group-key" aggregates in SQL GROUP BY queries (#23191)
Spawning tokio task outside reactor (#23884)
Correctly raise DuplicateError on asof_join with suffix="" (#23864)
Fix errors on native scan_iceberg (#23811)
Fix index ...

Contributors

mrkn, pka, and 46 other contributors

Assets 2

09 Sep 08:38

github-actions

py-1.33.1

1dc7792

Python Polars 1.33.1

🚀 Performance improvements

Use specialized decoding for all predicates for Parquet dictionary encoding (#24403)
Allocate only for read items when reading Parquet with predicate (#24401)
Don't aggregate groups for strict cast if original len (#24381)
Allocate only for read items when reading Parquet with predicate (#24324)

✨ Enhancements

Support S3 virtual-hosted–style URI (#24405)
Remove explicit file create for local async writes (#24358)
Add PyCapsule __arrow_c_schema__ interface to pl.Schema (#24365)
Support Partitioning sinks in cloud (#24399)
User-friendly error message on empty path expansion (#24337)
Add unstable pre_execution_query parameter to read_database_uri (#23634)
Add Polars security policy (#24314)

🐞 Bug fixes

Correct sink_ipc overload for compression (#24398)
Enable all integer dtypes for by parameter in join_asof (#24384)
Fix Group-By + filter aggregation performs subsequent operations on all data instead of only filtered data (#24373)
Wrap deprecated top-level imports in TYPE_CHECKING (#24340)
Fix incorrect output ordering for row-separable exprs (#24354)
Fix Series.__arrow_c_stream__ for Decimal and other logical types (#24120)
Match output type to engine for Struct arithmetic (#23805)
Make mmap use MAP_PRIVATE rather than MAP_SHARED (#24343)
Fix cloud iceberg scan DATASET_PROVIDER_VTABLE error (#24338)
Don't throw away type information for NumPy numeric values when using lit() (#24229)
Incorrect logic in negative streaming slice (#24326)
Ensure read_database_uri with ADBC works as expected with DuckDB URIs (#24097)
Do not error on non-list Sequence for columns parameter in read_excel (#23967)

📖 Documentation

Document newly added is_pure parameter for register_io_source (#24311)
Create a module docstring for the public polars module (#24332)
Update to Polars Cloud user guide (#24187)
Update distributed page (#24323)
Add a note and example about exporting unformatted Excel sheet data (#24145)
Add detail about server-side cursor behaviour for SQLAlchemy in the "iter_batches" parameter of read_database (#24094)
Add Polars security policy (#24314)

🛠️ Other improvements

Bump c-api (#24412)
Add a regression test for #7631 (#24363)
Update cloud test InteractiveQuery to DirectQuery (#24287)
Mark some tests as slow (#24327)
Mark more tests as ready for cloud (#24315)
Add hint to update PYPOLARS_VERSION on version assert test (#24313)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @VictorAtIfInsurance, @alexander-beedie, @coastalwhite, @dsprenkels, @itamarst, @kdn36, @kuril, @mcrumiller, @nameexhaustion, @nesb1, @orlp, @r-brink and @ritchie46

Contributors

orlp, dsprenkels, and 12 other contributors

Assets 4

01 Sep 16:33

github-actions

py-1.33.0

665722a

Python Polars 1.33.0

💥 Breaking changes

Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

🚀 Performance improvements

Native streaming int_range with len or count (#24280)
Lower arg_unique natively to the streaming engine (#24279)
Move unordering optimization to end (#24286)
Do ordering simplification step after common sub-plan elimination (#24269)
Always simplify order requirements in IR (#24192)
Basic de-duplication of filter expressions (#24220)
Cache the IR in pipe_with_schema (#24213)
Lower arg_where natively to streaming engine (#24088)
Lower Expr.shift to streaming engine (#24106)
Lower order-preserving groupby to streaming engine (#24053)

✨ Enhancements

Add CSE for custom io sources using pointer for hashing (#24297)
Allow pl.Expr.log to take in an expression (#24226)
Add caching to user credential providers (#23789)
Expose mkdir parameter on write_parquet (#24239)
Implement diff() in streaming engine (#24189)
Enable Expr.diff(n) for negative n (#24200)
Allow upcasting null-typed columns to nested column types in scans (#24185)
Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
Drop PyArrow requirement for write_database with the ADBC engine (#24136)
Add a deprecation warning for pl.Series.shift(Null) (#24114)
Improve Debug formatting of DataType (#24056)
Add LazyFrame.pipe_with_schema (#24075)
Catch additional temporal attributes in BytecodeParser function analysis (#24076)
Add cum_* as native streaming nodes (#23977)
Add peak_{min,max} support for booleans (#24068)
Add DataFrame.map_columns for eager evaluation (#23821)

🐞 Bug fixes

Invalid conversion from non-bit numpy bools (#24312)
Make dt.epoch('s') serializable (#24302)
Make Expr.rechunk serializable (#24303)
Schema mismatch for 'log' operation (#24300)
Incorrect first/last aggregate in streaming engine (#24289)
Fix group offsets in sliced groups (#24274)
Panic in inexact date(time) conversion (#24268)
Keep DSL cache after serialization and deserialization (#24265)
Sanitize and warn about eval usage (#24262)
Correct incorrect default in from_pandas overload for include_index (#24258)
Unique with keep="none" in new optimization pass (#24261)
Correct size limits for Decimal cast (#24252)
Unordered unions in check order observing pass (#24253)
Fix dtype for slice on Literal in agg context (#24137)
Fix incorrect filter(lit(True)) when scanning hive (#24237)
In-memory group_by on 128-bit integers (#24242)
Fix panic in gather inside groupby with invalid indices (#24182)
Release the GIL in map_groups (#24225)
Remove extra explode in LazyGroupBy.{head,tail} (#24221)
Fix panic in polars cloud CSV scan (#24197)
Fix panic when loading categorical columns from IO plugin (#24205)
Fix credential provider did not auto-init on partition sinks (#24188)
Fix engine type for concat_list on AggScalar implode (#24160)
Rolling_mean handle centered weights with len(values) < window_size (#24158)
Reading is_in predicate for Parquet plain strings (#24184)
Support native DuckDB connection in read_database (#24177)
Make PyCategories pickleable (#24170)
Remove unused unsound function to_mutable_slice (#24173)
PyO3 extension types giving compat_level errors (#24166)
Allow non-elementwise by in top_k (#24164)
Fix sort_by for group_by_dynamic context (#24152)
Input-independent length aggregations in streaming (#24153)
Release GIL when iterating df in to_arrow (#24151)
Respect non-elementwise join_where conditions (#24135)
Fix mismatched pytest test collection error (#24133)
Resolve schema mismatch for div on Boolean (#24111)
Fix from_repr parsing of negative durations (#24115)
Make group_by/partition_by iterator keys tuple[Any, ...] to enable tuple-unpacking (#24113)
Keep name when doing empty group-aware aggregation (#24098)
Implode instead of reshape_list (#24078)
Rolling mean with weights incorrect when min_samples < window_size (#23485)
Allow merge_sorted for all types (#24077)
Include datatypes in row_encode expression (#24074)
Include UDF materialized type in serialization (#24073)
Correct .rolling() output type for non-aggregations (#24072)
Correct planner output schema for join_asof (#24071)
Correct output for fold and reduce (#24069)
Expr.meta.output_name for struct fields (#24064)
Ensure upcast operations on pl.Date default to microsecond precision (#23981)
Add peak_{min,max} support for booleans (#24068)
Planner output type for mean with strange input type (#24052)
Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

📖 Documentation

Fix few typos (#24305)
Add missing reference to LazyFrame.pipe_with_schema() on the website (#24285)
Automatically register doctest.ELLIPSIS so we don't have to add the inline directive each time (#24146)
Update categorical comparison documentation in user guide (#24249)
Add missing references for Seriers.rolling_*_by methods (#24254)
Fix formatting of Series.value_counts examples (#24245)
Add hint to use DataFrame/Series constructors in from_arrow docstring (#22942)
Update GPU un/supported features (#24195)
Add DataFrame.map_columns to API (#24128)
Update multiple pages in the Polars Cloud user guide (#23661)
Fix str.find_many() docstring example (#24092)

📦 Build system

Re-enable macos-x86-64 (#24266)
Drop binary support for macos_x86-64 (#24257)

🛠️ Other improvements

Remove PDS-H code (#24301)
Get ready for even more cloud tests (#24292)
Add tests for slices with caches (#24288)
Readd ordering tests (#24284)
Fix Makefile venv path (#24251)
Remove unnecessary parentheses (#24244)
Make non-nested shift{,_and_fill} ops generic (#24224)
Remove unused Wrap (#24214)
Allow upcasting null-typed columns to nested column types in scans (#24185)
Automatically label a few more types of PR (#24147)
Update toolchain (#24156)
Add order_sensitive property for AExpr (#24116)
Mark more tests as not possible on cloud (#24103)
Turn AggExpr::Count from tuple to struct (#24096)
Mark tests that may fail in cloud (#24067)
Extend read database tests to capture more ADBC functionality (#24002)
Make CI perf failures more lenient (#24066)
Fix hive partition string encoding in CI by upgrading deltalake (#24018)
Make tests with sinks run on cloud again (#24048)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @MarcoGorelli, @NeejWeej, @agossard, @alexander-beedie, @aparna2198, @borchero, @coastalwhite, @deanm0000, @dsprenkels, @eitsupi, @etiennebacher, @gab23r, @henryharbeck, @jjurm, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stijnherfst, @vdrn and @wence-

Contributors

orlp, dsprenkels, and 24 other contributors

Assets 4

28 Aug 20:03

github-actions

py-1.33.0-beta.1

cbf621a

Python Polars 1.33.0-beta.1 Pre-release

Pre-release

💥 Breaking changes

Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

🚀 Performance improvements

Always simplify order requirements in IR (#24192)
Basic de-duplication of filter expressions (#24220)
Cache the IR in pipe_with_schema (#24213)
Lower arg_where natively to streaming engine (#24088)
Lower Expr.shift to streaming engine (#24106)
Lower order-preserving groupby to streaming engine (#24053)

✨ Enhancements

Allow pl.Expr.log to take in an expression (#24226)
Add caching to user credential providers (#23789)
Expose mkdir parameter on write_parquet (#24239)
Implement diff() in streaming engine (#24189)
Enable Expr.diff(n) for negative n (#24200)
Allow upcasting null-typed columns to nested column types in scans (#24185)
Log pyarrow predicate conversion result in sensitive verbose logs (#24186)
Drop PyArrow requirement for write_database with the ADBC engine (#24136)
Add a deprecation warning for pl.Series.shift(Null) (#24114)
Improve Debug formatting of DataType (#24056)
Add LazyFrame.pipe_with_schema (#24075)
Catch additional temporal attributes in BytecodeParser function analysis (#24076)
Add cum_* as native streaming nodes (#23977)
Add peak_{min,max} support for booleans (#24068)
Add DataFrame.map_columns for eager evaluation (#23821)

🐞 Bug fixes

Correct size limits for Decimal cast (#24252)
Unordered unions in check order observing pass (#24253)
Fix dtype for slice on Literal in agg context (#24137)
Fix incorrect filter(lit(True)) when scanning hive (#24237)
In-memory group_by on 128-bit integers (#24242)
Fix panic in gather inside groupby with invalid indices (#24182)
Release the GIL in map_groups (#24225)
Remove extra explode in LazyGroupBy.{head,tail} (#24221)
Fix panic in polars cloud CSV scan (#24197)
Fix panic when loading categorical columns from IO plugin (#24205)
Fix credential provider did not auto-init on partition sinks (#24188)
Fix engine type for concat_list on AggScalar implode (#24160)
Rolling_mean handle centered weights with len(values) < window_size (#24158)
Reading is_in predicate for Parquet plain strings (#24184)
Support native DuckDB connection in read_database (#24177)
Make PyCategories pickleable (#24170)
Remove unused unsound function to_mutable_slice (#24173)
PyO3 extension types giving compat_level errors (#24166)
Allow non-elementwise by in top_k (#24164)
Fix sort_by for group_by_dynamic context (#24152)
Input-independent length aggregations in streaming (#24153)
Release GIL when iterating df in to_arrow (#24151)
Respect non-elementwise join_where conditions (#24135)
Fix mismatched pytest test collection error (#24133)
Resolve schema mismatch for div on Boolean (#24111)
Fix from_repr parsing of negative durations (#24115)
Make group_by/partition_by iterator keys tuple[Any, ...] to enable tuple-unpacking (#24113)
Keep name when doing empty group-aware aggregation (#24098)
Implode instead of reshape_list (#24078)
Rolling mean with weights incorrect when min_samples < window_size (#23485)
Allow merge_sorted for all types (#24077)
Include datatypes in row_encode expression (#24074)
Include UDF materialized type in serialization (#24073)
Correct .rolling() output type for non-aggregations (#24072)
Correct planner output schema for join_asof (#24071)
Correct output for fold and reduce (#24069)
Expr.meta.output_name for struct fields (#24064)
Ensure upcast operations on pl.Date default to microsecond precision (#23981)
Add peak_{min,max} support for booleans (#24068)
Planner output type for mean with strange input type (#24052)
Remove, deprecate or change eager Exprs to be lazy compatible (#24027)

📖 Documentation

Fix formatting of Series.value_counts examples (#24245)
Add hint to use DataFrame/Series constructors in from_arrow docstring (#22942)
Update GPU un/supported features (#24195)
Add DataFrame.map_columns to API (#24128)
Update multiple pages in the Polars Cloud user guide (#23661)
Fix str.find_many() docstring example (#24092)

📦 Build system

Drop binary support for macos_x86-64 (#24257)

🛠️ Other improvements

Remove unnecessary parentheses (#24244)
Make non-nested shift{,_and_fill} ops generic (#24224)
Remove unused Wrap (#24214)
Allow upcasting null-typed columns to nested column types in scans (#24185)
Automatically label a few more types of PR (#24147)
Update toolchain (#24156)
Add order_sensitive property for AExpr (#24116)
Mark more tests as not possible on cloud (#24103)
Turn AggExpr::Count from tuple to struct (#24096)
Mark tests that may fail in cloud (#24067)
Extend read database tests to capture more ADBC functionality (#24002)
Make CI perf failures more lenient (#24066)
Fix hive partition string encoding in CI by upgrading deltalake (#24018)
Make tests with sinks run on cloud again (#24048)

Thank you to all our contributors for making this release possible!
@Kevin-Patyk, @agossard, @alexander-beedie, @aparna2198, @borchero, @coastalwhite, @deanm0000, @dsprenkels, @henryharbeck, @jjurm, @kdn36, @math-hiyoko, @mcrumiller, @mroeschke, @nameexhaustion, @orlp, @r-brink, @ritchie46, @stijnherfst, @vdrn and @wence-

Contributors

orlp, dsprenkels, and 19 other contributors

Assets 4

14 Aug 17:28

github-actions

py-1.32.3

2468e6f

Python Polars 1.32.3

🚀 Performance improvements

Lower .sort(maintain_order=True).head() to streaming top_k (#24014)
Lower top-k to streaming engine (#23979)
Allow order pass through Filters and relax to row-seperable instead of elementwise (#23969)

✨ Enhancements

Add native streaming for peaks_{min,max} (#24039)
IR graph arrows, monospace font, box nodes (#24021)
Add DataTypeExpr.default_value (#23973)
Lower rle to a native streaming engine node (#23929)
Add support for Int128 to pyo3-polars (#23959)

🐞 Bug fixes

Scan of multiple sources with null datatype (#24065)
Categorical in nested data in row encoding (#24051)
Missing length update in builder for pl.Array repetition (#24055)
Race condition in global categories init (#24045)
Revert "fix: Don't encode entire CategoricalMapping when going to Arrow (#24036)" (#24044)
Error when using named functions (#24041)
Don't encode entire CategoricalMapping when going to Arrow (#24036)
Fix cast on arithmetic with lit (#23941)
Incorrect slice-slice pushdown (#24032)
Dedup common cache subplan in IR graph (#24028)
Allow join on Decimal in in-memory engine (#24026)
Fix datatypes for eval.list in aggregation context (#23911)
Allocator capsule fallback panic (#24022)
Accept another zlib "magic header" file signature (#24013)
Fix truediv dtypes so cast in list.eval is not dropped (#23936)
Don't reuse cached return_dtype for expanded map expressions (#24010)
Cache id is not a valid dot node id (#24005)
Align map_elements with and without return_dtype (#24007)
Fix column dtype lifetime for csv_write segfault on Categorical (#23986)
Allow serializing LazyGroupBy.map_groups (#23964)
Correct allocator name in PyCapsule (#23968)
Mismatched types for write function for windows (#23915)
Fix unpivot panic when index= column not found (#23958)

📖 Documentation

Fix a typo in "lazy/execution" user-guide page (#23983)

🛠️ Other improvements

Update pyo3-polars versions (#24031)
Remove insert_error_function (#24023)
Remove cache hits, clean up in-mem prefill (#24019)
Use .venv instead of venv in pyo3-polars examples (#24024)
Fix test failing mypy (#24017)
Remove outdated comment (#23998)
Add a _plr.pyi to remove mypy issues (#23970)
Don't define CountStar as dyn OptimizationRule (#23976)
Rename atol and rtol to abs_tol and rel_tol (#23961)
Introduce Row{Encode,Decode} as FunctionExpr (#23933)
Dispatch through pl.map_batches and AnonymousColumnsUdf (#23867)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @borchero, @cmdlineluser, @coastalwhite, @iishutov, @jarondl, @kdn36, @orlp, @rawhuul, @ritchie46 and @stijnherfst

Contributors

orlp, jarondl, and 10 other contributors

Assets 4

07 Aug 10:51

github-actions

py-1.32.2

34595af

Python Polars 1.32.2

🐞 Bug fixes

Return correct python package version (#23951)

📖 Documentation

Add arr.len() on the website (#23944)

Thank you to all our contributors for making this release possible!
@coastalwhite and @etiennebacher

Contributors

coastalwhite and etiennebacher

Assets 4

06 Aug 16:50

github-actions

py-1.32.1

e5e3450

Python Polars 1.32.1

🚀 Performance improvements

Optimise BytecodeParser usage from warn_on_inefficient_map (#23809)
Lower extend_constant to the streaming engine (#23824)
Lower pl.repeat to streaming engine (#23804)
Remove redundant clone (#23771)

✨ Enhancements

Lower rle_id to a native streaming node (#23894)
Pass endpoint_url loaded from CredentialProviderAWS to scan/write_delta (#23812)
Dispatch scan_iceberg to native by default (#23912)
Lower unique_counts and value_counts to streaming engine (#23890)
Support initializing from __arrow_c_schema__ protocol in pl.Schema (#23879)
Better handle broken local package environment in show_versions (#23885)
Implement dt.days_in_month function (#23119)
Making Expr.rolling_*_by methods available to pl.Series (#23742)
Fix errors on native scan_iceberg (#23811)
Reinterpret binary data to fixed size numerical array (#22840)
Make rolling_map serializable (#23848)
Ensure CachingCredentialProvider returns copied credentials dict (#23817)
Change typing for .remote() from LazyFrameExt to LazyFrameRemote (#23825)
Implement repeat_by for Array and Null (#23794)
Add DeprecationWarning on passing physical ordering to Categorical (#23779)
Pre-filtered decode and row group skipping with Iceberg / Delta / scans with cast options (#23792)
Update BytecodeParser opcode awareness for upcoming Python 3.14 (#23782)

🐞 Bug fixes

Categorical namespace functions fail on Enum columns (#23925)
Properly set sumwise complete on filter for missing columns (#23877)
Restore Arrow-FFI-based Python<->Rust conversion in pyo3-polars (#23881)
Group By with filters (#23917)
Fix read_csv ignoring Decimal schema for header-only data (#23886)
Ensure collect() native Iceberg always scans latest when no snapshot_id is given (#23907)
Writing List(Array) columns to JSON without panic (#23875)
Fill Iceberg missing fields with partition values if present in metadata (#23900)
Create file for streaming sink even if unspawned (#23672)
Update cloud testing environment (#23908)
Parquet filtering on multiple RGs with literal predicate (#23903)
Incorrect datatype passed to libc::write (#23904)
Properly feature gate TZ_AWARE_RE usage (#23888)
Improve identification of "non group-key" aggregates in SQL GROUP BY queries (#23191)
Spawning tokio task outside reactor (#23884)
Correctly raise DuplicateError on asof_join with suffix="" (#23864)
Fix errors on native scan_iceberg (#23811)
Fix index out of bounds panic filtering parquet (#23850)
Fix error on empty range requests (#23844)
Fix handling of hive partitioning hive_start_idx parameter (#23843)
Allow encoding of pl.Enum with smaller physicals (#23829)
Filter sorted flag from physical in CategoricalChunked (#23827)
Remove accidental todo! in repeat node (#23822)
Make meta.pop operate on Expr only (#23808)
Stack overflow in DslPlan serde (#23801)
Clear credentials cached in Python when rebuilding object store (#23756)
Datetime selectors with mixed timezone info (#23774)
Support i128 in asof join (#23770)
Remove sleep for credential refresh (#23768)

📖 Documentation

Improve StackOverflow links in contributing guide (#23895)
Fix pyo3 documentation page link (#23839)
Document the pureness requirements of udfs (#23787)
Correct the name.* methods on their removal of aliases (#23773)

📦 Build system

Workaround for pyiceberg make requirements on Python 3.13 (#23810)
Add pyiceberg to dev dependencies (#23791)

🛠️ Other improvements

Ensure clippy and rustfmt run in CI when changing pyo3-polars (#23930)
Fix pyo3-polars proc-macro re-exports (#23918)
Rewrite evaluate_on_groups for .gather / .get (#23700)
Move Python C API to python-polars (#23876)
Improve/fix internal LRUCache implementation and move into "_utils" module (#23813)
Relax constraint on maximum Python version for numba (#23838)
Automatically tag PRs mentioning "SQL" with the appropriate label (#23816)
Update typos package (#23818)
Fix typos path (#23803)
Remove deserialize_with_unknown_fields (#23802)
Add pyiceberg to dev dependencies (#23791)
Remove old schema file (#23798)
Mark more tests as ready for cloud (#23743)
Reduce required deps for pyo3-polars (#23761)

Thank you to all our contributors for making this release possible!
@JakubValtar, @Kevin-Patyk, @Liyixin95, @alexander-beedie, @cgevans, @cmdlineluser, @coastalwhite, @eitsupi, @gfvioli, @itamarst, @jimmmmmmmmmmmy, @kdn36, @math-hiyoko, @mcrumiller, @mpasa, @mrkn, @nameexhaustion, @orlp, @pka, @pomo-mondreganto, @ritchie46 and @stijnherfst

Contributors

mrkn, pka, and 20 other contributors

Assets 4

01 Aug 12:19

github-actions

rs-0.50.0

0478b35

Rust Polars 0.50.0

🏆 Highlights

Make Selector a concrete part of the DSL (#23351)
Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

Lower Expr.slice to streaming engine (#23683)
Elide bound check (#23653)
Preserve Column repr in ColumnTransform operations (#23648)
Lower any() and all() to streaming engine (#23640)
Lower row-separable functions in streaming engine (#23633)
Lower int_range(len()) to with_row_index (#23576)
Avoid double field resolution in with_columns (#23530)
Rolling quantile lower time complexity (#23443)
Use single-key optimization with Categorical (#23436)
Improve null-preserving identification for boolean functions (#23317)
Improve boolean bitwise aggregate performance (#23325)
Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
Re-write join types during filter pushdown (#23275)
Generate PQ ZSTD decompression context once (#23200)
Trigger cache/cse optimizations when multiplexing (#23274)
Cache FileInfo upon DSL -> IR conversion (#23263)
Push more filters past joins (#23240)

✨ Enhancements

Expand on DataTypeExpr (#23249)
Lower row-separable functions in streaming engine (#23633)
Add scalar checks to range expressions (#23632)
Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
Implement mean function in arr namespace (#23486)
Implement vec_hash for List and Array (#23578)
Add unstable pl.row_index() expression (#23556)
Add Categories on the Python side (#23543)
Implement partitioned sinks for the in-memory engine (#23522)
Expose IRFunctionExpr::Rank in the python visitor (#23512)
Raise and Warn on UDF's without return_dtype set (#23353)
IR pruning (#23499)
Expose IRFunctionExpr::FillNullWithStrategy in the python visitor (#23479)
Support min/max reducer for null dtype in streaming engine (#23465)
Implement streaming Categorical/Enum min/max (#23440)
Allow cast to Categorical inside list.eval (#23432)
Support pathlib.Path as source for read/scan_delta() (#23411)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Pass payload in ExprRegistry (#23412)
Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
Support row group skipping with filters when cast_options is given (#23356)
Execute bitwise reductions in streaming engine (#23321)
Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
Add dtype to str.to_integer() (#22239)
Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
Add is_close method (#23273)
Drop superfluous casts from optimized plan (#23269)
Added drop_nulls option to to_dummies (#23215)
Support comma as decimal separator for CSV write (#23238)
Don't format keys if they're empty in dot (#23247)
Improve arity simplification (#23242)

🐞 Bug fixes

Fix credential refresh logic (#23730)
Fix to_datetime() fallible identification (#23735)
Correct output datatype for dt.with_time_unit (#23734)
Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
Allow DataType expressions with selectors (#23720)
Match output type to engine for interpolate on Decimal (#23706)
Remaining bugs in with_exprs_and_input and pruning (#23710)
Match output dtype to engine for cum_sum_horizontal (#23686)
Field names for pl.struct in group-by (#23703)
Fix output for str.extract_groups with empty string pattern (#23698)
Match output type to engine for rolling_map (#23702)
Fix incorrect join on single Int128 column for in-memory engine (#23694)
Match output field name to lhs for BusinessDaycount (#23679)
Correct the planner output datatype for strptime (#23676)
Sort and Scan with_exprs_and_input (#23675)
Revert to old behavior with name.keep (#23670)
Fix panic loading from arrow Map containing timestamps (#23662)
Selectors in self part of list.eval (#23668)
Fix output field dtype for ToInteger (#23664)
Allow decimal_comma with , separator in read_csv (#23657)
Fix handling of UTF-8 in write_csv to IO[str] (#23647)
Selectors in {Lazy,Data}Frame.filter (#23631)
Stop splitfields iterator at eol in simd branch (#23652)
Correct output datatype of dt.year and dt.mil (#23646)
Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
Order-preserving equi-join didn't always flush final matches (#23639)
Fix ColumnNotFound error when joining on col().cast() (#23622)
Fix agg groups on when/then in group_by context (#23628)
Output type for sign (#23572)
Apply agg_fn on null values in pivot (#23586)
Remove nonsensical duration variance (#23621)
Don't panic when sinking nested categorical to Parquet (#23610)
Correctly set value count output field name (#23611)
Casting unused columns in to_torch (#23606)
Allow inferring of hours-only timezone offset (#23605)
Bug in Categorical <-> str compare with nulls (#23609)
Honor n=0 in all cases of str.replace (#23598)
Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
Relabel duplicate sequence IDs in distributor (#23593)
Round-trip Enum and Categorical metadata in plugins (#23588)
Fix incorrect join_asof with by followed by head/slice (#23585)
Allow writing nested Int128 data to Parquet (#23580)
Enum serialization assert (#23574)
Output type for peak_min / peak_max (#23573)
Make Scalar Categorical, Enum and Struct values serializable (#23565)
Preserve row order within partition when sinking parquet (#23462)
Panic in create_multiple_physical_plans when branching from a single cache node (#23561)
Prevent in-mem partition sink deadlock (#23562)
Update AWS cloud documentation (#23563)
Correctly handle null values when comparing structs (#23560)
Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
Make Expr.append serializable (#23515)
Float by float division dtype (#23529)
Division on empty DataFrame generating null row (#23516)
Partition sink copy_exprs and with_exprs_and_input (#23511)
Unreachable with pl.self_dtype (#23507)
Rolling median incorrect min_samples with nulls (#23481)
Make Int128 roundtrippable via Parquet (#23494)
Fix panic when common subplans contain IEJoins (#23487)
Properly handle non-finite floats in rolling_sum/mean (#23482)
Make read_csv_batched respect skip_rows and skip_lines (#23484)
Always use cloudpickle for the python objects in cloud plans (#23474)
Support string literals in index_of() on categoricals (#23458)
Don't panic for finish_callback with nested datatypes (#23464)
Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
Fix var/moment dtypes (#23453)
Fix agg_groups dtype (#23450)
Clear cached_schema when apply changes dtype (#23439)
Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
Null handling in full-null group_by_dynamic mean/sum (#23435)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Fix index calculation for nearest interpolation (#23418)
Fix compilation failure with --no-default-features and --features lazy,strings (#23384)
Parse parquet footer length into unsigned integer (#23357)
Fix incorrect results with group_by aggregation on empty groups (#23358)
Fix boolean min() in group_by aggregation (streaming) (#23344)
Respect data-model in map_elements (#23340)
Properly join URI paths in PlPath (#23350)
Ignore null values in bitwise aggregation on bools (#23324)
Fix panic filtering after left join (#23310)
Out-of-bounds index in hot hash table (#23311)
Fix scanning '?' from cloud with glob=False (#23304)
Fix filters on inserted columns did not remove rows (#23303)
Don't ignore return_dtype (#23309)
Use safe parsing for get_normal_components (#23284)
Fix output column names/order of streaming coalesced right-join (#23278)
Restore concat_arr inputs expansion (#23271)

📖 Documentation

Point the R Polars version on R-multiverse (#23660)
Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
Add page about billing to Polars Cloud user guide (#23564)
Small user-guide improvement and fixes (#23549)
Correct note in from_pandas about data being cloned (#23552)
Fix a few typos in the "Streaming" section (#23536)
Update streaming page (#23535)
Update structure of Polars Cloud documentation (#23496)
Update when_then in user guide (#23245)

📦 Build system

Update all rand code (#23387)
Bump up rand & rand_distr (#22619)

🛠️ Other improvements

Remove incorrect DeletionFilesList::slice (#23796)
Remove old schema file (#23798)
Remove Default for StreamingExecutionState (#23729)
Explicit match to smaller dtypes before cast to Int32 in asof join (#23776)
Expose PlPathRef via polars::prelude (#23754)
Add hashes json (#23758)
Add AExpr::is_expr_equal_to (#23740)
Fix rank test to respect maintain order (#23723)
IR inputs and exprs iterators (#23722)
Store more granular schema hashes to reduce merge conflicts (#23709)
Add assertions for unique ID (#23711)
Use RelaxedCell in multiscan (#23712)
Debug assert ColumnTransform cast is non-strict (#23717)
Use UUID for UniqueID (#23704)
Remove scan id (#23697)
Propagate Iceberg physical ID schema to IR (#23671)
Remove unused and confusing match arm (#23691)
Remove unused ALLOW_GROUP_AWARE flag (#23690)
Remove unused evaluate_inline (#23687)
Remove unused field from AggregationContext (#23685)
Remove `nod...

Contributors

orlp, alexander-beedie, and 28 other contributors

Assets 2

01 Aug 01:43

github-actions

py-1.32.0

c57de4b

Python Polars 1.32.0

🏆 Highlights

Make Selector a concrete part of the DSL (#23351)
Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

Lower Expr.slice to streaming engine (#23683)
Elide bound check (#23653)
Preserve Column repr in ColumnTransform operations (#23648)
Lower any() and all() to streaming engine (#23640)
Lower row-separable functions in streaming engine (#23633)
Lower int_range(len()) to with_row_index (#23576)
Avoid double field resolution in with_columns (#23530)
Rolling quantile lower time complexity (#23443)
Use single-key optimization with Categorical (#23436)
Improve null-preserving identification for boolean functions (#23317)
Improve boolean bitwise aggregate performance (#23325)
Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
Re-write join types during filter pushdown (#23275)
Generate PQ ZSTD decompression context once (#23200)
Trigger cache/cse optimizations when multiplexing (#23274)
Cache FileInfo upon DSL -> IR conversion (#23263)
Push more filters past joins (#23240)
Optimize Bitmap::make_mut (#23138)

✨ Enhancements

Add Python-side caching for credentials and provider auto-initialization (#23736)
Expand on DataTypeExpr (#23249)
Lower row-separable functions in streaming engine (#23633)
Add scalar checks to range expressions (#23632)
Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
Implement mean function in arr namespace (#23486)
Implement vec_hash for List and Array (#23578)
Add unstable pl.row_index() expression (#23556)
Add Categories on the Python side (#23543)
Implement partitioned sinks for the in-memory engine (#23522)
Raise and Warn on UDF's without return_dtype set (#23353)
IR pruning (#23499)
Support min/max reducer for null dtype in streaming engine (#23465)
Implement streaming Categorical/Enum min/max (#23440)
Allow cast to Categorical inside list.eval (#23432)
Support pathlib.Path as source for read/scan_delta() (#23411)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Pass payload in ExprRegistry (#23412)
Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
Support row group skipping with filters when cast_options is given (#23356)
Execute bitwise reductions in streaming engine (#23321)
Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
Add dtype to str.to_integer() (#22239)
Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
Add is_close method (#23273)
Drop superfluous casts from optimized plan (#23269)
Added drop_nulls option to to_dummies (#23215)
Support comma as decimal separator for CSV write (#23238)
Don't format keys if they're empty in dot (#23247)
Improve arity simplification (#23242)
Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

Load _expiry_time from botocore Credentials in CredentialProviderAWS (#23753)
Fix credential refresh logic (#23730)
Fix to_datetime() fallible identification (#23735)
Correct output datatype for dt.with_time_unit (#23734)
Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
Allow DataType expressions with selectors (#23720)
Match output type to engine for interpolate on Decimal (#23706)
Remaining bugs in with_exprs_and_input and pruning (#23710)
Match output dtype to engine for cum_sum_horizontal (#23686)
Field names for pl.struct in group-by (#23703)
Fix output for str.extract_groups with empty string pattern (#23698)
Match output type to engine for rolling_map (#23702)
Moved passing DeltaTable._storage_options (#23673)
Fix incorrect join on single Int128 column for in-memory engine (#23694)
Match output field name to lhs for BusinessDaycount (#23679)
Correct the planner output datatype for strptime (#23676)
Sort and Scan with_exprs_and_input (#23675)
Revert to old behavior with name.keep (#23670)
Fix panic loading from arrow Map containing timestamps (#23662)
Selectors in self part of list.eval (#23668)
Fix output field dtype for ToInteger (#23664)
Allow decimal_comma with , separator in read_csv (#23657)
Fix handling of UTF-8 in write_csv to IO[str] (#23647)
Selectors in {Lazy,Data}Frame.filter (#23631)
Stop splitfields iterator at eol in simd branch (#23652)
Correct output datatype of dt.year and dt.mil (#23646)
Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
Order-preserving equi-join didn't always flush final matches (#23639)
Fix ColumnNotFound error when joining on col().cast() (#23622)
Fix agg groups on when/then in group_by context (#23628)
Output type for sign (#23572)
Apply agg_fn on null values in pivot (#23586)
Remove nonsensical duration variance (#23621)
Don't panic when sinking nested categorical to Parquet (#23610)
Correctly set value count output field name (#23611)
Casting unused columns in to_torch (#23606)
Allow inferring of hours-only timezone offset (#23605)
Bug in Categorical <-> str compare with nulls (#23609)
Honor n=0 in all cases of str.replace (#23598)
Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
Relabel duplicate sequence IDs in distributor (#23593)
Round-trip Enum and Categorical metadata in plugins (#23588)
Fix incorrect join_asof with by followed by head/slice (#23585)
Change return typing of get_index_type() from DataType to PolarsIntegerType (#23558)
Allow writing nested Int128 data to Parquet (#23580)
Enum serialization assert (#23574)
Output type for peak_min / peak_max (#23573)
Make Scalar Categorical, Enum and Struct values serializable (#23565)
Preserve row order within partition when sinking parquet (#23462)
Prevent in-mem partition sink deadlock (#23562)
Update AWS cloud documentation (#23563)
Correctly handle null values when comparing structs (#23560)
Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
Make Expr.append serializable (#23515)
Float by float division dtype (#23529)
Division on empty DataFrame generating null row (#23516)
Partition sink copy_exprs and with_exprs_and_input (#23511)
Unreachable with pl.self_dtype (#23507)
Rolling median incorrect min_samples with nulls (#23481)
Make Int128 roundtrippable via Parquet (#23494)
Fix panic when common subplans contain IEJoins (#23487)
Properly handle non-finite floats in rolling_sum/mean (#23482)
Make read_csv_batched respect skip_rows and skip_lines (#23484)
Always use cloudpickle for the python objects in cloud plans (#23474)
Support string literals in index_of() on categoricals (#23458)
Don't panic for finish_callback with nested datatypes (#23464)
Pass DeltaTable._storage_options if no storage_options are provided (#23456)
Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
Fix var/moment dtypes (#23453)
Fix agg_groups dtype (#23450)
Fix incorrect _get_path_scheme (#23444)
Fix missing overload defaults in read_ods and tree_format (#23442)
Clear cached_schema when apply changes dtype (#23439)
Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
Null handling in full-null group_by_dynamic mean/sum (#23435)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Fix index calculation for nearest interpolation (#23418)
Overload for eager default in Schema.to_frame was False instead of True (#23413)
Fix read_excel overloads so that passing list[str] to sheet_name does not raise (#23388)
Removed special handling for bytes like objects in read_ndjson (#23361)
Parse parquet footer length into unsigned integer (#23357)
Fix incorrect results with group_by aggregation on empty groups (#23358)
Fix boolean min() in group_by aggregation (streaming) (#23344)
Respect data-model in map_elements (#23340)
Properly join URI paths in PlPath (#23350)
Ignore null values in bitwise aggregation on bools (#23324)
Fix panic filtering after left join (#23310)
Out-of-bounds index in hot hash table (#23311)
Fix scanning '?' from cloud with glob=False (#23304)
Fix filters on inserted columns did not remove rows (#23303)
Don't ignore return_dtype (#23309)
Raise error instead of return in Series class (#23301)
Use safe parsing for get_normal_components (#23284)
Fix output column names/order of streaming coalesced right-join (#23278)
Restore concat_arr inputs expansion (#23271)
Expose FieldsMapper (#23232)
Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

📖 Documentation

Fix str.replace_many examples trigger deprecation warning (#23695)
Point the R Polars version on R-multiverse (#23660)
Update example for writing to cloud storage (#20265)
Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
Add docs of Expr.list.filter and Series.list.filter (#23589)
Add page about billing to Polars Cloud user guide (#23564)
Small user-guide improvement and fixes (#23549)
Correct note in from_pandas about data being cloned (#23552)
Fix a few typos in the "Streaming" section (#23536)
Update streaming page (#23535)
Update structure of Polars Cloud documentation (#23496)
Update example code in pandas migration guide (#23403)
Correct plugins user guide to reflect that teaching Expr.language is in a different section (#23377)
Add example of using OR in join_where (#23375)
Update when_then in user guide (#23245)

📦 Build system

Update all rand code (#23387)

🛠️ Other improvements

Remove unused functions from the rust side (#2...

Contributors

mrkn, orlp, and 29 other contributors

Assets 4

26 Jul 19:44

github-actions

py-1.32.0-beta.1

a7081b6

Python Polars 1.32.0-beta.1 Pre-release

Pre-release

🏆 Highlights

Make Selector a concrete part of the DSL (#23351)
Rework Categorical/Enum to use (Frozen)Categories (#23016)

🚀 Performance improvements

Lower Expr.slice to streaming engine (#23683)
Elide bound check (#23653)
Preserve Column repr in ColumnTransform operations (#23648)
Lower any() and all() to streaming engine (#23640)
Lower row-separable functions in streaming engine (#23633)
Lower int_range(len()) to with_row_index (#23576)
Avoid double field resolution in with_columns (#23530)
Rolling quantile lower time complexity (#23443)
Use single-key optimization with Categorical (#23436)
Improve null-preserving identification for boolean functions (#23317)
Improve boolean bitwise aggregate performance (#23325)
Enable Parquet expressions and dedup is_in values in Parquet predicates (#23293)
Re-write join types during filter pushdown (#23275)
Generate PQ ZSTD decompression context once (#23200)
Trigger cache/cse optimizations when multiplexing (#23274)
Cache FileInfo upon DSL -> IR conversion (#23263)
Push more filters past joins (#23240)
Optimize Bitmap::make_mut (#23138)

✨ Enhancements

Add Python-side caching for credentials and provider auto-initialization (#23736)
Expand on DataTypeExpr (#23249)
Lower row-separable functions in streaming engine (#23633)
Add scalar checks to range expressions (#23632)
Expose POLARS_DOT_SVG_VIEWER to automatically dispatch to SVG viewer (#23592)
Implement mean function in arr namespace (#23486)
Implement vec_hash for List and Array (#23578)
Add unstable pl.row_index() expression (#23556)
Add Categories on the Python side (#23543)
Implement partitioned sinks for the in-memory engine (#23522)
Raise and Warn on UDF's without return_dtype set (#23353)
IR pruning (#23499)
Support min/max reducer for null dtype in streaming engine (#23465)
Implement streaming Categorical/Enum min/max (#23440)
Allow cast to Categorical inside list.eval (#23432)
Support pathlib.Path as source for read/scan_delta() (#23411)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Pass payload in ExprRegistry (#23412)
Support reading nanosecond/Int96 timestamps and schema evolved datasets in scan_delta() (#23398)
Support row group skipping with filters when cast_options is given (#23356)
Execute bitwise reductions in streaming engine (#23321)
Use scan_parquet().collect_schema() for read_parquet_schema (#23359)
Add dtype to str.to_integer() (#22239)
Add arr.slice, arr.head and arr.tail methods to arr namespace (#23150)
Add is_close method (#23273)
Drop superfluous casts from optimized plan (#23269)
Added drop_nulls option to to_dummies (#23215)
Support comma as decimal separator for CSV write (#23238)
Don't format keys if they're empty in dot (#23247)
Improve arity simplification (#23242)
Allow expression input for length parameter in pad_start, pad_end, and zfill (#23182)

🐞 Bug fixes

Load _expiry_time from botocore Credentials in CredentialProviderAWS (#23753)
Fix credential refresh logic (#23730)
Fix to_datetime() fallible identification (#23735)
Correct output datatype for dt.with_time_unit (#23734)
Fix incorrect native Iceberg scan from tables with renamed/dropped columns/fields (#23713)
Allow DataType expressions with selectors (#23720)
Match output type to engine for interpolate on Decimal (#23706)
Remaining bugs in with_exprs_and_input and pruning (#23710)
Match output dtype to engine for cum_sum_horizontal (#23686)
Field names for pl.struct in group-by (#23703)
Fix output for str.extract_groups with empty string pattern (#23698)
Match output type to engine for rolling_map (#23702)
Moved passing DeltaTable._storage_options (#23673)
Fix incorrect join on single Int128 column for in-memory engine (#23694)
Match output field name to lhs for BusinessDaycount (#23679)
Correct the planner output datatype for strptime (#23676)
Sort and Scan with_exprs_and_input (#23675)
Revert to old behavior with name.keep (#23670)
Fix panic loading from arrow Map containing timestamps (#23662)
Selectors in self part of list.eval (#23668)
Fix output field dtype for ToInteger (#23664)
Allow decimal_comma with , separator in read_csv (#23657)
Fix handling of UTF-8 in write_csv to IO[str] (#23647)
Selectors in {Lazy,Data}Frame.filter (#23631)
Stop splitfields iterator at eol in simd branch (#23652)
Correct output datatype of dt.year and dt.mil (#23646)
Logic of broadcast_rhs in binary functions to correct list.set_intersection for list[str] columns (#23584)
Order-preserving equi-join didn't always flush final matches (#23639)
Fix ColumnNotFound error when joining on col().cast() (#23622)
Fix agg groups on when/then in group_by context (#23628)
Output type for sign (#23572)
Apply agg_fn on null values in pivot (#23586)
Remove nonsensical duration variance (#23621)
Don't panic when sinking nested categorical to Parquet (#23610)
Correctly set value count output field name (#23611)
Casting unused columns in to_torch (#23606)
Allow inferring of hours-only timezone offset (#23605)
Bug in Categorical <-> str compare with nulls (#23609)
Honor n=0 in all cases of str.replace (#23598)
Remove arbitrary 25 item limit from implicit Python list -> Series infer (#23603)
Relabel duplicate sequence IDs in distributor (#23593)
Round-trip Enum and Categorical metadata in plugins (#23588)
Fix incorrect join_asof with by followed by head/slice (#23585)
Change return typing of get_index_type() from DataType to PolarsIntegerType (#23558)
Allow writing nested Int128 data to Parquet (#23580)
Enum serialization assert (#23574)
Output type for peak_min / peak_max (#23573)
Make Scalar Categorical, Enum and Struct values serializable (#23565)
Preserve row order within partition when sinking parquet (#23462)
Prevent in-mem partition sink deadlock (#23562)
Update AWS cloud documentation (#23563)
Correctly handle null values when comparing structs (#23560)
Make fold/reduce/cum_reduce/cum_fold serializable (#23524)
Make Expr.append serializable (#23515)
Float by float division dtype (#23529)
Division on empty DataFrame generating null row (#23516)
Partition sink copy_exprs and with_exprs_and_input (#23511)
Unreachable with pl.self_dtype (#23507)
Rolling median incorrect min_samples with nulls (#23481)
Make Int128 roundtrippable via Parquet (#23494)
Fix panic when common subplans contain IEJoins (#23487)
Properly handle non-finite floats in rolling_sum/mean (#23482)
Make read_csv_batched respect skip_rows and skip_lines (#23484)
Always use cloudpickle for the python objects in cloud plans (#23474)
Support string literals in index_of() on categoricals (#23458)
Don't panic for finish_callback with nested datatypes (#23464)
Pass DeltaTable._storage_options if no storage_options are provided (#23456)
Support min/max aggregation for DataFrame/LazyFrame Categoricals (#23455)
Fix var/moment dtypes (#23453)
Fix agg_groups dtype (#23450)
Fix incorrect _get_path_scheme (#23444)
Fix missing overload defaults in read_ods and tree_format (#23442)
Clear cached_schema when apply changes dtype (#23439)
Allow structured conversion to/from numpy with Array types, preserving shape (#23438)
Null handling in full-null group_by_dynamic mean/sum (#23435)
Enable default set of ScanCastOptions for native scan_iceberg() (#23416)
Fix index calculation for nearest interpolation (#23418)
Overload for eager default in Schema.to_frame was False instead of True (#23413)
Fix read_excel overloads so that passing list[str] to sheet_name does not raise (#23388)
Removed special handling for bytes like objects in read_ndjson (#23361)
Parse parquet footer length into unsigned integer (#23357)
Fix incorrect results with group_by aggregation on empty groups (#23358)
Fix boolean min() in group_by aggregation (streaming) (#23344)
Respect data-model in map_elements (#23340)
Properly join URI paths in PlPath (#23350)
Ignore null values in bitwise aggregation on bools (#23324)
Fix panic filtering after left join (#23310)
Out-of-bounds index in hot hash table (#23311)
Fix scanning '?' from cloud with glob=False (#23304)
Fix filters on inserted columns did not remove rows (#23303)
Don't ignore return_dtype (#23309)
Raise error instead of return in Series class (#23301)
Use safe parsing for get_normal_components (#23284)
Fix output column names/order of streaming coalesced right-join (#23278)
Restore concat_arr inputs expansion (#23271)
Expose FieldsMapper (#23232)
Fix time zone handling in dt.iso_year and dt.is_leap_year (#23125)

📖 Documentation

Fix str.replace_many examples trigger deprecation warning (#23695)
Point the R Polars version on R-multiverse (#23660)
Update example for writing to cloud storage (#20265)
Update GPU docs for RAPIDS CUDA 11 deprecation (#23620)
Add docs of Expr.list.filter and Series.list.filter (#23589)
Add page about billing to Polars Cloud user guide (#23564)
Small user-guide improvement and fixes (#23549)
Correct note in from_pandas about data being cloned (#23552)
Fix a few typos in the "Streaming" section (#23536)
Update streaming page (#23535)
Update structure of Polars Cloud documentation (#23496)
Update example code in pandas migration guide (#23403)
Correct plugins user guide to reflect that teaching Expr.language is in a different section (#23377)
Add example of using OR in join_where (#23375)
Update when_then in user guide (#23245)

📦 Build system

Update all rand code (#23387)

🛠️ Other improvements

Add hashes json (#23758)
Add `AExpr::is_expr...

Contributors

mrkn, orlp, and 29 other contributors

Assets 4

Releases: pola-rs/polars

Rust Polars 0.51.0

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

Contributors

Uh oh!

Python Polars 1.33.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.33.0

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.33.0-beta.1

💥 Breaking changes

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.32.3

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.32.2

🐞 Bug fixes

📖 Documentation

Contributors

Uh oh!

Python Polars 1.32.1

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Rust Polars 0.50.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.32.0

🏆 Highlights

🚀 Performance improvements

✨ Enhancements

🐞 Bug fixes

📖 Documentation

📦 Build system

🛠️ Other improvements

Contributors

Uh oh!

Python Polars 1.32.0-beta.1

🏆 Highlights