Releases: modin-project/modin
Modin 0.32.0
This release introduces support for Polars API, a new query compiler for small data,
more functions that can use dynamic partitioning, as well as several bug fixes.
Key Features and Updates Since 0.31.0
- Stability and Bugfixes
- FIX-#0000: Fix type hint (#7343)
- FIX-#7113: Fix docstring overrides for subclasses (#7354)
- FIX-#7134: Use a separate docstring class for
BasePandasDataset
(#7353) - FIX-#7329: Do not sort columns on
df.update
(#7330) - FIX-#7351: Add ipython method calls to non-lookup list (#7352)
- FIX-#7355: Cpu count would be set incorrectly on a cluster (#7356)
- FIX-#7357: Fix
NoAttributeError
onDataFrame.copy
(#7358) - FIX-#7371: Fix inserting datelike values into a DataFrame (#7372)
- FIX-#7373: Try a previous version of
motoserver/moto
service, pin to 5.0.13 (#7374) - FIX-#7379: Fix
__imul__
performing addition instead of multiplication (#7380) - FIX-#7387: Limit the number of pytest workers for tests with Ray engine on Windows (#7388)
- FIX-#7389: Fix uploading artifacts (#7390)
- Refactor Codebase
- REFACTOR-#0000: Update copyright date (#7333)
- Documentation improvements
- New Features
- FEAT-#4605: Add native query compiler (#7259)
- FEAT-#7308: Interoperability between query compilers (#7376)
- FEAT-#7331: Initial Polars API (#7332)
- FEAT-#7337: Using dynamic partitionning in
broadcast_apply
(#7338) - FEAT-#7340: Add more granular lazy flags to query compiler (#7348)
- FEAT-#7368: Add a new environment variable for using dynamic partitioning (#7369)
Contributors
@MortalHappiness
@Retribution98
@YarShev
@ZhipengXue97
@anmyachev
@arunjose696
@devin-petersohn
@likawind
@sfc-gh-joshi
@sfc-gh-mvashishtha
Modin 0.31.0
First release compatible with NumPy 2.0.
Key Features and Updates Since 0.30.0
- Stability and Bugfixes
- FIX-#7138: Stop reloading modules for custom docstrings (#7307)
- FIX-#7263: Empty docstrings should not be inherited (#7264)
- FIX-#7272: Remove HDK engine (#7275)
- FIX-#7277: Remove Cudf storage format as unmaintained (#7290)
- FIX-#7278: Make sure
enable_logging
decorator preserve type hints (#7279) - FIX-#7292: Prepare Modin code to NumPy 2.0 (#7293)
- FIX-#7295: Unpin numexpr to allow versions >= 2.8.4 to match pandas (#7296)
- FIX-#7309: Update versioneer with
versioneer install --vendor
(#7311) - FIX-#7320: Bump the github-actions group with 3 updates (#7319)
- FIX-#7321: Using
C
engine instead ofpyarrow
for getting metadata inread_csv
(#7322)
- Performance enhancements
- Refactor Codebase
- REFACTOR-#7271: Remove
instance_type
attribute of axis partitions (#7268) - REFACTOR-#7273: Remove deprecated functions from utils.py, accessor.py and io.py (#7274)
- REFACTOR-#7285: Remove deprecated configs (#7286)
- REFACTOR-#7294: Reduce access of methods
_modin_frame
methods from_query_compiler
(#7297) - REFACTOR-#7313: Add similar methods as in #7294 for operating on columns (#7314)
- REFACTOR-#7271: Remove
- Update testing suite
- Documentation improvements
- New Features
- FEAT-#6574: UserWarning no longer displayed when Series/DataFrames are small (#7323)
- FEAT-#7249: Add
reload_modin
feature (#7280) - FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262)
- FEAT-#7283: Introduce
MinRowPartitionSize
andMinColumnPartitionSize
(#7284) - FEAT-#7310: NumPy 2.0 support (#7312)
Contributors
@Jayson729
@Retribution98
@YarShev
@anmyachev
@arunjose696
@kurtmckee
@sfc-gh-dpetersohn
@vsreekanti
Modin 0.30.1
This release pins numpy<2.
Key Features and Updates Since 0.30.0
Contributors
Modin 0.29.1
This release pins numpy<2.
Key Features and Updates Since 0.29.0
- Stability and Bugfixes
- New Features
Contributors
Modin 0.28.3
This release pins numpy<2.
Key Features and Updates Since 0.28.2
- Stability and Bugfixes
- New Features
Contributors
Modin 0.27.1
This release pins numpy<2.
Key Features and Updates Since 0.27.0
- Stability and Bugfixes
- New Features
Contributors
Modin 0.30.0
This release introduces support for DataFrame API standard, a distributed implementation for right merge/join,
more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions,
improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX.
Key Features and Updates Since 0.29.0
- Stability and Bugfixes
- FIX-#0000: Fix badge in README.md (#7213)
- FIX-#0000: Make merge tests more stable by sorting results (#7266)
- FIX-#6967: Remove
read_pickle_distributed
/to_pickle_distributed
functions as deprecated (#7258) - FIX-#7093: Make sure
idxmax
andidxmin
can work with string columns (#7193) - FIX-#7102: Remove
enable_api_only
mode in modin logging (#7194) - FIX-#7103: Move lower-level functionality logging to debug (#7184)
- FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214)
- FIX-#7185: Add extra check for some config classes (#7189)
- FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209)
- FIX-#7206: Make sure
df.melt
handle duplicatevalue_vars
correctly (#7208) - FIX-#7219: Pin
dataframe-api-compat>=0.2.7
(#7220) - FIX-#7221: Don't use
use_legacy_dataset=False
forParquetDataset
(#7222) - FIX-#7224: Importing
modin.pandas.api.extensions
overwrites re-export ofpandas.api
submodules (#7225) - FIX-#7233: Display property name in
default_to_pandas
error messages (#7269) - FIX-#7234: Deprecate HDK engine (#7235)
- FIX-#7238: Fix docstring inheritance for
cached_property
and use it (#7239) - FIX-#7240: Allow
doc_checker.py
works withfunctools.cached_property
(#7241) - FIX-#7246: Pin
pyarrow>=10.0.1
aspandas 2.2.*
does (#7247) - FIX-#7248: Make sure
_validate_dtypes_sum_prod_mean
works correctly with datetime types (#7237) - FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251)
- Performance enhancements
- Refactor Codebase
- Update testing suite
- Documentation improvements
- New Features
- FEAT-#5394: Reduce amount of remote calls for
Map
operator (#7136) - FEAT-#5394: Reduce amount of remote calls for
TreeReduce
andGroupByReduce
operators (#7245) - FEAT-#6492: Add
from_map
feature to create dataframe (#7215) - FEAT-#6498: Make
Fold
operator more flexible (#7257) - FEAT-#6808: Implement
__arrow_array__
for Series (#7200) - FEAT-#6890: Modin implementation of DataFrame API standard (#7216)
- FEAT-#7139: Use
ray-core
instead ofray-default
(#6955) - FEAT-#7187: Change
master
branch tomain
(#7188) - FEAT-#7202: Use custom resources for Ray (#7205)
- FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204)
- FEAT-#7207: Add the ability to assign a df to a columns selection without d2p (#7210)
- FEAT-#7252: Add type hints for
base.py
(#7253) - FEAT-#7254: Support right
merge
/join
(#7226)
- FEAT-#5394: Reduce amount of remote calls for
Contributors
@Retribution98
@YarShev
@anmyachev
@arunjose696
@noloerino
@sfc-gh-jkew
Modin 0.29.0
This release introduces modin.pandas.testing
and modin.pandas.arrays
modules, faster implementation (range-partitioning) for
pivot_table
, unique
, drop_duplicates
, nunique
, df.resample
functions, new functions to interact with Dask: to/from_dask
,
distributed implementation for Series.case_when
, optimization for astype
function with scalar dtype.
Key Features and Updates Since 0.28.0
- Stability and Bugfixes
- FIX-#6227: Make sure
Series.unique()
with pyarrow dtype returnsArrowExtensionArray
(#7042) - FIX-#6793: Use
pandas_dtype
instead ofnp.dtype
for some more places in Modin code (#6794) - FIX-#7039: Pass scalar dtype as is to
astype
query compiler (#7152) - FIX-#7051: Update exception message for
astype
function (#7052) - FIX-#7054: Update exception message for
shift
function (#7055) - FIX-#7056: Update exception message for
iloc/loc
functions (#7057) - FIX-#7058: Update exception message for
insert
function (#7059) - FIX-#7060: Fix
pivot
when index or columns are of Index type (#7061) - FIX-#7062: Update exception message for
aggregate
function (#7063) - FIX-#7072: Replace
MaterializationHook
with the materialized object on serialization (#7075) - FIX-#7088: Make sure
rank
raisesNo axis named None...
exception (#7089) - FIX-#7115: Exclude Ray 2.10.0 from deps installation (#7116)
- FIX-#7135: Fix appending a new row (#7172)
- FIX-#7153: Fix
Series.corr
withmethod != pearson
(#7158) - FIX-#7157: Make sure
quantile
function works withnumeric_only=True
(#7160) - FIX-#7170: Don't use
MinPartitionSize
configuration variable in remote context (#7177)
- FIX-#6227: Make sure
- Performance enhancements
- PERF-#5296: Partition parquet file if it has too few row groups (#7016)
- PERF-#7068: Provide
shape_hint="column"
for some more operations with Series (#7069) - PERF-#7123: Preserve
shape_hint
for dropna (#7124) - PERF-#7130: Preserve partition lengths in
apply_full_axis
withkeep_partitioning=True
(#7131) - PERF-#7132: Preserve partition lengths in
apply_full_axis
withkeep_partitioning=False
(#7133) - PERF-#7150: Reduce peak memory consumption (#7149)
- Refactor Codebase
- Update testing suite
- TEST-#3622: Centralize tests in Modin (#7137)
- TEST-#6016: Make sure
eval_general
doesn't expect exceptions by default (#6954) - TEST-#7064: Explicitly check for exceptions in
test_groupby.py
(#7065) - TEST-#7066: Explicitly check for exceptions in
test_io.py
(#7067) - TEST-#7073: Explicitly check for exceptions in
test_default.py
(#7074) - TEST-#7076: Explicitly check for exceptions in
test_map_metadata.py
(#7077) - TEST-#7082: Explicitly check for exceptions in
test_series.py
(#7083) - TEST-#7084: Explicitly check for exceptions in
test_indexing.py
(#7085) - TEST-#7086: Explicitly check for exceptions in
test_reduce.py
(#7087) - TEST-#7094: Rename
raising_exceptions
argument ofeval_general
testing function (#7095) - TEST-#7125: Explicitly install modin in CI tests (#7126)
- TEST-#7165: Add codecov token to fix CI on master (#7175)
- TEST-#7166: Fix HDF tests in CI (#7167)
- TEST-#7173: Update github actions (#7168)
- Documentation improvements
- New Features
- FEAT-#4527: Add Modin logging to
AxisPartition
andBlockPartition
classes (#7079) - FEAT-#6783: Implement
modin.pandas.testing
module (#7045) - FEAT-#6929: Implement
Series.case_when
in a distributed way (#6972) - FEAT-#7004: Use generators when returning from
_deploy_ray_func
remote function. (#7005) - FEAT-#7021: Implement
to/from_dask
functions (#7022) - FEAT-#7047: Add range-partitioning implementation for
.pivot_table()
(#7048) - FEAT-#7070: Add
modin.pandas.arrays
module (#7071) - FEAT-#7078: Add
modin_layer
names to classes that inheritClassLogger
(#7099) - FEAT-#7090: Add range-partitioning implementation for
.unique()
and.drop_duplicates()
(#7091) - FEAT-#7100: Add range-partitioning impl for
nunique()
(#7101) - FEAT-#7102: Deprecate
enable_api_only
mode in modin logging (#7114) - FEAT-#7111: Implemented
@remote_function
decorator with cache (#7112) - FEAT-#7117: Support building range-partitioning from an index level (#7120)
- FEAT-#7118: Add range-partitioning impl for
df.resample()
(#7140) - FEAT-#7128: Update minimal supported version of Ray up to 2.1.0 (#7129)
- FEAT-#7141: Add an ability to use config variables with a context manager (#7142)
- FEAT-#7146: Use
BaseQueryCompiler
,BasePandasDataset
,DataFrame
orSeries
type hints at a high level (#7147) - FEAT-#7156: Add type hints for
Series
(#7154) - FEAT-#7178: Add type hints for
DataFrame
(#7179) - FEAT-#7180: Add type hints for
modin.pandas.[functions]
(#7181)
- FEAT-#4527: Add Modin logging to
Contributors
@AndreyPavlenko
@Retribution98
@YarShev
@anmyachev
@arunjose696
@dchigarev
@sfc-gh-mvashishtha
Modin 0.28.2
This release reverts the pandas requirement from
2.2.1 to >=2.2,<2.3
Key Features and Updates Since 0.28.1
Contributors
Modin 0.28.1
This release pins pandas to 2.2.1. This pin will be removed in a subsequent release. Key Features and Updates Since 0.28.0 ------------------------------------- * New Features * FEAT-#7162: Pin pandas to 2.2.1 (87d147f) Contributors ------------ @sfc-gh-dpetersohn