Skip to content

Releases: modin-project/modin

Modin 0.32.0

11 Sep 13:43
0.32.0
3e951a6
Compare
Choose a tag to compare

This release introduces support for Polars API, a new query compiler for small data,
more functions that can use dynamic partitioning, as well as several bug fixes.

Key Features and Updates Since 0.31.0

  • Stability and Bugfixes
    • FIX-#0000: Fix type hint (#7343)
    • FIX-#7113: Fix docstring overrides for subclasses (#7354)
    • FIX-#7134: Use a separate docstring class for BasePandasDataset (#7353)
    • FIX-#7329: Do not sort columns on df.update (#7330)
    • FIX-#7351: Add ipython method calls to non-lookup list (#7352)
    • FIX-#7355: Cpu count would be set incorrectly on a cluster (#7356)
    • FIX-#7357: Fix NoAttributeError on DataFrame.copy (#7358)
    • FIX-#7371: Fix inserting datelike values into a DataFrame (#7372)
    • FIX-#7373: Try a previous version of motoserver/moto service, pin to 5.0.13 (#7374)
    • FIX-#7379: Fix __imul__ performing addition instead of multiplication (#7380)
    • FIX-#7387: Limit the number of pytest workers for tests with Ray engine on Windows (#7388)
    • FIX-#7389: Fix uploading artifacts (#7390)
  • Refactor Codebase
    • REFACTOR-#0000: Update copyright date (#7333)
  • Documentation improvements
    • DOCS-#0000: Update RunLLM Ask AI widget script path (#7345)
    • DOCS-#7335: Fix borken links in Modin Usage Examples page (#7336)
    • DOCS-#7382: Add documentation on how to use Modin Native query compiler (#7386)
  • New Features
    • FEAT-#4605: Add native query compiler (#7259)
    • FEAT-#7308: Interoperability between query compilers (#7376)
    • FEAT-#7331: Initial Polars API (#7332)
    • FEAT-#7337: Using dynamic partitionning in broadcast_apply (#7338)
    • FEAT-#7340: Add more granular lazy flags to query compiler (#7348)
    • FEAT-#7368: Add a new environment variable for using dynamic partitioning (#7369)

Contributors

@MortalHappiness
@Retribution98
@YarShev
@ZhipengXue97
@anmyachev
@arunjose696
@devin-petersohn
@likawind
@sfc-gh-joshi
@sfc-gh-mvashishtha

Modin 0.31.0

26 Jun 16:04
0.31.0
c8bbca8
Compare
Choose a tag to compare

First release compatible with NumPy 2.0.

Key Features and Updates Since 0.30.0

  • Stability and Bugfixes
    • FIX-#7138: Stop reloading modules for custom docstrings (#7307)
    • FIX-#7263: Empty docstrings should not be inherited (#7264)
    • FIX-#7272: Remove HDK engine (#7275)
    • FIX-#7277: Remove Cudf storage format as unmaintained (#7290)
    • FIX-#7278: Make sure enable_logging decorator preserve type hints (#7279)
    • FIX-#7292: Prepare Modin code to NumPy 2.0 (#7293)
    • FIX-#7295: Unpin numexpr to allow versions >= 2.8.4 to match pandas (#7296)
    • FIX-#7309: Update versioneer with versioneer install --vendor (#7311)
    • FIX-#7320: Bump the github-actions group with 3 updates (#7319)
    • FIX-#7321: Using C engine instead of pyarrow for getting metadata in read_csv (#7322)
  • Performance enhancements
    • PERF-#7299: Avoid using synchronize_labels for combine function (#7300)
  • Refactor Codebase
    • REFACTOR-#7271: Remove instance_type attribute of axis partitions (#7268)
    • REFACTOR-#7273: Remove deprecated functions from utils.py, accessor.py and io.py (#7274)
    • REFACTOR-#7285: Remove deprecated configs (#7286)
    • REFACTOR-#7294: Reduce access of methods _modin_frame methods from _query_compiler (#7297)
    • REFACTOR-#7313: Add similar methods as in #7294 for operating on columns (#7314)
  • Update testing suite
    • TEST-#0000: Add a Dependabot config to auto-update GitHub action versions (#7318)
    • TEST-#7316: Run a subset of CI tests with python 3.10 and 3.11 on a scheduled basis (#7289)
  • Documentation improvements
    • DOCS-#0000: Adds RunLLM widget to docs (#7326)
    • DOCS-#7287: Update Modin on Dask documentation (#7288)
  • New Features
    • FEAT-#6574: UserWarning no longer displayed when Series/DataFrames are small (#7323)
    • FEAT-#7249: Add reload_modin feature (#7280)
    • FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262)
    • FEAT-#7283: Introduce MinRowPartitionSize and MinColumnPartitionSize (#7284)
    • FEAT-#7310: NumPy 2.0 support (#7312)

Contributors

@Jayson729
@Retribution98
@YarShev
@anmyachev
@arunjose696
@kurtmckee
@sfc-gh-dpetersohn
@vsreekanti

Modin 0.30.1

10 Jun 12:51
0.30.1
52fca1c
Compare
Choose a tag to compare

This release pins numpy<2.

Key Features and Updates Since 0.30.0

Contributors

@anmyachev

Modin 0.29.1

10 Jun 16:01
0.29.1
ff6b30c
Compare
Choose a tag to compare

This release pins numpy<2.

Key Features and Updates Since 0.29.0

  • Stability and Bugfixes
  • New Features
    • FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262)

Contributors

@anmyachev
@sfc-gh-dpetersohn

Modin 0.28.3

10 Jun 18:38
0.28.3
1809a0a
Compare
Choose a tag to compare

This release pins numpy<2.

Key Features and Updates Since 0.28.2

  • Stability and Bugfixes
  • New Features
    • FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262)

Contributors

@anmyachev
@sfc-gh-dpetersohn

Modin 0.27.1

10 Jun 21:16
0.27.1
427f515
Compare
Choose a tag to compare

This release pins numpy<2.

Key Features and Updates Since 0.27.0

  • Stability and Bugfixes
  • New Features
    • FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262)

Contributors

@anmyachev
@dchigarev
@sfc-gh-dpetersohn

Modin 0.30.0

15 May 10:28
0.30.0
51b0a78
Compare
Choose a tag to compare

This release introduces support for DataFrame API standard, a distributed implementation for right merge/join,
more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions,
improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX.

Key Features and Updates Since 0.29.0

  • Stability and Bugfixes
    • FIX-#0000: Fix badge in README.md (#7213)
    • FIX-#0000: Make merge tests more stable by sorting results (#7266)
    • FIX-#6967: Remove read_pickle_distributed/to_pickle_distributed functions as deprecated (#7258)
    • FIX-#7093: Make sure idxmax and idxmin can work with string columns (#7193)
    • FIX-#7102: Remove enable_api_only mode in modin logging (#7194)
    • FIX-#7103: Move lower-level functionality logging to debug (#7184)
    • FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214)
    • FIX-#7185: Add extra check for some config classes (#7189)
    • FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209)
    • FIX-#7206: Make sure df.melt handle duplicate value_vars correctly (#7208)
    • FIX-#7219: Pin dataframe-api-compat>=0.2.7 (#7220)
    • FIX-#7221: Don't use use_legacy_dataset=False for ParquetDataset (#7222)
    • FIX-#7224: Importing modin.pandas.api.extensions overwrites re-export of pandas.api submodules (#7225)
    • FIX-#7233: Display property name in default_to_pandas error messages (#7269)
    • FIX-#7234: Deprecate HDK engine (#7235)
    • FIX-#7238: Fix docstring inheritance for cached_property and use it (#7239)
    • FIX-#7240: Allow doc_checker.py works with functools.cached_property (#7241)
    • FIX-#7246: Pin pyarrow>=10.0.1 as pandas 2.2.* does (#7247)
    • FIX-#7248: Make sure _validate_dtypes_sum_prod_mean works correctly with datetime types (#7237)
    • FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251)
  • Performance enhancements
    • PERF-#7227: Call modin_frame.combine() for merge and join only when necessary (#7228)
    • PERF-#7230: Don't preserve bad partition for merge (#7229)
  • Refactor Codebase
    • REFACTOR-#7242: Add type hints for modin/core/dataframe/algebra/ (#7243)
    • REFACTOR-#7260: Use extract_dtype internal function in more places (#7261)
  • Update testing suite
    • TEST-#7049: Add some sanity tests with pyarrow-backed pandas dataframes (#7199)
    • TEST-#7191: Fix ASV after changing default branch (#7190)
  • Documentation improvements
    • DOCS-#0000: Fix a typo with MODIN_CPUS number (#7198)
    • DOCS-#0000: Supplement Optimization Notes with a link to configs (#7197)
    • DOCS-#7217: Update docs as to when Modin operators work best (#7218)
    • DOCS-#7255: Update docs as to from_* functions (#7256)
  • New Features
    • FEAT-#5394: Reduce amount of remote calls for Map operator (#7136)
    • FEAT-#5394: Reduce amount of remote calls for TreeReduce and GroupByReduce operators (#7245)
    • FEAT-#6492: Add from_map feature to create dataframe (#7215)
    • FEAT-#6498: Make Fold operator more flexible (#7257)
    • FEAT-#6808: Implement __arrow_array__ for Series (#7200)
    • FEAT-#6890: Modin implementation of DataFrame API standard (#7216)
    • FEAT-#7139: Use ray-core instead of ray-default (#6955)
    • FEAT-#7187: Change master branch to main (#7188)
    • FEAT-#7202: Use custom resources for Ray (#7205)
    • FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204)
    • FEAT-#7207: Add the ability to assign a df to a columns selection without d2p (#7210)
    • FEAT-#7252: Add type hints for base.py (#7253)
    • FEAT-#7254: Support right merge/join (#7226)

Contributors

@Retribution98
@YarShev
@anmyachev
@arunjose696
@noloerino
@sfc-gh-jkew

Modin 0.29.0

15 Apr 18:05
0.29.0
6d64e08
Compare
Choose a tag to compare

This release introduces modin.pandas.testing and modin.pandas.arrays modules, faster implementation (range-partitioning) for
pivot_table, unique, drop_duplicates, nunique, df.resample functions, new functions to interact with Dask: to/from_dask,
distributed implementation for Series.case_when, optimization for astype function with scalar dtype.

Key Features and Updates Since 0.28.0

  • Stability and Bugfixes
    • FIX-#6227: Make sure Series.unique() with pyarrow dtype returns ArrowExtensionArray (#7042)
    • FIX-#6793: Use pandas_dtype instead of np.dtype for some more places in Modin code (#6794)
    • FIX-#7039: Pass scalar dtype as is to astype query compiler (#7152)
    • FIX-#7051: Update exception message for astype function (#7052)
    • FIX-#7054: Update exception message for shift function (#7055)
    • FIX-#7056: Update exception message for iloc/loc functions (#7057)
    • FIX-#7058: Update exception message for insert function (#7059)
    • FIX-#7060: Fix pivot when index or columns are of Index type (#7061)
    • FIX-#7062: Update exception message for aggregate function (#7063)
    • FIX-#7072: Replace MaterializationHook with the materialized object on serialization (#7075)
    • FIX-#7088: Make sure rank raises No axis named None... exception (#7089)
    • FIX-#7115: Exclude Ray 2.10.0 from deps installation (#7116)
    • FIX-#7135: Fix appending a new row (#7172)
    • FIX-#7153: Fix Series.corr with method != pearson (#7158)
    • FIX-#7157: Make sure quantile function works with numeric_only=True (#7160)
    • FIX-#7170: Don't use MinPartitionSize configuration variable in remote context (#7177)
  • Performance enhancements
    • PERF-#5296: Partition parquet file if it has too few row groups (#7016)
    • PERF-#7068: Provide shape_hint="column" for some more operations with Series (#7069)
    • PERF-#7123: Preserve shape_hint for dropna (#7124)
    • PERF-#7130: Preserve partition lengths in apply_full_axis with keep_partitioning=True (#7131)
    • PERF-#7132: Preserve partition lengths in apply_full_axis with keep_partitioning=False (#7133)
    • PERF-#7150: Reduce peak memory consumption (#7149)
  • Refactor Codebase
    • REFACTOR-#3257: Move logging and caching to the gen_data internal function (#7046)
    • REFACTOR-#7105: Deprecate cfg.RangePartitioningGroupby (#7161)
    • REFACTOR-#7106: Rename from/to_ray_dataset to from/to_ray (#7107)
    • REFACTOR-#7109: Remove the outdated aws_example.yaml file (#7110)
  • Update testing suite
    • TEST-#3622: Centralize tests in Modin (#7137)
    • TEST-#6016: Make sure eval_general doesn't expect exceptions by default (#6954)
    • TEST-#7064: Explicitly check for exceptions in test_groupby.py (#7065)
    • TEST-#7066: Explicitly check for exceptions in test_io.py (#7067)
    • TEST-#7073: Explicitly check for exceptions in test_default.py (#7074)
    • TEST-#7076: Explicitly check for exceptions in test_map_metadata.py (#7077)
    • TEST-#7082: Explicitly check for exceptions in test_series.py (#7083)
    • TEST-#7084: Explicitly check for exceptions in test_indexing.py (#7085)
    • TEST-#7086: Explicitly check for exceptions in test_reduce.py (#7087)
    • TEST-#7094: Rename raising_exceptions argument of eval_general testing function (#7095)
    • TEST-#7125: Explicitly install modin in CI tests (#7126)
    • TEST-#7165: Add codecov token to fix CI on master (#7175)
    • TEST-#7166: Fix HDF tests in CI (#7167)
    • TEST-#7173: Update github actions (#7168)
  • Documentation improvements
    • DOCS-#2434: Clarify the use of --signoff option (#7145)
    • DOCS-#6987: Rework range-partitioning docs (#7169)
    • DOCS-#7144: Add information about logging from user defined function (#7155)
  • New Features
    • FEAT-#4527: Add Modin logging to AxisPartition and BlockPartition classes (#7079)
    • FEAT-#6783: Implement modin.pandas.testing module (#7045)
    • FEAT-#6929: Implement Series.case_when in a distributed way (#6972)
    • FEAT-#7004: Use generators when returning from _deploy_ray_func remote function. (#7005)
    • FEAT-#7021: Implement to/from_dask functions (#7022)
    • FEAT-#7047: Add range-partitioning implementation for .pivot_table() (#7048)
    • FEAT-#7070: Add modin.pandas.arrays module (#7071)
    • FEAT-#7078: Add modin_layer names to classes that inherit ClassLogger (#7099)
    • FEAT-#7090: Add range-partitioning implementation for .unique() and .drop_duplicates() (#7091)
    • FEAT-#7100: Add range-partitioning impl for nunique() (#7101)
    • FEAT-#7102: Deprecate enable_api_only mode in modin logging (#7114)
    • FEAT-#7111: Implemented @remote_function decorator with cache (#7112)
    • FEAT-#7117: Support building range-partitioning from an index level (#7120)
    • FEAT-#7118: Add range-partitioning impl for df.resample() (#7140)
    • FEAT-#7128: Update minimal supported version of Ray up to 2.1.0 (#7129)
    • FEAT-#7141: Add an ability to use config variables with a context manager (#7142)
    • FEAT-#7146: Use BaseQueryCompiler, BasePandasDataset, DataFrame or Series type hints at a high level (#7147)
    • FEAT-#7156: Add type hints for Series (#7154)
    • FEAT-#7178: Add type hints for DataFrame (#7179)
    • FEAT-#7180: Add type hints for modin.pandas.[functions] (#7181)

Contributors

@AndreyPavlenko
@Retribution98
@YarShev
@anmyachev
@arunjose696
@dchigarev
@sfc-gh-mvashishtha

Modin 0.28.2

12 Apr 08:46
0.28.2
caed912
Compare
Choose a tag to compare

This release reverts the pandas requirement from
2.2.1 to >=2.2,<2.3

Key Features and Updates Since 0.28.1

  • New Features

Contributors

@sfc-gh-mvashishtha

Modin 0.28.1

09 Apr 23:04
0.28.1
eac21a8
Compare
Choose a tag to compare
This release pins pandas to 2.2.1. This pin will be removed
in a subsequent release.

Key Features and Updates Since 0.28.0
-------------------------------------
* New Features
  * FEAT-#7162: Pin pandas to 2.2.1 (87d147f)

Contributors
------------
@sfc-gh-dpetersohn