Skip to content

v2.3.1

Latest
Compare
Choose a tag to compare
@aluu317 aluu317 released this 23 Dec 16:57
3ec30a0

Summary of changes in this release

New feature updates around data handling and preprocessing:

  • Enable loading of Parquet and Arrow Dataset files.
  • Dataset mixing via sampling probabilities in data config.
  • New additional_data_handlers arg in train function to be registered with the data preprocessor.
  • Support multiple files, directories, pattern-based paths, HF Dataset IDs, and their combinations via data_config.
  • New support for both multi-turn and single-turn chat interactions.

New tracker:

  • New MLFlow tracker

Additional Changes

  • Refactor test artifacts into tests/artifacts , adding new data types, datasets, and predefined data configs for new unit tests.
  • Resolve issues with deprecated training arguments.

Full list of Changes

  • feat: Add support to handle Parquet Dataset files via data config by @Abhishek-TAMU in #401
  • test: add arrow datasets and arrow unit tests by @willmj in #403
  • feat: Perform dataset mixing via sampling probabilities in data config by @dushyantbehl in #408
  • feat: Expose additional data handlers as an argument in train by @dushyantbehl in #409
  • fix: Move deprecated positional arguments from SFTTrainer to SFTConfig by @Luka-D in #399
  • fix: update dataclass objects directly instead of creating new variables by @kmehant in #418
  • test: Add unit tests to test multiple files in single dataset by @Abhishek-TAMU in #412
  • feat: Add multi and single turn chat support by @dushyantbehl in #415
  • feat: Integrate MLflow tracker by @dushyantbehl in #425
  • feat: Handle passing of multiple files, multiple folders, path with patterns, HF Dataset and combination by @Abhishek-TAMU in #424
  • docs: Add documentation for data preprocessor release by @dushyantbehl in #423

New Contributors

Full Changelog: v2.2.0...v2.3.1