Skip to content

Releases: foundation-model-stack/fms-hf-tuning

v2.5.0

27 Jan 23:27
6f9bab2
Compare
Choose a tag to compare

Image: quay.io/modh/fms-hf-tuning:v2.5.0
In v2.5.0, fms-hf-tuning library is now built with python 3.12. See more on support update below.
Other note-worthy updates in this release:

New tracker:

  • New tracker using HFResourceScanner to enable lightweight tracking of memory usage and train time during training.

Support update:

  • We have tested and extended the support for python 3.12. fms-hf-tuning can now run with py 3.9, 3.10, 3.11 and 3.12.
  • Dockerfile is updated to use python 3.12 as default.

What's Changed

New Contributors

Full Changelog: v2.4.0...v2.5.0

v2.4.0

16 Jan 19:43
76bd76d
Compare
Choose a tag to compare

Image released: quay.io/modh/fms-hf-tuning:v2.4.0

Summary of Changes

Acceleration Updates:

  • Dataclass args added for accelerated MoE tuning, which can be activated using the new int flag fast_moe for the number of expert parallel sharding.
  • Update function name from requires_agumentation to requires_augmentation.
  • Note: the lower limit of the fms-acceleration library has been increased to 0.6.0.

Data Preprocessor Updates:

  • Allows for padding free plugin to be used without response template.
  • Allows HF dataset IDs to be passed via the training_data_path flag.

Additional Changes:

  • Add pad_token to special_tokens_dict when pad_token == eos_token, which improves granite 3.0 + 3.1 quality on the tuning stack.
    For full details of changes, see the release notes.
    (edited)

Full List of Change

  • fix: broken README.md link by @dushyantbehl in #429
  • feat: Allow hf dataset id to be passed via training_data_path by @dushyantbehl in #431
  • feat: dataclass args for accelerated MoE tuning by @willmj in #390
  • feat: allow for padding free plugin to be used without response template by @dushyantbehl in #430
  • fix: function name from requires_agumentation to requires_augmentation by @willmj in #434
  • fix: Add pad_token to special_tokens_dict when pad_token == eos_token by @Abhishek-TAMU in #436
  • chore(deps): upgrade fms-acceleration to >= 0.6 by @willmj in #440
  • docs: update granite3 model support by @anhuong in #441

Full Changelog: v2.3.1...v2.4.0

v2.4.0-rc.2

16 Jan 18:37
d03072b
Compare
Choose a tag to compare
v2.4.0-rc.2 Pre-release
Pre-release

What's Changed

  • fix: broken README.md link by @dushyantbehl in #429
  • feat: Allow hf dataset id to be passed via training_data_path by @dushyantbehl in #431
  • feat: dataclass args for accelerated MoE tuning by @willmj in #390
  • feat: allow for padding free plugin to be used without response template by @dushyantbehl in #430
  • fix: function name from requires_agumentation to requires_augmentation by @willmj in #434
  • fix: Add pad_token to special_tokens_dict when pad_token == eos_token by @Abhishek-TAMU in #436
  • chore(deps): upgrade fms-acceleration to >= 0.6 by @willmj in #440
  • docs: update granite3 model support by @anhuong in #441

Full Changelog: v2.3.0...v2.4.0-rc.2

v2.4.0-rc.1

16 Jan 14:07
24f7e42
Compare
Choose a tag to compare
v2.4.0-rc.1 Pre-release
Pre-release
add tokens to special_tokens_dict (#436)

Signed-off-by: Abhishek <[email protected]>

v2.3.1

23 Dec 16:57
3ec30a0
Compare
Choose a tag to compare

Summary of changes in this release

Image released: quay.io/modh/fms-hf-tuning:v2.3.1

New feature updates around data handling and preprocessing:

  • Enable loading of Parquet and Arrow Dataset files.
  • Dataset mixing via sampling probabilities in data config.
  • New additional_data_handlers arg in train function to be registered with the data preprocessor.
  • Support multiple files, directories, pattern-based paths, HF Dataset IDs, and their combinations via data_config.
  • New support for both multi-turn and single-turn chat interactions.

New tracker:

  • New MLFlow tracker

Additional Changes

  • Refactor test artifacts into tests/artifacts , adding new data types, datasets, and predefined data configs for new unit tests.
  • Resolve issues with deprecated training arguments.

Full list of Changes

  • feat: Add support to handle Parquet Dataset files via data config by @Abhishek-TAMU in #401
  • test: add arrow datasets and arrow unit tests by @willmj in #403
  • feat: Perform dataset mixing via sampling probabilities in data config by @dushyantbehl in #408
  • feat: Expose additional data handlers as an argument in train by @dushyantbehl in #409
  • fix: Move deprecated positional arguments from SFTTrainer to SFTConfig by @Luka-D in #399
  • fix: update dataclass objects directly instead of creating new variables by @kmehant in #418
  • test: Add unit tests to test multiple files in single dataset by @Abhishek-TAMU in #412
  • feat: Add multi and single turn chat support by @dushyantbehl in #415
  • feat: Integrate MLflow tracker by @dushyantbehl in #425
  • feat: Handle passing of multiple files, multiple folders, path with patterns, HF Dataset and combination by @Abhishek-TAMU in #424
  • docs: Add documentation for data preprocessor release by @dushyantbehl in #423

New Contributors

Full Changelog: v2.2.0...v2.3.1

v2.3.0

23 Dec 16:51
594dd37
Compare
Choose a tag to compare

Release missing the right README docs, please see v2.3.1 for the complete Release Changelog.

v2.3.0-rc.1

19 Dec 21:06
d7f06f5
Compare
Choose a tag to compare
v2.3.0-rc.1 Pre-release
Pre-release

What's Changed

  • feat: Add support to handle Parquet Dataset files via data config by @Abhishek-TAMU in #401
  • test: add arrow datasets and arrow unit tests by @willmj in #403
  • feat: Perform dataset mixing via sampling probabilities in data config by @dushyantbehl in #408
  • feat: Expose additional data handlers as an argument in train by @dushyantbehl in #409
  • fix: Move deprecated positional arguments from SFTTrainer to SFTConfig by @Luka-D in #399
  • fix: update dataclass objects directly instead of creating new variables by @kmehant in #418
  • test: Add unit tests to test multiple files in single dataset by @Abhishek-TAMU in #412
  • feat: Add multi and single turn chat support by @dushyantbehl in #415
  • feat: Integrate MLflow tracker by @dushyantbehl in #425
  • feat: Handle passing of multiple files, multiple folders, path with patterns, HF Dataset and combination by @Abhishek-TAMU in #424

New Contributors

Full Changelog: v2.2.1...v2.3.0

v2.2.1

06 Dec 19:26
054a985
Compare
Choose a tag to compare

Image released: quay.io/modh/fms-hf-tuning:v2.2.1

Foundational Updates

  • Addition of new data preprocessor framework as a base code for future enhancements, while maintaining full compatibility with existing features.

Additional Changes

  • Added a Data Preprocessor ADR.
  • Moved test datasets from tests/data to tests/artifacts/testdata.

Full list of Changes

Full Changelog: v2.1.2...v2.2.1

v2.2.0

06 Dec 19:01
054a985
Compare
Choose a tag to compare

This version has dependency not compatible with the repo and users should move to v2.2.1 instead

v2.2.0-rc.1

05 Dec 01:06
7df3416
Compare
Choose a tag to compare
v2.2.0-rc.1 Pre-release
Pre-release

What's Changed

Full Changelog: v2.1.2-rc.1...v2.2.0-rc.1