Skip to content

Releases: huggingface/datasets

2.14.7

15 Nov 08:19
bf02cff
Compare
Choose a tag to compare

Bug Fixes

New Contributors

Full Changelog: 2.14.6...2.14.7

2.14.6

24 Oct 08:15
06c3ffb
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 2.14.5...2.14.6

2.14.5

24 Oct 08:15
1a598a0
Compare
Choose a tag to compare

Bug fixes

Other improvements

New Contributors

Full Changelog: 2.14.4...2.14.5

2.13.2

06 Sep 08:29
98b1bdd
Compare
Choose a tag to compare

Bug fixes

Full Changelog: 2.13.1...2.13.2

2.14.4

08 Aug 15:52
53d55f3
Compare
Choose a tag to compare

Bug fixes

Full Changelog: 2.14.3...2.14.4

2.14.3

03 Aug 10:31
33f736e
Compare
Choose a tag to compare

Bug fixes

Full Changelog: 2.14.2...2.14.3

2.14.2

31 Jul 06:39
Compare
Choose a tag to compare

Bug fixes

Full Changelog: 2.14.1...2.14.2

2.14.1

27 Jul 17:09
029956a
Compare
Choose a tag to compare

Bug fixes

Other improvements

Full Changelog: 2.14.0...2.14.1

2.14.0

24 Jul 15:54
88896a7
Compare
Choose a tag to compare

Important: caching

  • Datasets downloaded and cached using datasets>=2.14.0 may not be reloaded from cache using older version of datasets (and therefore re-downloaded).
  • Datasets that were already cached are still supported.
  • This affects datasets on Hugging Face without dataset scripts, e.g. made of pure parquet, csv, jsonl, etc. files.
  • This is due to the default configuration name for those datasets have been fixed (from "username--dataset_name" to "default") in #5331.

Dataset Configuration

  • Support for multiple configs via metadata yaml info by @polinaeterna in #5331

    • Configure your dataset using YAML at the top of your dataset card (docs here)
    • Choose which file goes into which split
      ---
      configs:
      - config_name: default
        data_files:
        - split: train
           path: data.csv
        - split: test
            path: holdout.csv
      ---
    • Define multiple dataset configurations
      ---
      configs:
      - config_name: main_data
        data_files: main_data.csv
      - config_name: additional_data
        data_files: additional_data.csv
      ---

Dataset Features

  • Support for multiple configs via metadata yaml info by @polinaeterna in #5331

    • push_to_hub() additional dataset configurations
    ds.push_to_hub("username/dataset_name", config_name="additional_data")
    # reload later
    ds = load_dataset("username/dataset_name", "additional_data")
  • Support returning dataframe in map transform by @mariosasko in #5995

What's Changed

New Contributors

Full Changelog: 2.13.1...2.14.0

2.13.1

22 Jun 18:31
682d21e
Compare
Choose a tag to compare

General improvements and bug fixes

Full Changelog: 2.13.0...2.13.1