Skip to content

Releases: huggingface/datasets

1.16.1

26 Nov 16:58
Compare
Choose a tag to compare

Bug fixes

1.16.0

26 Nov 14:22
Compare
Choose a tag to compare

Datasets Changes

Datasets Features

  • Push to hub capabilities for Dataset and DatasetDict by @LysandreJik in #3098:
    • upload your dataset to the Hugging face Hub with the push_to_hub() method !
    • See documentation here
  • 200+ datasets now support streaming:
  • Resolve data_files by split name automatically by @lhoestq in #3221
    • It takes into account the file names to know which file goes into which split
    • See documentation here
  • Filter method for batched=True by @thomasw21 in #3244
  • Adding with_rank arg to pass process rank to map by @TevenLeScao in #3314

Dataset Cards

Metrics Changes

  • New: OpenAI's pass@k code evaluation metric by @lvwerra in #2916
  • Update: BLEURT - options to use updated bleurt checkpoints by @jaehlee in #3235
  • Update: CER - update to support latest release by @mariosasko in #3252
  • Update: WER - update to the documentation by @wooters in #3278

Documentation

Additional improvements and bug fixes

Citation

Deprecations

Full Changelog: 1.15.1...1.16.0

1.15.1

02 Nov 21:47
Compare
Choose a tag to compare

Dependencies

1.15.0

02 Nov 21:22
Compare
Choose a tag to compare

Dataset Changes

Dataset Features

Dataset Cards

  • Fill in dataset card for NCBI disease dataset by @edugp in #3115

Metrics Changes

General improvements and bug fixes

1.14.0

19 Oct 16:46
Compare
Choose a tag to compare

Dataset changes

Dataset features

General improvements and bug fixes

1.13.3

15 Oct 15:50
Compare
Choose a tag to compare

Dataset changes

Bug fixes

1.13.2

14 Oct 16:02
Compare
Choose a tag to compare

Bug fixes

1.13.1

14 Oct 12:50
Compare
Choose a tag to compare

Bug fixes

1.13.0

13 Oct 15:15
Compare
Choose a tag to compare

Dataset changes

Metric changes

Dataset features

  • Use with TensorFlow:
  • Better support for ZIP files:
    • Support loading dataset from multiple zipped CSV data files #3021 (@albertvillanova)
    • Load private data files + use glob on ZIP archives for json/csv/etc. module inference #3041 (@lhoestq)
  • Streaming improvements:
    • Extend support for streaming datasets that use glob.glob #3015 (@albertvillanova)
    • Add remove_columns to IterableDataset #3030 (@cccntu)
    • All the above ZIP features also work in streaming mode
  • New utilities:
    • Add get_dataset_split_names() to get a dataset config's split names #2906 (@severo)
  • Replace script_version with revision #2933 (@albertvillanova)
    • The script_version parameter in load_dataset is now deprecated, in favor of revision
  • Experimental - Create Audio feature type #2324 (@albertvillanova):
    • It allows to automatically decode audio data (mp3, wav, flac, etc.) when examples are accessed

Dataset cards

Documentation

General improvements and bug fixes

Breaking changes:

  • Due to the big refactoring at #2986, the prepare_module function doesn't support the return_resolved_file_path and return_associated_base_path parameters. As an alternative, you may use the dataset_module_factory instead.

1.12.1

15 Sep 17:45
Compare
Choose a tag to compare

Bug fixes

  • Fix fsspec AbstractFileSystem access #2915 (@pierre-godard)
  • Fix unwanted tqdm bar when accessing examples #2920 (@lhoestq)
  • Fix conversion of multidim arrays in list to arrow #2922 (@lhoestq):
    • this fixes the ArrowInvalid: Can only convert 1-dimensional array values errors