Releases
1.3.0
Dataset Features
On-the-fly data transforms (#1795 )
ADD S3 support for downloading and uploading processed datasets (#1723 )
Allow loading dataset in-memory (#1792 )
Support future datasets (#1813 )
Enable/disable caching (#1703 )
Offline dataset loading (#1726 )
Datasets Hub Features
Dataset Changes
New: LJ Speech (#1878 )
New: Add Hindi Discourse Analysis Natural Language Inference Dataset (#1822 )
New: cord 19 (#1850 )
New: Tweet Eval Dataset (#1829 )
New: CIFAR-100 Dataset (#1812 )
New: SICK (#1804 )
New: BBC Hindi NLI Dataset (#1158 )
New: Freebase QA Dataset (#1814 )
New: Arabic sarcasm (#1798 )
New: Semantic Scholar Open Research Corpus (#1606 )
New: DuoRC Dataset (#1800 )
New: Aggregated dataset for the GEM benchmark (#1807 )
New: CC-News dataset of English language articles (#1323 )
New: irc disentangle (#1586 )
New: Narrative QA Manual (#1778 )
New: Universal Morphologies (#1174 )
New: SILICONE (#1761 )
New: Librispeech ASR (#1767 )
New: OSCAR (#1694 , #1868 , #1833 )
New: CANER Corpus (#1684 )
New: Arabic Speech Corpus (#1852 )
New: id_liputan6 (#1740 )
New: Stuctured Argument Extraction for Korean dataset (#1748 )
New: TurkCorpus (#1732 )
New: Hatexplain Dataset (#1716 )
New: adversarialQA (#1714 )
Update: Doc2dial - reading comprehension update to latest version (#1816 )
Update: OPUS Open Subtitles - add with metadata information (#1865 )
Update: SWDA - use all metadata features(#1799 )
Update: SWDA - add metadata and correct splits (#1749 )
Update: CommonGen - update citation information (#1787 )
Update: SciFact - update URL (#1780 )
Update: BrWaC - update features name (#1736 )
Update: TLC - update urls to be github links (#1737 )
Update: Ted Talks IWSLT - add new version: WIT3 (#1676 )
Fix: multi_woz_v22 - fix checksums (#1880 )
Fix: limit - fix url (#1861 )
Fix: WebNLG - fix test test + more field (#1739 )
Fix: PAWS-X - fix csv Dictreader splitting data on quotes (#1763 )
Fix: reuters - add missing "brief" entries (#1744 )
Fix: thainer: empty token bug (#1734 )
Fix: lst20: empty token bug (#1734 )
Metrics Changes
New: Word Error Metric (#1847 )
New: COMET (#1577 , #1753 )
Fix: bert_score - set version dependency (#1851 )
Metric Docs
Add metrics usage examples and tests (#1820 )
CLI Changes
[BREAKING] remove outdated commands (#1869 ):
remove outdated "datasets-cli upload_dataset" and "datasets-cli upload_metric"
instead, use the huggingface-hub CLI
Bug fixes
fix writing GPU Faiss index (#1862 )
update pyarrow import warning (#1782 )
Ignore definition line number of functions for caching (#1779 )
update saving and loading methods for faiss index so to accept path like objects (#1663 )
Print error message with filename when malformed CSV (#1826 )
Fix default tensors precision when format is set to PyTorch and TensorFlow (#1795 )
Refactoring
Refactoring: Create config module (#1848 )
Use a config id in the cache directory names for custom configs (#1754 )
Logging
Enable logging propagation and remove logging handler (#1845 )
You can鈥檛 perform that action at this time.