Skip to content

Commit

Permalink
Release: 1.3.0
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Feb 15, 2021
1 parent 2bcfc91 commit ef633da
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = '1.2.1'
release = '1.3.0'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@

setup(
name="datasets",
version="1.2.1",
version="1.3.0",
description=DOCLINES[0],
long_description="\n".join(DOCLINES[2:]),
author="HuggingFace Inc.",
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.2.1"
__version__ = "1.3.0"

import pyarrow
from pyarrow import total_allocated_bytes
Expand Down

2 comments on commit ef633da

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019949 / 0.011353 (0.008596) 0.017345 / 0.011008 (0.006337) 0.050023 / 0.038508 (0.011515) 0.034883 / 0.023109 (0.011774) 0.229830 / 0.275898 (-0.046068) 0.255136 / 0.323480 (-0.068344) 0.010065 / 0.007986 (0.002079) 0.005199 / 0.004328 (0.000871) 0.008177 / 0.004250 (0.003927) 0.049014 / 0.037052 (0.011961) 0.236284 / 0.258489 (-0.022205) 0.261142 / 0.293841 (-0.032699) 0.165642 / 0.128546 (0.037096) 0.141306 / 0.075646 (0.065660) 0.477240 / 0.419271 (0.057968) 0.461068 / 0.043533 (0.417535) 0.223748 / 0.255139 (-0.031391) 0.233809 / 0.283200 (-0.049390) 1.804336 / 0.141683 (1.662653) 1.848932 / 1.452155 (0.396777) 2.006523 / 1.492716 (0.513807)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.046119 / 0.037411 (0.008708) 0.020566 / 0.014526 (0.006040) 0.027865 / 0.176557 (-0.148691) 0.050434 / 0.737135 (-0.686701) 0.034514 / 0.296338 (-0.261825)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.254163 / 0.215209 (0.038954) 2.790341 / 2.077655 (0.712686) 1.419527 / 1.504120 (-0.084593) 1.199908 / 1.541195 (-0.341287) 1.243872 / 1.468490 (-0.224618) 7.508085 / 4.584777 (2.923308) 6.672756 / 3.745712 (2.927044) 9.173076 / 5.269862 (3.903215) 8.128024 / 4.565676 (3.562347) 0.760609 / 0.424275 (0.336334) 0.012099 / 0.007607 (0.004492) 0.327569 / 0.226044 (0.101525) 3.338411 / 2.268929 (1.069482) 1.991717 / 55.444624 (-53.452908) 1.602950 / 6.876477 (-5.273526) 1.686213 / 2.142072 (-0.455860) 7.402516 / 4.805227 (2.597289) 6.532465 / 6.500664 (0.031801) 8.592482 / 0.075469 (8.517013)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.216032 / 1.841788 (9.374244) 14.462852 / 8.074308 (6.388544) 24.929433 / 10.191392 (14.738041) 0.498865 / 0.680424 (-0.181558) 0.312368 / 0.534201 (-0.221833) 0.993269 / 0.579283 (0.413986) 0.681252 / 0.434364 (0.246889) 0.770792 / 0.540337 (0.230454) 1.480725 / 1.386936 (0.093789)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019496 / 0.011353 (0.008143) 0.015292 / 0.011008 (0.004284) 0.045385 / 0.038508 (0.006877) 0.034763 / 0.023109 (0.011654) 0.301736 / 0.275898 (0.025838) 0.398615 / 0.323480 (0.075135) 0.006170 / 0.007986 (-0.001815) 0.004924 / 0.004328 (0.000596) 0.006154 / 0.004250 (0.001903) 0.048566 / 0.037052 (0.011513) 0.338457 / 0.258489 (0.079968) 0.389982 / 0.293841 (0.096141) 0.171216 / 0.128546 (0.042670) 0.123009 / 0.075646 (0.047362) 0.429963 / 0.419271 (0.010691) 0.447575 / 0.043533 (0.404042) 0.289054 / 0.255139 (0.033915) 0.347451 / 0.283200 (0.064251) 1.822746 / 0.141683 (1.681063) 1.756375 / 1.452155 (0.304220) 1.927460 / 1.492716 (0.434743)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.043009 / 0.037411 (0.005597) 0.022868 / 0.014526 (0.008343) 0.028771 / 0.176557 (-0.147785) 0.050191 / 0.737135 (-0.686944) 0.038342 / 0.296338 (-0.257997)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.357781 / 0.215209 (0.142572) 3.518525 / 2.077655 (1.440870) 1.995800 / 1.504120 (0.491680) 1.723136 / 1.541195 (0.181941) 1.839343 / 1.468490 (0.370853) 7.202282 / 4.584777 (2.617505) 6.536282 / 3.745712 (2.790570) 9.221120 / 5.269862 (3.951259) 7.767144 / 4.565676 (3.201467) 0.744456 / 0.424275 (0.320181) 0.011673 / 0.007607 (0.004066) 0.416478 / 0.226044 (0.190433) 4.223819 / 2.268929 (1.954890) 2.634325 / 55.444624 (-52.810299) 2.159565 / 6.876477 (-4.716911) 2.113190 / 2.142072 (-0.028882) 7.264958 / 4.805227 (2.459731) 5.003957 / 6.500664 (-1.496707) 5.342462 / 0.075469 (5.266993)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.820260 / 1.841788 (9.978473) 15.613286 / 8.074308 (7.538978) 23.156926 / 10.191392 (12.965534) 0.935406 / 0.680424 (0.254982) 0.622612 / 0.534201 (0.088411) 0.787531 / 0.579283 (0.208247) 0.623372 / 0.434364 (0.189009) 0.725173 / 0.540337 (0.184835) 1.549181 / 1.386936 (0.162245)

CML watermark

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017898 / 0.011353 (0.006545) 0.016397 / 0.011008 (0.005389) 0.045385 / 0.038508 (0.006877) 0.033546 / 0.023109 (0.010437) 0.215877 / 0.275898 (-0.060021) 0.242084 / 0.323480 (-0.081396) 0.008739 / 0.007986 (0.000753) 0.004764 / 0.004328 (0.000435) 0.006742 / 0.004250 (0.002492) 0.048200 / 0.037052 (0.011148) 0.231986 / 0.258489 (-0.026503) 0.256165 / 0.293841 (-0.037676) 0.158609 / 0.128546 (0.030063) 0.134162 / 0.075646 (0.058516) 0.437558 / 0.419271 (0.018286) 0.449347 / 0.043533 (0.405814) 0.221238 / 0.255139 (-0.033901) 0.249512 / 0.283200 (-0.033688) 1.811942 / 0.141683 (1.670259) 1.847658 / 1.452155 (0.395503) 1.931587 / 1.492716 (0.438871)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.042345 / 0.037411 (0.004934) 0.020775 / 0.014526 (0.006249) 0.088459 / 0.176557 (-0.088097) 0.048125 / 0.737135 (-0.689010) 0.030761 / 0.296338 (-0.265578)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.301328 / 0.215209 (0.086119) 2.999297 / 2.077655 (0.921642) 1.398605 / 1.504120 (-0.105515) 1.210151 / 1.541195 (-0.331044) 1.235344 / 1.468490 (-0.233146) 7.315245 / 4.584777 (2.730468) 6.578604 / 3.745712 (2.832892) 9.064396 / 5.269862 (3.794535) 7.937175 / 4.565676 (3.371498) 0.723150 / 0.424275 (0.298875) 0.011611 / 0.007607 (0.004004) 0.339609 / 0.226044 (0.113565) 3.581674 / 2.268929 (1.312745) 1.942498 / 55.444624 (-53.502126) 1.634655 / 6.876477 (-5.241822) 1.648645 / 2.142072 (-0.493427) 7.491026 / 4.805227 (2.685799) 6.845047 / 6.500664 (0.344383) 9.624303 / 0.075469 (9.548834)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.897480 / 1.841788 (10.055693) 15.170692 / 8.074308 (7.096384) 24.609455 / 10.191392 (14.418063) 0.815791 / 0.680424 (0.135367) 0.323032 / 0.534201 (-0.211169) 0.778658 / 0.579283 (0.199375) 0.623137 / 0.434364 (0.188773) 0.713765 / 0.540337 (0.173428) 1.621621 / 1.386936 (0.234685)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019426 / 0.011353 (0.008073) 0.015288 / 0.011008 (0.004280) 0.045411 / 0.038508 (0.006902) 0.037313 / 0.023109 (0.014204) 0.370662 / 0.275898 (0.094764) 0.412459 / 0.323480 (0.088979) 0.005875 / 0.007986 (-0.002111) 0.004826 / 0.004328 (0.000497) 0.007148 / 0.004250 (0.002898) 0.046764 / 0.037052 (0.009712) 0.378878 / 0.258489 (0.120389) 0.421323 / 0.293841 (0.127482) 0.156279 / 0.128546 (0.027733) 0.123898 / 0.075646 (0.048251) 0.434413 / 0.419271 (0.015141) 0.430196 / 0.043533 (0.386663) 0.386343 / 0.255139 (0.131204) 0.402835 / 0.283200 (0.119636) 1.718213 / 0.141683 (1.576530) 1.933982 / 1.452155 (0.481827) 1.961822 / 1.492716 (0.469106)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.045524 / 0.037411 (0.008112) 0.021506 / 0.014526 (0.006980) 0.027273 / 0.176557 (-0.149284) 0.048325 / 0.737135 (-0.688810) 0.032765 / 0.296338 (-0.263574)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.343856 / 0.215209 (0.128647) 3.487789 / 2.077655 (1.410134) 2.176494 / 1.504120 (0.672374) 2.008339 / 1.541195 (0.467144) 2.036302 / 1.468490 (0.567812) 6.844747 / 4.584777 (2.259970) 5.861552 / 3.745712 (2.115840) 8.544176 / 5.269862 (3.274314) 7.470466 / 4.565676 (2.904789) 0.684659 / 0.424275 (0.260384) 0.010645 / 0.007607 (0.003038) 0.393606 / 0.226044 (0.167562) 3.938515 / 2.268929 (1.669587) 2.661777 / 55.444624 (-52.782847) 2.392768 / 6.876477 (-4.483709) 2.380001 / 2.142072 (0.237929) 6.875844 / 4.805227 (2.070617) 5.285155 / 6.500664 (-1.215509) 10.637364 / 0.075469 (10.561895)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 12.315626 / 1.841788 (10.473839) 16.362045 / 8.074308 (8.287736) 24.485492 / 10.191392 (14.294100) 0.821775 / 0.680424 (0.141351) 0.623871 / 0.534201 (0.089670) 0.771316 / 0.579283 (0.192033) 0.622454 / 0.434364 (0.188090) 0.709865 / 0.540337 (0.169527) 1.638915 / 1.386936 (0.251979)

CML watermark

Please sign in to comment.