Skip to content

Commit

Permalink
Release: 1.4.0
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Mar 3, 2021
1 parent a100f35 commit f42658e
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = '1.3.0'
release = '1.4.0'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@

setup(
name="datasets",
version="1.3.0",
version="1.4.0",
description=DOCLINES[0],
long_description="\n".join(DOCLINES[2:]),
author="HuggingFace Inc.",
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.3.0"
__version__ = "1.4.0"

import pyarrow
from pyarrow import total_allocated_bytes
Expand Down

2 comments on commit f42658e

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.014368 / 0.011353 (0.003015) 0.012464 / 0.011008 (0.001455) 0.037901 / 0.038508 (-0.000607) 0.032718 / 0.023109 (0.009609) 0.177143 / 0.275898 (-0.098755) 0.196736 / 0.323480 (-0.126744) 0.005535 / 0.007986 (-0.002450) 0.003921 / 0.004328 (-0.000408) 0.006029 / 0.004250 (0.001779) 0.043480 / 0.037052 (0.006428) 0.179063 / 0.258489 (-0.079427) 0.204782 / 0.293841 (-0.089059) 0.128206 / 0.128546 (-0.000340) 0.097584 / 0.075646 (0.021938) 0.359682 / 0.419271 (-0.059589) 0.363764 / 0.043533 (0.320231) 0.177467 / 0.255139 (-0.077672) 0.193346 / 0.283200 (-0.089854) 1.419385 / 0.141683 (1.277702) 1.571944 / 1.452155 (0.119789) 1.592391 / 1.492716 (0.099675)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.035495 / 0.037411 (-0.001916) 0.016944 / 0.014526 (0.002419) 0.100698 / 0.176557 (-0.075858) 0.040638 / 0.737135 (-0.696497) 0.024818 / 0.296338 (-0.271520)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.197130 / 0.215209 (-0.018079) 1.977555 / 2.077655 (-0.100099) 1.072337 / 1.504120 (-0.431783) 0.978335 / 1.541195 (-0.562860) 1.004409 / 1.468490 (-0.464081) 5.210524 / 4.584777 (0.625747) 4.628255 / 3.745712 (0.882543) 7.004879 / 5.269862 (1.735017) 5.764835 / 4.565676 (1.199158) 0.512217 / 0.424275 (0.087942) 0.009199 / 0.007607 (0.001592) 0.223727 / 0.226044 (-0.002317) 2.345492 / 2.268929 (0.076563) 1.448922 / 55.444624 (-53.995703) 1.288103 / 6.876477 (-5.588374) 1.304998 / 2.142072 (-0.837075) 5.199291 / 4.805227 (0.394064) 4.067395 / 6.500664 (-2.433269) 6.199319 / 0.075469 (6.123850)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 9.079350 / 1.841788 (7.237563) 13.723880 / 8.074308 (5.649572) 15.075097 / 10.191392 (4.883705) 0.390307 / 0.680424 (-0.290117) 0.243225 / 0.534201 (-0.290976) 0.665956 / 0.579283 (0.086673) 0.483234 / 0.434364 (0.048870) 0.555095 / 0.540337 (0.014757) 1.312888 / 1.386936 (-0.074048)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.014262 / 0.011353 (0.002909) 0.011558 / 0.011008 (0.000550) 0.037815 / 0.038508 (-0.000693) 0.030955 / 0.023109 (0.007845) 0.284033 / 0.275898 (0.008135) 0.318059 / 0.323480 (-0.005420) 0.005605 / 0.007986 (-0.002381) 0.003734 / 0.004328 (-0.000595) 0.006063 / 0.004250 (0.001813) 0.049111 / 0.037052 (0.012058) 0.288343 / 0.258489 (0.029854) 0.330689 / 0.293841 (0.036848) 0.125809 / 0.128546 (-0.002737) 0.095079 / 0.075646 (0.019433) 0.358313 / 0.419271 (-0.060958) 0.363124 / 0.043533 (0.319591) 0.296999 / 0.255139 (0.041860) 0.309951 / 0.283200 (0.026752) 1.427997 / 0.141683 (1.286314) 1.597425 / 1.452155 (0.145271) 1.604926 / 1.492716 (0.112209)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.035880 / 0.037411 (-0.001531) 0.018451 / 0.014526 (0.003925) 0.024104 / 0.176557 (-0.152453) 0.042584 / 0.737135 (-0.694551) 0.025123 / 0.296338 (-0.271215)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.263418 / 0.215209 (0.048209) 2.632042 / 2.077655 (0.554388) 1.708522 / 1.504120 (0.204402) 1.591532 / 1.541195 (0.050337) 1.620056 / 1.468490 (0.151566) 5.047581 / 4.584777 (0.462804) 4.505921 / 3.745712 (0.760209) 6.967933 / 5.269862 (1.698072) 5.570123 / 4.565676 (1.004446) 0.506403 / 0.424275 (0.082128) 0.009211 / 0.007607 (0.001604) 0.286130 / 0.226044 (0.060086) 2.892516 / 2.268929 (0.623588) 1.989697 / 55.444624 (-53.454927) 1.837293 / 6.876477 (-5.039184) 1.844230 / 2.142072 (-0.297843) 5.087698 / 4.805227 (0.282471) 3.941674 / 6.500664 (-2.558990) 5.749353 / 0.075469 (5.673884)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 9.195149 / 1.841788 (7.353362) 13.977753 / 8.074308 (5.903445) 15.335800 / 10.191392 (5.144408) 0.906784 / 0.680424 (0.226360) 0.484920 / 0.534201 (-0.049281) 0.589589 / 0.579283 (0.010306) 0.453816 / 0.434364 (0.019452) 0.522542 / 0.540337 (-0.017796) 1.273061 / 1.386936 (-0.113875)

CML watermark

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017408 / 0.011353 (0.006055) 0.016098 / 0.011008 (0.005090) 0.046783 / 0.038508 (0.008275) 0.031555 / 0.023109 (0.008446) 0.213600 / 0.275898 (-0.062298) 0.229314 / 0.323480 (-0.094166) 0.006642 / 0.007986 (-0.001344) 0.006642 / 0.004328 (0.002313) 0.006495 / 0.004250 (0.002244) 0.046606 / 0.037052 (0.009553) 0.209862 / 0.258489 (-0.048628) 0.235732 / 0.293841 (-0.058108) 0.157681 / 0.128546 (0.029134) 0.132304 / 0.075646 (0.056657) 0.436558 / 0.419271 (0.017286) 0.628743 / 0.043533 (0.585210) 0.209441 / 0.255139 (-0.045698) 0.226247 / 0.283200 (-0.056953) 7.973322 / 0.141683 (7.831639) 1.873112 / 1.452155 (0.420958) 1.915457 / 1.492716 (0.422741)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041522 / 0.037411 (0.004111) 0.019637 / 0.014526 (0.005111) 0.032355 / 0.176557 (-0.144202) 0.044291 / 0.737135 (-0.692845) 0.027365 / 0.296338 (-0.268974)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.279049 / 0.215209 (0.063840) 2.725308 / 2.077655 (0.647653) 1.387057 / 1.504120 (-0.117063) 1.182752 / 1.541195 (-0.358443) 1.220041 / 1.468490 (-0.248449) 7.486395 / 4.584777 (2.901618) 6.697237 / 3.745712 (2.951525) 9.315022 / 5.269862 (4.045161) 8.152964 / 4.565676 (3.587288) 0.759421 / 0.424275 (0.335146) 0.011631 / 0.007607 (0.004023) 0.355732 / 0.226044 (0.129688) 3.413985 / 2.268929 (1.145057) 1.999796 / 55.444624 (-53.444829) 1.667004 / 6.876477 (-5.209473) 1.759437 / 2.142072 (-0.382635) 7.596611 / 4.805227 (2.791384) 8.026950 / 6.500664 (1.526286) 10.400243 / 0.075469 (10.324774)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.632116 / 1.841788 (9.790329) 15.344344 / 8.074308 (7.270036) 22.951039 / 10.191392 (12.759647) 0.488525 / 0.680424 (-0.191899) 0.320467 / 0.534201 (-0.213734) 0.823267 / 0.579283 (0.243984) 0.663015 / 0.434364 (0.228651) 0.740048 / 0.540337 (0.199711) 1.620535 / 1.386936 (0.233599)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019076 / 0.011353 (0.007723) 0.016624 / 0.011008 (0.005616) 0.067131 / 0.038508 (0.028623) 0.034267 / 0.023109 (0.011158) 0.343957 / 0.275898 (0.068059) 0.387407 / 0.323480 (0.063928) 0.006375 / 0.007986 (-0.001610) 0.004988 / 0.004328 (0.000660) 0.007214 / 0.004250 (0.002963) 0.055875 / 0.037052 (0.018822) 0.347444 / 0.258489 (0.088955) 0.428372 / 0.293841 (0.134531) 0.166460 / 0.128546 (0.037914) 0.135190 / 0.075646 (0.059543) 0.443391 / 0.419271 (0.024119) 0.434624 / 0.043533 (0.391091) 0.365115 / 0.255139 (0.109976) 0.387398 / 0.283200 (0.104198) 1.729037 / 0.141683 (1.587354) 1.998806 / 1.452155 (0.546652) 1.922735 / 1.492716 (0.430019)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.048673 / 0.037411 (0.011262) 0.021514 / 0.014526 (0.006988) 0.028287 / 0.176557 (-0.148270) 0.050696 / 0.737135 (-0.686439) 0.044734 / 0.296338 (-0.251605)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.368532 / 0.215209 (0.153323) 3.822484 / 2.077655 (1.744829) 2.344458 / 1.504120 (0.840338) 2.299936 / 1.541195 (0.758741) 2.350590 / 1.468490 (0.882100) 7.184412 / 4.584777 (2.599635) 6.178546 / 3.745712 (2.432834) 8.546458 / 5.269862 (3.276597) 7.489643 / 4.565676 (2.923966) 0.702855 / 0.424275 (0.278580) 0.011441 / 0.007607 (0.003834) 0.387294 / 0.226044 (0.161250) 3.852226 / 2.268929 (1.583298) 2.646707 / 55.444624 (-52.797918) 2.313567 / 6.876477 (-4.562910) 2.314679 / 2.142072 (0.172606) 6.902167 / 4.805227 (2.096940) 5.320841 / 6.500664 (-1.179823) 8.891267 / 0.075469 (8.815798)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 12.800731 / 1.841788 (10.958943) 16.729194 / 8.074308 (8.654886) 25.095224 / 10.191392 (14.903832) 1.166399 / 0.680424 (0.485975) 0.643714 / 0.534201 (0.109513) 0.922139 / 0.579283 (0.342856) 0.705166 / 0.434364 (0.270802) 0.757762 / 0.540337 (0.217424) 1.828352 / 1.386936 (0.441416)

CML watermark

Please sign in to comment.