Skip to content

Commit

Permalink
Release: 1.5.0
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Mar 18, 2021
1 parent a185d47 commit f256b77
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = '1.4.1'
release = '1.5.0'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@

setup(
name="datasets",
version="1.4.1.dev0",
version="1.5.0",
description=DOCLINES[0],
long_description="\n".join(DOCLINES[2:]),
author="HuggingFace Inc.",
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.4.1"
__version__ = "1.5.0"

import pyarrow
from pyarrow import total_allocated_bytes
Expand Down

2 comments on commit f256b77

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017441 / 0.011353 (0.006088) 0.016244 / 0.011008 (0.005236) 0.046635 / 0.038508 (0.008127) 0.036863 / 0.023109 (0.013754) 0.226879 / 0.275898 (-0.049019) 0.252251 / 0.323480 (-0.071229) 0.006444 / 0.007986 (-0.001542) 0.005036 / 0.004328 (0.000707) 0.008392 / 0.004250 (0.004142) 0.049461 / 0.037052 (0.012409) 0.229585 / 0.258489 (-0.028905) 0.253922 / 0.293841 (-0.039919) 0.144401 / 0.128546 (0.015855) 0.131000 / 0.075646 (0.055353) 0.465303 / 0.419271 (0.046031) 0.424683 / 0.043533 (0.381150) 0.221640 / 0.255139 (-0.033499) 0.235043 / 0.283200 (-0.048156) 2.322296 / 0.141683 (2.180613) 1.890627 / 1.452155 (0.438472) 1.935057 / 1.492716 (0.442341)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.042515 / 0.037411 (0.005103) 0.019948 / 0.014526 (0.005422) 0.027413 / 0.176557 (-0.149143) 0.048725 / 0.737135 (-0.688410) 0.029364 / 0.296338 (-0.266975)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.242976 / 0.215209 (0.027767) 2.302850 / 2.077655 (0.225195) 1.274719 / 1.504120 (-0.229401) 1.159335 / 1.541195 (-0.381860) 1.203397 / 1.468490 (-0.265093) 6.695725 / 4.584777 (2.110948) 5.821366 / 3.745712 (2.075654) 8.383977 / 5.269862 (3.114115) 7.275619 / 4.565676 (2.709943) 0.689806 / 0.424275 (0.265530) 0.011300 / 0.007607 (0.003693) 0.264833 / 0.226044 (0.038789) 2.867347 / 2.268929 (0.598418) 1.841491 / 55.444624 (-53.603134) 1.628527 / 6.876477 (-5.247949) 1.710560 / 2.142072 (-0.431513) 6.883803 / 4.805227 (2.078575) 5.491096 / 6.500664 (-1.009568) 8.902285 / 0.075469 (8.826816)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.362385 / 1.841788 (8.520598) 13.757033 / 8.074308 (5.682725) 17.348106 / 10.191392 (7.156714) 0.474508 / 0.680424 (-0.205916) 0.285801 / 0.534201 (-0.248400) 0.790712 / 0.579283 (0.211429) 0.566431 / 0.434364 (0.132067) 0.673501 / 0.540337 (0.133163) 1.552444 / 1.386936 (0.165508)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017147 / 0.011353 (0.005794) 0.015047 / 0.011008 (0.004038) 0.048400 / 0.038508 (0.009892) 0.037439 / 0.023109 (0.014330) 0.372658 / 0.275898 (0.096760) 0.431466 / 0.323480 (0.107986) 0.006397 / 0.007986 (-0.001589) 0.004600 / 0.004328 (0.000272) 0.007062 / 0.004250 (0.002812) 0.057604 / 0.037052 (0.020552) 0.385806 / 0.258489 (0.127317) 0.465633 / 0.293841 (0.171792) 0.139035 / 0.128546 (0.010488) 0.125161 / 0.075646 (0.049515) 0.461388 / 0.419271 (0.042116) 0.412086 / 0.043533 (0.368553) 0.380535 / 0.255139 (0.125396) 0.383317 / 0.283200 (0.100117) 1.696655 / 0.141683 (1.554972) 2.059374 / 1.452155 (0.607220) 2.005560 / 1.492716 (0.512844)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.044460 / 0.037411 (0.007048) 0.023995 / 0.014526 (0.009469) 0.033987 / 0.176557 (-0.142569) 0.054570 / 0.737135 (-0.682565) 0.039077 / 0.296338 (-0.257262)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.288992 / 0.215209 (0.073783) 2.891380 / 2.077655 (0.813725) 1.944298 / 1.504120 (0.440178) 1.832229 / 1.541195 (0.291035) 1.891773 / 1.468490 (0.423283) 6.381334 / 4.584777 (1.796557) 5.518124 / 3.745712 (1.772412) 8.480123 / 5.269862 (3.210261) 7.128432 / 4.565676 (2.562755) 0.627733 / 0.424275 (0.203458) 0.010854 / 0.007607 (0.003247) 0.317149 / 0.226044 (0.091104) 3.216170 / 2.268929 (0.947242) 2.310757 / 55.444624 (-53.133868) 2.149490 / 6.876477 (-4.726986) 2.222967 / 2.142072 (0.080895) 6.377808 / 4.805227 (1.572581) 4.207376 / 6.500664 (-2.293288) 7.824507 / 0.075469 (7.749038)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.747819 / 1.841788 (8.906031) 14.711504 / 8.074308 (6.637196) 18.205554 / 10.191392 (8.014162) 1.153751 / 0.680424 (0.473328) 0.585973 / 0.534201 (0.051772) 0.782580 / 0.579283 (0.203297) 0.548813 / 0.434364 (0.114449) 0.643216 / 0.540337 (0.102878) 1.578554 / 1.386936 (0.191618)

CML watermark

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.023137 / 0.011353 (0.011784) 0.017959 / 0.011008 (0.006951) 0.051609 / 0.038508 (0.013101) 0.036127 / 0.023109 (0.013018) 0.240322 / 0.275898 (-0.035577) 0.258136 / 0.323480 (-0.065344) 0.008197 / 0.007986 (0.000211) 0.005619 / 0.004328 (0.001290) 0.007170 / 0.004250 (0.002920) 0.050588 / 0.037052 (0.013535) 0.240452 / 0.258489 (-0.018037) 0.294896 / 0.293841 (0.001055) 0.181345 / 0.128546 (0.052799) 0.143021 / 0.075646 (0.067375) 0.473961 / 0.419271 (0.054689) 0.684890 / 0.043533 (0.641357) 0.238067 / 0.255139 (-0.017072) 0.250504 / 0.283200 (-0.032696) 2.540961 / 0.141683 (2.399278) 2.045740 / 1.452155 (0.593586) 2.073650 / 1.492716 (0.580933)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041551 / 0.037411 (0.004140) 0.024262 / 0.014526 (0.009736) 0.030640 / 0.176557 (-0.145916) 0.050868 / 0.737135 (-0.686267) 0.031842 / 0.296338 (-0.264497)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.280366 / 0.215209 (0.065157) 2.783830 / 2.077655 (0.706175) 1.402595 / 1.504120 (-0.101525) 1.228371 / 1.541195 (-0.312824) 1.339319 / 1.468490 (-0.129171) 7.997078 / 4.584777 (3.412301) 6.942062 / 3.745712 (3.196350) 10.007577 / 5.269862 (4.737716) 8.580084 / 4.565676 (4.014408) 0.803487 / 0.424275 (0.379212) 0.011511 / 0.007607 (0.003904) 0.311900 / 0.226044 (0.085856) 3.293310 / 2.268929 (1.024381) 2.104213 / 55.444624 (-53.340411) 1.751000 / 6.876477 (-5.125476) 1.776362 / 2.142072 (-0.365711) 7.773490 / 4.805227 (2.968263) 6.710117 / 6.500664 (0.209453) 9.118060 / 0.075469 (9.042591)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 12.241462 / 1.841788 (10.399674) 13.791567 / 8.074308 (5.717259) 24.724873 / 10.191392 (14.533481) 0.525623 / 0.680424 (-0.154801) 0.337955 / 0.534201 (-0.196246) 0.902459 / 0.579283 (0.323176) 0.718809 / 0.434364 (0.284445) 0.811329 / 0.540337 (0.270992) 1.825797 / 1.386936 (0.438861)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019092 / 0.011353 (0.007740) 0.016327 / 0.011008 (0.005318) 0.050094 / 0.038508 (0.011586) 0.037093 / 0.023109 (0.013984) 0.354455 / 0.275898 (0.078557) 0.380765 / 0.323480 (0.057285) 0.006964 / 0.007986 (-0.001022) 0.005542 / 0.004328 (0.001213) 0.006639 / 0.004250 (0.002389) 0.056019 / 0.037052 (0.018967) 0.349790 / 0.258489 (0.091301) 0.373281 / 0.293841 (0.079440) 0.169524 / 0.128546 (0.040978) 0.146647 / 0.075646 (0.071001) 0.469087 / 0.419271 (0.049816) 0.460551 / 0.043533 (0.417018) 0.345132 / 0.255139 (0.089993) 0.373591 / 0.283200 (0.090392) 1.859800 / 0.141683 (1.718117) 1.858526 / 1.452155 (0.406371) 1.987729 / 1.492716 (0.495013)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.042090 / 0.037411 (0.004679) 0.022966 / 0.014526 (0.008441) 0.043529 / 0.176557 (-0.133027) 0.050168 / 0.737135 (-0.686968) 0.048879 / 0.296338 (-0.247459)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.379322 / 0.215209 (0.164113) 3.737409 / 2.077655 (1.659754) 2.369986 / 1.504120 (0.865866) 2.142526 / 1.541195 (0.601331) 2.222832 / 1.468490 (0.754342) 7.563294 / 4.584777 (2.978517) 6.689103 / 3.745712 (2.943391) 9.715800 / 5.269862 (4.445939) 8.313974 / 4.565676 (3.748297) 0.780373 / 0.424275 (0.356098) 0.012993 / 0.007607 (0.005386) 0.432965 / 0.226044 (0.206921) 4.210885 / 2.268929 (1.941956) 2.779345 / 55.444624 (-52.665279) 2.579091 / 6.876477 (-4.297386) 2.578747 / 2.142072 (0.436674) 7.928057 / 4.805227 (3.122830) 6.976450 / 6.500664 (0.475786) 7.854473 / 0.075469 (7.779004)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 12.539124 / 1.841788 (10.697336) 14.048210 / 8.074308 (5.973902) 22.937573 / 10.191392 (12.746181) 0.782416 / 0.680424 (0.101992) 0.586392 / 0.534201 (0.052192) 0.833018 / 0.579283 (0.253735) 0.683232 / 0.434364 (0.248868) 0.772370 / 0.540337 (0.232033) 1.700876 / 1.386936 (0.313940)

CML watermark

Please sign in to comment.