Skip to content

Commit

Permalink
Release: 1.15.1
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Nov 2, 2021
1 parent 433d907 commit 0181006
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
# The short X.Y version
version = ""
# The full version, including alpha/beta/rc tags
release = "1.15.0"
release = "1.15.1"


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@

setup(
name="datasets",
version="1.15.1.dev0", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
version="1.15.1", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
description="HuggingFace community-driven open-source library of datasets",
long_description=open("README.md", "r", encoding="utf-8").read(),
long_description_content_type="text/markdown",
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.15.0.dev0"
__version__ = "1.15.1"

import pyarrow
from packaging import version as _version
Expand Down

2 comments on commit 0181006

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.062317 / 0.011353 (0.050964) 0.004463 / 0.011008 (-0.006546) 0.031828 / 0.038508 (-0.006680) 0.031385 / 0.023109 (0.008276) 0.347344 / 0.275898 (0.071445) 0.340984 / 0.323480 (0.017504) 0.079736 / 0.007986 (0.071750) 0.005760 / 0.004328 (0.001432) 0.010782 / 0.004250 (0.006532) 0.037579 / 0.037052 (0.000527) 0.302889 / 0.258489 (0.044400) 0.378729 / 0.293841 (0.084888) 0.085415 / 0.128546 (-0.043131) 0.014006 / 0.075646 (-0.061641) 0.284575 / 0.419271 (-0.134697) 0.048835 / 0.043533 (0.005302) 0.356553 / 0.255139 (0.101414) 0.390720 / 0.283200 (0.107520) 0.076093 / 0.141683 (-0.065590) 1.847463 / 1.452155 (0.395308) 2.051831 / 1.492716 (0.559114)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.250842 / 0.018006 (0.232836) 0.536080 / 0.000490 (0.535590) 0.006621 / 0.000200 (0.006421) 0.000482 / 0.000054 (0.000427)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.037441 / 0.037411 (0.000030) 0.023088 / 0.014526 (0.008562) 0.022638 / 0.176557 (-0.153919) 0.183044 / 0.737135 (-0.554092) 0.025585 / 0.296338 (-0.270754)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.567465 / 0.215209 (0.352256) 5.751780 / 2.077655 (3.674125) 2.239883 / 1.504120 (0.735763) 1.823094 / 1.541195 (0.281900) 1.836839 / 1.468490 (0.368349) 0.675195 / 4.584777 (-3.909582) 6.462978 / 3.745712 (2.717266) 4.479985 / 5.269862 (-0.789876) 1.371473 / 4.565676 (-3.194204) 0.092928 / 0.424275 (-0.331347) 0.014296 / 0.007607 (0.006689) 0.775567 / 0.226044 (0.549523) 7.176211 / 2.268929 (4.907282) 2.902541 / 55.444624 (-52.542083) 2.149365 / 6.876477 (-4.727111) 2.228122 / 2.142072 (0.086049) 0.874489 / 4.805227 (-3.930738) 0.154348 / 6.500664 (-6.346316) 0.055392 / 0.075469 (-0.020077)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.798096 / 1.841788 (-0.043691) 14.092543 / 8.074308 (6.018235) 40.300551 / 10.191392 (30.109159) 0.850435 / 0.680424 (0.170011) 0.580987 / 0.534201 (0.046786) 0.439446 / 0.579283 (-0.139837) 0.668020 / 0.434364 (0.233656) 0.306495 / 0.540337 (-0.233842) 0.326913 / 1.386936 (-1.060023)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.073920 / 0.011353 (0.062568) 0.003826 / 0.011008 (-0.007183) 0.037246 / 0.038508 (-0.001262) 0.038762 / 0.023109 (0.015652) 0.322433 / 0.275898 (0.046535) 0.433940 / 0.323480 (0.110460) 0.075723 / 0.007986 (0.067737) 0.005295 / 0.004328 (0.000966) 0.007125 / 0.004250 (0.002875) 0.033554 / 0.037052 (-0.003498) 0.315612 / 0.258489 (0.057123) 0.380161 / 0.293841 (0.086320) 0.096207 / 0.128546 (-0.032339) 0.011311 / 0.075646 (-0.064335) 0.316730 / 0.419271 (-0.102542) 0.053513 / 0.043533 (0.009980) 0.309842 / 0.255139 (0.054703) 0.335329 / 0.283200 (0.052130) 0.080374 / 0.141683 (-0.061308) 1.849910 / 1.452155 (0.397756) 1.949360 / 1.492716 (0.456643)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.365491 / 0.018006 (0.347485) 0.532465 / 0.000490 (0.531975) 0.053477 / 0.000200 (0.053277) 0.001523 / 0.000054 (0.001468)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.033224 / 0.037411 (-0.004188) 0.023958 / 0.014526 (0.009433) 0.023819 / 0.176557 (-0.152738) 0.186443 / 0.737135 (-0.550692) 0.025241 / 0.296338 (-0.271098)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.560800 / 0.215209 (0.345591) 5.784196 / 2.077655 (3.706542) 2.180583 / 1.504120 (0.676463) 1.775744 / 1.541195 (0.234550) 1.699820 / 1.468490 (0.231329) 0.621947 / 4.584777 (-3.962830) 6.689711 / 3.745712 (2.943999) 5.042314 / 5.269862 (-0.227547) 1.487598 / 4.565676 (-3.078078) 0.085878 / 0.424275 (-0.338397) 0.014137 / 0.007607 (0.006530) 0.838964 / 0.226044 (0.612919) 7.748086 / 2.268929 (5.479157) 2.998174 / 55.444624 (-52.446450) 2.269611 / 6.876477 (-4.606866) 2.153767 / 2.142072 (0.011695) 0.772826 / 4.805227 (-4.032402) 0.190500 / 6.500664 (-6.310164) 0.066076 / 0.075469 (-0.009393)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.804899 / 1.841788 (-0.036889) 14.632016 / 8.074308 (6.557708) 41.395170 / 10.191392 (31.203778) 0.930690 / 0.680424 (0.250266) 0.741621 / 0.534201 (0.207420) 0.422020 / 0.579283 (-0.157263) 0.768165 / 0.434364 (0.333801) 0.340128 / 0.540337 (-0.200209) 0.312670 / 1.386936 (-1.074266)

CML watermark

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.099527 / 0.011353 (0.088174) 0.004559 / 0.011008 (-0.006449) 0.037949 / 0.038508 (-0.000559) 0.037595 / 0.023109 (0.014485) 0.346599 / 0.275898 (0.070700) 0.385928 / 0.323480 (0.062448) 0.083973 / 0.007986 (0.075988) 0.004720 / 0.004328 (0.000392) 0.011197 / 0.004250 (0.006947) 0.045852 / 0.037052 (0.008800) 0.344786 / 0.258489 (0.086297) 0.398555 / 0.293841 (0.104714) 0.105776 / 0.128546 (-0.022771) 0.014359 / 0.075646 (-0.061287) 0.315045 / 0.419271 (-0.104226) 0.058056 / 0.043533 (0.014523) 0.354932 / 0.255139 (0.099793) 0.384169 / 0.283200 (0.100970) 0.085839 / 0.141683 (-0.055844) 2.091798 / 1.452155 (0.639643) 2.115069 / 1.492716 (0.622352)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.189358 / 0.018006 (0.171352) 0.472358 / 0.000490 (0.471868) 0.007100 / 0.000200 (0.006900) 0.000330 / 0.000054 (0.000275)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.042064 / 0.037411 (0.004652) 0.026361 / 0.014526 (0.011836) 0.028060 / 0.176557 (-0.148497) 0.234981 / 0.737135 (-0.502154) 0.030953 / 0.296338 (-0.265386)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.642184 / 0.215209 (0.426974) 6.404942 / 2.077655 (4.327288) 2.445214 / 1.504120 (0.941094) 2.036201 / 1.541195 (0.495006) 2.058467 / 1.468490 (0.589977) 0.725801 / 4.584777 (-3.858976) 6.400096 / 3.745712 (2.654383) 3.112568 / 5.269862 (-2.157293) 1.498699 / 4.565676 (-3.066978) 0.080343 / 0.424275 (-0.343932) 0.013005 / 0.007607 (0.005398) 0.788703 / 0.226044 (0.562658) 7.920421 / 2.268929 (5.651492) 3.058470 / 55.444624 (-52.386154) 2.359551 / 6.876477 (-4.516926) 2.441784 / 2.142072 (0.299711) 0.899637 / 4.805227 (-3.905591) 0.177262 / 6.500664 (-6.323402) 0.066317 / 0.075469 (-0.009152)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.915674 / 1.841788 (0.073886) 15.006828 / 8.074308 (6.932520) 43.363195 / 10.191392 (33.171803) 0.927796 / 0.680424 (0.247372) 0.648468 / 0.534201 (0.114267) 0.469970 / 0.579283 (-0.109313) 0.683779 / 0.434364 (0.249415) 0.342115 / 0.540337 (-0.198222) 0.369773 / 1.386936 (-1.017163)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.074864 / 0.011353 (0.063511) 0.004621 / 0.011008 (-0.006388) 0.035809 / 0.038508 (-0.002699) 0.035738 / 0.023109 (0.012629) 0.377885 / 0.275898 (0.101987) 0.415058 / 0.323480 (0.091578) 0.091727 / 0.007986 (0.083741) 0.005361 / 0.004328 (0.001032) 0.008590 / 0.004250 (0.004340) 0.037194 / 0.037052 (0.000141) 0.378537 / 0.258489 (0.120048) 0.437906 / 0.293841 (0.144065) 0.102172 / 0.128546 (-0.026374) 0.013550 / 0.075646 (-0.062096) 0.313089 / 0.419271 (-0.106182) 0.059383 / 0.043533 (0.015850) 0.372549 / 0.255139 (0.117410) 0.419248 / 0.283200 (0.136048) 0.087722 / 0.141683 (-0.053961) 2.030929 / 1.452155 (0.578774) 2.133712 / 1.492716 (0.640995)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.201244 / 0.018006 (0.183238) 0.481592 / 0.000490 (0.481103) 0.004415 / 0.000200 (0.004215) 0.000336 / 0.000054 (0.000281)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.036966 / 0.037411 (-0.000445) 0.026331 / 0.014526 (0.011805) 0.035462 / 0.176557 (-0.141094) 0.229826 / 0.737135 (-0.507309) 0.030798 / 0.296338 (-0.265540)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.642951 / 0.215209 (0.427742) 6.601140 / 2.077655 (4.523485) 2.442419 / 1.504120 (0.938299) 2.072804 / 1.541195 (0.531609) 2.048184 / 1.468490 (0.579693) 0.765318 / 4.584777 (-3.819459) 6.346554 / 3.745712 (2.600842) 5.031933 / 5.269862 (-0.237928) 1.526206 / 4.565676 (-3.039471) 0.082075 / 0.424275 (-0.342201) 0.013187 / 0.007607 (0.005580) 0.811888 / 0.226044 (0.585844) 8.135717 / 2.268929 (5.866789) 3.089015 / 55.444624 (-52.355610) 2.351077 / 6.876477 (-4.525400) 2.407909 / 2.142072 (0.265837) 0.975852 / 4.805227 (-3.829375) 0.177129 / 6.500664 (-6.323535) 0.066428 / 0.075469 (-0.009041)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.948187 / 1.841788 (0.106400) 14.838324 / 8.074308 (6.764016) 44.943772 / 10.191392 (34.752380) 0.959360 / 0.680424 (0.278936) 0.674118 / 0.534201 (0.139917) 0.482563 / 0.579283 (-0.096720) 0.648013 / 0.434364 (0.213649) 0.362013 / 0.540337 (-0.178325) 0.351469 / 1.386936 (-1.035467)

CML watermark

Please sign in to comment.