Skip to content

Commit

Permalink
Release: 1.13.3
Browse files Browse the repository at this point in the history
  • Loading branch information
albertvillanova committed Oct 15, 2021
1 parent baa2baa commit 10dc68c
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
# The short X.Y version
version = ""
# The full version, including alpha/beta/rc tags
release = "1.13.2"
release = "1.13.3"


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@

setup(
name="datasets",
version="1.13.3.dev0", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
version="1.13.3", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
description=DOCLINES[0],
long_description="\n".join(DOCLINES[2:]),
long_description_content_type="text/markdown",
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.13.3.dev0"
__version__ = "1.13.3"

import pyarrow
from packaging import version as _version
Expand Down

2 comments on commit 10dc68c

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010390 / 0.011353 (-0.000963) 0.004050 / 0.011008 (-0.006959) 0.036500 / 0.038508 (-0.002008) 0.040365 / 0.023109 (0.017256) 0.350376 / 0.275898 (0.074478) 0.456802 / 0.323480 (0.133323) 0.008477 / 0.007986 (0.000492) 0.005078 / 0.004328 (0.000749) 0.010248 / 0.004250 (0.005998) 0.043604 / 0.037052 (0.006552) 0.354924 / 0.258489 (0.096435) 0.383468 / 0.293841 (0.089627) 0.025899 / 0.128546 (-0.102647) 0.009523 / 0.075646 (-0.066123) 0.303881 / 0.419271 (-0.115391) 0.052783 / 0.043533 (0.009250) 0.356162 / 0.255139 (0.101023) 0.399929 / 0.283200 (0.116729) 0.089476 / 0.141683 (-0.052207) 1.954913 / 1.452155 (0.502758) 1.980944 / 1.492716 (0.488228)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.216996 / 0.018006 (0.198990) 0.466077 / 0.000490 (0.465587) 0.007217 / 0.000200 (0.007017) 0.000112 / 0.000054 (0.000057)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.043841 / 0.037411 (0.006429) 0.025190 / 0.014526 (0.010664) 0.027761 / 0.176557 (-0.148795) 0.140715 / 0.737135 (-0.596421) 0.028579 / 0.296338 (-0.267759)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.497181 / 0.215209 (0.281972) 4.970040 / 2.077655 (2.892385) 2.288112 / 1.504120 (0.783992) 2.147918 / 1.541195 (0.606724) 2.237475 / 1.468490 (0.768985) 0.425090 / 4.584777 (-4.159687) 5.829697 / 3.745712 (2.083985) 1.023501 / 5.269862 (-4.246361) 0.952902 / 4.565676 (-3.612774) 0.046829 / 0.424275 (-0.377446) 0.005700 / 0.007607 (-0.001908) 0.621096 / 0.226044 (0.395052) 6.143683 / 2.268929 (3.874755) 2.815148 / 55.444624 (-52.629476) 2.429959 / 6.876477 (-4.446517) 2.461796 / 2.142072 (0.319723) 0.565240 / 4.805227 (-4.239987) 0.120802 / 6.500664 (-6.379862) 0.061005 / 0.075469 (-0.014465)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.813598 / 1.841788 (-0.028190) 15.091010 / 8.074308 (7.016702) 31.217886 / 10.191392 (21.026494) 0.898863 / 0.680424 (0.218439) 0.639785 / 0.534201 (0.105584) 0.270346 / 0.579283 (-0.308937) 0.667852 / 0.434364 (0.233489) 0.242380 / 0.540337 (-0.297957) 0.236090 / 1.386936 (-1.150846)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.010624 / 0.011353 (-0.000728) 0.004280 / 0.011008 (-0.006728) 0.032642 / 0.038508 (-0.005866) 0.040701 / 0.023109 (0.017592) 0.335583 / 0.275898 (0.059685) 0.363073 / 0.323480 (0.039593) 0.009121 / 0.007986 (0.001135) 0.005196 / 0.004328 (0.000868) 0.009395 / 0.004250 (0.005144) 0.051010 / 0.037052 (0.013958) 0.325191 / 0.258489 (0.066702) 0.356658 / 0.293841 (0.062817) 0.028610 / 0.128546 (-0.099936) 0.009515 / 0.075646 (-0.066132) 0.291882 / 0.419271 (-0.127390) 0.054804 / 0.043533 (0.011271) 0.338184 / 0.255139 (0.083045) 0.375148 / 0.283200 (0.091948) 0.093556 / 0.141683 (-0.048127) 1.962527 / 1.452155 (0.510372) 1.991632 / 1.492716 (0.498916)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.363316 / 0.018006 (0.345310) 0.495717 / 0.000490 (0.495228) 0.064974 / 0.000200 (0.064774) 0.000362 / 0.000054 (0.000307)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041956 / 0.037411 (0.004545) 0.024573 / 0.014526 (0.010047) 0.028478 / 0.176557 (-0.148078) 0.145817 / 0.737135 (-0.591318) 0.031596 / 0.296338 (-0.264742)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.496214 / 0.215209 (0.281005) 4.886850 / 2.077655 (2.809195) 2.179694 / 1.504120 (0.675574) 1.944927 / 1.541195 (0.403733) 1.958745 / 1.468490 (0.490255) 0.437958 / 4.584777 (-4.146819) 6.022588 / 3.745712 (2.276876) 1.148976 / 5.269862 (-4.120886) 1.090207 / 4.565676 (-3.475469) 0.051286 / 0.424275 (-0.372990) 0.006831 / 0.007607 (-0.000776) 0.674159 / 0.226044 (0.448115) 6.456227 / 2.268929 (4.187299) 2.681269 / 55.444624 (-52.763356) 2.308303 / 6.876477 (-4.568173) 2.331523 / 2.142072 (0.189450) 0.569343 / 4.805227 (-4.235884) 0.119724 / 6.500664 (-6.380940) 0.061276 / 0.075469 (-0.014193)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.789855 / 1.841788 (-0.051932) 14.814754 / 8.074308 (6.740446) 30.532174 / 10.191392 (20.340782) 0.804853 / 0.680424 (0.124429) 0.636839 / 0.534201 (0.102638) 0.268230 / 0.579283 (-0.311053) 0.616603 / 0.434364 (0.182239) 0.218012 / 0.540337 (-0.322326) 0.241192 / 1.386936 (-1.145744)

CML watermark

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.012783 / 0.011353 (0.001430) 0.004817 / 0.011008 (-0.006191) 0.040881 / 0.038508 (0.002373) 0.040299 / 0.023109 (0.017190) 0.358846 / 0.275898 (0.082948) 0.506035 / 0.323480 (0.182555) 0.010108 / 0.007986 (0.002123) 0.005737 / 0.004328 (0.001409) 0.012591 / 0.004250 (0.008341) 0.042560 / 0.037052 (0.005508) 0.347265 / 0.258489 (0.088776) 0.420585 / 0.293841 (0.126744) 0.039441 / 0.128546 (-0.089105) 0.013466 / 0.075646 (-0.062180) 0.328114 / 0.419271 (-0.091158) 0.060836 / 0.043533 (0.017303) 0.368726 / 0.255139 (0.113587) 0.399614 / 0.283200 (0.116414) 0.094619 / 0.141683 (-0.047064) 2.049263 / 1.452155 (0.597109) 2.098860 / 1.492716 (0.606144)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.241601 / 0.018006 (0.223595) 0.542928 / 0.000490 (0.542438) 0.005926 / 0.000200 (0.005726) 0.000115 / 0.000054 (0.000061)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.039705 / 0.037411 (0.002293) 0.027779 / 0.014526 (0.013253) 0.031249 / 0.176557 (-0.145308) 0.136102 / 0.737135 (-0.601034) 0.031904 / 0.296338 (-0.264435)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.629353 / 0.215209 (0.414144) 6.302135 / 2.077655 (4.224480) 2.581959 / 1.504120 (1.077839) 2.190889 / 1.541195 (0.649694) 2.191772 / 1.468490 (0.723282) 0.671450 / 4.584777 (-3.913327) 6.861696 / 3.745712 (3.115984) 1.546611 / 5.269862 (-3.723250) 1.383051 / 4.565676 (-3.182626) 0.072517 / 0.424275 (-0.351758) 0.006006 / 0.007607 (-0.001601) 0.830768 / 0.226044 (0.604723) 7.743383 / 2.268929 (5.474455) 3.035756 / 55.444624 (-52.408868) 2.370685 / 6.876477 (-4.505792) 2.328922 / 2.142072 (0.186850) 0.824184 / 4.805227 (-3.981043) 0.161693 / 6.500664 (-6.338971) 0.059160 / 0.075469 (-0.016309)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 1.968306 / 1.841788 (0.126518) 15.524870 / 8.074308 (7.450562) 42.741338 / 10.191392 (32.549946) 0.975005 / 0.680424 (0.294581) 0.658082 / 0.534201 (0.123881) 0.285089 / 0.579283 (-0.294194) 0.742449 / 0.434364 (0.308085) 0.251246 / 0.540337 (-0.289092) 0.266377 / 1.386936 (-1.120559)
PyArrow==latest
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.011512 / 0.011353 (0.000159) 0.005017 / 0.011008 (-0.005992) 0.037699 / 0.038508 (-0.000809) 0.039449 / 0.023109 (0.016340) 0.383607 / 0.275898 (0.107709) 0.419172 / 0.323480 (0.095692) 0.009427 / 0.007986 (0.001442) 0.005639 / 0.004328 (0.001310) 0.011819 / 0.004250 (0.007568) 0.046672 / 0.037052 (0.009620) 0.370553 / 0.258489 (0.112064) 0.431485 / 0.293841 (0.137644) 0.038359 / 0.128546 (-0.090187) 0.013430 / 0.075646 (-0.062216) 0.334208 / 0.419271 (-0.085063) 0.063172 / 0.043533 (0.019639) 0.374969 / 0.255139 (0.119830) 0.453087 / 0.283200 (0.169887) 0.102656 / 0.141683 (-0.039027) 2.037650 / 1.452155 (0.585495) 2.197879 / 1.492716 (0.705162)

Benchmark: benchmark_getitem_100B.json

metric get_batch_of_1024_random_rows get_batch_of_1024_rows get_first_row get_last_row
new / old (diff) 0.410040 / 0.018006 (0.392034) 0.558310 / 0.000490 (0.557821) 0.084572 / 0.000200 (0.084372) 0.000672 / 0.000054 (0.000618)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.042572 / 0.037411 (0.005161) 0.027585 / 0.014526 (0.013059) 0.036476 / 0.176557 (-0.140081) 0.155405 / 0.737135 (-0.581730) 0.036536 / 0.296338 (-0.259802)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.630462 / 0.215209 (0.415252) 6.330694 / 2.077655 (4.253039) 2.410557 / 1.504120 (0.906438) 2.036393 / 1.541195 (0.495199) 2.079913 / 1.468490 (0.611423) 0.672004 / 4.584777 (-3.912773) 7.036012 / 3.745712 (3.290300) 1.507371 / 5.269862 (-3.762490) 1.390588 / 4.565676 (-3.175088) 0.072869 / 0.424275 (-0.351406) 0.005828 / 0.007607 (-0.001779) 0.800379 / 0.226044 (0.574335) 7.856834 / 2.268929 (5.587905) 3.090082 / 55.444624 (-52.354542) 2.470455 / 6.876477 (-4.406022) 2.541044 / 2.142072 (0.398972) 0.842542 / 4.805227 (-3.962685) 0.163117 / 6.500664 (-6.337547) 0.064138 / 0.075469 (-0.011331)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 2.026738 / 1.841788 (0.184950) 14.809568 / 8.074308 (6.735260) 44.647731 / 10.191392 (34.456339) 0.948791 / 0.680424 (0.268367) 0.655615 / 0.534201 (0.121414) 0.293499 / 0.579283 (-0.285784) 0.747192 / 0.434364 (0.312828) 0.243294 / 0.540337 (-0.297044) 0.253444 / 1.386936 (-1.133492)

CML watermark

Please sign in to comment.