Skip to content

Commit

Permalink
Release: 1.1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Oct 6, 2020
1 parent d2f990e commit 2256521
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = '1.1.1'
release = '1.1.2'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@

setup(
name='datasets',
version="1.1.1",
version="1.1.2",
description=DOCLINES[0],
long_description='\n'.join(DOCLINES[2:]),
author='HuggingFace Inc.',
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.1.1"
__version__ = "1.1.2"

import pyarrow
from pyarrow import total_allocated_bytes
Expand Down

2 comments on commit 2256521

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.016970 / 0.011353 (0.005617) 0.016362 / 0.011008 (0.005354) 0.043082 / 0.038508 (0.004574) 0.030125 / 0.023109 (0.007016) 0.186595 / 0.275898 (-0.089303) 0.206338 / 0.323480 (-0.117142) 0.009776 / 0.007986 (0.001790) 0.004160 / 0.004328 (-0.000168) 0.006298 / 0.004250 (0.002048) 0.047884 / 0.037052 (0.010831) 0.184933 / 0.258489 (-0.073556) 0.206772 / 0.293841 (-0.087069) 0.152107 / 0.128546 (0.023561) 0.118509 / 0.075646 (0.042862) 0.429932 / 0.419271 (0.010661) 0.533082 / 0.043533 (0.489550) 0.187401 / 0.255139 (-0.067738) 0.201207 / 0.283200 (-0.081993) 0.072268 / 0.141683 (-0.069415) 1.622283 / 1.452155 (0.170129) 1.713378 / 1.492716 (0.220662)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.035209 / 0.037411 (-0.002203) 0.019056 / 0.014526 (0.004530) 0.029269 / 0.176557 (-0.147287) 0.081146 / 0.737135 (-0.655989) 0.026785 / 0.296338 (-0.269554)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.177410 / 0.215209 (-0.037799) 1.836204 / 2.077655 (-0.241451) 1.093852 / 1.504120 (-0.410268) 1.039182 / 1.541195 (-0.502013) 1.036352 / 1.468490 (-0.432138) 6.493940 / 4.584777 (1.909163) 5.401806 / 3.745712 (1.656094) 7.583278 / 5.269862 (2.313416) 6.622465 / 4.565676 (2.056788) 0.666449 / 0.424275 (0.242174) 0.010207 / 0.007607 (0.002600) 0.209480 / 0.226044 (-0.016565) 2.238126 / 2.268929 (-0.030803) 1.519543 / 55.444624 (-53.925082) 1.420028 / 6.876477 (-5.456449) 1.410288 / 2.142072 (-0.731785) 6.520261 / 4.805227 (1.715034) 8.950174 / 6.500664 (2.449510) 8.558088 / 0.075469 (8.482619)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.744044 / 1.841788 (8.902257) 13.575772 / 8.074308 (5.501464) 13.019491 / 10.191392 (2.828099) 0.865214 / 0.680424 (0.184790) 0.255194 / 0.534201 (-0.279007) 0.777044 / 0.579283 (0.197761) 0.610372 / 0.434364 (0.176009) 0.736531 / 0.540337 (0.196193) 1.501040 / 1.386936 (0.114104)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.016975 / 0.011353 (0.005622) 0.014576 / 0.011008 (0.003567) 0.044582 / 0.038508 (0.006074) 0.028007 / 0.023109 (0.004897) 0.303033 / 0.275898 (0.027135) 0.333112 / 0.323480 (0.009632) 0.009259 / 0.007986 (0.001273) 0.005638 / 0.004328 (0.001309) 0.006372 / 0.004250 (0.002121) 0.042687 / 0.037052 (0.005635) 0.312183 / 0.258489 (0.053694) 0.334647 / 0.293841 (0.040806) 0.150501 / 0.128546 (0.021955) 0.115899 / 0.075646 (0.040252) 0.395800 / 0.419271 (-0.023471) 0.380297 / 0.043533 (0.336764) 0.289452 / 0.255139 (0.034313) 0.314603 / 0.283200 (0.031404) 0.091077 / 0.141683 (-0.050606) 1.599267 / 1.452155 (0.147112) 1.738234 / 1.492716 (0.245518)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.040353 / 0.037411 (0.002942) 0.020035 / 0.014526 (0.005509) 0.034853 / 0.176557 (-0.141703) 0.079128 / 0.737135 (-0.658007) 0.069521 / 0.296338 (-0.226817)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.257065 / 0.215209 (0.041856) 2.463399 / 2.077655 (0.385744) 1.653457 / 1.504120 (0.149337) 1.546592 / 1.541195 (0.005397) 1.608297 / 1.468490 (0.139807) 6.324044 / 4.584777 (1.739267) 5.428059 / 3.745712 (1.682347) 7.728331 / 5.269862 (2.458469) 6.780277 / 4.565676 (2.214600) 0.658296 / 0.424275 (0.234021) 0.010547 / 0.007607 (0.002940) 0.292655 / 0.226044 (0.066611) 3.071133 / 2.268929 (0.802205) 2.212864 / 55.444624 (-53.231760) 2.053064 / 6.876477 (-4.823412) 2.134445 / 2.142072 (-0.007627) 6.630165 / 4.805227 (1.824938) 4.654367 / 6.500664 (-1.846297) 6.529703 / 0.075469 (6.454234)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.619705 / 1.841788 (8.777917) 14.229469 / 8.074308 (6.155161) 14.206458 / 10.191392 (4.015066) 1.107769 / 0.680424 (0.427345) 0.597388 / 0.534201 (0.063187) 0.782763 / 0.579283 (0.203480) 0.570428 / 0.434364 (0.136064) 0.747219 / 0.540337 (0.206882) 1.458792 / 1.386936 (0.071856)

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019888 / 0.011353 (0.008535) 0.016777 / 0.011008 (0.005769) 0.049603 / 0.038508 (0.011095) 0.033335 / 0.023109 (0.010225) 0.229989 / 0.275898 (-0.045909) 0.253494 / 0.323480 (-0.069986) 0.011120 / 0.007986 (0.003134) 0.004954 / 0.004328 (0.000625) 0.007408 / 0.004250 (0.003158) 0.052622 / 0.037052 (0.015570) 0.235715 / 0.258489 (-0.022775) 0.253590 / 0.293841 (-0.040251) 0.164820 / 0.128546 (0.036273) 0.139314 / 0.075646 (0.063668) 0.477584 / 0.419271 (0.058312) 0.531209 / 0.043533 (0.487677) 0.233041 / 0.255139 (-0.022098) 0.239541 / 0.283200 (-0.043659) 0.091968 / 0.141683 (-0.049715) 1.908530 / 1.452155 (0.456375) 1.955752 / 1.492716 (0.463035)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.044895 / 0.037411 (0.007484) 0.022767 / 0.014526 (0.008242) 0.093408 / 0.176557 (-0.083149) 0.283247 / 0.737135 (-0.453889) 0.155879 / 0.296338 (-0.140460)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.214832 / 0.215209 (-0.000377) 2.216770 / 2.077655 (0.139115) 1.371943 / 1.504120 (-0.132177) 1.242344 / 1.541195 (-0.298850) 1.255254 / 1.468490 (-0.213236) 7.198173 / 4.584777 (2.613396) 6.021747 / 3.745712 (2.276035) 8.766147 / 5.269862 (3.496285) 7.631689 / 4.565676 (3.066012) 0.732902 / 0.424275 (0.308627) 0.012470 / 0.007607 (0.004863) 0.264078 / 0.226044 (0.038033) 2.630759 / 2.268929 (0.361831) 1.801935 / 55.444624 (-53.642690) 1.674626 / 6.876477 (-5.201851) 1.747900 / 2.142072 (-0.394172) 7.132257 / 4.805227 (2.327030) 4.767217 / 6.500664 (-1.733447) 7.479554 / 0.075469 (7.404085)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.942970 / 1.841788 (10.101182) 15.048570 / 8.074308 (6.974262) 16.300820 / 10.191392 (6.109428) 0.904765 / 0.680424 (0.224342) 0.312133 / 0.534201 (-0.222068) 0.864124 / 0.579283 (0.284841) 0.663688 / 0.434364 (0.229324) 0.827798 / 0.540337 (0.287461) 1.721974 / 1.386936 (0.335037)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019424 / 0.011353 (0.008071) 0.016457 / 0.011008 (0.005448) 0.050658 / 0.038508 (0.012150) 0.032193 / 0.023109 (0.009083) 0.363656 / 0.275898 (0.087758) 0.420197 / 0.323480 (0.096717) 0.005819 / 0.007986 (-0.002167) 0.004559 / 0.004328 (0.000231) 0.007485 / 0.004250 (0.003235) 0.048698 / 0.037052 (0.011646) 0.379563 / 0.258489 (0.121074) 0.425152 / 0.293841 (0.131311) 0.165357 / 0.128546 (0.036810) 0.130950 / 0.075646 (0.055304) 0.477165 / 0.419271 (0.057894) 0.440638 / 0.043533 (0.397105) 0.382321 / 0.255139 (0.127182) 0.401316 / 0.283200 (0.118116) 0.093000 / 0.141683 (-0.048683) 1.900366 / 1.452155 (0.448211) 1.965170 / 1.492716 (0.472454)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.045781 / 0.037411 (0.008370) 0.024935 / 0.014526 (0.010409) 0.105234 / 0.176557 (-0.071323) 0.131818 / 0.737135 (-0.605318) 0.204083 / 0.296338 (-0.092256)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.289182 / 0.215209 (0.073973) 2.828262 / 2.077655 (0.750608) 1.959530 / 1.504120 (0.455411) 1.842630 / 1.541195 (0.301435) 1.878534 / 1.468490 (0.410043) 7.123726 / 4.584777 (2.538950) 5.838864 / 3.745712 (2.093152) 8.284520 / 5.269862 (3.014659) 7.310153 / 4.565676 (2.744476) 0.702061 / 0.424275 (0.277786) 0.011549 / 0.007607 (0.003942) 0.311287 / 0.226044 (0.085243) 3.227345 / 2.268929 (0.958417) 2.371597 / 55.444624 (-53.073028) 2.205511 / 6.876477 (-4.670966) 2.235224 / 2.142072 (0.093152) 7.142920 / 4.805227 (2.337692) 4.647705 / 6.500664 (-1.852959) 8.665284 / 0.075469 (8.589815)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 12.763352 / 1.841788 (10.921565) 15.763285 / 8.074308 (7.688977) 15.685113 / 10.191392 (5.493721) 1.198720 / 0.680424 (0.518296) 0.611431 / 0.534201 (0.077230) 0.860392 / 0.579283 (0.281109) 0.660675 / 0.434364 (0.226311) 0.836646 / 0.540337 (0.296308) 1.704590 / 1.386936 (0.317654)

Please sign in to comment.