Skip to content

Commit

Permalink
Release: 1.1.3
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Nov 19, 2020
1 parent 9edd2d4 commit 000b584
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = '1.1.2'
release = '1.1.3'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@

setup(
name='datasets',
version="1.1.2",
version="1.1.3",
description=DOCLINES[0],
long_description='\n'.join(DOCLINES[2:]),
author='HuggingFace Inc.',
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.1.2"
__version__ = "1.1.3"

import pyarrow
from pyarrow import total_allocated_bytes
Expand Down

2 comments on commit 000b584

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.015708 / 0.011353 (0.004355) 0.012970 / 0.011008 (0.001961) 0.041702 / 0.038508 (0.003194) 0.029464 / 0.023109 (0.006355) 0.190785 / 0.275898 (-0.085113) 0.209022 / 0.323480 (-0.114458) 0.008873 / 0.007986 (0.000888) 0.004453 / 0.004328 (0.000125) 0.006703 / 0.004250 (0.002452) 0.049553 / 0.037052 (0.012500) 0.181121 / 0.258489 (-0.077368) 0.214142 / 0.293841 (-0.079699) 0.127572 / 0.128546 (-0.000974) 0.095651 / 0.075646 (0.020005) 0.379764 / 0.419271 (-0.039508) 0.471130 / 0.043533 (0.427597) 0.184927 / 0.255139 (-0.070212) 0.198504 / 0.283200 (-0.084696) 0.082321 / 0.141683 (-0.059362) 1.571531 / 1.452155 (0.119377) 1.659870 / 1.492716 (0.167154)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.031916 / 0.037411 (-0.005495) 0.016776 / 0.014526 (0.002250) 0.023821 / 0.176557 (-0.152736) 0.136865 / 0.737135 (-0.600270) 0.022566 / 0.296338 (-0.273773)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.167516 / 0.215209 (-0.047693) 1.685277 / 2.077655 (-0.392378) 1.035846 / 1.504120 (-0.468274) 0.978864 / 1.541195 (-0.562331) 1.024330 / 1.468490 (-0.444160) 4.679141 / 4.584777 (0.094364) 4.049842 / 3.745712 (0.304130) 5.833779 / 5.269862 (0.563918) 4.899713 / 4.565676 (0.334036) 0.458490 / 0.424275 (0.034215) 0.009163 / 0.007607 (0.001555) 0.169196 / 0.226044 (-0.056848) 1.761851 / 2.268929 (-0.507077) 1.263550 / 55.444624 (-54.181075) 1.177852 / 6.876477 (-5.698625) 1.216655 / 2.142072 (-0.925417) 4.869327 / 4.805227 (0.064100) 5.086489 / 6.500664 (-1.414175) 7.925314 / 0.075469 (7.849845)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 9.238528 / 1.841788 (7.396740) 13.211133 / 8.074308 (5.136825) 11.478565 / 10.191392 (1.287173) 0.843359 / 0.680424 (0.162935) 0.251424 / 0.534201 (-0.282777) 0.683957 / 0.579283 (0.104674) 0.477732 / 0.434364 (0.043368) 0.663151 / 0.540337 (0.122813) 1.410924 / 1.386936 (0.023988)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.015341 / 0.011353 (0.003988) 0.011989 / 0.011008 (0.000981) 0.040529 / 0.038508 (0.002021) 0.030950 / 0.023109 (0.007840) 0.281175 / 0.275898 (0.005277) 0.311416 / 0.323480 (-0.012064) 0.006086 / 0.007986 (-0.001899) 0.004197 / 0.004328 (-0.000132) 0.006476 / 0.004250 (0.002226) 0.049876 / 0.037052 (0.012823) 0.279980 / 0.258489 (0.021491) 0.320387 / 0.293841 (0.026546) 0.124736 / 0.128546 (-0.003810) 0.085003 / 0.075646 (0.009357) 0.364968 / 0.419271 (-0.054304) 0.411292 / 0.043533 (0.367759) 0.280208 / 0.255139 (0.025069) 0.347658 / 0.283200 (0.064459) 0.089440 / 0.141683 (-0.052243) 1.371221 / 1.452155 (-0.080933) 1.596426 / 1.492716 (0.103710)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.037049 / 0.037411 (-0.000363) 0.020052 / 0.014526 (0.005526) 0.024936 / 0.176557 (-0.151620) 0.085703 / 0.737135 (-0.651433) 0.025974 / 0.296338 (-0.270364)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.225253 / 0.215209 (0.010043) 2.244517 / 2.077655 (0.166862) 1.636201 / 1.504120 (0.132081) 1.614722 / 1.541195 (0.073527) 1.684060 / 1.468490 (0.215570) 4.524097 / 4.584777 (-0.060680) 4.150351 / 3.745712 (0.404639) 6.241953 / 5.269862 (0.972091) 4.943401 / 4.565676 (0.377725) 0.450018 / 0.424275 (0.025743) 0.009082 / 0.007607 (0.001475) 0.220060 / 0.226044 (-0.005984) 2.195402 / 2.268929 (-0.073526) 1.781300 / 55.444624 (-53.663324) 1.702390 / 6.876477 (-5.174087) 1.770318 / 2.142072 (-0.371755) 5.028762 / 4.805227 (0.223535) 3.932188 / 6.500664 (-2.568476) 4.570432 / 0.075469 (4.494963)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 8.824095 / 1.841788 (6.982307) 13.401137 / 8.074308 (5.326829) 10.454660 / 10.191392 (0.263268) 0.907306 / 0.680424 (0.226882) 0.487864 / 0.534201 (-0.046337) 0.663476 / 0.579283 (0.084193) 0.423934 / 0.434364 (-0.010430) 0.538558 / 0.540337 (-0.001780) 1.243124 / 1.386936 (-0.143812)

CML watermark

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.015196 / 0.011353 (0.003843) 0.012582 / 0.011008 (0.001574) 0.043653 / 0.038508 (0.005144) 0.028322 / 0.023109 (0.005212) 0.180147 / 0.275898 (-0.095751) 0.205440 / 0.323480 (-0.118040) 0.008146 / 0.007986 (0.000161) 0.004187 / 0.004328 (-0.000142) 0.006557 / 0.004250 (0.002307) 0.051029 / 0.037052 (0.013977) 0.179865 / 0.258489 (-0.078624) 0.215968 / 0.293841 (-0.077873) 0.117417 / 0.128546 (-0.011129) 0.092547 / 0.075646 (0.016901) 0.393999 / 0.419271 (-0.025273) 0.465267 / 0.043533 (0.421734) 0.177221 / 0.255139 (-0.077918) 0.190561 / 0.283200 (-0.092639) 0.081728 / 0.141683 (-0.059955) 1.575638 / 1.452155 (0.123483) 1.623254 / 1.492716 (0.130538)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.035380 / 0.037411 (-0.002031) 0.018194 / 0.014526 (0.003668) 0.100198 / 0.176557 (-0.076358) 0.084123 / 0.737135 (-0.653012) 0.027614 / 0.296338 (-0.268724)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.167390 / 0.215209 (-0.047819) 1.687933 / 2.077655 (-0.389721) 1.034126 / 1.504120 (-0.469994) 0.973042 / 1.541195 (-0.568153) 1.023053 / 1.468490 (-0.445438) 4.987554 / 4.584777 (0.402777) 4.176161 / 3.745712 (0.430449) 6.457601 / 5.269862 (1.187740) 5.230794 / 4.565676 (0.665118) 0.504131 / 0.424275 (0.079856) 0.009779 / 0.007607 (0.002172) 0.191220 / 0.226044 (-0.034824) 2.002088 / 2.268929 (-0.266840) 1.428052 / 55.444624 (-54.016572) 1.328268 / 6.876477 (-5.548208) 1.421241 / 2.142072 (-0.720832) 5.033205 / 4.805227 (0.227977) 7.879057 / 6.500664 (1.378393) 10.612959 / 0.075469 (10.537490)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 9.093049 / 1.841788 (7.251262) 13.733224 / 8.074308 (5.658916) 11.360882 / 10.191392 (1.169490) 0.805372 / 0.680424 (0.124948) 0.237040 / 0.534201 (-0.297161) 0.662772 / 0.579283 (0.083489) 0.458861 / 0.434364 (0.024497) 0.607489 / 0.540337 (0.067151) 1.355858 / 1.386936 (-0.031078)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.014885 / 0.011353 (0.003533) 0.012292 / 0.011008 (0.001284) 0.041691 / 0.038508 (0.003183) 0.029641 / 0.023109 (0.006532) 0.316934 / 0.275898 (0.041036) 0.343584 / 0.323480 (0.020104) 0.006063 / 0.007986 (-0.001922) 0.004176 / 0.004328 (-0.000152) 0.006459 / 0.004250 (0.002209) 0.050182 / 0.037052 (0.013129) 0.315960 / 0.258489 (0.057471) 0.352770 / 0.293841 (0.058929) 0.112290 / 0.128546 (-0.016257) 0.089003 / 0.075646 (0.013356) 0.388058 / 0.419271 (-0.031214) 0.384952 / 0.043533 (0.341419) 0.313488 / 0.255139 (0.058349) 0.330474 / 0.283200 (0.047274) 0.087964 / 0.141683 (-0.053719) 1.538944 / 1.452155 (0.086790) 1.586891 / 1.492716 (0.094175)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.036459 / 0.037411 (-0.000953) 0.019819 / 0.014526 (0.005293) 0.046763 / 0.176557 (-0.129793) 0.087252 / 0.737135 (-0.649883) 0.049906 / 0.296338 (-0.246433)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.221943 / 0.215209 (0.006734) 2.219345 / 2.077655 (0.141690) 1.593813 / 1.504120 (0.089693) 1.550234 / 1.541195 (0.009039) 1.653535 / 1.468490 (0.185045) 4.834923 / 4.584777 (0.250146) 4.001321 / 3.745712 (0.255608) 6.393118 / 5.269862 (1.123256) 5.080381 / 4.565676 (0.514704) 0.487821 / 0.424275 (0.063546) 0.009598 / 0.007607 (0.001991) 0.241437 / 0.226044 (0.015392) 2.456874 / 2.268929 (0.187946) 1.938381 / 55.444624 (-53.506243) 1.823796 / 6.876477 (-5.052681) 1.892816 / 2.142072 (-0.249257) 4.893936 / 4.805227 (0.088708) 4.500528 / 6.500664 (-2.000136) 6.947406 / 0.075469 (6.871937)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 9.533851 / 1.841788 (7.692063) 30.316784 / 8.074308 (22.242476) 12.383040 / 10.191392 (2.191648) 0.830938 / 0.680424 (0.150514) 0.518982 / 0.534201 (-0.015218) 0.652457 / 0.579283 (0.073174) 0.440866 / 0.434364 (0.006502) 0.580128 / 0.540337 (0.039790) 1.325629 / 1.386936 (-0.061307)

CML watermark

Please sign in to comment.