Skip to content

Commit

Permalink
Release: 1.1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Oct 2, 2020
1 parent d9da44a commit fe52b67
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = '1.0.2'
release = '1.1.0'


# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@

setup(
name='datasets',
version="1.0.2",
version="1.1.0",
description=DOCLINES[0],
long_description='\n'.join(DOCLINES[2:]),
author='HuggingFace Inc.',
Expand Down
2 changes: 1 addition & 1 deletion src/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# pylint: enable=line-too-long
# pylint: disable=g-import-not-at-top,g-bad-import-order,wrong-import-position

__version__ = "1.0.2"
__version__ = "1.1.0"

import pyarrow
from pyarrow import total_allocated_bytes
Expand Down

2 comments on commit fe52b67

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.016009 / 0.011353 (0.004656) 0.013743 / 0.011008 (0.002735) 0.049650 / 0.038508 (0.011142) 0.032172 / 0.023109 (0.009062) 0.197824 / 0.275898 (-0.078074) 0.216284 / 0.323480 (-0.107196) 0.009544 / 0.007986 (0.001559) 0.004342 / 0.004328 (0.000014) 0.008924 / 0.004250 (0.004674) 0.051132 / 0.037052 (0.014079) 0.200333 / 0.258489 (-0.058157) 0.219660 / 0.293841 (-0.074181) 0.142468 / 0.128546 (0.013921) 0.101045 / 0.075646 (0.025399) 0.453210 / 0.419271 (0.033938) 0.496846 / 0.043533 (0.453313) 0.181942 / 0.255139 (-0.073197) 0.207618 / 0.283200 (-0.075581) 0.087725 / 0.141683 (-0.053958) 1.600671 / 1.452155 (0.148517) 1.856866 / 1.492716 (0.364150)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.035876 / 0.037411 (-0.001535) 0.020627 / 0.014526 (0.006101) 0.050894 / 0.176557 (-0.125662) 0.161313 / 0.737135 (-0.575822) 0.101477 / 0.296338 (-0.194862)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.167992 / 0.215209 (-0.047217) 1.694305 / 2.077655 (-0.383349) 1.109703 / 1.504120 (-0.394417) 0.994045 / 1.541195 (-0.547150) 1.050636 / 1.468490 (-0.417854) 5.510731 / 4.584777 (0.925954) 4.545480 / 3.745712 (0.799768) 7.203179 / 5.269862 (1.933318) 5.969608 / 4.565676 (1.403932) 0.573967 / 0.424275 (0.149692) 0.010396 / 0.007607 (0.002789) 0.193950 / 0.226044 (-0.032095) 2.005546 / 2.268929 (-0.263382) 1.459716 / 55.444624 (-53.984909) 1.398653 / 6.876477 (-5.477823) 1.473150 / 2.142072 (-0.668922) 5.505179 / 4.805227 (0.699951) 3.683998 / 6.500664 (-2.816666) 10.147168 / 0.075469 (10.071699)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 11.083903 / 1.841788 (9.242116) 14.073687 / 8.074308 (5.999379) 12.208841 / 10.191392 (2.017449) 0.691830 / 0.680424 (0.011406) 0.256328 / 0.534201 (-0.277873) 0.737552 / 0.579283 (0.158269) 0.489055 / 0.434364 (0.054691) 0.698667 / 0.540337 (0.158330) 1.422385 / 1.386936 (0.035449)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.016270 / 0.011353 (0.004917) 0.013690 / 0.011008 (0.002681) 0.047715 / 0.038508 (0.009207) 0.032670 / 0.023109 (0.009561) 0.297609 / 0.275898 (0.021711) 0.345545 / 0.323480 (0.022065) 0.008704 / 0.007986 (0.000719) 0.004275 / 0.004328 (-0.000053) 0.008971 / 0.004250 (0.004721) 0.051090 / 0.037052 (0.014038) 0.302489 / 0.258489 (0.043999) 0.344355 / 0.293841 (0.050514) 0.140681 / 0.128546 (0.012135) 0.099026 / 0.075646 (0.023380) 0.465965 / 0.419271 (0.046694) 0.408477 / 0.043533 (0.364944) 0.303740 / 0.255139 (0.048601) 0.328613 / 0.283200 (0.045413) 0.098373 / 0.141683 (-0.043310) 1.550872 / 1.452155 (0.098717) 1.794370 / 1.492716 (0.301654)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.043653 / 0.037411 (0.006241) 0.021233 / 0.014526 (0.006707) 0.025475 / 0.176557 (-0.151081) 0.085415 / 0.737135 (-0.651720) 0.076610 / 0.296338 (-0.219729)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.229670 / 0.215209 (0.014461) 2.133490 / 2.077655 (0.055835) 1.677687 / 1.504120 (0.173567) 1.682092 / 1.541195 (0.140897) 1.604456 / 1.468490 (0.135966) 5.777835 / 4.584777 (1.193058) 4.579592 / 3.745712 (0.833880) 6.950089 / 5.269862 (1.680228) 5.619000 / 4.565676 (1.053324) 0.562941 / 0.424275 (0.138666) 0.010056 / 0.007607 (0.002449) 0.253613 / 0.226044 (0.027568) 2.597418 / 2.268929 (0.328489) 1.979675 / 55.444624 (-53.464949) 1.797153 / 6.876477 (-5.079324) 1.866893 / 2.142072 (-0.275179) 5.644289 / 4.805227 (0.839062) 3.672709 / 6.500664 (-2.827955) 5.997091 / 0.075469 (5.921622)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.452162 / 1.841788 (8.610375) 14.606382 / 8.074308 (6.532074) 10.866265 / 10.191392 (0.674873) 0.706702 / 0.680424 (0.026278) 0.540230 / 0.534201 (0.006029) 0.715325 / 0.579283 (0.136042) 0.471349 / 0.434364 (0.036985) 0.663958 / 0.540337 (0.123621) 1.457097 / 1.386936 (0.070161)

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017714 / 0.011353 (0.006361) 0.015508 / 0.011008 (0.004500) 0.055182 / 0.038508 (0.016674) 0.033697 / 0.023109 (0.010588) 0.215577 / 0.275898 (-0.060321) 0.260740 / 0.323480 (-0.062740) 0.010525 / 0.007986 (0.002540) 0.004605 / 0.004328 (0.000276) 0.008366 / 0.004250 (0.004116) 0.052404 / 0.037052 (0.015352) 0.213878 / 0.258489 (-0.044611) 0.265520 / 0.293841 (-0.028321) 0.147651 / 0.128546 (0.019105) 0.119120 / 0.075646 (0.043474) 0.540364 / 0.419271 (0.121093) 0.544796 / 0.043533 (0.501263) 0.213485 / 0.255139 (-0.041654) 0.230171 / 0.283200 (-0.053028) 0.089087 / 0.141683 (-0.052596) 2.033951 / 1.452155 (0.581796) 1.986259 / 1.492716 (0.493543)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.042384 / 0.037411 (0.004973) 0.025593 / 0.014526 (0.011068) 0.105510 / 0.176557 (-0.071047) 1.349426 / 0.737135 (0.612290) 0.205052 / 0.296338 (-0.091287)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.195803 / 0.215209 (-0.019406) 1.966633 / 2.077655 (-0.111022) 1.263513 / 1.504120 (-0.240607) 1.188398 / 1.541195 (-0.352796) 1.262248 / 1.468490 (-0.206242) 6.686841 / 4.584777 (2.102065) 5.288184 / 3.745712 (1.542472) 8.024383 / 5.269862 (2.754522) 6.928173 / 4.565676 (2.362496) 0.694580 / 0.424275 (0.270305) 0.013493 / 0.007607 (0.005886) 0.237944 / 0.226044 (0.011900) 2.321600 / 2.268929 (0.052671) 1.712485 / 55.444624 (-53.732139) 1.603707 / 6.876477 (-5.272769) 1.685765 / 2.142072 (-0.456308) 6.535083 / 4.805227 (1.729856) 4.521051 / 6.500664 (-1.979613) 7.365693 / 0.075469 (7.290223)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 12.521091 / 1.841788 (10.679304) 16.102262 / 8.074308 (8.027954) 13.124993 / 10.191392 (2.933601) 0.858578 / 0.680424 (0.178154) 0.283539 / 0.534201 (-0.250662) 0.789187 / 0.579283 (0.209904) 0.553179 / 0.434364 (0.118815) 0.742039 / 0.540337 (0.201702) 1.588031 / 1.386936 (0.201095)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018053 / 0.011353 (0.006700) 0.015739 / 0.011008 (0.004731) 0.053039 / 0.038508 (0.014531) 0.034175 / 0.023109 (0.011065) 0.362378 / 0.275898 (0.086480) 0.385266 / 0.323480 (0.061786) 0.006776 / 0.007986 (-0.001209) 0.004738 / 0.004328 (0.000409) 0.009906 / 0.004250 (0.005655) 0.053922 / 0.037052 (0.016869) 0.358121 / 0.258489 (0.099632) 0.384491 / 0.293841 (0.090650) 0.145939 / 0.128546 (0.017392) 0.123628 / 0.075646 (0.047982) 0.518226 / 0.419271 (0.098955) 0.423401 / 0.043533 (0.379869) 0.349119 / 0.255139 (0.093980) 0.372764 / 0.283200 (0.089564) 0.102105 / 0.141683 (-0.039578) 1.873652 / 1.452155 (0.421497) 1.941002 / 1.492716 (0.448286)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.049345 / 0.037411 (0.011934) 0.022901 / 0.014526 (0.008375) 0.036007 / 0.176557 (-0.140549) 0.096570 / 0.737135 (-0.640565) 0.050483 / 0.296338 (-0.245855)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.247730 / 0.215209 (0.032521) 2.501679 / 2.077655 (0.424024) 1.889590 / 1.504120 (0.385470) 1.828964 / 1.541195 (0.287770) 1.901695 / 1.468490 (0.433205) 6.486697 / 4.584777 (1.901921) 5.386261 / 3.745712 (1.640549) 7.948549 / 5.269862 (2.678687) 6.794189 / 4.565676 (2.228512) 0.652754 / 0.424275 (0.228479) 0.012034 / 0.007607 (0.004427) 0.274722 / 0.226044 (0.048678) 2.867641 / 2.268929 (0.598712) 2.283521 / 55.444624 (-53.161104) 2.203104 / 6.876477 (-4.673373) 2.306935 / 2.142072 (0.164863) 6.554267 / 4.805227 (1.749039) 4.727084 / 6.500664 (-1.773580) 8.964800 / 0.075469 (8.889331)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 13.031114 / 1.841788 (11.189326) 17.365266 / 8.074308 (9.290958) 13.229965 / 10.191392 (3.038573) 1.203261 / 0.680424 (0.522837) 0.570111 / 0.534201 (0.035910) 0.799607 / 0.579283 (0.220324) 0.573509 / 0.434364 (0.139145) 0.750223 / 0.540337 (0.209885) 1.721422 / 1.386936 (0.334486)

Please sign in to comment.