Skip to content

Commit

Permalink
minor
Browse files Browse the repository at this point in the history
  • Loading branch information
lhoestq committed Feb 15, 2021
1 parent f2e09b6 commit 86b605d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/quicktour.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Let's have a quick look at the 🤗datasets library. This library has three main
>>> from datasets import list_datasets
>>> datasets_list = list_datasets()
>>> len(datasets_list)
136
656
>>> print(', '.join(dataset for dataset in datasets_list))
aeslc, ag_news, ai2_arc, allocine, anli, arcd, art, billsum, blended_skill_talk, blimp, blog_authorship_corpus, bookcorpus, boolq, break_data,
c4, cfq, civil_comments, cmrc2018, cnn_dailymail, coarse_discourse, com_qa, commonsense_qa, compguesswhat, coqa, cornell_movie_dialog, cos_e,
Expand Down

1 comment on commit 86b605d

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.016703 / 0.011353 (0.005350) 0.014558 / 0.011008 (0.003550) 0.047069 / 0.038508 (0.008561) 0.032090 / 0.023109 (0.008981) 0.198618 / 0.275898 (-0.077280) 0.228278 / 0.323480 (-0.095201) 0.005299 / 0.007986 (-0.002687) 0.004599 / 0.004328 (0.000270) 0.006740 / 0.004250 (0.002490) 0.050005 / 0.037052 (0.012952) 0.189403 / 0.258489 (-0.069087) 0.219217 / 0.293841 (-0.074624) 0.142958 / 0.128546 (0.014412) 0.110603 / 0.075646 (0.034956) 0.400599 / 0.419271 (-0.018673) 0.616259 / 0.043533 (0.572726) 0.211609 / 0.255139 (-0.043530) 0.219363 / 0.283200 (-0.063836) 6.737638 / 0.141683 (6.595955) 1.734789 / 1.452155 (0.282635) 1.759867 / 1.492716 (0.267151)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.039704 / 0.037411 (0.002292) 0.018382 / 0.014526 (0.003857) 0.023471 / 0.176557 (-0.153085) 0.045230 / 0.737135 (-0.691905) 0.033440 / 0.296338 (-0.262899)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.231067 / 0.215209 (0.015858) 2.226846 / 2.077655 (0.149192) 1.199628 / 1.504120 (-0.304492) 1.152771 / 1.541195 (-0.388424) 1.174055 / 1.468490 (-0.294435) 6.238397 / 4.584777 (1.653620) 5.449271 / 3.745712 (1.703559) 7.889502 / 5.269862 (2.619640) 6.959712 / 4.565676 (2.394035) 0.611387 / 0.424275 (0.187112) 0.011646 / 0.007607 (0.004039) 0.245013 / 0.226044 (0.018968) 2.651289 / 2.268929 (0.382361) 1.703065 / 55.444624 (-53.741560) 1.441631 / 6.876477 (-5.434846) 1.589899 / 2.142072 (-0.552173) 6.441606 / 4.805227 (1.636379) 4.230148 / 6.500664 (-2.270516) 4.738348 / 0.075469 (4.662879)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.058181 / 1.841788 (8.216393) 14.719327 / 8.074308 (6.645018) 17.943495 / 10.191392 (7.752103) 0.866023 / 0.680424 (0.185599) 0.275510 / 0.534201 (-0.258691) 0.758213 / 0.579283 (0.178930) 0.580142 / 0.434364 (0.145778) 0.678461 / 0.540337 (0.138123) 1.580939 / 1.386936 (0.194003)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017205 / 0.011353 (0.005852) 0.015194 / 0.011008 (0.004186) 0.047829 / 0.038508 (0.009321) 0.040214 / 0.023109 (0.017105) 0.316351 / 0.275898 (0.040453) 0.347063 / 0.323480 (0.023583) 0.008849 / 0.007986 (0.000863) 0.004384 / 0.004328 (0.000056) 0.007857 / 0.004250 (0.003607) 0.049072 / 0.037052 (0.012020) 0.328449 / 0.258489 (0.069960) 0.362731 / 0.293841 (0.068890) 0.168657 / 0.128546 (0.040111) 0.116350 / 0.075646 (0.040704) 0.432477 / 0.419271 (0.013206) 0.405153 / 0.043533 (0.361620) 0.325503 / 0.255139 (0.070364) 0.341530 / 0.283200 (0.058330) 1.608213 / 0.141683 (1.466530) 1.730673 / 1.452155 (0.278518) 1.700430 / 1.492716 (0.207714)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.038625 / 0.037411 (0.001214) 0.018953 / 0.014526 (0.004427) 0.046745 / 0.176557 (-0.129812) 0.049896 / 0.737135 (-0.687239) 0.047507 / 0.296338 (-0.248832)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.289349 / 0.215209 (0.074140) 2.827137 / 2.077655 (0.749482) 1.895703 / 1.504120 (0.391583) 1.704136 / 1.541195 (0.162941) 1.743892 / 1.468490 (0.275402) 6.153659 / 4.584777 (1.568882) 4.905967 / 3.745712 (1.160255) 7.622236 / 5.269862 (2.352375) 6.641999 / 4.565676 (2.076323) 0.576421 / 0.424275 (0.152146) 0.009183 / 0.007607 (0.001576) 0.319935 / 0.226044 (0.093891) 3.008502 / 2.268929 (0.739574) 2.263659 / 55.444624 (-53.180966) 2.111187 / 6.876477 (-4.765289) 2.091311 / 2.142072 (-0.050762) 6.067572 / 4.805227 (1.262344) 3.727634 / 6.500664 (-2.773030) 6.016096 / 0.075469 (5.940627)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 10.364154 / 1.841788 (8.522366) 15.470470 / 8.074308 (7.396162) 17.794506 / 10.191392 (7.603114) 0.736304 / 0.680424 (0.055880) 0.572605 / 0.534201 (0.038404) 0.700690 / 0.579283 (0.121407) 0.516453 / 0.434364 (0.082089) 0.634213 / 0.540337 (0.093876) 1.410141 / 1.386936 (0.023205)

CML watermark

Please sign in to comment.