You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The used md5sum process is single threaded and takes ages calculating a 240GB nr database checksum.
It would be nice to use a checksum algorithm/program which can be parallelized to speed this up.
The text was updated successfully, but these errors were encountered:
I tried two different algorithms, but they finished really closely while not maxing out IO.
We should take a look at parallel implementations as it seems that the one used core could be the bottleneck.
time cksum nr_2022-04-02_mmseqs_taxonomy.tar
2820021559 280000174080 nr_2022-04-02_mmseqs_taxonomy.tar
real 65m15.818s
user 21m25.004s
sys 2m6.584s
time md5sum nr_2022-04-02_mmseqs_taxonomy.tar
35b7bc1a96f0b337c12713d4d3d4b4d3 nr_2022-04-02_mmseqs_taxonomy.tar
real 60m19.030s
user 13m45.796s
sys 2m22.600s
The used md5sum process is single threaded and takes ages calculating a 240GB nr database checksum.
It would be nice to use a checksum algorithm/program which can be parallelized to speed this up.
The text was updated successfully, but these errors were encountered: