You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Running with 100 threads, I found the gating factor to be how fast I could list items from the
source bucket (!?!) Which makes me wonder if there is any way to do this faster."
If one knows that the directory entry names does not change, only the contents, it would be possible to do aws s3 ls s3://bucket/ > bucket.list, and then use that file as local cache, skipping the need to read out the bucket list at run time.
We have to deal with a bucket with some 20 million entries on the top level. Just listing the top level catalog takes 1-2 hours, so being able to cache this between runs would be helpful.
The text was updated successfully, but these errors were encountered:
From the README:
If one knows that the directory entry names does not change, only the contents, it would be possible to do aws s3 ls s3://bucket/ > bucket.list, and then use that file as local cache, skipping the need to read out the bucket list at run time.
We have to deal with a bucket with some 20 million entries on the top level. Just listing the top level catalog takes 1-2 hours, so being able to cache this between runs would be helpful.
The text was updated successfully, but these errors were encountered: