Read directory entries from a local cache file #102

ingvarha · 2019-03-25T08:49:49Z

From the README:

"Running with 100 threads, I found the gating factor to be how fast I could list items from the
source bucket (!?!) Which makes me wonder if there is any way to do this faster."

If one knows that the directory entry names does not change, only the contents, it would be possible to do aws s3 ls s3://bucket/ > bucket.list, and then use that file as local cache, skipping the need to read out the bucket list at run time.

We have to deal with a bucket with some 20 million entries on the top level. Just listing the top level catalog takes 1-2 hours, so being able to cache this between runs would be helpful.

cobbzilla · 2021-01-20T12:36:42Z

@ingvarha Yes, I love the idea of saving an index (perhaps even including metadata) to a file, then using that instead of KeyLister.

I would welcome a PR that added this functionality.

cobbzilla mentioned this issue Jan 20, 2021

Support resume after interruption #94

Closed

cobbzilla added enhancement good first issue labels Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read directory entries from a local cache file #102

Read directory entries from a local cache file #102

ingvarha commented Mar 25, 2019

cobbzilla commented Jan 20, 2021

Read directory entries from a local cache file #102

Read directory entries from a local cache file #102

Comments

ingvarha commented Mar 25, 2019

cobbzilla commented Jan 20, 2021