Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read directory entries from a local cache file #102

Open
ingvarha opened this issue Mar 25, 2019 · 1 comment
Open

Read directory entries from a local cache file #102

ingvarha opened this issue Mar 25, 2019 · 1 comment

Comments

@ingvarha
Copy link

From the README:

"Running with 100 threads, I found the gating factor to be how fast I could list items from the
source bucket (!?!) Which makes me wonder if there is any way to do this faster."

If one knows that the directory entry names does not change, only the contents, it would be possible to do aws s3 ls s3://bucket/ > bucket.list, and then use that file as local cache, skipping the need to read out the bucket list at run time.

We have to deal with a bucket with some 20 million entries on the top level. Just listing the top level catalog takes 1-2 hours, so being able to cache this between runs would be helpful.

@cobbzilla
Copy link
Owner

@ingvarha Yes, I love the idea of saving an index (perhaps even including metadata) to a file, then using that instead of KeyLister.

I would welcome a PR that added this functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants