Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document hub vs cache #51

Open
Raynos opened this issue Feb 15, 2013 · 0 comments
Open

Document hub vs cache #51

Raynos opened this issue Feb 15, 2013 · 0 comments

Comments

@Raynos
Copy link
Contributor

Raynos commented Feb 15, 2013

Lazy data structures are hard. We should document strategies for handling with multiple reduces like hubbing and caching.

Hubbing vs caching are different ways to deal with lazy data structures.

It allows you to say what you should happen when something gets consumed multiple times. 

Normally it will just re-reduce the entire source. So if the source is a file it would open another file descriptor and read the entire thing.

If you use hub then multiple people can consume that source (file) and it would only use one file descriptor. This is like a splitter or like how multi-pipe works in node streams.

The downside of hub is if someone reduces later he misses the first part of the file.

An alternative is cache. Which will cache the result and if you reduce multiple times it just reads from the cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant