Document hub vs cache #51

Raynos · 2013-02-15T01:40:54Z

Lazy data structures are hard. We should document strategies for handling with multiple reduces like hubbing and caching.

Hubbing vs caching are different ways to deal with lazy data structures.

It allows you to say what you should happen when something gets consumed multiple times. 

Normally it will just re-reduce the entire source. So if the source is a file it would open another file descriptor and read the entire thing.

If you use hub then multiple people can consume that source (file) and it would only use one file descriptor. This is like a splitter or like how multi-pipe works in node streams.

The downside of hub is if someone reduces later he misses the first part of the file.

An alternative is cache. Which will cache the result and if you reduce multiple times it just reads from the cache.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document hub vs cache #51

Document hub vs cache #51

Raynos commented Feb 15, 2013

Document hub vs cache #51

Document hub vs cache #51

Comments

Raynos commented Feb 15, 2013