You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The map/reduce examples have clear boundaries between startup, reading data, processing data and writing it out to disk. The process lifetime doesn't extend beyond those boundaries, which always perpetuates the cost of disk usage.
Similar to apache spark, avoiding disk access saves disk access, which can augment performance. It is important to realize that the boundaries of the processing isn't different from disk/memory access. The only difference is that at the moment where the mapper (for example) writes a partition to disk and exits, it would simply stay around to wait for queries to be executed against the data in the partitions.
what's left is figure out how to express the functions to be executed against the data (which may be in any format) in a consistent way. Most of them are aggregation functions:
sum
group?
etc
Joins are a lot harder to achieve. Maybe the mapper/reducer process itself can implement specific functions that dictate how this is done, so that the framework doesn't become overly generic and hard to read.
The text was updated successfully, but these errors were encountered:
The map/reduce examples have clear boundaries between startup, reading data, processing data and writing it out to disk. The process lifetime doesn't extend beyond those boundaries, which always perpetuates the cost of disk usage.
Similar to apache spark, avoiding disk access saves disk access, which can augment performance. It is important to realize that the boundaries of the processing isn't different from disk/memory access. The only difference is that at the moment where the mapper (for example) writes a partition to disk and exits, it would simply stay around to wait for queries to be executed against the data in the partitions.
what's left is figure out how to express the functions to be executed against the data (which may be in any format) in a consistent way. Most of them are aggregation functions:
Joins are a lot harder to achieve. Maybe the mapper/reducer process itself can implement specific functions that dictate how this is done, so that the framework doesn't become overly generic and hard to read.
The text was updated successfully, but these errors were encountered: