-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read the input file in parallel #189
Comments
Would it be faster to not open multiple file handles, given no memory mapping is used? What would it even look like, I thought |
Right now, memory mapping is supported by the library implicitly by allowing the user to pass any |
I don't think so. If you have a single file descriptor for everything, that means you cannot issue parallel reads. If nothing else, there is only one offset value for a given file descriptor, so you cannot issue reads from several places at once. Beyond that, a file descriptor is not safe to access in parallel, you'd need a mutex which defeats the point - see https://stackoverflow.com/a/823525 We'll just need to open several file descriptors, and |
I thought so too. I just thought it sounded like there was an alternative to opening multiple handles. Let's try that then :) |
Yes, but it would be horrifically unsafe. Memory-mapped data changes in memory whenever the file is changed. That means that if you create an |
Okay, but if you use the mapped memory only through the interface of the |
Rust just doesn't allow creating a You can take a raw pointer to the memory map and So to actually take advantage of mmap, you need to either adapt basically all your code to use |
thanks for that explanation :) good to know |
I'm wondering how this would happen, from my docs perusing , it seems the crate wraps a BufferedReader from which it gets it's bytes from, they don't implement clone and the same one can't be sent to multiple threads , creating multiple bufreaders to the same underlying reader is explicitly mentioned in the docs to cause data loss. |
When reading from a file, you'll need to open multiple For reading from an in-memory buffer, you can just send the AFAIK there's no way to implement this for a generic buffered reader. There is, for example, no reasonable way to read a stream from the network in parallel, let alone seek the network stream. So if that's a use case you'd like to support, reading needs to be single-threaded in this case. |
meta data should be read with buffered readers, since a lot of tiny read requests are made. The pixel data, which will be read in parallel, will not benefit from any buffering |
the cloning of the byte source will have to be a custom abstraction. the or alternatively we abstract over the ability to create a new reader from scratch, by remembering the path of the file, and calling those individual readers can then be buffered where it makes sense |
And what about data passed as a slice? Won't that remove parallelism from it? Ideally what you want is a trait, I'll call it For a file, create a new file handler(we can't use clone because change in one file seek affects all other instances of files) for data using a cursor, create a new cursor with the same referenced data and returns it. Also the trait should implement independent seeking, i.e each independent clone should allow seeking without affecting the others I'm not sure if this is possible I've never really done it as I'm just speaking what I think may work. |
yes! I'm pretty sure that calling |
What can be improved or is missing?
OpenEXR format allows the file to be read in parallel. It is considerably faster than a single-threaded read all by itself, see #183 (comment)
It will also let us initialize buffers in parallel (until #186 becomes feasible) and allows performing pixel format conversions in worker threads more naturally (see #182) - AFAIK doing it with the current architecture would require a
memcpy()
.Implementation Approach
Memory-mapping is tempting, but is going to result in lots of very spooky UB the moment someone modifies the file we're reading. So we'll probably have to open a file descriptor per thread and seek it.
The text was updated successfully, but these errors were encountered: