Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadFile: semantics of bytesRead #352

Open
lostmsu opened this issue Mar 19, 2024 · 27 comments
Open

ReadFile: semantics of bytesRead #352

lostmsu opened this issue Mar 19, 2024 · 27 comments

Comments

@lostmsu
Copy link
Contributor

lostmsu commented Mar 19, 2024

Just spent a couple of hours trying to figure out an issue with my FS.

Turned out that Dokan expects the implementation of ReadFile to always fill the entire buffer when offset + buffer.Length are within the file length, and read exactly remaining bytes when offset + buffer.Length are more than file length.

I was calling .NET Stream.Read with the buffer, and my filesystem would return garbage to some programs. The reason is Stream.Read by its contract does not have to fill your entire buffer. For example, when reading from a TCP connection if you specify 1MiB buffer and 1KiB packet arrives, NetworkStream will fill 1KiB and return 1024 even though the connection is still open and it will be possible to read more data in the future.

Now I don't know what the kernel APIs expect in this case, but IMHO one of the following should be done:

  • documenting this behavior
  • if this behavior is the result of a bug (perhaps the value returned in bytesRead is not used everywhere it should be), fixing that bug
  • considering changing .NET wrapper code to repeat ReadFile calls until either buffer is filled or end of file reached (this would remove the need for documenting the discrepancy)
@Liryna
Copy link
Member

Liryna commented Mar 19, 2024

Hi @lostmsu ,

Turned out that Dokan expects the implementation of ReadFile to always fill the entire buffer when offset + buffer.Length are within the file length, and read exactly remaining bytes when offset + buffer.Length are more than file length.

Dokan does use bytesRead to inform the system how much was actually read. It does not expect that the buffer is always fill.
That said, the calling application must handle the bytesRead it will receive and reissue a read to get the next data.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 19, 2024 via email

@LTRData
Copy link
Contributor

LTRData commented Mar 19, 2024

I assume that you have seen this line:
https://github.com/dokan-dev/dokan-dotnet/blob/master/DokanNet%2FDokanOperationProxy.cs#L411

And yes, it looks like it copies the whole buffer. But the value of rawReadLegth is reported back to the OS because it is a ref parameter, so this should never cause any issues. Have you checked and made absolutely sure that you return the correct read length in the bytesRead output parameter?

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 19, 2024

Reasonably sure. This is the end of the method before the fix

bytesRead = stream.Read(buffer, 0, buffer.Length);
return DokanResult.Success;

@LTRData
Copy link
Contributor

LTRData commented Mar 19, 2024

Okay. I think it would be interesting in your case to just try and change the Marshal.Copy code in the library to use the reported length instead of initial length and see if that makes any change in this case.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 29, 2024

So I tried the suggested change, and dumpbin is still misbehaving when I read less than requested. Could still be bug elsewhere in dokan, but maybe it is a bug in dumpbin, where they unreasonably expect to read the exact amount requested.

This is what I put into DokanOperationProxy:

if (rawReadLength < 0)
    throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be greater than or equal to zero.");
if (rawReadLength > rawBufferLength)
    throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be less than or equal to the length of the buffer.");
Marshal.Copy(buffer, 0, rawBuffer, rawReadLength);

This is what I put into Mirror.ReadFile (+ further down replaced all buffer.Length with read):

int read = Math.Min(buffer.Length, 128);

I copied apphost.exe to the mirrored folder and tried to dumpbin /dependents mirror\apphost.exe which gave

LINK : warning LNK4094: 'apphost.exe' is an MS-DOS executable; use EXEHDR to dump it

When I replace with read = buffer.Length it works correctly.

The sad part is that if Microsoft tool is misbehaving like that, how many other apps silently rely on a similar behavior? I guess I will have to ensure full reads now in all file system implementations for compatibility purposes. 😭

@LTRData
Copy link
Contributor

LTRData commented Mar 29, 2024

Thanks for testing! Based on your results, it seems like something we probably need to address somehow. I'll take a closer look at this next week when I am back in office.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 29, 2024

BTW, if you are considering forcing the compatibility thing at Dokan level, I think simply repeating reads until buffer is filled is going to break potential scenarios where FS is backed by something like sockets.

@Liryna
Copy link
Member

Liryna commented Mar 30, 2024

I have confirmed that bytesRead behave as expected. The caller will receive the value set by the FS implementation.
The only thing I saw is that the library did not use this value to copy the data to the destination buffer 2f15c8a Not sure if this solves dumpbin case but worth a try.

Procmon might also be useful to see what is going on with dumpbin.

@lostmsu Are you able to reproduce the issue with the C# Mirror ? C Mirror / Memfs ? It would help find out where the issue is

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in ReadFile implementation.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by bytesRead. It is nice to have foot guards 😁

@LTRData
Copy link
Contributor

LTRData commented Mar 30, 2024

@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in ReadFile implementation.

But that is not a correct implementation. If fewer than requested bytes are returned for a file by the file system, it means that the read reached end of file. It is not like reading from devices such as communication channels. So, effectively, dumpbin will obviously treat the file as just 128 bytes long if you do this (which means it will only get the MZ header and not the PE header and assume it is a DOS exe etc etc).

@LTRData
Copy link
Contributor

LTRData commented Mar 30, 2024

@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by bytesRead. It is nice to have foot guards 😁

It is a bit of an issue when an implementation actually wants to return fewer than requested bytes. That should be perfectly fine to do, to indicate that only that much data was available towards the end of the file.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

@LTRData the guards are against negative values and values that are higher than requested.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

From MSDN it sounds like @LTRData is correct image

E.g. a file system that serves files should always read the requested amount when possible.

IMHO, what should be done is by default Dokan should repeat ReadFile calls from IDokanOperations until the buffer is filled or the end of file is reached just to prevent accidental incorrect implementations that follow .NET Stream behavior. But there should be a way for a file system implementation to opt out of this behavior somehow.

@LTRData
Copy link
Contributor

LTRData commented Mar 30, 2024

From MSDN it sounds like @LTRData is correct image

E.g. a file system that serves files should always read the requested amount when possible.

IMHO, what should be done is by default Dokan should repeat ReadFile calls from IDokanOperations until the buffer is filled or the end of file is reached, but there's should be a way for an FS implementation to opt out of this behavior somehow.

Sorry but I do not agree. That would limit possibilities for implementations that actually want to return fewer bytes read to indicate reached end of file and to avoid further unnecessary reads. Dokan should not send another unnecessary read request to the implementation in that case.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on IDokanOperations, or another optional interface, or something else.

@LTRData
Copy link
Contributor

LTRData commented Mar 30, 2024

@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on IDokanOperations, or another optional interface, or something else.

Yes, but I think it should be the other way around in that case. There should be some kind of compatiblity layer to implement if this behavior with subsequent reads are needed by the implementation. Because this is not how file system implementations normally handle requests.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

The problem with the "other way around" is that is just calls for bugs. BTW, the whole conversation should also apply to dokan native repository, because in C standard library read call has the same semantics as .NET Stream.Read - the only requirement is that 0 means end of file.

For instance, I have not looked at SSHFS implementation, but there's a good chance that it exhibits the same issue because it reads from the network, and network does not have the semantics used in Windows ReadFile call for files.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

@Liryna
Copy link
Member

Liryna commented Mar 30, 2024

Agree there is no good reason for the library to retry reads. It is unable to know why the file system returned this value. Retry could be bad. Even if it is an option, maybe one read case returned less bytes for a good reason and a retry could make it worse.

@LTRData
Copy link
Contributor

LTRData commented Mar 30, 2024

And to prove my theory: https://github.com/dokan-dev/dokan-sshfs/blob/3577c47f92e6f67be80554a7b1a91a922e627495/DokanSSHFS/DokanOperations.cs#L518 yes, it appears to have this bug.

No. That is not an issue. For regular files, .NET Stream reads and C library reads always return the number of requested bytes except towards end of file. The concept of returning fewer bytes and reads should be retried only applies to things like communication devices, network sockets and similar.

Edit: to be more specific, the read that you linked to in sshfs is not a read from a network socket in a way that would be an issue here. And if it was, it would have been really bad and practically would never had worked.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

.NET Stream reads and C library reads always return the number of requested bytes

This is never guaranteed, and maybe true right now on Windows, but actually may not be.

For instance, from FileStream.Read documentation:

Returns
Int32

The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.

@LTRData
Copy link
Contributor

LTRData commented Mar 30, 2024

.NET Stream reads and C library reads always return the number of requested bytes

This is never guaranteed, and maybe true right now on Windows, but actually may not be.

For instance, from FileStream.Read documentation:

Returns
Int32

The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.

It is never guaranteed as a general concept for everything that could be read from. But for regular files it definitely is, on all platforms and frameworks. There are a huge number of things in many places that would break otherwise.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

I'm sorry, but what you say just doesn't match the documentation that I quoted.

Unlike the ReadFile documentation I quoted above, FileStream.Read doesn't make any guarantees like this. And same applies for read in C spec.

@lostmsu
Copy link
Contributor Author

lostmsu commented Mar 30, 2024

In fact, there's more down there:

The method is free to return fewer bytes than requested even if the end of the file stream has not been reached

@LTRData
Copy link
Contributor

LTRData commented Mar 31, 2024

In fact, there's more down there:

The method is free to return fewer bytes than requested even if the end of the file stream has not been reached

Even Win32 ReadFile frequently returns fewer bytes in cases like network sockets, serial ports etc (the documentation for ReadFile that you quoted is wrong/incomplete in this way).

But this does not happen for regular files.

In any case, it is something that you need to take into account if you are reading from sources that could behave like this. But the code in sshfs that you linked to is not an example of such scenario because there they read from a steam buffer that is known to the implementation and the exact behavior is well defined. You can compare that to when they read from sockets and similar, which is entirely different in this aspect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@lostmsu @Liryna @LTRData and others