`ReadFile`: semantics of `bytesRead` #352

lostmsu · 2024-03-19T00:06:43Z

Just spent a couple of hours trying to figure out an issue with my FS.

Turned out that Dokan expects the implementation of ReadFile to always fill the entire buffer when offset + buffer.Length are within the file length, and read exactly remaining bytes when offset + buffer.Length are more than file length.

I was calling .NET Stream.Read with the buffer, and my filesystem would return garbage to some programs. The reason is Stream.Read by its contract does not have to fill your entire buffer. For example, when reading from a TCP connection if you specify 1MiB buffer and 1KiB packet arrives, NetworkStream will fill 1KiB and return 1024 even though the connection is still open and it will be possible to read more data in the future.

Now I don't know what the kernel APIs expect in this case, but IMHO one of the following should be done:

documenting this behavior
if this behavior is the result of a bug (perhaps the value returned in bytesRead is not used everywhere it should be), fixing that bug
considering changing .NET wrapper code to repeat ReadFile calls until either buffer is filled or end of file reached (this would remove the need for documenting the discrepancy)

The text was updated successfully, but these errors were encountered:

Liryna · 2024-03-19T02:18:21Z

Hi @lostmsu ,

Turned out that Dokan expects the implementation of ReadFile to always fill the entire buffer when offset + buffer.Length are within the file length, and read exactly remaining bytes when offset + buffer.Length are more than file length.

Dokan does use bytesRead to inform the system how much was actually read. It does not expect that the buffer is always fill.
That said, the calling application must handle the bytesRead it will receive and reissue a read to get the next data.

lostmsu · 2024-03-19T03:51:30Z

The scenario I worked with is `dumpbin` from C++ build tools, and it failed with my FS until I did fill the buffer (but didn't fail on a raw file). There might be a bug in .NET wrapper. In the decompilation it looked like it copies the whole buffer regardless of `bytesRead` for some reason, which is suspect. Perhaps some programs rely on only `bytesRead` to be overwritten. E.g. if they read into a circular buffer.

LTRData · 2024-03-19T06:20:53Z

I assume that you have seen this line:
https://github.com/dokan-dev/dokan-dotnet/blob/master/DokanNet%2FDokanOperationProxy.cs#L411

And yes, it looks like it copies the whole buffer. But the value of rawReadLegth is reported back to the OS because it is a ref parameter, so this should never cause any issues. Have you checked and made absolutely sure that you return the correct read length in the bytesRead output parameter?

lostmsu · 2024-03-19T06:37:03Z

Reasonably sure. This is the end of the method before the fix

bytesRead = stream.Read(buffer, 0, buffer.Length);
return DokanResult.Success;

LTRData · 2024-03-19T06:44:03Z

Okay. I think it would be interesting in your case to just try and change the Marshal.Copy code in the library to use the reported length instead of initial length and see if that makes any change in this case.

lostmsu · 2024-03-29T21:49:40Z

So I tried the suggested change, and dumpbin is still misbehaving when I read less than requested. Could still be bug elsewhere in dokan, but maybe it is a bug in dumpbin, where they unreasonably expect to read the exact amount requested.

This is what I put into DokanOperationProxy:

if (rawReadLength < 0)
    throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be greater than or equal to zero.");
if (rawReadLength > rawBufferLength)
    throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be less than or equal to the length of the buffer.");
Marshal.Copy(buffer, 0, rawBuffer, rawReadLength);

This is what I put into Mirror.ReadFile (+ further down replaced all buffer.Length with read):

int read = Math.Min(buffer.Length, 128);

I copied apphost.exe to the mirrored folder and tried to dumpbin /dependents mirror\apphost.exe which gave

LINK : warning LNK4094: 'apphost.exe' is an MS-DOS executable; use EXEHDR to dump it

When I replace with read = buffer.Length it works correctly.

The sad part is that if Microsoft tool is misbehaving like that, how many other apps silently rely on a similar behavior? I guess I will have to ensure full reads now in all file system implementations for compatibility purposes. 😭

LTRData · 2024-03-29T22:44:19Z

Thanks for testing! Based on your results, it seems like something we probably need to address somehow. I'll take a closer look at this next week when I am back in office.

lostmsu · 2024-03-29T23:26:34Z

BTW, if you are considering forcing the compatibility thing at Dokan level, I think simply repeating reads until buffer is filled is going to break potential scenarios where FS is backed by something like sockets.

…ffer #352

Liryna · 2024-03-30T02:43:55Z

I have confirmed that bytesRead behave as expected. The caller will receive the value set by the FS implementation.
The only thing I saw is that the library did not use this value to copy the data to the destination buffer 2f15c8a Not sure if this solves dumpbin case but worth a try.

Procmon might also be useful to see what is going on with dumpbin.

@lostmsu Are you able to reproduce the issue with the C# Mirror ? C Mirror / Memfs ? It would help find out where the issue is

lostmsu · 2024-03-30T20:30:36Z

@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in ReadFile implementation.

lostmsu · 2024-03-30T20:32:19Z

@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by bytesRead. It is nice to have foot guards 😁

LTRData · 2024-03-30T20:35:55Z

@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in ReadFile implementation.

But that is not a correct implementation. If fewer than requested bytes are returned for a file by the file system, it means that the read reached end of file. It is not like reading from devices such as communication channels. So, effectively, dumpbin will obviously treat the file as just 128 bytes long if you do this (which means it will only get the MZ header and not the PE header and assume it is a DOS exe etc etc).

LTRData · 2024-03-30T20:39:06Z

@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by bytesRead. It is nice to have foot guards 😁

It is a bit of an issue when an implementation actually wants to return fewer than requested bytes. That should be perfectly fine to do, to indicate that only that much data was available towards the end of the file.

lostmsu · 2024-03-30T20:42:30Z

@LTRData the guards are against negative values and values that are higher than requested.

lostmsu · 2024-03-30T20:44:55Z

From MSDN it sounds like @LTRData is correct

E.g. a file system that serves files should always read the requested amount when possible.

IMHO, what should be done is by default Dokan should repeat ReadFile calls from IDokanOperations until the buffer is filled or the end of file is reached just to prevent accidental incorrect implementations that follow .NET Stream behavior. But there should be a way for a file system implementation to opt out of this behavior somehow.

LTRData · 2024-03-30T20:54:43Z

From MSDN it sounds like @LTRData is correct

E.g. a file system that serves files should always read the requested amount when possible.

IMHO, what should be done is by default Dokan should repeat ReadFile calls from IDokanOperations until the buffer is filled or the end of file is reached, but there's should be a way for an FS implementation to opt out of this behavior somehow.

Sorry but I do not agree. That would limit possibilities for implementations that actually want to return fewer bytes read to indicate reached end of file and to avoid further unnecessary reads. Dokan should not send another unnecessary read request to the implementation in that case.

lostmsu · 2024-03-30T20:56:00Z

@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on IDokanOperations, or another optional interface, or something else.

LTRData · 2024-03-30T20:58:45Z

@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on IDokanOperations, or another optional interface, or something else.

Yes, but I think it should be the other way around in that case. There should be some kind of compatiblity layer to implement if this behavior with subsequent reads are needed by the implementation. Because this is not how file system implementations normally handle requests.

lostmsu · 2024-03-30T21:02:07Z

The problem with the "other way around" is that is just calls for bugs. BTW, the whole conversation should also apply to dokan native repository, because in C standard library read call has the same semantics as .NET Stream.Read - the only requirement is that 0 means end of file.

For instance, I have not looked at SSHFS implementation, but there's a good chance that it exhibits the same issue because it reads from the network, and network does not have the semantics used in Windows ReadFile call for files.

lostmsu · 2024-03-30T21:03:15Z

And to prove my theory: https://github.com/dokan-dev/dokan-sshfs/blob/3577c47f92e6f67be80554a7b1a91a922e627495/DokanSSHFS/DokanOperations.cs#L518 yes, it appears to have this bug.

Liryna · 2024-03-30T21:03:28Z

Agree there is no good reason for the library to retry reads. It is unable to know why the file system returned this value. Retry could be bad. Even if it is an option, maybe one read case returned less bytes for a good reason and a retry could make it worse.

LTRData · 2024-03-30T21:15:48Z

And to prove my theory: https://github.com/dokan-dev/dokan-sshfs/blob/3577c47f92e6f67be80554a7b1a91a922e627495/DokanSSHFS/DokanOperations.cs#L518 yes, it appears to have this bug.

No. That is not an issue. For regular files, .NET Stream reads and C library reads always return the number of requested bytes except towards end of file. The concept of returning fewer bytes and reads should be retried only applies to things like communication devices, network sockets and similar.

Edit: to be more specific, the read that you linked to in sshfs is not a read from a network socket in a way that would be an issue here. And if it was, it would have been really bad and practically would never had worked.

lostmsu · 2024-03-30T21:41:58Z

.NET Stream reads and C library reads always return the number of requested bytes

This is never guaranteed, and maybe true right now on Windows, but actually may not be.

For instance, from FileStream.Read documentation:

Returns
Int32

The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.

LTRData · 2024-03-30T21:52:09Z

.NET Stream reads and C library reads always return the number of requested bytes

This is never guaranteed, and maybe true right now on Windows, but actually may not be.

For instance, from FileStream.Read documentation:

Returns
Int32

The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.

It is never guaranteed as a general concept for everything that could be read from. But for regular files it definitely is, on all platforms and frameworks. There are a huge number of things in many places that would break otherwise.

lostmsu · 2024-03-30T21:54:16Z

I'm sorry, but what you say just doesn't match the documentation that I quoted.

Unlike the ReadFile documentation I quoted above, FileStream.Read doesn't make any guarantees like this. And same applies for read in C spec.

lostmsu · 2024-03-30T22:03:57Z

In fact, there's more down there:

The method is free to return fewer bytes than requested even if the end of the file stream has not been reached

LTRData · 2024-03-31T09:51:35Z

In fact, there's more down there:

The method is free to return fewer bytes than requested even if the end of the file stream has not been reached

Even Win32 ReadFile frequently returns fewer bytes in cases like network sockets, serial ports etc (the documentation for ReadFile that you quoted is wrong/incomplete in this way).

But this does not happen for regular files.

In any case, it is something that you need to take into account if you are reading from sources that could behave like this. But the code in sshfs that you linked to is not an example of such scenario because there they read from a steam buffer that is known to the implementation and the exact behavior is well defined. You can compare that to when they read from sockets and similar, which is entirely different in this aspect.

Liryna added a commit that referenced this issue Mar 30, 2024

Library - Read - Only copy the actual read data to the destination bu…

2f15c8a

…ffer #352

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ReadFile`: semantics of `bytesRead` #352

`ReadFile`: semantics of `bytesRead` #352

lostmsu commented Mar 19, 2024 •

edited

Loading

Liryna commented Mar 19, 2024

lostmsu commented Mar 19, 2024 via email •

edited

Loading

LTRData commented Mar 19, 2024

lostmsu commented Mar 19, 2024

LTRData commented Mar 19, 2024

lostmsu commented Mar 29, 2024

LTRData commented Mar 29, 2024

lostmsu commented Mar 29, 2024

Liryna commented Mar 30, 2024 •

edited

Loading

lostmsu commented Mar 30, 2024

lostmsu commented Mar 30, 2024

LTRData commented Mar 30, 2024

LTRData commented Mar 30, 2024

lostmsu commented Mar 30, 2024

lostmsu commented Mar 30, 2024 •

edited

Loading

LTRData commented Mar 30, 2024

lostmsu commented Mar 30, 2024

LTRData commented Mar 30, 2024 •

edited

Loading

lostmsu commented Mar 30, 2024

lostmsu commented Mar 30, 2024 •

edited

Loading

Liryna commented Mar 30, 2024 •

edited

Loading

LTRData commented Mar 30, 2024 •

edited

Loading

lostmsu commented Mar 30, 2024

LTRData commented Mar 30, 2024

lostmsu commented Mar 30, 2024 •

edited

Loading

lostmsu commented Mar 30, 2024

LTRData commented Mar 31, 2024

ReadFile: semantics of bytesRead #352

ReadFile: semantics of bytesRead #352

Comments

lostmsu commented Mar 19, 2024 • edited Loading

Liryna commented Mar 19, 2024

lostmsu commented Mar 19, 2024 via email • edited Loading

LTRData commented Mar 19, 2024

lostmsu commented Mar 19, 2024

LTRData commented Mar 19, 2024

lostmsu commented Mar 29, 2024

LTRData commented Mar 29, 2024

lostmsu commented Mar 29, 2024

Liryna commented Mar 30, 2024 • edited Loading

lostmsu commented Mar 30, 2024

lostmsu commented Mar 30, 2024

LTRData commented Mar 30, 2024

LTRData commented Mar 30, 2024

lostmsu commented Mar 30, 2024

lostmsu commented Mar 30, 2024 • edited Loading

LTRData commented Mar 30, 2024

lostmsu commented Mar 30, 2024

LTRData commented Mar 30, 2024 • edited Loading

lostmsu commented Mar 30, 2024

lostmsu commented Mar 30, 2024 • edited Loading

Liryna commented Mar 30, 2024 • edited Loading

LTRData commented Mar 30, 2024 • edited Loading

lostmsu commented Mar 30, 2024

LTRData commented Mar 30, 2024

lostmsu commented Mar 30, 2024 • edited Loading

lostmsu commented Mar 30, 2024

LTRData commented Mar 31, 2024

`ReadFile`: semantics of `bytesRead` #352

`ReadFile`: semantics of `bytesRead` #352

lostmsu commented Mar 19, 2024 •

edited

Loading

lostmsu commented Mar 19, 2024 via email •

edited

Loading

Liryna commented Mar 30, 2024 •

edited

Loading

lostmsu commented Mar 30, 2024 •

edited

Loading

LTRData commented Mar 30, 2024 •

edited

Loading

lostmsu commented Mar 30, 2024 •

edited

Loading

Liryna commented Mar 30, 2024 •

edited

Loading

LTRData commented Mar 30, 2024 •

edited

Loading

lostmsu commented Mar 30, 2024 •

edited

Loading