Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad assumption on read() results #8

Open
ayende opened this issue Jan 20, 2022 · 5 comments
Open

Bad assumption on read() results #8

ayende opened this issue Jan 20, 2022 · 5 comments

Comments

@ayende
Copy link

ayende commented Jan 20, 2022

In https://github.com/skerkour/kerkour.com/blob/main/2022/rust_file_encryption_with_password/src/main.rs#L75 (and in many other places), you are assumption that the only scenario where you'll get less than the number of requested bytes is when you are at the end of the file.
This is wrong.

See the docs here:

https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read

In particular:

It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet.

There are many reasons why this may be the case.

A great example is if the read you are doing happened to start on a region that is already in memory, but continues to a page that is not there yet. The file system will return what it has right now. That can lead to really hard to figure out bugs.

@sylvain101010
Copy link
Member

sylvain101010 commented Jan 20, 2022

Hey @ayende,

It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet.

Indeed i'm aware of this warning but in my experience when reading a local file, read never returned less than BUFF_LEN,

Because the docs also say:

While for File, it is possible to reach the end of file and get zero as result

This function does not provide any guarantees about whether it blocks waiting for data, but if an object needs to block for a read and cannot, it will typically signal this via an Err return value.

So I believe it should be safe to assume that when reading a local file read will prefer to block rather than return a misleading result.

But, you also state:

A great example is if the read you are doing happened to start on a region that is already in memory, but continues to a page that is not there yet.

I'm learning this today, Thank you very much. But I also find that very surprising!

Do you have a recommendation to fix this potential issue? Use a BufReader?

@ayende
Copy link
Author

ayende commented Jan 21, 2022

You need to keep reading until read() returns a zero, that is the only scenario where you can be sure that the file is done.

Something like this:

 loop {
        let read_count = source_file.read(&mut buffer)?;

        if read_count == 0{
               break;
        } else {
            let ciphertext = stream_encryptor
                .encrypt_last(&buffer[..read_count])
                .map_err(|err| anyhow!("Encrypting large file: {}", err))?;
            dist_file.write(&ciphertext)?;
        }
    }

That work for all scenarios.

@ayende
Copy link
Author

ayende commented Jan 21, 2022

I noticed that you have encrypt_next vs. encrypt_last, you should probably just call encrypt_next always, and pass encrypt_last with empty buffer.

@sylvain101010
Copy link
Member

sylvain101010 commented Jan 21, 2022

Thnaks!
The problem is that it would make the encryption/decryption way more complex: we need to operate on the same chunks when decrypting than when encrypting.

Rust's std File.read uses directly the read of the OS's libc (https://doc.rust-lang.org/src/std/fs.rs.html#618, https://github.com/rust-lang/rust/blob/master/library/std/src/sys/unix/fs.rs#L852).

Where can I find information about the behavior you have described above (that File.read can return less than the requested number of bytes for a local file, so far I can only find information stating that read can return less than the requested number of bytes when reaching end of file)?

@ayende
Copy link
Author

ayende commented Jan 23, 2022

If you need to make the distinction between next & last, read to the buffer until it is full, even if you get partial read.
And only stop if you get 0 back.

For reference, see:
It is not an error if this number is smaller than the number of
bytes requested

https://man7.org/linux/man-pages/man2/read.2.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants