Skip to content

Commit a4527e1

Browse files
fdmananakdave
authored andcommitted
btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents
When doing a direct IO read or write, we always return -ENOTBLK when we find a compressed extent (or an inline extent) so that we fallback to buffered IO. This however is not ideal in case we are in a NOWAIT context (io_uring for example), because buffered IO can block and we currently have no support for NOWAIT semantics for buffered IO, so if we need to fallback to buffered IO we should first signal the caller that we may need to block by returning -EAGAIN instead. This behaviour can also result in short reads being returned to user space, which although it's not incorrect and user space should be able to deal with partial reads, it's somewhat surprising and even some popular applications like QEMU (Link tag #1) and MariaDB (Link tag #2) don't deal with short reads properly (or at all). The short read case happens when we try to read from a range that has a non-compressed and non-inline extent followed by a compressed extent. After having read the first extent, when we find the compressed extent we return -ENOTBLK from btrfs_dio_iomap_begin(), which results in iomap to treat the request as a short read, returning 0 (success) and waiting for previously submitted bios to complete (this happens at fs/iomap/direct-io.c:__iomap_dio_rw()). After that, and while at btrfs_file_read_iter(), we call filemap_read() to use buffered IO to read the remaining data, and pass it the number of bytes we were able to read with direct IO. Than at filemap_read() if we get a page fault error when accessing the read buffer, we return a partial read instead of an -EFAULT error, because the number of bytes previously read is greater than zero. So fix this by returning -EAGAIN for NOWAIT direct IO when we find a compressed or an inline extent. Reported-by: Dominique MARTINET <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Link: https://jira.mariadb.org/browse/MDEV-27900?focusedCommentId=216582&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-216582 Tested-by: Dominique MARTINET <[email protected]> CC: [email protected] # 5.10+ Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 037e127 commit a4527e1

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

fs/btrfs/inode.c

+13-1
Original file line numberDiff line numberDiff line change
@@ -7681,7 +7681,19 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
76817681
if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags) ||
76827682
em->block_start == EXTENT_MAP_INLINE) {
76837683
free_extent_map(em);
7684-
ret = -ENOTBLK;
7684+
/*
7685+
* If we are in a NOWAIT context, return -EAGAIN in order to
7686+
* fallback to buffered IO. This is not only because we can
7687+
* block with buffered IO (no support for NOWAIT semantics at
7688+
* the moment) but also to avoid returning short reads to user
7689+
* space - this happens if we were able to read some data from
7690+
* previous non-compressed extents and then when we fallback to
7691+
* buffered IO, at btrfs_file_read_iter() by calling
7692+
* filemap_read(), we fail to fault in pages for the read buffer,
7693+
* in which case filemap_read() returns a short read (the number
7694+
* of bytes previously read is > 0, so it does not return -EFAULT).
7695+
*/
7696+
ret = (flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOTBLK;
76857697
goto unlock_err;
76867698
}
76877699

0 commit comments

Comments
 (0)