Skip to content

Conversation

@niebowen666
Copy link

@niebowen666 niebowen666 commented Dec 18, 2025

exit4file intoduction

An eBPF tool used to trace the read/write/delete of every files for EXT4

What for

ext4file is used to monitor the I/O patterns(buffer or direct) of each file in the target ext4 FS. To prevent confusion caused by files having the same name, the output of ext4file also includes metadata such as the file's node ID, parent node ID, and whether it has been deleted.

Why ext4file

In the current repository, there exist block-layer tools to monitor I/O patterns of whole disk(such as biopattern, biolatency, etc.) and VFS-layer tools to trace file lifecycle and I/O behavior of files throughout the entire VFS (such as filelife and vfsstat).
However, the above tools lack finer-grained tracking at the file level.

  • biopattern can only track the proportion of random I/O and sequential I/O occurring on a single hard drive device relative to all I/O(Other block-level tools have the same problem: they cannot track down to the file level with finer granularity).
  • vfsstat tracks file creation, deletion, reading, writing, and other behaviors at the VFS level, while ignoring finer-grained distinctions at the file level.
  • filelife ignores I/O and only focuses on the creation and deletion of files.
  • To prevent excessive output, filetop only displays part of the data, and filetop's I/O tracking cannot further determine whether it is buffer or direct.

How ext4file

Run ext4file before executing your test. You can refer to ./ext4file -h to get the usage of the tool

Show I/O pattern of every ext4 file.

Usage: ./ext4file [-h, --help] [-d <dir>, --dir==<dir>] [-o <file>, --output==<file>] [interval] [count]

Options:
  -d, --dir=<dir>              Trace the specific device(FS) mounted on this dir
  -o, --output=<file>          Output to a specific file(selective)
  -h, --help                   Show this help
  interval                     Specify the amount of time in seconds between each report
  count                        Limit the number of reports (default: unlimited)

Examples:
  ./ext4file -d /mnt/ext4                      # Trace the device mounted on '/mnt/ext4'
  ./ext4file -d /mnt/ext4 1 10                 # Print 10 reports at 1 second intervals
  ./ext4file -d /mnt/ext4 -o output 1 10       # Print 10 reports at 1 second intervals to ./output

The output could be:

root@server:/home/nbw/OpenSource/biohint/libbpf-tools# ./ext4file -d /mnt/ext4File/
EXT4 FS Info: blocks_count=3750232064 blocks_per_group=32768 bg_cnt=114448
Tracing Ext4 read/write... Hit Ctrl-C to end.
2026-01-14 13:58:21
file_name            inode      pa_inode   hint   buffer_read     direct_read     buffer_write    direct_write    delete
test3                83361794   83361793   0      0               0               0               0               False
test2                34         2          0      8               0               1               0               False
dir1                 83361793   2          0      0               0               0               0               False
dir2                 440467457  2          0      0               0               0               0               True
test3                33         2          0      8               0               1               0               False
test1                33         2          0      8               0               1               0               True
test3                440467458  440467457  0      0               0               0               0               True

The introduction for each field is as follows:

  • inode: the inode ID of the file.
  • file_name: file name of the file(not path).
  • pa_node: parent inode ID of the file.
  • hint: the attribute used by FDP SSD.
  • buffer_read: the count of buffer read occurred in the file.
  • direct_read: the count of direct read occurred in the file.
  • buffer_write: the count of buffer write occurred in the file.
  • direct_write: the count of direct write occurred in the file.
  • delete: has this file been deleted?

Target Audience

This tool can be used by developers of the ext4 file system who are interested in certain file I/O patterns.

@Bojun-Seo
Copy link
Contributor

Bojun-Seo commented Jan 6, 2026

Here are my quick notes:

  • Docs: Need more explanation
  • Naming: ext4File -> ext4file
  • Patch splitting: Please split patches functionally or logically

Thanks

@niebowen666
Copy link
Author

Here are my quick notes:

  • Docs: Need more explanation
  • Naming: ext4File -> ext4file
  • Patch splitting: Please split patches functionally or logically

Thanks

Thanks for your reply.
But I wonder what kind of docs should I offer and which directory should I submit these docs to.
Besides, another two PR has been submitted: #5439 and #5429.
Could you take a look if you have time. Thanks a lot!

@Bojun-Seo
Copy link
Contributor

When I said docs, I actually meant the commit message.
I want you to provide the purpose, necessity, value, and usage instructions in the commit message.

@niebowen666 niebowen666 force-pushed the ext4File branch 2 times, most recently from 7cc4f5e to a5912d8 Compare January 15, 2026 11:43
@niebowen666
Copy link
Author

When I said docs, I actually meant the commit message. I want you to provide the purpose, necessity, value, and usage instructions in the commit message.

Hi Bojun,
I have fix my code and update the commit message.

  • Detailed explanation has been commit
  • The name have been changed to ext4file
  • I have removed the tracking of the time for file creation, deletion, and access. Currently, ext4file only focuses on file-level I/O patterns.

tcptop \
vfsstat \
wakeuptime \
ext4file \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this new tool ? We already have fsdist/fsslower/filelife/filetop ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ext4file is a tool used to track file-level buffer or direct I/O.
If a certain file is expected to be accessed by direct I/O, ext4file can detect abnormal I/O access.
I have read the source code of the tools you listed above.

  • fsdist focuses on the execution time of operations like read, write, open, and sync, which is different from the issue we are concerned with(Buffer I/O and Direct I/O).
  • Compared to fsdist, fsslower is more powerful. The information it traces includes file names and pays attention to the size of I/O. It also sets a threshold, and if the execution time of an operation is below this threshold, it will skip tracing. Although it tracks file names, it does not achieve file-level tracking, because a file name does not represent a unique file. In addition, it tracks the size of I/O rather than the distribution between buffer and direct, so the results of ext4file can complement those of fsslower.
  • filelife ignores I/O and only focuses on the creation and deletion of files.
  • To prevent excessive output, filetop only displays part of the data, and filetop's I/O tracking cannot further determine whether it is buffer or direct.

ext4file can complement the tools mentioned above and can determine whether a file exhibits unexpected I/O under complex workloads.

@Bojun-Seo
Copy link
Contributor

I'm someone who believes that each commit/patch should be self-contained and complete (self-contained atomic unit). I think developers should be able to understand the full context and intent just by reading the commit message alone, without having to dig through the PR description or conversation thread.

Therefore, it would be great if you could include the PR description into the commit message(s). Also, if you revise the patches so that each individual commit/patch maintains its own completeness (rather than scattering fixes across multiple small follow-up commits), it would make the review much easier.

Additionally, it would be helpful to add the answer to question of @chenhengqi directly into the explanation under the Why ext4file section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants