Conversation
Use io_uring read_multishot to get eventfd notification: - event read is done automatically by io_uring, save one read() syscall - more importantly, read_multishot() is race free since the read is always done automatically by io_uring - then we can optimize event write in batch Add one command line for using read_mshot, which is only available since linux kernel v6.7. Signed-off-by: Ming Lei <tom.leiming@gmail.com>
sahlberg
left a comment
There was a problem hiding this comment.
Unfortunate to have to add complexity but it is what it is.
Looks good to me.
Reviewed-by me
| pthread_spin_unlock(&q_data->io_list_lock); | ||
| if (!use_mshot_evt(q_data)) { | ||
| pthread_spin_lock(&q_data->io_list_lock); | ||
| ublksrv_queue_handled_event(q); |
There was a problem hiding this comment.
This is from the old code, I was just about to send a change to remove the spinlock as I don't think we need it around the call to ublksrv_queue_handled_event(). Can you remove the spinlock in your PR?
There was a problem hiding this comment.
Can you send PR to remove it first? I can rebase this PR on yours.
|
Done |
| }; | ||
| static const struct option lo_longopts[] = { | ||
| { "nfs", 1, NULL, 1024 }, | ||
| { "mshot_evt", 0, NULL, 0 }, |
There was a problem hiding this comment.
This would need to be added to the usage function.
But do we need it? If it helps performance it should be unconditionally enabled, always?
There was a problem hiding this comment.
Or just ignore my comment. I can add it to usage() and add an entry in the manpage for ublk.nfs and mention this argument too and that it only is available in 6.7 or later.
|
This can be closed as it is no longer relevant |
Use io_uring read_multishot to get eventfd notification:
event read is done automatically by io_uring, save one read() syscall
more importantly, read_multishot() is race free since the read is always done automatically by io_uring
then we can optimize event write in batch
Add one command line '--mshot_evt' for using read_mshot, which is only available since linux kernel v6.7.
This way improves perf by ~5% in my test vm:
taskset -c $cpu ~/fio/t/io_uring -p 0 /dev/ublkb0
$cpu: the cpu which is in the queue affinity of ublkb0, since ubl.nfs offloads IO in libnfs pthread