Add vdev property to bypass vdev queue #16591

MigeljanImeri · 2024-10-01T17:18:23Z

Added vdev to bypass vdev queue when reading / writing from / to vdev. The intention behind this property is to improve performance when using o_direct.

Motivation and Context

Bypassing the vdev queue yields significant performance improvements when doing o_direct random reads on NVMe SSD's. We noticed a contention point with the vq->vq_lock in a flamegraph and wondered if this contention could be reduced through bypassing the vdev queue.

Description

An additional check is needed in vdev_queue_io_done to see if the ZIO was actually added to the queue, since the property may have been toggled in between vdev_queue_io and vdev_queue_io_done.

Using a gen 5 NVMe SSD, we were able to get roughly 3.5x the IOPS (136k -> 469k), through bypassing the vdev queue when doing o_direct random reads.

How Has This Been Tested?

Added a test case to check if toggling the bypassqueue property will cause kernel panic. When creating this vdev property, I ran into a situation where if the property was toggled while a ZIO was going through the pipeline, it would cause a panic because it was expected to be in a queue when it was never added to the queue. We now check if the ZIO has been added to the vdev queue before removing it.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin

I guess it might have sense for some very fast NVMe's with "unlimited" queue depth and I/O rate, since others may suffer from I/O aggregation lost as a side effect.

Also it seems that you are still obtaining the queue lock on completion, aren't you?

Also I don't see what would initialize io_queue_state you check on completion if the queue is bypassed.

Also the property seems to have negative meaning. It is preferable to use positive ones to avoid double negation.

MigeljanImeri · 2024-10-04T17:35:36Z

I guess it might have sense for some very fast NVMe's with "unlimited" queue depth and I/O rate, since others may suffer from I/O aggregation lost as a side effect.

Also it seems that you are still obtaining the queue lock on completion, aren't you?

Changed to avoid obtaining the queue lock if nothing gets removed from the queue.

Also I don't see what would initialize io_queue_state you check on completion if the queue is bypassed.

Changed to set the io_queue_state if not queueing the zio.

Also the property seems to have negative meaning. It is preferable to use positive ones to avoid double negation.

Changed property to use positive meaning.

tonyhutter · 2024-10-07T21:32:39Z

man/man7/vdevprops.7

@@ -156,6 +156,8 @@ If this device should perform new allocations, used to disable a device
 when it is scheduled for later removal.
 See
 .Xr zpool-remove 8 .
+.It Sy queue_io
+Add io to the vdev queue when reading or writing to this vdev.


Might want to give a hint to the user about when they would want to change this prop:

Add IO to the vdev queue when reading or writing to this vdev. Disabling this property can sometimes improve performance for direct IOs.

Feel free to re-word this ⬆️

tonyhutter · 2024-10-07T21:36:56Z

tests/zfs-tests/tests/functional/cli_root/zpool_set/vdev_set_queue_io.ksh

+log_must eval "fio --filename=$mntpnt/foobar --name=write-file \
+		--rw=write --size=$MINVDEVSIZE --bs=128k --numjobs=1 --direct=1 \
+		--ioengine=sync --runtime=10"
+


Would it make sense to also do the toggle_queue_io test while doing the write?

Sure makes sense to me.
Added.

tonyhutter · 2024-10-07T21:54:16Z

tests/zfs-tests/tests/functional/cli_root/zpool_set/vdev_set_queue_io.ksh

+verify_runnable "global"
+
+command -v fio > /dev/null || log_unsupported "fio missing"
+


I think you need to save/set/restore the zfs_dio_enabled module param to use Direct IO

log_must save_tunable DIO_ENABLED log_must set_tunable32 DIO_ENABLED 1 ... log_must restore_tunable DIO_ENABLED

robn · 2024-10-07T22:05:00Z

I am uncomfortable with this change.

I agree that the vq_lock runs white-hot when the disks underneath are very very fast. I've spent many months trying to reorganise this lock, with mixed success, mostly because in all my testing it just moves the contention to other locks. That was all pre-direct-IO though, so that may be different now, though my own workloads are not O_DIRECT.

So the main thing I know that this will "break" is that the IO is no longer tracked for the "slow IO" checks to find (vdev_deadman()). That's actually the main reason this lock runs hot, because removing the IO from the list is the only reason we take it on the IO return path.

The other part I'm just nervous about is that I don't know how it will interact with IO that is on the queue. Seems like it could starve other high-priority IO. This sort of stuff is why I've spent so long on it without a conclusive path forward.

I would weakly support this being done as a module parameter so it's at least available in the field for experimentation on specific customer workload, though it's a little tough with module parameters being scoped far wider than a single vdev instance. Though maybe that's fine to start for an experimental feature; the other vdev_queue_* parameters have the same issue.

For now though I'd be against having as a vdev property, because that is both more visible and more permanent, and I don't have the confidence to support it yet.

If I were overruled here, I would at least make the scope far more narrow: maybe only limit it to Direct IO IOs, and definitely make the name clear that is more narrowly. I would suggest something like your original and make it "queue bypass" or similar (eg "directio queue bypass" if it were limited to those).

(I appreciate the point about avoiding double-negatives, but I think its possible to claim that "bypass" is a specific, positive action. At least, for me, "queue io" is way too vague, and might cause me to say "why wouldn't I turn that off? Queues sound slow!". But I this is only a tiny part of my objection to this whole thing, so if it does stay in its current form, I'm unlikely to worry too much about naming).

MigeljanImeri · 2024-10-22T22:58:04Z

I am uncomfortable with this change.

I agree that the vq_lock runs white-hot when the disks underneath are very very fast. I've spent many months trying to reorganise this lock, with mixed success, mostly because in all my testing it just moves the contention to other locks. That was all pre-direct-IO though, so that may be different now, though my own workloads are not O_DIRECT.

I have been working with @bwatkinson into improving IOPS performance with reads and we have also noticed that getting around this lock just seems to shift contention elsewhere. I am hoping that with further research and testing we can further reduce this contention, this bypassing the vdev queue idea is just one step.

So the main thing I know that this will "break" is that the IO is no longer tracked for the "slow IO" checks to find (vdev_deadman()). That's actually the main reason this lock runs hot, because removing the IO from the list is the only reason we take it on the IO return path.

The other part I'm just nervous about is that I don't know how it will interact with IO that is on the queue. Seems like it could starve other high-priority IO. This sort of stuff is why I've spent so long on it without a conclusive path forward.

I would weakly support this being done as a module parameter so it's at least available in the field for experimentation on specific customer workload, though it's a little tough with module parameters being scoped far wider than a single vdev instance. Though maybe that's fine to start for an experimental feature; the other vdev_queue_* parameters have the same issue.

As for implementing this as a module parameter, I am not too sure which way we should go. I am okay with it either way, maybe others (@amotin, @tonyhutter) can provide some insight into what should be done here?

For now though I'd be against having as a vdev property, because that is both more visible and more permanent, and I don't have the confidence to support it yet.

If I were overruled here, I would at least make the scope far more narrow: maybe only limit it to Direct IO IOs, and definitely make the name clear that is more narrowly. I would suggest something like your original and make it "queue bypass" or similar (eg "directio queue bypass" if it were limited to those).

I agree that this property should probably be limited to only O_DIRECT IO's, as from the testing that I have done, it doesn't seem to improve performance with non O_DIRECT workloads.

(I appreciate the point about avoiding double-negatives, but I think its possible to claim that "bypass" is a specific, positive action. At least, for me, "queue io" is way too vague, and might cause me to say "why wouldn't I turn that off? Queues sound slow!". But I this is only a tiny part of my objection to this whole thing, so if it does stay in its current form, I'm unlikely to worry too much about naming).

bwatkinson · 2024-10-23T00:34:42Z

@behlendorf do you have an opinion on whether we should stick with a VDEV property that is limited to Direct I/O or would a module parameter make more sense at this point? I see a good argument for both.

tonyhutter · 2024-10-23T17:49:47Z

My preference would be for a vdev property, since you could set it NVMe vdevs, but not spinning disks. This assumes we're 100% comfortable with this feature though.

@MigeljanImeri do you have any thoughts on @robn's comments regarding breaking the deadman IO checks and the IO priority starvation? For example, if you submit a un-queue'd IO, and the drive "eats" the IO with no response, how does this codepath handle that?

MigeljanImeri · 2024-10-23T21:05:28Z

My preference would be for a vdev property, since you could set it NVMe vdevs, but not spinning disks. This assumes we're 100% comfortable with this feature though.

@MigeljanImeri do you have any thoughts on @robn's comments regarding breaking the deadman IO checks and the IO priority starvation? For example, if you submit a un-queue'd IO, and the drive "eats" the IO with no response, how does this codepath handle that?

If we restrict this to only o_direct reads and writes, the o_direct code path should propagate up any errors that occur. In terms of priority starvation, I don't foresee this being an issue with newer NVMe SSD's that are very fast. Although this could be an issue with slower devices, maybe we could include a warning in the property description that potential starvation could occur if the underlying disk can't keep up with the requests?

tonyhutter · 2024-10-23T23:20:36Z

@robn I was actually able to get this PR to trigger deadman events even with queue_io=off.

I set the ZFS params to:

zfs_deadman_checktime_ms=100
zfs_deadman_ziotime_ms=500
zfs_deadman_enabled=1

I then created a scsi_debug device with these params set:

modprobe scsi_debug max_luns=1 num_tgts=1 add_host=1 dev_size_mb=1000

... then created the pool with the scsi_debug device. I then set these additional scsi_debug params:

delay=100
every_nth=1000

Then I copied the zfs/build directory from the zfs source workspace to the pool and saw deadman events generated. This was NOT using O_DIRECT writes - just normal writes.

tonyhutter · 2024-10-23T23:25:55Z

I'm going to see if I can starve out a scrub and MMP write with queue_io=off.

robn · 2024-10-24T00:19:22Z

@tonyhutter thanks for trying to test those cases; I'm mostly going from memory and gut feels and its good to have things confirmed, or not!

From memory the deadman path takes the front (oldest) zio from the queue active list and uses it for the slow report, because we know its the oldest one outstanding. It therefore follows that that if the slow IO is not on that list, we won't report on it. The slow IO however will still be holding up the txg sync, and with no reporting, all the knockon "deadman fired" events (like suspending the pool, zed actions, etc) won't fire.

That might be ok! It might be ok to just document, "you lose all these other things" or something, so long as we're aware of it.

But this is part of why, for now, I'm against a vdev property: because there's knock-on effects that are to some extent tied up in implementation details that I definitely want to change, and I'm not aware of any policy or process around deprecating/removing a property in the future.

To be clear, I'm not against having the option available. If it had started as a module parameter, I would not have flinched. I think it's crucial we have ways to change internal behaviour in the field to gain more experience without too much overhead. My only concern with any of this is that we can support it into the future. Properties feel to me like real "user interface", like buttons, switches or lights on the dashboard, while module parameters feel more like messing around under the hood.

If I'm overblowing it, feel free to tell me to go away: I get it, the "best" way isn't always possible, and I'm not gonna be the person blocking this - it's important we can learn more about these use-cases.

Other possible options:

make it a hidden property; a secret switch under the dashboard is perhaps a bit more defensible
add some helpers so we can target module parameters to specific objects (vdevs, datasets, pools)
add some sort "transient" property type, that can be set, but isn't stored
gate the property behind a module parameter, so it does nothing unless a parameter is also set

I volunteer to make any/all of these possible if it would help (I'm not gonna request a bunch of unrelated work from someone, that's rude).

robn · 2024-10-24T00:22:03Z

(incidentally, I love how many subtle and weird and things properly fast IO is throwing up. It's nice to be solving hard problems!)

tonyhutter · 2024-10-24T01:19:27Z

@robn - some random thoughts:

One alternative is to pull it in as a vdev property and then keep it in master until 2.4.x. So don't backport it to 2.3.x or 2.2.x. That will give it some soak time and will help expose any unknowns. One thing I didn't think of though is that you can set vdev properties on the root vdev, top-level-vdevs and leaf vdevs, which could make this complicated. Like, what happens if you have queue_io=off on the top level vdev, and queue_io=on on the mirror-0 vdev and queue_io=off on one of the leaf mirror vdevs.

Another idea would be not have the vdev property at all and just overload zfs_vdev_def_queue_depth=0 to mean "don't queue anything". That makes me curious what setting zfs_vdev_def_queue_depth=0 does without this PR...

robn · 2024-10-24T02:57:14Z

There's currently no inheritance on vdev properties, so nothing happens. We've talked a little internally at Klara about what property inheritance should mean, and turns out it's kinda weird no matter how you slice it. But yeah, for now, nothing.

I did think about getting it on master and then not backporting it, but I don't think it gets us much, and possible makes it worse. "Soak time" assumes that there's people running master and kicking the tyres on new things, and to me this feels too niche for many (any?) people to actually do that. If they don't, then we're just having to make the same decisions when its time to cut 2.4. If they do, then it's the same problem, in that we have a user-acessible lever tied to an implementation detail, with the bonus that it's already enabled on pools already out there.

If we were doing a module parameter, I would just make a new one, not overload an existing one. Based on precedent, I don't consider module parameters to be stable between release series, and I'd have no problem shipping one in (say) 2.3 and then removing it in 2.4. We already do this with lots of different parameters (most notably ARC related).

(Below follows an extended ramble where I ask a bunch of questions, sort of answer them along the way, and come to the same conclusion. The joys of vacation; got time on my hands).

Maybe there's some clarity to be found in considering when and why an operator might want to enable this option, or maybe better, when not to. That's something for me too; I initially saw a mention of vq_lock and kinda rubbed my hands together because I've been training for this moment 💪 and maybe I haven't thought about it on its merits.

So, I'm an operator, and I've got my shiny new NVMe pool, and I feel like it's going a bit slow. I search around and find people talking about an option that makes it go faster. I turn it on and indeed, it goes faster. I am happy! Hurrah! Or, I turn it on, and it doesn't go faster because my workload (or pool topology or overall system architecture) isn't a fit for it, and I'm now annoyed, but I don't bother turning it off because eh, it doesn't do anything anyway.

Alternatively, I'm not using an NVMe pool, just regular spinners, but I read about the option and wonder, and I turn it on. Everything suddenly goes slower, so I turn it off again, fine. But maybe it doesn't go slower, because I have some funky setup with tons of block cache "under" ZFS. (I dunno, I'm trying to invent a scenario I'm pretty sure isn't real).

So! Is this is only a GO FAST / GO SLOW toggle? Or does it affect something deeper?

Let's say I'm right for the moment, and it does have two knock-on effects: no slow IO warnings for bypassed IOs, and priority queued IO may not get serviced if bypassed IOs are hogging all the worker threads. As an operator, am I going to be surprised if I toggle this and other things stop working properly? What if I didn't know that they might beforehand? What if I did now but I didn't really understand it?

What if it didn't affect something deeper? What if we "fixed" those? Like, maybe queue bypass only works when we're not scrubbing or other things that "naturally" degrade pool performance, though maybe we want to prioritise ZIL writes still, or MOS updates, or...? Maybe queue bypass only works when the pool is definitely in good health and responding well (ie when we can be more certain that slow IOs wouldn't have happened anyway).

I hope it's obvious that these are examples; these two may not end up being problems at all. There might be different things we haven't thought of. New ones might come up as a result of future changes; how does that affect the property being enabled or not? What if there was a future where vq_lock didn't exist (I'm working on it I swear). What's the user's expectation then?

If it's only limited to direct IO, does that change the answers? Does it add any new problems?

And then, if it was possible for it to be only a GO FAST / GO SLOW toggle, would we even need it? Is there ever a time where we wouldn't bypass the queue for "safe" IO, and so don't even need a toggle?

What if it wasn't an explicit "bypass the queue", but more of a hint: "prioritise direct IO when safe to do so?". That would sidestep most of the questions, because it clearly says "idk, give it a try, I promise I won't mess up your pool". Is there any value in such a thing?

But if we had that, then how is that any different in concept to the existing priority system? Note "in concept"; I know it would suck under the existing priority system because of how it's currently implemented, but that's not the point.

So I don't have complete answers for all of this, and I'll admit that probably my partial answers are informing my questions a bit, so again, I am not assuming that I'm definitely right here. But, all this is why this seems like a module parameter to me, because those are things we currently have to influence implementation-specific behaviour. Lifting the lid to flip a "bypass queue" switch carries with it an acknowledgement that the operator knows they're doing something weird, and at least, they now know there is thing called "queue" and presumably it's there for a reason. So they flip it and everything goes wrong and at least maybe they know why, and if they ask for help we can say "ok, turn that off, and thanks for the feedback on you scenario".

The one shortcoming here is about wanting to apply this to a specific vdev. I know that properties are the only comfortable way to do that right now, but that doesn't mean they're the right way necessarily, and the only reason we don't have some more transient way is that we haven't needed one. Is this the change that forces our hand? Ask yourself, if I had said at the start "this should be a vdev-specific module parameter" because that's a facility we aready had, would this discussion even had happened? If it's "I don't think so", then I will go and write that facility this weekend (and use the hell out of it myself; I have a patch to lift the vdev queue limit parameters into vdev properties that I finished but shelved for all the reasons above: I didn't want to commit to surfacing those implementation details yet).

Conclusion: don't put a "bypass queue" button on the dashboard without first (a) defining "queue", (b) committing to "queue" being a thing in the long term and (c) making sure it does only that. But feel free to put whatever you want under the hood, because it's already expected that (a) there's loads of things in there you don't understand, (b) the next model could have a totally different engine and (c) you have no expectation that it won't do weird things to other things in there, because it's a whole engine and everything is there for a reason, right?

(I genuinely did not start out with car analogies, but there's a reason they're so prevalent in engineering discplines!)

tonyhutter · 2024-10-24T22:28:09Z

I think you raise some good points on the vdev prop vs module param discussion. I'll have to think it over a little more.

What if it didn't affect something deeper? What if we "fixed" those? Like, maybe queue bypass only works when we're not scrubbing or other things that "naturally" degrade pool performance, though maybe we want to prioritise ZIL writes still, or MOS updates, or...?

Hmm.. could we do a hybrid approach where we allow O_DIRECT queue bypass if and only if the pending queue is empty (vq->vq_active == 0)?:

zio_t *                                                                          
vdev_queue_io(zio_t *zio)   
...
       if (!zio->io_vd->vdev_queue_io && vq->vq_active == 0) {                   
              zio->io_queue_state = ZIO_QS_NONE;                                 
              return (zio);                                                      
       }

So if you had a workload that was 99% O_DIRECT writes and 1% queued writes, you'd get most (but not all) of your O_DIRECT writes to be bypass. That would solve the starvation problem but not the slow IO detection. I don't know if this would actually work, just thinking aloud. You could get stuck into a "pileup" problem where there's exists one queued write, and so the next 99 O_DIRECT writes get queued up too, and then you never get back into bypass mode since the queue is backed up. Maybe there's enough gaps where the queue goes back to zero that you can get into bypass mode fairly easily. I'd be curious to test this on real hardware.

Maybe queue bypass only works when the pool is definitely in good health and responding well (ie when we can be more certain that slow IOs wouldn't have happened anyway).

If your disk was unhealthy and not responding to IOs, and you had at least a handful of queued-IOs (even just 1% of the writes), then presumably those would get reported as slow IOs, and you'd at least have somewhat of a canary in the coal mine, right? So maybe this slow IO detection isn't as much of an issue?

tonyhutter · 2024-10-30T19:34:36Z

@MigeljanImeri I've been trying to test/benchmark this PR, but in the process I've uncovered some unrelated bugs in ZFS (#16707, #16695) that are slowing down my testing. Can you describe your setup for getting those 3.5x gains you mentioned? Did you get those running fio with O_DIRECT? What did your pool config and record size look like?

MigeljanImeri · 2024-10-30T19:44:22Z

@MigeljanImeri I've been trying to test/benchmark this PR, but in the process I've uncovered some unrelated bugs in ZFS (#16707, #16695) that are slowing down my testing. Can you describe your setup for getting those 3.5x gains you mentioned? Did you get those running fio with O_DIRECT? What did your pool config and record size look like?

I was running just a single pool with one vdev. Had recordsize set to 4k. Wrote out 256 GB over 256 files (each one gig) with o_direct. Then doing o_direct random 4k reads with fio, 256 jobs, each job reading from its own individual file (we saw some weird contention with rangelocks and this was how we got around it).

tonyhutter · 2024-10-30T22:19:06Z

I just tested with a single NVMe pool. Sure enough, turning off queuing does improve IOPS with O_DIRECT:

-- IOPS --
fio, blocks=4k, seq read, direct=1
queue on:150k
queue off: 280k

fio, blocks=4k, randread, direct=1
queue on: 136k
queue off: 297k

tonyhutter · 2024-10-30T22:20:03Z

I did get over 500k IOPS with direct=0, but I'm assuming that's due to caching.

tonyhutter · 2024-10-30T22:22:04Z

I'll try to run my "scrub while FIOing" test again, but this time using the NVMe drive, since there seems to be weird stuff going on with brd + ZFS.

amotin · 2024-10-31T14:08:19Z

module/zfs/vdev.c

+				break;
+			}
+			if (vd->vdev_ops->vdev_op_leaf) {
+				vd->vdev_queue_io = intval & 1;


Do you plan it to be a bitmask, or why else & 1?

Ah, no. I removed the & 1.

amotin · 2024-10-31T14:10:33Z

module/zfs/vdev_queue.c

+	if (zio->io_queue_state == ZIO_QS_ACTIVE) {
+		mutex_enter(&vq->vq_lock);
+		vdev_queue_pending_remove(vq, zio);
+	}
+	else
+		return;


Suggested change

if (zio->io_queue_state == ZIO_QS_ACTIVE) {

mutex_enter(&vq->vq_lock);

vdev_queue_pending_remove(vq, zio);

}

else

return;

if (zio->io_queue_state == ZIO_QS_NONE)

return

mutex_enter(&vq->vq_lock);

vdev_queue_pending_remove(vq, zio);

Added vdev property to toggle queueing io to vdev queue when reading / writing from / to vdev. The intention behind this property is to improve performance when using o_direct. Signed-off-by: MigeljanImeri <[email protected]>

MigeljanImeri · 2024-11-13T23:57:37Z

I limited the queue bypass to only direct io. I think I've come to the conclusion that this should probably be a module parameter, rather than a vdev property, due to the potential issues that could arise with starvation and slow io detection. The only concern I have is that the current vdev module parameters aren't specific to a vdev. Ideally, we would be able to toggle this for individual vdevs.

tonyhutter · 2024-11-22T02:00:34Z

I'm having trouble seeing the benefits of the "Direct IOs can bypass the queue" special case. What does that get us over just having queue_io control all IOs? If the idea is "my workload runs Direct IO, and I know what I'm doing, so I'm fine with bypassing the queue and accept the risks that go along with it", that's fine, but keep in mind other users on your system may also be using Direct IO as well. In fact a user may be running an application that uses Direct IO without their knowledge (like a database). So they could be buying into the relaxed safety assumptions unwittingly. That's why I lean towards queue_io controlling all IOs. It's just simpler, and it allows the user to bypass the queue for normal, non-Direct IOs as well.

There's one other "hybrid" option we could consider:

queue_io=on|off|hybrid

hybrid: If this is a "normal IO" and the queue is empty, then do not queue the IO. That should prevent starvation of the important IOs. Assuming most of the IOs are "normal" then most of the time then you're not going to be queuing, which is what you want. Important IOs would be things like ZIO_FLAG_IO_REPAIR, ZIO_FLAG_SELF_HEAL, ZIO_FLAG_RESILVER, ZIO_FLAG_SCRUB, ZIO_FLAG_IO_RETRY... In those cases you would always queue. So if you're running as resilver, a bunch of resilver IOs are going to get queued, which means your "normal IOs" get queued as well, and so you don't starve out the resilver.

Hybrid mode would look a little like this (untested):

vdev_queue_io(zio_t *zio)
...
    zio->io_flags |= ZIO_FLAG_DONT_QUEUE;
    zio->io_timestamp = gethrtime();

    if (!zio->io_vd->vdev_queue_io) {
        int length = vdev_queue_class_length(zio->io_vd, zio->io_priority);
        if (length == 0) {
            /* Queue is empty - bypass queue */
            zio->io_queue_state = ZIO_QS_NONE;
            return (zio);
        }
    }

I don't think we'd ever default queue_io to anything other than "on". You would have to manually switch it to "off" or "hybrid" to signal buy-in for the relaxed protections.

I don't think this should be a module param because you can have big discrepancies in latency and bandwidth between disks in a pool. Imagine you have a pool comprised of spinning disks, and some NVMe drives for a special device. You would want queue_io=on for the spinning disks and queue_io=off|hybrid for the NVMe. This is also true since you'd want slow IO detection for the spinning disks, but probably wouldn't care for NVMe. Now consider the same pool with spinning disks and NVMe but you only had a single, monolithic zfs_queue_io module parameter. Would you set it to "on" or "off" or "hybrid?

amotin reviewed Oct 1, 2024

View reviewed changes

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 1, 2024

MigeljanImeri force-pushed the add_vdev_prop branch from 7739374 to ca898ab Compare October 4, 2024 17:30

MigeljanImeri force-pushed the add_vdev_prop branch from ca898ab to af04e9a Compare October 4, 2024 17:39

tonyhutter reviewed Oct 7, 2024

View reviewed changes

MigeljanImeri force-pushed the add_vdev_prop branch from af04e9a to 6da75aa Compare October 10, 2024 19:17

bwatkinson requested a review from behlendorf October 23, 2024 00:34

amotin reviewed Oct 31, 2024

View reviewed changes

Add vdev property to toggle adding io to vdev queue

118abc4

Added vdev property to toggle queueing io to vdev queue when reading / writing from / to vdev. The intention behind this property is to improve performance when using o_direct. Signed-off-by: MigeljanImeri <[email protected]>

MigeljanImeri force-pushed the add_vdev_prop branch from 6da75aa to 118abc4 Compare November 13, 2024 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vdev property to bypass vdev queue #16591

Add vdev property to bypass vdev queue #16591

MigeljanImeri commented Oct 1, 2024

amotin left a comment •

edited

Loading

MigeljanImeri commented Oct 4, 2024

tonyhutter Oct 7, 2024

MigeljanImeri Oct 10, 2024

tonyhutter Oct 7, 2024

MigeljanImeri Oct 10, 2024

tonyhutter Oct 7, 2024

MigeljanImeri Oct 10, 2024

robn commented Oct 7, 2024

MigeljanImeri commented Oct 22, 2024

bwatkinson commented Oct 23, 2024 •

edited

Loading

tonyhutter commented Oct 23, 2024

MigeljanImeri commented Oct 23, 2024

tonyhutter commented Oct 23, 2024

tonyhutter commented Oct 23, 2024

robn commented Oct 24, 2024 •

edited

Loading

robn commented Oct 24, 2024

tonyhutter commented Oct 24, 2024

robn commented Oct 24, 2024 •

edited

Loading

tonyhutter commented Oct 24, 2024

tonyhutter commented Oct 30, 2024

MigeljanImeri commented Oct 30, 2024

tonyhutter commented Oct 30, 2024 •

edited

Loading

tonyhutter commented Oct 30, 2024

tonyhutter commented Oct 30, 2024

amotin Oct 31, 2024

MigeljanImeri Nov 13, 2024

amotin Oct 31, 2024

MigeljanImeri Nov 13, 2024

MigeljanImeri commented Nov 13, 2024

tonyhutter commented Nov 22, 2024

		verify_runnable "global"

		command -v fio > /dev/null \|\| log_unsupported "fio missing"

Add vdev property to bypass vdev queue #16591

Are you sure you want to change the base?

Add vdev property to bypass vdev queue #16591

Conversation

MigeljanImeri commented Oct 1, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

amotin left a comment • edited Loading

Choose a reason for hiding this comment

MigeljanImeri commented Oct 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robn commented Oct 7, 2024

MigeljanImeri commented Oct 22, 2024

bwatkinson commented Oct 23, 2024 • edited Loading

tonyhutter commented Oct 23, 2024

MigeljanImeri commented Oct 23, 2024

tonyhutter commented Oct 23, 2024

tonyhutter commented Oct 23, 2024

robn commented Oct 24, 2024 • edited Loading

robn commented Oct 24, 2024

tonyhutter commented Oct 24, 2024

robn commented Oct 24, 2024 • edited Loading

tonyhutter commented Oct 24, 2024

tonyhutter commented Oct 30, 2024

MigeljanImeri commented Oct 30, 2024

tonyhutter commented Oct 30, 2024 • edited Loading

tonyhutter commented Oct 30, 2024

tonyhutter commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MigeljanImeri commented Nov 13, 2024

tonyhutter commented Nov 22, 2024

amotin left a comment •

edited

Loading

bwatkinson commented Oct 23, 2024 •

edited

Loading

robn commented Oct 24, 2024 •

edited

Loading

robn commented Oct 24, 2024 •

edited

Loading

tonyhutter commented Oct 30, 2024 •

edited

Loading