-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vioscsi] Fix IntLock regression in commit 15e64ace. #1293
base: master
Are you sure you want to change the base?
[vioscsi] Fix IntLock regression in commit 15e64ace. #1293
Conversation
99569b3
to
3ca1c1c
Compare
@benyamin-codez - We spotted this regression from commit: 15e64ac Copying some of the internal repro data from @sb-ntnx
|
I would note this might be a race condition at start before passive initialization has completed, and I would like to do some more instrumentation and testing of that, but time is short, and I'm now on PTO. Expect further analysis in a week or so. |
Thanks for the heads-up and the data, Jon. I'm curious under which circumstances the NMI is produced. In the parent PR (#1175), the crash dump / hibernation path also had a mention, reproduced here for your convenience.
Regarding Martin's commit, it appears the changes in I note it also appears this commit has an SDV failure, but I cannot immediately see a reason as to why... imho it would be good if we can retain the mnemonic semantics of So give me a moment to run that up and I will raise a PR for your team's consideration and further testing. |
Jon and Sergey, Firstly, I have been unable to reproduce an NMI (using Win11 24H2), even when using v266, whilst performing hibernation. It's possible we have an issue when asking Windows to suspend, but when using In any case, I've raised PR #1294 to potentially address this regression in what I guess we historically called the crashdump + resume / hibernation pathway. I should note the As mentioned above, in my testing hibernate and suspend/resume didn't cause any NMIs, and I was only able to observe the Are you able to share anything further about the environment or the test scenario?
If you are using a I guess it is also possible this is something related to vhost plumbing? Could I ask you to try the commit in PR #1294 with your reproducer to see if that resolves the issue? Using |
@benyamin-codez , the problem happens during vioscsi driver initialization on either installation / upgrade of a driver or SCSI controller start (disable and enable it from the device manager). The problem is that a VM hangs and seems to be due a spinlock not being released. NMI part is not relevant to the problem and corresponds NMI interrupt injections to get a memory dump. I can share the memory dump with vioscsi if it can be of any use. |
Does it BSOD or just hang?
I sent you an email if you want to send me a link. The values of these variables (Locals) from the debugger might be sufficient.
|
@sb-ntnx
Except in the event crashdump processing requires that no spinlocking occurs. Also use of |
… (regression) Credit to Nutanix and in particular @MartinCHarvey-Nutanix for his work in PR virtio-win#1293. Background: We previously ignored calls for a spinlock with isr=TRUE in VioScsiVQLock() and VioScsiVQUnlock(). This was replaced with a call to InterruptLock in the (!IsCrashDumpMode && adaptExt->dpc_ok) = FALSE pathway. In testing, suspend/resume/hibernate did not use this pathway but instead issued DPCs. The InterruptLock was presumed to be used when IsCrashDumpMode=TRUE. Also, using PVOID LockContext = NULL, and / or then setting LockContext to &adaptExt->dpc[vq_req_idx], appears to cause a HCK Flush Test failure. Created new overloaded enumeration called CUSTOM_STOR_SPINLOCK which adds some new (invalid) spinlock types such as Skip_Locking and No_Lock. Also provides InvalidLock for builds prior to NTDDI_WIN11_GE (Windows 11, version 24H2, build 26100) via Invalid_Lock. In similar vein, Dpc_Lock = DpcLock, StartIo_Lock = StartIoLock, Interrupt_Lock = InterruptLock, ThreadedDpc_Lock = ThreadedDpcLock & ThreadedDpc_Lock = ThreadedDpcLock. This fix has two components: 1. Only DpcLock type spinlocks are processed in ProcessBuffer() with all other types presently being ignored ; and 2. The (PVOID)LockContext is no longer used, with calls to StorPortAcquireSpinLock() for DpcLock type spinlocks using &adaptExt->dpc[vq_req_idx] directly. Note: Use of InvalidLock requires Win11 and both InvalidLock and DpcLevelLock require using StorPortAcquireSpinLockEx. Consider for future use. Signed-off-by: benyamin-codez <[email protected]>
… (regression) Credit to Nutanix and in particular @MartinCHarvey-Nutanix for his work in PR virtio-win#1293. Background: We previously ignored calls for a spinlock with isr=TRUE in VioScsiVQLock() and VioScsiVQUnlock(). This was replaced with a call to InterruptLock in the (!IsCrashDumpMode && adaptExt->dpc_ok) = FALSE pathway. In testing, suspend/resume/hibernate did not use this pathway but instead issued DPCs. The InterruptLock was presumed to be used when IsCrashDumpMode=TRUE. Also, using PVOID LockContext = NULL, and / or then setting LockContext to &adaptExt->dpc[vq_req_idx], appears to cause a HCK Flush Test failure. Created new overloaded enumeration called CUSTOM_STOR_SPINLOCK which adds some new (invalid) spinlock types such as Skip_Locking and No_Lock. Also provides InvalidLock for builds prior to NTDDI_WIN11_GE (Windows 11, version 24H2, build 26100) via Invalid_Lock. In similar vein, Dpc_Lock = DpcLock, StartIo_Lock = StartIoLock, Interrupt_Lock = InterruptLock, ThreadedDpc_Lock = ThreadedDpcLock & ThreadedDpc_Lock = ThreadedDpcLock. This fix has two components: 1. Only DpcLock type spinlocks are processed in ProcessBuffer() with all other types presently being ignored ; and 2. The (PVOID)LockContext is no longer used, with calls to StorPortAcquireSpinLock() for DpcLock type spinlocks using &adaptExt->dpc[vq_req_idx] directly. Note: Use of InvalidLock requires Win11 and both InvalidLock and DpcLevelLock require using StorPortAcquireSpinLockEx. Consider for future use. Signed-off-by: benyamin-codez <[email protected]>
@benyamin-codez OK, this looks pretty good. I still have a bit more testing to go, but this looks pretty sound so far. |
vioscsi/vioscsi.c
Outdated
if (adaptExt->dpc_ok) | ||
{ | ||
RhelDbgPrint(TRACE_LEVEL_WARNING, "Unexpected: DPC already initialized.\n"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would likely be hit following ScsiRestartAdapter
and the call to VioScsiHwReinitialize()
.
It might be worth appending " Adapter may have restarted." or similar to the debug message.
Perhaps another reason adaptExt->dpc_ok
should be adaptExt->dpc_ready
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Made the debug print less worrying. The restart cases are not easy for me to test manually, but I note your HCK tests do test PnP stop / restart / rebalance etc, so they should check this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might find wide support too:
adaptExt->dpc_ok
should beadaptExt->dpc_ready
...
vioscsi/vioscsi.c
Outdated
//Sharing of interrupts and "isServiced" based on reading ISR regs to disambig | ||
//line based interrupts. Interrupt is ours, but at the wrong time, | ||
//do not change isInterruptServiced value. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this else
path where !adaptExt->dump_mode && !adaptExt->dpc_ok
is a new pathway that skips ProcessBuffer()
. The previous behaviour was to run ProcessBuffer()
without spinlocks. Are we assuming here that the interrupt has already achieved the purpose for which it was called and we can now safely return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should never happen now that we have moved virtio_device_ready to the point where we know we have dpc's initialized. To get an interrupt at this point would be early on in the initialization (no previous reinitialization), and would (IMO) be considered a hardware (...emaulated hardware) error. Devices shouldn't interrupt until you've told them you can handle interrupts.
Some previous diagnostic code I had or'ed in a status bit such that the buffer processing was subsequently done when the dpc was ready, but it got tricky for line-based interrupts because you had to store the queue number somewhere.
I notice that most int processing for "non message" queues (ie control q, and events q) is done in the handler, with later message processing deferred. The great thing about queues is that you can leave stuff queued up, provided there is a subsequent interrupt. Indicating virtio_device_ready subsequently should produce that interrupt.
Note that this is in the line-based interrupt handling, and current impl of the device (and likely all subsequent impl) is MSI based. Similar logic for the MSI case is in DispatchQueue
, where we assume it's okay to leave things queued in the expectation of an interrupt when we call virtio_device_ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this
else
path where!adaptExt->dump_mode && !adaptExt->dpc_ok
is a new pathway
Second thoughts. Maybe this allowance for processing with no buffering is a previous workaround for someting else? In that case, I can change the behaviour back to the way it was. I'd still like to move the virtio_device_ready call to later,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second thoughts. Maybe this allowance for processing with no buffering is a previous workaround...
That might be the safest option.
Only when ((virtio_read_isr_status(&adaptExt->vdev) == 1) || adaptExt->dump_mode)
is FALSE
do we return early. So when TRUE
we process further, and that's usually for a reason...
I'd still like to move the virtio_device_ready call to later.
Provided we only use DPCs, I don't see any reason why virtio_device_ready()
can't stay in VioScsiPassiveInitializeRoutine()
. We do return FALSE;
when !StorPortEnablePassiveInitialization()
so I don't think this will be an issue, but @vrozenfe is the one who would likely know best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still keeping this new pathway...?
vioscsi/vioscsi.c
Outdated
else | ||
{ | ||
RhelDbgPrint(TRACE_LEVEL_WARNING, "Spurious interrupt (Messageid 0x%x), ditching\n", MessageId); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add your comment in VioScsiInterrupt()
re this here as well?
I have PR #1214 in the pipe, which if merged, will consolidate these calls into this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment in VioScsiInterrupt refers to the disambiguation of interrupt ownership by reading the ISR, not the subsequent DPC processing. The comment in the line based ISR routine is that should accurately return to the OS whether the interrupt was generated by your device, not whether you have your house in order well enough to perform all subsequent processing.
This allows the OS to refrain from calling all later handlers on anything that shares some legacy PIC/APIC line, and then getting confused about whether the int has actually been services by any device.
It does not apply to MSI(X) handlers, which (when real hardware exists) are effectively in-band pci/read write operations, and always confined to a physical or logical device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think you will still keep this code path unlike in the line based pathway?
Methinks the same logic would apply here too, perhaps more so, certainly if PR #1214 merges.
We should definitely still refactor the call to ProcessBuffer()
into the else
branch though...!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still keeping this new pathway...?
Interesting approach, Martin.
Per the MS doco you linked in the other PR, provided the Others will likely need to weigh in on whether we can skip
Can you confirm that's working now? |
Once I've placated my boss, juggled the 5 other tasks I have on my plate, set up some VM's, and got a SDV build to work. I'll have an answer for you. Normal operation works. I will push changes to this review when satisfied crashdump handling has not regressed, |
Only 5...!?!?! 8^d 8^D Thanks for giving the above some consideration. I appreciate your collaborative approach. I'll mod my WIP - with something close to what I presume you will end up with - and have a play... 8^d |
Changes following code feedback/review. Also retested both normal operation, and crashdump writing to secondary drive. @YanVugenfirer if passes HCK, request merge soon as regresses a lot of our VM's. |
@benyamin-codez If you're feeling keen, you could merge this across to virtio-stor, because the issue is there was well. And, depending on merge order of PR's please don't undo it by merging over the top, thanks. |
Seems only fair. Thanks for cleaning up in aisle 5... 8^d
No problem, refactoring already.... |
1d8f864
to
3d8c6b1
Compare
Reverting to simplest and most conservative version of this fix. |
Why not just revert the patch that caused the problem? Best, |
Primarily because there was a performance improvement removing the spinlock managers. |
There was also a fair bit of mnemonic, semantic refactoring and some preparatory changes, e.g.:
This latter work is in PR #1214, which I am presently updating. |
I'm happy with any solution which changes the locking behaviour back to the way it was - my only reason in include additional definitions here is to make clear what the software is doing in order to prevent the same mistake in any subsequent refactors._ |
I also think this solution is much more elegant than the previous spinlock managers. The mnemonic changes Martin mentions would have helped to avoid my mistake in the first place too. I had mistakenly thought that a combination of |
@vrozenfe @MartinCHarvey-Nutanix Also, let's not forget, the relocation of
Vadim, perhaps you have a different view on this change..?
|
Martin, I've closed PR #1294, so hopefully all eyes are on this one. |
I saw some of the auto-HCK tests (flush test and disk logo tests) not looking so good, so reverted it to something more conservative. It would be nice to have a better statistical idea of what works/fails in the automated tests and what's made them better or worse. For the moment, on the basis of 3 or 4 tests passed or failed, can't say either way. |
Oh, I didn't realise you dropped that part. The new pathways ( Subject to @vrozenfe's advice, I would put it back in, but perhaps he and Yan would prefer a new PR for it. Re
I agree. Even with the Here too is my implementation with tracing in situ per PR #1214: kvm-guest-drivers-windows/vioscsi/vioscsi.c Lines 1450 to 1481 in 38fba18
The remainder is implemented in PR #1228, but we will see how far that gets up... |
@benyamin-codez I am waiting for feedback / approval / denial from a maintainer before proceeding on this. |
Yes. @vrozenfe and @YanVugenfirer still need to review and approve. LGTM though. I think it best to raise new PRs for relocation of |
Plus maybe edit your OP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM at firsh blush, will defer to Vadim et al
Martin, you may want to fix that clang format issue I mentioned here. |
I'd be OK with that, but this is perhaps a little clearer about what the code is doing. |
This corrects interrupt/spinlock acquision code regressed previously. Locking in the driver is now the same as prior to that point, and basic functionality is restored. Initial regression testing has been performed. There are however, some outstanding questions as to whether the driver performs correctly in the crashdump/hibernation path. I will be performing additional regression testing for those cases, and MSI(X), and issuing further patches if necessary. Ref: Nutanix ENG-741981. Signed-off-by: Martin Harvey <[email protected]>
3d8c6b1
to
ff9dc88
Compare
No clang-format... 8^( EDIT: Sorted..! 🙏 |
@vrozenfe @benyamin-codez It is important we fix this regession, however, given various other things I'm working on at the mo, I'm afraid I don't have time to babysit this particular PR. Can I leave it to one of you to run with this? |
I think you've done all that you can. I'm sure Yan and Vadim will review ASAP. The fact multiple HCK-CI checks have run and passed (the failures are known issues with the HCK) is perhaps indicative that visibility of their review and (hopefully) approval and merging will occur soon. Can you confirm the regression broke crash dumping...? If so, it's likely of high importance this is merged ASAP. For the reviewers, perhaps it is prudent to briefly revisit:
After further consideration, I should have said, primarily because I didn't split the commits enough to do a proper migration. The performance improvement and the other reasons mentioned above are still valid, but this is the real reason we couldn't easily just revert, or revert without consequence. I'll try to do better with that in the future. I also deferred these issues to a subsequent, speculative, future PR (to be raised after PR #1214 had merged), which in hindsight, was ill-advised. It would have been better to flesh out the remaining issue, i.e. how can
In any case, IMHO I would suggest merging this PR is the superior solution to the problem. For the benefit of future readers, I am also minded to share a couple of observations:
Anyway, thank you again Martin for your work on this. |
@MartinCHarvey-Nutanix @benyamin-codez Thanks a lot for your work. I am going to review the PR in next couple of days. It is also important that Vadim will review it as well. |
This corrects interrupt/spinlock acquision code regressed previously. The original code previously skipped lock acquision if invoked directly from the ISR. The regression then made it acquire an interrupt spinlock.
This PR restores the original behaviour.
Subsequent PR's also needed to deal with:
Ref: Nutanix ENG-741981.