Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush stor #486

Merged
merged 4 commits into from
Oct 13, 2024
Merged

Flush stor #486

merged 4 commits into from
Oct 13, 2024

Conversation

kostyanf14
Copy link
Contributor

No description provided.

The flush test requires sending UDP packets to a PDU device.
We will emulate this device on the host, so we need a connection
between client VMs and the host.

Attaching the br_world to the client is bad because a client
can download updates/drivers/applications from the Internet
and spend CPU/RAM resources in vain.

Signed-off-by: Kostiantyn Kostiuk <[email protected]>
Copy link
Contributor

@akihikodaki akihikodaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://oidref.com/1.3.6.1.4.1.318.1.1.12.3.3.1.1.4 says it has delayedReboot so you may check if the value is used. I don't require to do so as Microsoft's documentation does not mention it.

The flush test sends an SNMP request to PDU to reset the
power on the hardware PC. As the flush test sends only one
type of request we don't need to parse it and can just wait
for any UDP packet and perform a hard reset of the VM.

Signed-off-by: Kostiantyn Kostiuk <[email protected]>
Signed-off-by: Kostiantyn Kostiuk <[email protected]>
Signed-off-by: Kostiantyn Kostiuk <[email protected]>
@kostyanf14 kostyanf14 changed the title RFC: Flush stor Flush stor Oct 13, 2024
@kostyanf14 kostyanf14 merged commit 4d188c2 into HCK-CI:master Oct 13, 2024
9 checks passed
@kostyanf14 kostyanf14 deleted the flush-stor branch October 13, 2024 10:15
@benyamin-codez
Copy link

@kostyanf14

Thanks for the reference.

I noticed the MS doco suggested the following ASNs are required:

  • ImmediatePowerOn = 1
  • ImmediatePowerOff = 2
  • ImmediateReboot = 3

[sic] - I'm presuming typographic errors here, and these should actually be:

  • immediateOn = 1
  • immediateOff = 2
  • immediateReboot = 3

...from https://oidref.com/1.3.6.1.4.1.318.1.1.12.3.3.1.1.4 (the relevant OID):

rPDUOutletControlOutletCommand OBJECT-TYPE
   SYNTAX INTEGER {
      immediateOn             (1),
      immediateOff            (2),
      immediateReboot         (3),
      delayedOn               (4),
      delayedOff              (5),
      delayedReboot           (6),
      cancelPendingCommand    (7)
   }

As the first two, i.e. immediateOn and immediateOff, are read-write, i.e. set/get, perhaps we are performing the system-reset via QMP, not only when the test issues the command but perhaps also when the test issues a query to check the power state.

So when asked to check the power state, perhaps we are resetting instead... 8^O

This might explain the erratic results seen in virtio-win...
Perhaps we need to tee the socket data to a SNMP parser to check...
...but it might be quicker / easier to add the requisite features to the listener here... 8^d

@kostyanf14
Copy link
Contributor Author

@benyamin-codez

No, you are not correct. We analyze requests, and HLK always sends ImmediateReboot and nothing else.

@kostyanf14
Copy link
Contributor Author

@benyamin-codez Also, if you open HLKX results where Flush test failed, you can see the error. Error looks like this: System reset was requested but system not rebooted.

@benyamin-codez
Copy link

@kostyanf14

Thanks for the prompt reply.

No, you are not correct. We analyze requests, and HLK always sends ImmediateReboot and nothing else.

I saw the socket.recvfrom(0), with the zero-sized buffer, and presumed you weren't looking...

Also, if you open HLKX results where Flush test failed, you can see the error. Error looks like this:
System reset was requested but system not rebooted.

Oh, ok. I was checking Failed: Flush Test.zip > Test Run > WttEA.log instead
I've only seen 0x80070002 (ERROR_FILE_NOT_FOUND errors) in the dozen or so I've checked.

I'll grab the Studio and have a peek... 8^d
Can you tell I'm trying to avoid installing the whole HLK...? 8^D It seems like quite a beast...

@kostyanf14
Copy link
Contributor Author

I saw the socket.recvfrom(0), with the zero-sized buffer, and presumed you weren't looking...

During development, we checked this data and found that it can be ignored.

For example:
virtio-win/kvm-guest-drivers-windows#1304 - viostor-Win2022x64
image

The error is not related to the storage driver but something with HLK.

@benyamin-codez
Copy link

Yeah, that's what I see in the few I peeked into.
The actual test passes and then an error occurs during test cleanup.
Apparently it occurs whilst "Initializing and formatting the disk"...
So it doesn't appear to be related to this SNMP kit or reboots at all - at least in these cases.

Is it possible the disk is OFFLINE and so it cannot be initialised or formatted?
... If so, this is something we might need to address in the driver...
Is it perhaps something to do with the type of disk backing, i.e. COW or RAW?
This appears to be running (and failing) on non-bootable disks (as expected - except the failing part)...
I presume there's no AV or anything else interfering with anything in the boot block...

Quite annoying...! 8^(

@benyamin-codez
Copy link

System reset was requested but system not rebooted.

Do you perhaps see that in QEMU logs..?

The 0x8007007E error (missing module), i.e.:
Caught std::exception: onecore\drivers\storage\storutil\disk.cpp(478)\Storage.Tests.FlushTest.dll!...
... would seem to indicate part of the HLK is missing or misconfigured.

Possibly also the test is wiping the disk where the module is expected...?!?!
As in format is called which wipes the disk where Storage.Tests.FlushTest.dll is...?
e.g. C:\HLK\JobsWorkingDir\Tasks\WTTJobRunDCAC4D33-DFF4-EF11-BD88-56000200DDDD\Storage.Tests.FlushTest.dll

The 0X80070002 errors are usually related to the trace being missing (that could also be a hint):
e.g. C:\HLK\JobsWorkingDir\Tasks\WTTJobRunDCAC4D33-DFF4-EF11-BD88-56000200DDDD\Te.wtl

Is the storage::FlushFUATest::TestCleanup task misconfigured with or hard-coded to C:\ or \\?\ or \Device.....?!?

@kostyanf14
Copy link
Contributor Author

You can configure only the following options:

  • IP IP Address of Remote PDU
  • OID OID of Remote PDU outlet
  • Outlet Port of Remote PDU outlet
  • Community Community of Remote PDU (e.g. private)

All other configurations are done automatically by HLK (not AutoHCK)

@kostyanf14
Copy link
Contributor Author

@benyamin-codez Let's continue in the discussion thread #608. This is more complicated to find this PR and this creates extra mail noise for other people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants