Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular system freezing (independantly if a VM is running or not) #6918

Closed
q4747 opened this issue Sep 25, 2021 · 1 comment
Closed

Regular system freezing (independantly if a VM is running or not) #6918

q4747 opened this issue Sep 25, 2021 · 1 comment
Labels
C: other hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.

Comments

@q4747
Copy link

q4747 commented Sep 25, 2021

Qubes OS release

4.0

Brief summary

I experience regular freezes. They usually happen when I simply let the PC sit for a while (= ~1h) and don't interact with it.
But I also experienced freezes while I was working with it where I couldn't move my mouse and both the USB keyboard or the PS/2 keyboard didn't react.
Sometimes the clock in XFCE is still working correctly, sometimes it also freezes.
I experienced the freezes with both the USB Qube and without it (= USB devices in Dom0).
When a freeze occurs I can only force a power off (long pushing of the PC power button). No other keyboard combination seems to work and also the num block LED on the keyboard doesn't react anymore (not even on the PS/2 keyboard).
I did not experience freezing issues during the installation.

Steps to reproduce

I'm letting my system run for a bit more than an hour with no interactions and no VMs running (except for Dom0).

Expected behavior

The system should continue running without a freeze.

Actual behavior

The system freezes.

Hardware

  • Mainboard: AMD B550
  • CPU: Ryzen 7 5800x
  • Graphics Card: Geforce GTX 1060
  • RAM: 2*16 GB, 3600 MHz (with overclocking currently disabled)
  • SSD: Samsung 870 EVO

What I tried so far

  • I tried to find similar freezing tickets but couldn't find any which are matching my situation (hopefully I didn't miss anything)
  • Switching off all display power management settings in the Xfce Power Manager: It didn't reliably switch off everything (system is still locked after a while) but if the freeze didn't happen yet, I can still unlock the system without problems. So, I didn't observe a clear correlation between the Power Management kicking in and the system freeze.
  • Upgrading to the latest Dom0 kernel (5.13.6-1): No change.
  • Re-Installing the system from scratch with BTRFS instead of LVM-thin: This solved my other issue with very slow shutdowns of large VMs (see Shutting down domain takes a long time #5426) but didn't change the freezing problem described above.
  • Trying various usage patterns (only running certain VMs, only using PS/2 keyboard, removing network cable,...): No pattern identified and freeze always happens after some time.
  • I tried to find something in the logs but nothing seems to show up at the time the freeze happens. Attached below is a log example of a freezing instance where I turned on the PC, switched off all the VMs except for Dom0 and then didn't do anything except for checking every once in a while whether the LED of the keyboard num block can still be turned on/off. At 17:50 the LED still worked, at 18:08 it didn't work anymore (= I couldn't switch it on/off). So the freezing occurred between 17:50 and 18:08.

Output of journalctl -b -2 -p 0..4 -x (which doesn't seem to show anything related to my issue as all errors happen before the actual freeze):

Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 0, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 0, failed to setup threshold interrupt for bank 0, block 0 (MSRC0002003=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 0, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 0, failed to setup threshold interrupt for bank 1, block 0 (MSRC0002013=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 0, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: cpu 0 spinlock event irq 57
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 1, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 1, failed to setup threshold interrupt for bank 0, block 0 (MSRC0002003=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 1, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 1, failed to setup threshold interrupt for bank 1, block 0 (MSRC0002013=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 1, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: cpu 1 spinlock event irq 67
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 2, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 2, failed to setup threshold interrupt for bank 0, block 0 (MSRC0002003=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 2, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 2, failed to setup threshold interrupt for bank 1, block 0 (MSRC0002013=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 2, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: cpu 2 spinlock event irq 73
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 3, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 3, failed to setup threshold interrupt for bank 0, block 0 (MSRC0002003=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 3, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 3, failed to setup threshold interrupt for bank 1, block 0 (MSRC0002013=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 3, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: cpu 3 spinlock event irq 79
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 4, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 4, failed to setup threshold interrupt for bank 0, block 0 (MSRC0002003=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 4, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 4, failed to setup threshold interrupt for bank 1, block 0 (MSRC0002013=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 4, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: cpu 4 spinlock event irq 85
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 5, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 5, failed to setup threshold interrupt for bank 0, block 0 (MSRC0002003=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 5, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 5, failed to setup threshold interrupt for bank 1, block 0 (MSRC0002013=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 5, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: cpu 5 spinlock event irq 91
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 6, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 6, failed to setup threshold interrupt for bank 0, block 0 (MSRC0002003=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 6, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 6, failed to setup threshold interrupt for bank 1, block 0 (MSRC0002013=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 6, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: cpu 6 spinlock event irq 97
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 7, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 7, failed to setup threshold interrupt for bank 0, block 0 (MSRC0002003=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 7, try to use APIC510 (LVT offset 1) for vector 0xf9, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: mce: [Firmware Bug]: cpu 7, failed to setup threshold interrupt for bank 1, block 0 (MSRC0002013=0xd010000000000000)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: cpu 7, try to use APIC520 (LVT offset 2) for vector 0xf4, but the register is already in use for vector 0x0 on this cpu
Sep 23 16:09:25 dom0 kernel: cpu 7 spinlock event irq 103
Sep 23 16:09:25 dom0 kernel: Grant table initialized
Sep 23 16:09:25 dom0 kernel: wait_for_initramfs() called before rootfs_initcalls
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 23 16:09:25 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 23 16:09:25 dom0 kernel: hpet_acpi_add: no address or irqs in _CRS
Sep 23 16:09:25 dom0 kernel: i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
Sep 23 16:09:25 dom0 kernel: ata7.00: supports DRM functions and may not be fully accessible
Sep 23 16:09:25 dom0 kernel: ata7.00: supports DRM functions and may not be fully accessible
Sep 23 16:09:25 dom0 kernel: pciback 0000:04:00.0: no suspend buffer for PTM
Sep 23 16:09:25 dom0 kernel: acpi PNP0C14:01: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
Sep 23 16:09:25 dom0 kernel: acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
Sep 23 16:09:25 dom0 kernel: acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
Sep 23 16:09:25 dom0 kernel: acpi PNP0C14:04: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
Sep 23 16:09:25 dom0 kernel: usb: port power management may be unreliable
Sep 23 16:09:25 dom0 systemd-vconsole-setup[489]: /usr/bin/setfont failed with error code 71.
Sep 23 16:09:25 dom0 systemd-udevd[417]: Process '/usr/lib/systemd/systemd-vconsole-setup' failed with exit code 1.
Sep 23 16:09:25 dom0 kernel: usb 3-4: config 1 has an invalid interface number: 2 but max is 1
Sep 23 16:09:25 dom0 kernel: usb 3-4: config 1 has no interface number 1
Sep 23 16:09:40 dom0 kernel: kauditd_printk_skb: 3 callbacks suppressed
Sep 23 16:09:40 dom0 kernel: printk: systemd: 18 output lines suppressed due to ratelimiting
Sep 23 16:09:41 dom0 kernel: xen_acpi_processor: (CX): Hypervisor error (-14) for ACPI CPU1
Sep 23 16:09:41 dom0 kernel: xen_acpi_processor: (CX): Hypervisor error (-14) for ACPI CPU3
Sep 23 16:09:41 dom0 kernel: xen_acpi_processor: (CX): Hypervisor error (-14) for ACPI CPU5
Sep 23 16:09:41 dom0 kernel: xen_acpi_processor: (CX): Hypervisor error (-14) for ACPI CPU7
Sep 23 16:09:41 dom0 kernel: xen_acpi_processor: (CX): Hypervisor error (-14) for ACPI CPU9
Sep 23 16:09:41 dom0 kernel: xen_acpi_processor: (CX): Hypervisor error (-14) for ACPI CPU11
Sep 23 16:09:41 dom0 kernel: xen_acpi_processor: (CX): Hypervisor error (-14) for ACPI CPU13
Sep 23 16:09:41 dom0 kernel: xen_acpi_processor: (CX): Hypervisor error (-14) for ACPI CPU15
Sep 23 16:09:41 dom0 kernel: platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
Sep 23 16:09:41 dom0 kernel: sp5100-tco sp5100-tco: Watchdog hardware is disabled
Sep 23 16:09:41 dom0 kernel: Bluetooth: hci0: MSFT filter_enable is already on
Sep 23 16:09:41 dom0 systemd-udevd[847]: could not read from '/sys/module/pcc_cpufreq/initstate': No such device
Sep 23 16:09:41 dom0 xenstored[1675]: Checking store ...
Sep 23 16:09:41 dom0 xenstored[1675]: Checking store complete.
Sep 23 16:09:42 dom0 xenstored[1675]: write rate limit: domain 0 is affected
Sep 23 16:09:43 dom0 kernel: pciback 0000:04:00.0: no suspend buffer for PTM
Sep 23 16:09:46 dom0 kernel: kauditd_printk_skb: 91 callbacks suppressed
Sep 23 16:09:56 dom0 kernel: nouveau 0000:08:00.0: gr: intr 00000040
Sep 23 16:10:03 dom0 kernel: kauditd_printk_skb: 20 callbacks suppressed
Sep 23 16:10:03 dom0 lightdm[2604]: Could not chown user data directory /var/lib/lightdm-data/q: Error creating directory /var/lib/lightdm-data/q: File exists
Sep 23 16:10:04 dom0 qui-updates[2814]: gdk_window_thaw_toplevel_updates: assertion 'window->update_and_descendants_freeze_count > 0' failed
Sep 23 16:10:07 dom0 upowerd[1648]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:08.0/0000:05:00.1/usb1/1-6/1-6:1.2/0003:046D:C52B.0
Sep 23 16:10:30 dom0 kernel: pciback 0000:04:00.0: no suspend buffer for PTM
Sep 23 16:10:58 dom0 kernel: kauditd_printk_skb: 11 callbacks suppressed
Sep 23 16:14:02 dom0 xenstored[1675]: write rate limit: not in force recently
Sep 23 16:48:40 dom0 xscreensaver[2786]: pam_unix(xscreensaver:auth): conversation failed
Sep 23 16:48:40 dom0 xscreensaver[2786]: pam_unix(xscreensaver:auth): auth could not identify password for [q]
Sep 23 16:48:58 dom0 xscreensaver[2786]: pam_unix(xscreensaver:auth): conversation failed
Sep 23 16:48:58 dom0 xscreensaver[2786]: pam_unix(xscreensaver:auth): auth could not identify password for [q]
Sep 23 17:21:36 dom0 xscreensaver[2786]: pam_unix(xscreensaver:auth): conversation failed
Sep 23 17:21:36 dom0 xscreensaver[2786]: pam_unix(xscreensaver:auth): auth could not identify password for [q]
Sep 23 17:50:45 dom0 xscreensaver[2786]: pam_unix(xscreensaver:auth): conversation failed
Sep 23 17:50:45 dom0 xscreensaver[2786]: pam_unix(xscreensaver:auth): auth could not identify password for [q]

Update: Updated the freezing description and added hardware descriptions.

@q4747 q4747 added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug labels Sep 25, 2021
@andrewdavidwong andrewdavidwong added C: other hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Sep 25, 2021
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone Sep 25, 2021
@q4747
Copy link
Author

q4747 commented Oct 4, 2021

Update: After much more testing I have strong evidence that it's a problem with the SSD.

I now noticed that my Qubes 4.0 installation on another SSD doesn’t seem to be affected by those freezes (at least I didn’t get one all day). Both setups are basically identical (BTRFS file system; running on exactly the same PC; …). Only difference seems to be the type of SSD. I’m not aware that anything else is different.

With this SSD I’m experiencing the freezes: Samsung 870 EVO 4TB
With this SSD I only experienced one freeze so far: Samsung 860 EVO 2TB (the one freeze was after having made several changes and upgrades and the system seemed to be stable after the next reboot and didn't freeze again for a long time)

I'm closing this issue, as I'm currently assuming that it's some kind of SSD issue (either a hardware defect or some kind of SSD controller upgrade that would be required).

This topic is further discussed in this thread.

@q4747 q4747 closed this as completed Oct 4, 2021
@andrewdavidwong andrewdavidwong removed the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: other hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.
Projects
None yet
Development

No branches or pull requests

2 participants