Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCIe Bus Errors with Realtek 8822BE when ASPM Enabled #296

Open
besworks opened this issue Jan 23, 2025 · 6 comments
Open

PCIe Bus Errors with Realtek 8822BE when ASPM Enabled #296

besworks opened this issue Jan 23, 2025 · 6 comments

Comments

@besworks
Copy link

Description of the issue:

I am experiencing PCIe Bus Errors with a Realtek 8822BE Wi-Fi card on my system when ASPM (Active State Power Management) is enabled. Disabling ASPM globally resolves the issue, but lspci still reports ASPM as enabled for the device, which seems inconsistent.

Hardware and system configuration:

  • Wi-Fi Card: Realtek 8822BE
  • Kernel Version: 6.12.10 (Arch Linux)
  • PCI Device ID: 10ec:b822
  • Platform: Lenovo ThinkPad E580

Steps to reproduce the issue:

  1. Use the built-in rtw88 driver or the rtw_8822be driver from this repo.
  2. Boot with ASPM enabled (default behavior).
  3. Monitor dmesg logs for PCIe Bus Errors.

Observed behavior:

The following errors are observed repeatedly in the kernel log:

Jan 22 16:29:44 JET kernel: pcieport 0000:00:1d.2: AER: Multiple Correctable error message received from 0000:05:00.0
Jan 22 16:29:44 JET kernel: rtw_8822be 0000:05:00.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
Jan 22 16:29:44 JET kernel: rtw_8822be 0000:05:00.0:   device [10ec:b822] error status/mask=00000001/00006000
Jan 22 16:29:44 JET kernel: rtw_8822be 0000:05:00.0:    [ 0] RxErr                  (First)

Disabling ASPM globally via kernel parameter (pcie_aspm=off) prevents these errors from appearing, but running lspci -vvv -s 05:00.0 still reports : LnkCtl: ASPM L0s L1 Enabled. This suggests that ASPM may still be enabled for the device, or the lspci output is inaccurate.

Driver options tested:

I tried setting the disable_aspm option in the rtw_pci driver, but this did not disable ASPM for the device. The errors persisted.

Expected behavior:

When ASPM is disabled via driver options or globally (pcie_aspm=off), lspci should not report ASPM as enabled for the device. Additionally, there should be no PCIe Bus Errors related to the device.

Additional notes:

This issue is critical as it causes frequent, brief disconnects that interrupt workflow and disabling ASPM globally is not a proper solution for a laptop. I would appreciate guidance or a fix for properly disabling ASPM for the Realtek 8822BE device.

Thank you for your time and effort!

@dubhater
Copy link
Collaborator

disable_aspm is a bit special. It's not enough to reboot after you change it. You must shut down the computer. Did you shut it down?

Did you use the correct module name? It's rtw88_pci for the built-in driver, rtw_pci for this repository.

@besworks
Copy link
Author

It did do a full shutdown recently for a firmware update on my system after installing this driver. I'm pretty sure that I tried adding disable_aspm before this but I'm not 100% certain of the sequence of events now. I am definitely using the correct driver.

/etc/modprobe.d $ lsmod|grep rtw
rtw_8822be             12288  0
rtw_8822b             233472  1 rtw_8822be
rtw_pci                40960  1 rtw_8822be
rtw_core              335872  2 rtw_8822b,rtw_pci
mac80211             1638400  2 rtw_core,rtw_pci
cfg80211             1396736  2 rtw_core,mac80211
/etc/modprobe.d $ cat 50-rtw_pci.conf 
options rtw_pci disable_aspm=y disable_msi=y
options rtw_core disable_lps_deep=y
#options rtw_core support_bf=y
#options rtw_core debug_mask=0

I can't shutdown right at the moment, but I will try removing pcie_aspm=off from my kernel parameters and do a hard reboot later to be sure.

@besworks
Copy link
Author

I can now 100% confirm that adding the option disable_aspm=y to the rtw_pci driver does not disable ASPM for this card.

I performed the following test :

  • Removed the pcie_aspm=off kernel parameter from my boot stanza.
  • Disconnected my external power source.
  • Rebooted into my firmware config and set the system into Repair Mode which internally disconnects the battery until the next time the charger is connected.
  • Powered off the system for several minutes then plugged it in and started it back up.

After performing these steps, lspci -vvv -s 05:00.0 still reports LnkCtl: ASPM L0s L1 Enabled; and the PCI Bus errors have reappeared in my logs.

@briansune
Copy link

@besworks

I have high chance to be wrong but.

" internally disconnects the battery until the next time the charger is connected."

Will this also clear the CMOS when the cell-battery is getting old resetting the BIOS back to default settings?

Just curious, I might not be correct from first place.

@besworks
Copy link
Author

Will this also clear the CMOS...?

It's UEFI, not BIOS. The settings are stored in NVRAM and are not erased when power is lost.

@briansune
Copy link

@besworks

Okay now I understand. I am old man sorry, still stop at the 90s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants