Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grml-hwinfo: Include some more error outputs #12

Open
jkirk opened this issue Jun 8, 2024 · 3 comments
Open

grml-hwinfo: Include some more error outputs #12

jkirk opened this issue Jun 8, 2024 · 3 comments

Comments

@jkirk
Copy link
Contributor

jkirk commented Jun 8, 2024

Some more error output is missing:

  root@grml ~ # DISPLAY=:0.0 grml-hwinfo
  grml-hwinfo 0.17.1 - collect hardware information
  Output file:      /root/grml-hwinfo-2024-06-08--18-21-24-UTC.tar.bz2

  This might take a few seconds/minutes. Please be patient...
  pcilib: sysfs_read_vpd: read failed: No such device
  Starting sysdump...
    NOTE: if it seems to be hanging at this stage file a bug report with output of:
          lsof -p $(pgrep -f $(which sysdump))
  Execution of sysdump finished.
  Error: /dev/sda: unrecognised disk label
  MODE SENSE(10): Malformed SCSI command

  root@grml ~ # lspci -vvnn > /dev/null
  pcilib: sysfs_read_vpd: read failed: No such device
  root@grml ~ # parted -s /dev/sda print > /dev/null
  Error: /dev/sda: unrecognised disk label
  1 root@grml ~ # sdparm --all --long /dev/sdb > /dev/null
  MODE SENSE(10): Malformed SCSI command
  97 root@grml ~ #   

I was using an older Grml daily. So, the sdparm problem was fixed in #10, but we should include the error outputs of lspci and parted and most probably some other tools.

I also think that putting the error output in a separate file is problematic, as one can not see where the error actually occurs.
But on the other hand look at this:
The output of pcilib: sysfs_read_vpd: read failed: No such device is "somewhere else":

root@grml ~ # lspci -vvnn 2>&1 | grep -C 10 pcilib 
	Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W
		DevCtl:	CorrErr+ NonFatalErr+ Fatapcilib: sysfs_read_vpd: read failed: No such device
lErr+ UnsupReq+
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

root@grml ~ # lspci -vvnn 
[...]
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: No such device
		Not readable
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 62-04-00-00-68-4c-e0-00
	Kernel driver in use: r8169
	Kernel modules: r8169

FTR, pcilib: sysfs_read_vpd: read failed: No such device is not an "actual" error, there is just no VPD EEPROM present: https://bugzilla.kernel.org/show_bug.cgi?id=199467

(But this is a bug and this isn't handled well in pcilib.)

mika added a commit that referenced this issue Aug 2, 2024
Otherwise executing `parted -s /dev/nvme0n1 print` might throw something
like this for empty disks:

  Error: /dev/nvme0n1: unrecognized disk label

Related to #12
mika added a commit that referenced this issue Aug 2, 2024
Otherwise executing lspci might spill something like:

  # lspci -vvnn > /dev/null
  pcilib: sysfs_read_vpd: read failed: No such device

Which according to https://bugzilla.kernel.org/show_bug.cgi?id=199467
isn't an "actual" error, but there is just no VPD EEPROM present.
But as long as it shows up on stderr, it behaves like an error, so
let's also tread it like an error. :)

Related to #12
Thanks: Darshaka Pathirana for the bug report
@mika
Copy link
Member

mika commented Aug 2, 2024

So I also stumbled upon the parted issue on my own and took care of this, see commit 8293591

The sdparm issue was already taken care of in 5f91136 AKA #11

The lspci issue is interesting, though I don't agree with https://bugzilla.kernel.org/show_bug.cgi?id=199467#c6, quoting:

It's neither a bug nor an actual error.
The message simply means that the optional VPD EEPROM isn't present.
The ticket should be closed.

Either you report it to stderr because it's an error or not? ;)

Instead I fully agree with:

(But this is a bug and this isn't handled well in pcilib.)

So for the time being let's also report lspci's stderr to a separate file, as we tend to do, done in commit bbfd3b1

But I agree also with @jkirk's:

I also think that putting the error output in a separate file is problematic,
as one can not see where the error actually occurs.

Though this needs further redesign of how grml-hwinfo works, maybe let's discuss this before closing this issue?

@jkirk
Copy link
Contributor Author

jkirk commented Aug 6, 2024

Quick idea: What about a third(?) "full" log file for every output where we put 'stdout' and 'stderr' in one file?

@mika
Copy link
Member

mika commented Aug 6, 2024

Quick idea: What about a third(?) "full" log file for every output where we put 'stdout' and 'stderr' in one file?

Sorry, don't understand your idea or how that exactly should look like 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants