Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in createTimeWithContext when run on FreeBSD arm64 #1636

Open
sdalu opened this issue May 2, 2024 · 18 comments
Open

Crash in createTimeWithContext when run on FreeBSD arm64 #1636

sdalu opened this issue May 2, 2024 · 18 comments

Comments

@sdalu
Copy link

sdalu commented May 2, 2024

Describe the bug
When running telegraph 1.31.0~553d972c, with gopsutil v3.24.4, it crash in the createTimeWithContext.

2024-05-02T13:09:40Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 147 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d410370)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x67aa400, 0xc587b20})
	/usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4d0a0368, {0x8232a44, 0xc9983c0})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process_freebsd.go:121 +0x4c
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4d0a0368, {0x8232a44, 0xc9983c0})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:310 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x8232a44, 0xc9983c0}, 0x3744)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:218 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(...)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:203
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x3744)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x30
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4ccc6e48, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:209 +0x848
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4ccc6e48, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:166 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d410370, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/models/running_input.go:227 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
	/go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 120
	/go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x4d310b68)
	/usr/local/go/src/runtime/sema.go:62 +0x3c
sync.(*WaitGroup).Wait(0
2024-05-02T13:09:40Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

Issue initially reported: influxdata/telegraf#13933

Environment (please complete the following information):
FreBSD:

14.0-RELEASE-p4
14.0-RELEASE-p4
14.0-RELEASE-p4

FreeBSD brain.home.sdalu.com 14.0-RELEASE-p4 FreeBSD 14.0-RELEASE-p4 #0 releng/14.0-n265400-4edf3b80733e: Wed Jan 17 19:32:19 CET 2024 [email protected]:/usr/obj/usr/src/arm64.aarch64/sys/BROADCOM arm64

@shirou
Copy link
Owner

shirou commented May 3, 2024

I can pass go122 test -run Test_Process_CreateTime on my QEMU environment. Any other information or could you provide a minimal reproducible code?

# go122 version
go version go1.22.1 freebsd/arm64

# uname -a
FreeBSD generic 14.0-RELEASE FreeBSD 14.0-RELEASE #0 releng/14.0-n265380-f9716eee8ab4: Fri Nov 10 08:59:18 UTC 2023     [email protected]:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64

@sdalu
Copy link
Author

sdalu commented May 3, 2024

I'm just testing with the artifact freebsd_armv7.tar.gz provided influxdata/telegraf#15272 . Hoping that @powersj could tell you more about the build.
The computer on which I tested is a Raspberry PI 4.

I'm not a go user, how do I run go test -run Test_Process_CreateTime ?

I tried:

git clone  https://github.com/shirou/gopsutil
cd gopsutil
go test -run Test_Process_CreateTime

#  result I got:
?   	github.com/shirou/gopsutil/v3	[no test files]

@powersj
Copy link
Contributor

powersj commented May 13, 2024

@sdalu,

Can you run go test -v -run Test_Process_CreateTime ./process/...

@shirou,

In the original issue, I had the reporter use the following code. You can see his response for the output.

@sdalu
Copy link
Author

sdalu commented May 13, 2024

root@brain:~/gopsutil # go test -v -run Test_Process_CreateTime ./process/...
go: downloading golang.org/x/sys v0.20.0
go: downloading github.com/stretchr/testify v1.9.0
=== RUN   Test_Process_CreateTime
--- PASS: Test_Process_CreateTime (0.00s)
PASS
ok  	github.com/shirou/gopsutil/v3/process	0.016s

@shirou
Copy link
Owner

shirou commented May 14, 2024

Thank you for the information. This means that, at least in the test environment, the CreateTime() is being retrieved without any issues. In other words, it is not a simple implementation mistake in the code or a problem with the struct definition. Do you have any other potential issues in mind?

@powersj
Copy link
Contributor

powersj commented May 14, 2024

@shirou,

Well gopsutil should not panic :)

In the original issue, I had sdalu, first get a process using the scripts own PID:

myself, err := process.NewProcess(int32(currentPid))

This returned nothing, yet the test does.

@sdalu,

Can you run the following go code:

go.mod:

module test-process

go 1.22

require github.com/shirou/gopsutil/v3 v3.24.4

require (
	github.com/go-ole/go-ole v1.2.6 // indirect
	github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
	github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
	github.com/shoenig/go-m1cpu v0.1.6 // indirect
	github.com/tklauser/go-sysconf v0.3.14 // indirect
	github.com/tklauser/numcpus v0.8.0 // indirect
	github.com/yusufpapurcu/wmi v1.2.4 // indirect
	golang.org/x/sys v0.20.0 // indirect
)

main.go:

package main

import (
	"fmt"
	"os"

	"github.com/shirou/gopsutil/v3/process"
)

func main() {
	currentPid := os.Getpid()
	myself, err := process.NewProcess(int32(currentPid))
	if err != nil {
		panic(err)
	}
	fmt.Println(myself.Name())
	fmt.Println(myself.String())
	fmt.Println(myself.NumThreads())
	fmt.Println(myself.RlimitUsage(true))
	fmt.Println(myself.Status())

	time, err := myself.CreateTime()
	if err != nil {
		panic(err)
	}
	fmt.Println(time)
}

And either run this directly via go run . or build it go build . and run the test-process binary.

@sdalu
Copy link
Author

sdalu commented May 14, 2024

I'm using go 1.21

root@brain:~/t # go run .
test-process <nil>
{"pid":38018}
6 <nil>
[] not implemented yet
[running] <nil>
1715697644691

@powersj
Copy link
Contributor

powersj commented May 14, 2024

Well that is not what I was expecting. Just to confirm, this was run on your FreeBSD system that was showing the issue with Telegraf?

@sdalu
Copy link
Author

sdalu commented May 14, 2024

Yes. I re-run the telegraf artifact, just in case. It's crashing with the same panic. Should I try to build it instead of runing the artifact ?

@powersj
Copy link
Contributor

powersj commented May 14, 2024

@sdalu I have put up influxdata/telegraf#15355 which will hopefully print out what the result of getKProc call and the pid we are looking at. In 20-30mins can you download another Telegraf artifact and see what you get?

Thanks!

@sdalu
Copy link
Author

sdalu commented May 14, 2024

@powersj Here is the output:

root@brain:~/t/telegraf-1.31.0 # ./usr/bin/telegraf --config /usr/local/etc/telegraf.conf --debug
2024-05-14T20:07:16Z I! Loading config: /usr/local/etc/telegraf.conf
2024-05-14T20:07:16Z W! DeprecationWarning: Option "fielddrop" of plugin "inputs.system" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldexclude' instead
2024-05-14T20:07:16Z W! DeprecationWarning: Option "dns_lookup" of plugin "inputs.ntpq" deprecated since version 1.24.0 and will be removed in 2.0.0: add '-n' to 'options' instead to skip DNS lookup
2024-05-14T20:07:16Z I! Starting Telegraf 1.31.0-74404b62 brought to you by InfluxData the makers of InfluxDB
2024-05-14T20:07:16Z I! Available plugins: 233 inputs, 9 aggregators, 32 processors, 25 parsers, 60 outputs, 4 secret-stores
2024-05-14T20:07:16Z I! Loaded inputs: apache cpu disk diskio exec (2x) internal mem mysql net netstat ntpq processes procstat (10x) redis smart swap system zfs
2024-05-14T20:07:16Z I! Loaded aggregators: 
2024-05-14T20:07:16Z I! Loaded processors: converter override (4x) regex
2024-05-14T20:07:16Z I! Loaded secretstores: 
2024-05-14T20:07:16Z I! Loaded outputs: influxdb_v2
2024-05-14T20:07:16Z I! Tags enabled: host=brain.home.sdalu.com
2024-05-14T20:07:16Z W! Deprecated inputs: 0 and 1 options
2024-05-14T20:07:16Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"brain.home.sdalu.com", Flush Interval:30s
2024-05-14T20:07:16Z D! [agent] Initializing plugins
2024-05-14T20:07:16Z W! [inputs.smart] nvme not found: verify that nvme is installed and it is in your PATH (or specified in config) to gather vendor specific attributes: provided path does not exist: []
2024-05-14T20:07:16Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead
2024-05-14T20:07:16Z D! [agent] Connecting outputs
2024-05-14T20:07:16Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-05-14T20:07:16Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-05-14T20:07:16Z D! [agent] Starting service inputs
getting process with pid:  40544
k: (*process.KinfoProc)(nil)
2024-05-14T20:07:20Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 149 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d638140)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x696a390, 0xc88a150})
	/usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4ce30098, {0x843a4c4, 0xcc9ccd8})
	/go/pkg/mod/github.com/powersj/gopsutil/[email protected]/process/process_freebsd.go:123 +0xac
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4ce30098, {0x843a4c4, 0xcc9ccd8})
	/go/pkg/mod/github.com/powersj/gopsutil/[email protected]/process/process.go:312 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x843a4c4, 0xcc9ccd8}, 0x9e60)
	/go/pkg/mod/github.com/powersj/gopsutil/[email protected]/process/process.go:220 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(0x9e60)
	/go/pkg/mod/github.com/powersj/gopsutil/[email protected]/process/process.go:205 +0x90
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x9e60)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x1c
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4d36cf78, {0x8452d48, 0x4d367308})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:223 +0x810
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4d36cf78, {0x8452d48, 0x4d367308})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:185 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d638140, {0x8452d48, 0x4d367308})
	/go/src/github.com/influxdata/telegraf/models/running_input.go:228 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
	/go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 146
	/go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine
2024-05-14T20:07:20Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

@powersj
Copy link
Contributor

powersj commented May 14, 2024

Thanks for trying that out!

getting process with pid: 40544
k: (*process.KinfoProc)(nil)

@shirou any reason why the kinfoproc would be nil? Does that mean gopsutil failed to parse the information but didn't generate an error?

That print statement is from my own fork, printing out the k value before we attempt to create time with context:

https://github.com/powersj/gopsutil/blob/919283ff5f489d3697e01c5267ccdd0a9273cd41/process/process_freebsd.go#L122

@shirou
Copy link
Owner

shirou commented May 23, 2024

I read the code and it doesn't seem like err is nil and kInfoProc is nil at the same time. However, this issue is occurring in practice. Therefore, I have created a PR #1653 to add a nil check. Could you please take a look?

@powersj
Copy link
Contributor

powersj commented May 23, 2024

@sdalu,

As before, can you grab the telegraf artifacts in few minutes from this post at: influxdata/telegraf#15397

@shirou - thanks for looking into this!

@sdalu
Copy link
Author

sdalu commented May 23, 2024

@powersj here is the result (looks the same):

root@brain:~/telegraf-1.31.0 # ./usr/bin/telegraf --config /usr/local/etc/telegraf.conf --debug
2024-05-23T19:41:13Z I! Loading config: /usr/local/etc/telegraf.conf
2024-05-23T19:41:13Z W! DeprecationWarning: Option "fielddrop" of plugin "inputs.system" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldexclude' instead
2024-05-23T19:41:13Z W! DeprecationWarning: Option "dns_lookup" of plugin "inputs.ntpq" deprecated since version 1.24.0 and will be removed in 2.0.0: add '-n' to 'options' instead to skip DNS lookup
2024-05-23T19:41:13Z I! Starting Telegraf 1.31.0-ef79d2b9 brought to you by InfluxData the makers of InfluxDB
2024-05-23T19:41:13Z I! Available plugins: 233 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 4 secret-stores
2024-05-23T19:41:13Z I! Loaded inputs: apache cpu disk diskio exec (2x) internal mem mysql net netstat ntpq processes procstat (10x) redis smart swap system zfs
2024-05-23T19:41:13Z I! Loaded aggregators: 
2024-05-23T19:41:13Z I! Loaded processors: converter override (4x) regex
2024-05-23T19:41:13Z I! Loaded secretstores: 
2024-05-23T19:41:13Z I! Loaded outputs: influxdb_v2
2024-05-23T19:41:13Z I! Tags enabled: host=brain.home.sdalu.com
2024-05-23T19:41:13Z W! Deprecated inputs: 0 and 1 options
2024-05-23T19:41:13Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"brain.home.sdalu.com", Flush Interval:30s
2024-05-23T19:41:13Z D! [agent] Initializing plugins
2024-05-23T19:41:13Z W! [inputs.smart] nvme not found: verify that nvme is installed and it is in your PATH (or specified in config) to gather vendor specific attributes: provided path does not exist: []
2024-05-23T19:41:13Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead for protocol stats
2024-05-23T19:41:13Z D! [agent] Connecting outputs
2024-05-23T19:41:13Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-05-23T19:41:13Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-05-23T19:41:13Z D! [agent] Starting service inputs
2024-05-23T19:41:20Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 61 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d36c5a0)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x68b6d10, 0xc708b80})
	/usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4d0cc128, {0x8361124, 0xcb1af28})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process_freebsd.go:122 +0x4c
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4d0cc128, {0x8361124, 0xcb1af28})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:310 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x8361124, 0xcb1af28}, 0x624)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:218 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(...)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:203
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x624)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x30
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4d29e8f8, {0x8379320, 0x4d3951e8})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:223 +0x810
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4d29e8f8, {0x8379320, 0x4d3951e8})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:185 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d36c5a0, {0x8379320, 0x4d3951e8})
	/go/src/github.com/influxdata/telegraf/models/running_input.go:228 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
	/go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 112
	/go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine 1 [
2024-05-23T19:41:20Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

I downloaded: telegraf-1.31.0~ef79d2b9_freebsd_armv7.tar.gz

@powersj
Copy link
Contributor

powersj commented May 23, 2024

That's the same line, you originally saw:

return int64(k.Start.Sec)*1000 + int64(k.Start.Usec)/1000, nil

Ah, @shirou, the nil check is checking if the memory address is nil, so that will never be nil.

- &x == nil
+ x == nil

@shirou
Copy link
Owner

shirou commented May 27, 2024

The parseKinfoProc function returns a struct instance, not a pointer. Therefore, I agree that it should be impossible for a pointer to this value to be nil. However, it is actually happening, so I am very confused.

@powersj
Copy link
Contributor

powersj commented May 29, 2024

@shirou - can that be changed? Otherwise you would need to check if any of the pointer fields are nil?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants