Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid nil values when unwrapping errors #1677

Conversation

robinverduijn
Copy link
Contributor

We sometimes see Buildkite fail to add annotations to a build and when this happens, it fails with the following error:

{"log.level":"error","ecs.version":"1.6.0","gobld.version":"7.25.2","gobld.gitCommit":"8eae7d1","error.message":"Message: An error occurred while executing your GraphQL query. Please contact [email protected] for help and provide this query in the email., Locations: [], Extensions: map[], Path: []","error":[{"message":"An error occurred while executing your GraphQL query. Please contact [email protected] for help and provide this query in the email.","extensions":null,"locations":null,"path":null}],"@timestamp":"2024-11-22T20:25:12.306Z","log.origin":{"file.name":"/go/src/gobld/buildkite/buildkite.go","file.line":157},"trace.id":"d9ab02ae11e462233dfbcdb189fc043a","transaction.id":"d9ab02ae11e46223","trace.id":"325cf6b488e72c6e7457be8ecaf46eca","transaction.id":"325cf6b488e72c6e","span.id":"af9de4294dd6daf0","message":"Failed to annotate build"}
panic: reflect: call of reflect.Value.Type on zero Value

goroutine 319 [running]:
reflect.Value.typeSlow({0x0?, 0x0?, 0x40eb9f?})
	/usr/local/go/src/reflect/value.go:2699 +0x113
reflect.Value.Type(...)
	/usr/local/go/src/reflect/value.go:2694
go.elastic.co/apm/v2.(*exceptionDataBuilder).init(0xc0010eef50, 0xc004a800a0, {0x0, 0x0})
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/error.go:414 +0xf6
go.elastic.co/apm/v2.(*exceptionDataBuilder).init(0xc0010eef50, 0xc00af0a020, {0x343fcc0, 0xc001316588})
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/error.go:499 +0x8e5
go.elastic.co/apm/v2.initException(...)
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/error.go:402
go.elastic.co/apm/v2.(*Tracer).NewError(0x3461720?, {0x343fcc0, 0xc001316588})
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/error.go:103 +0xec
go.elastic.co/apm/v2.CaptureError({0x3461720, 0xc0008022d0}, {0x343fcc0, 0xc001316588})
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/gocontext.go:132 +0xed
http://github.com/elastic/gobld/buildkite.(*buildkite ).AnnotateBuild(0xc0006a0c20, {0x3461720?, 0xc00a55b080?}, {0xc0009d1440, 0x3c}, {0x2ce4a15, 0x11}, {0xc000969880, 0x370}, {0x2cce7a8, ...}, ...)
	/go/src/gobld/buildkite/buildkite.go:158 +0x585
http://github.com/elastic/gobld/creator.ProcessJob ({_, }, , {{0x343ee20, 0xc00a4ccb40}, {0xc00073e260, 0xd}, {0x3462308, 0xc0006a0c20}, {0xc000a14540, ...}, ...}, ...)
	/go/src/gobld/creator/creator.go:356 +0xfad
http://github.com/elastic/gobld/creator.ProcessJobs (, {, _}, {{0x343ee20, 0xc00a4ccb40}, {0xc00073e260, 0xd}, {0x3462308, 0xc0006a0c20}, {0xc000a14540, ...}, ...})
	/go/src/gobld/creator/creator.go:257 +0x3ae
created by http://github.com/elastic/gobld/creator.(*creator ).StartWorkers in goroutine 321
	/go/src/gobld/creator/creator.go:218 +0x151

One hunch of mine is that, somehow, the err passed in to init(e *exceptionData, err error) is nil and therefor, the reflectValue.Type() call on its value returns an error "Type on zero Value". If that supposition is correct, then changing the following block to confirm err isn't nil before passing it in would obviate the problem:

for _, err := range e.ErrorDetails.Cause {
  if b.errorCount >= maxErrorTreeNodes {
    break
  }
  var data exceptionData
  if b.init(&data, err) {
    e.cause = append(e.cause, data)
  }
}

However, I may well be (very) wrong about this so I'd welcome any feedback from people more knowledgeable around this code than I am. For one thing, I have not been able to run this modified code to confirm it has any effect on the above Buildkite crash.

We sometimes see Buildkite fail to add annotations to a build and when this happens, it fails with the following error:

```
{"log.level":"error","ecs.version":"1.6.0","gobld.version":"7.25.2","gobld.gitCommit":"8eae7d1","error.message":"Message: An error occurred while executing your GraphQL query. Please contact [email protected] for help and provide this query in the email., Locations: [], Extensions: map[], Path: []","error":[{"message":"An error occurred while executing your GraphQL query. Please contact [email protected] for help and provide this query in the email.","extensions":null,"locations":null,"path":null}],"@timestamp":"2024-11-22T20:25:12.306Z","log.origin":{"file.name":"/go/src/gobld/buildkite/buildkite.go","file.line":157},"trace.id":"d9ab02ae11e462233dfbcdb189fc043a","transaction.id":"d9ab02ae11e46223","trace.id":"325cf6b488e72c6e7457be8ecaf46eca","transaction.id":"325cf6b488e72c6e","span.id":"af9de4294dd6daf0","message":"Failed to annotate build"}
panic: reflect: call of reflect.Value.Type on zero Value

goroutine 319 [running]:
reflect.Value.typeSlow({0x0?, 0x0?, 0x40eb9f?})
	/usr/local/go/src/reflect/value.go:2699 +0x113
reflect.Value.Type(...)
	/usr/local/go/src/reflect/value.go:2694
go.elastic.co/apm/v2.(*exceptionDataBuilder).init(0xc0010eef50, 0xc004a800a0, {0x0, 0x0})
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/error.go:414 +0xf6
go.elastic.co/apm/v2.(*exceptionDataBuilder).init(0xc0010eef50, 0xc00af0a020, {0x343fcc0, 0xc001316588})
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/error.go:499 +0x8e5
go.elastic.co/apm/v2.initException(...)
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/error.go:402
go.elastic.co/apm/v2.(*Tracer).NewError(0x3461720?, {0x343fcc0, 0xc001316588})
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/error.go:103 +0xec
go.elastic.co/apm/v2.CaptureError({0x3461720, 0xc0008022d0}, {0x343fcc0, 0xc001316588})
	/root/go/pkg/mod/go.elastic.co/apm/[email protected]/gocontext.go:132 +0xed
http://github.com/elastic/gobld/buildkite.(*buildkite ).AnnotateBuild(0xc0006a0c20, {0x3461720?, 0xc00a55b080?}, {0xc0009d1440, 0x3c}, {0x2ce4a15, 0x11}, {0xc000969880, 0x370}, {0x2cce7a8, ...}, ...)
	/go/src/gobld/buildkite/buildkite.go:158 +0x585
http://github.com/elastic/gobld/creator.ProcessJob ({_, }, , {{0x343ee20, 0xc00a4ccb40}, {0xc00073e260, 0xd}, {0x3462308, 0xc0006a0c20}, {0xc000a14540, ...}, ...}, ...)
	/go/src/gobld/creator/creator.go:356 +0xfad
http://github.com/elastic/gobld/creator.ProcessJobs (, {, _}, {{0x343ee20, 0xc00a4ccb40}, {0xc00073e260, 0xd}, {0x3462308, 0xc0006a0c20}, {0xc000a14540, ...}, ...})
	/go/src/gobld/creator/creator.go:257 +0x3ae
created by http://github.com/elastic/gobld/creator.(*creator ).StartWorkers in goroutine 321
	/go/src/gobld/creator/creator.go:218 +0x151
```

One hunch of mine is that, somehow, the `err` passed in to `init(e *exceptionData, err error)` is nil and therefor, the `reflectValue.Type()` call on
its value returns an error "Type on zero Value". If that supposition is correct, then changing the following block to
confirm `err` isn't nil before passing it in would obviate the problem:

```
for _, err := range e.ErrorDetails.Cause {
  if b.errorCount >= maxErrorTreeNodes {
    break
  }
  var data exceptionData
  if b.init(&data, err) {
    e.cause = append(e.cause, data)
  }
}
```

However, I may well be (very) wrong about this so I'd welcome any feedback from people more knowledgeable around this
code than I am. For one thing, I have not been able to run this modified code to confirm it has any effect on the above
Buildkite crash.
@robinverduijn robinverduijn requested a review from a team as a code owner January 7, 2025 00:26
Copy link
Member

@kruskall kruskall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix! 🙇

One hunch of mine is that, somehow, the err passed in to init(e *exceptionData, err error) is nil and therefor, the reflectValue.Type() call on its value returns an error "Type on zero Value". If that supposition is correct, then changing the following block to confirm err isn't nil before passing it in would obviate the problem:

I think this is correct but I'm not sure about adding the if check in there.

Here's a test to reproduce the issue:

diff --git a/error_test.go b/error_test.go
index 6d105f9..636cbb8 100644
--- a/error_test.go
+++ b/error_test.go
@@ -583,6 +583,23 @@ func TestErrorCauseUnwrap(t *testing.T) {
        assert.Equal(t, "cause", payloads.Errors[0].Exception.Cause[0].Message)
 }
 
+func TestErrorCauseNilElementUnwrap(t *testing.T) {
+       err := errorJoin{errors.New("foo"), nil, errors.New("bar")}
+
+       tracer, recorder := transporttest.NewRecorderTracer()
+       defer tracer.Close()
+       tracer.NewError(err).Send()
+       tracer.Flush(nil)
+
+       payloads := recorder.Payloads()
+       require.Len(t, payloads.Errors, 1)
+       assert.Equal(t, "TestErrorCauseNilElementUnwrap", payloads.Errors[0].Culprit)
+
+       require.Len(t, payloads.Errors[0].Exception.Cause, 2)
+       assert.Equal(t, "foo", payloads.Errors[0].Exception.Cause[0].Message)
+       assert.Equal(t, "bar", payloads.Errors[0].Exception.Cause[1].Message)
+}
+
 func assertErrorTransactionSampled(t *testing.T, e model.Error, sampled bool) {
        assert.Equal(t, &sampled, e.Transaction.Sampled)
        if sampled {
@@ -691,6 +708,16 @@ func (es errorslice) Cause() error {
        return es[0]
 }
 
+type errorJoin []error
+
+func (es errorJoin) Error() string {
+       return "errorjoin"
+}
+
+func (es errorJoin) Unwrap() []error {
+       return es
+}
+
 type runtimeStackTracer struct {
        message string
        trace   []uintptr

We're guarding against adding nil elements to error details cause but missed a check at

apm-agent-go/error.go

Lines 471 to 474 in f0ac7a6

case interface{ Unwrap() []error }:
if causes := err.Unwrap(); causes != nil {
e.ErrorDetails.Cause = append(e.ErrorDetails.Cause, causes...)
}

The returned value is a slice so != nil is not enough, WDYT about iterating over the slice and adding the nil check there before appending each element ?

Instead of aborting after encountering a `nil` error and thus missing other potential causes, add missing nil check in
the `Unwrap` method that takes a slice.

Also include a test for actually exercising this code and reproducing the problem seen in the wild (in Buildkite).
Thanks to @kruskall for providing the test and the excellent suggestion.
@robinverduijn
Copy link
Contributor Author

Thank you for the fix! 🙇

One hunch of mine is that, somehow, the err passed in to init(e *exceptionData, err error) is nil and therefor, the reflectValue.Type() call on its value returns an error "Type on zero Value". If that supposition is correct, then changing the following block to confirm err isn't nil before passing it in would obviate the problem:

I think this is correct but I'm not sure about adding the if check in there.

Here's a test to reproduce the issue:

diff --git a/error_test.go b/error_test.go
index 6d105f9..636cbb8 100644
--- a/error_test.go
+++ b/error_test.go
@@ -583,6 +583,23 @@ func TestErrorCauseUnwrap(t *testing.T) {
        assert.Equal(t, "cause", payloads.Errors[0].Exception.Cause[0].Message)
 }
 
+func TestErrorCauseNilElementUnwrap(t *testing.T) {
+       err := errorJoin{errors.New("foo"), nil, errors.New("bar")}
+
+       tracer, recorder := transporttest.NewRecorderTracer()
+       defer tracer.Close()
+       tracer.NewError(err).Send()
+       tracer.Flush(nil)
+
+       payloads := recorder.Payloads()
+       require.Len(t, payloads.Errors, 1)
+       assert.Equal(t, "TestErrorCauseNilElementUnwrap", payloads.Errors[0].Culprit)
+
+       require.Len(t, payloads.Errors[0].Exception.Cause, 2)
+       assert.Equal(t, "foo", payloads.Errors[0].Exception.Cause[0].Message)
+       assert.Equal(t, "bar", payloads.Errors[0].Exception.Cause[1].Message)
+}
+
 func assertErrorTransactionSampled(t *testing.T, e model.Error, sampled bool) {
        assert.Equal(t, &sampled, e.Transaction.Sampled)
        if sampled {
@@ -691,6 +708,16 @@ func (es errorslice) Cause() error {
        return es[0]
 }
 
+type errorJoin []error
+
+func (es errorJoin) Error() string {
+       return "errorjoin"
+}
+
+func (es errorJoin) Unwrap() []error {
+       return es
+}
+
 type runtimeStackTracer struct {
        message string
        trace   []uintptr

We're guarding against adding nil elements to error details cause but missed a check at

apm-agent-go/error.go

Lines 471 to 474 in f0ac7a6

case interface{ Unwrap() []error }:
if causes := err.Unwrap(); causes != nil {
e.ErrorDetails.Cause = append(e.ErrorDetails.Cause, causes...)
}

The returned value is a slice so != nil is not enough, WDYT about iterating over the slice and adding the nil check there before appending each element ?

That's an excellent catch. My original change would have missed any subsequent errors if there was a single nil in the slice. I have added your test and fixed Unwrap as you suggested. Please let me know how this looks. I can confirm the test failed with my original fix and is passing now.

@robinverduijn
Copy link
Contributor Author

The GH actions fail after my latest push, but I'm not sure it's caused by my changes.

The coverage check is failing on other branches as well, and the newly failing precheck appears to be a SIGSEGV that does not happen when I run make precheck locally.

I wonder if rerunning it will fix the precheck failure?

@robinverduijn robinverduijn requested a review from kruskall January 7, 2025 18:17
@robinverduijn robinverduijn changed the title Avoid nil values when building up exception data. Avoid nil values when unwrapping errors Jan 7, 2025
@dmathieu dmathieu merged commit 189bc90 into elastic:main Jan 8, 2025
10 of 11 checks passed
@robinverduijn robinverduijn deleted the INFRA-3218-avoid-nil-values-exception-data branch January 8, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants