-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemtest: Fix flaky tests #15992
base: main
Are you sure you want to change the base?
systemtest: Fix flaky tests #15992
Conversation
This pull request does not have a backport label. Could you fix it @ericywl? 🙏
|
wg.Add(1) | ||
go func() { | ||
defer wg.Done() | ||
h.SendBatchesInLoop(ctx) | ||
}() | ||
eg.Go(func() error { | ||
sendErr := h.SendBatchesInLoop(ctx) | ||
if sendErr != nil && !errors.Is(sendErr, context.DeadlineExceeded) { | ||
return sendErr | ||
} | ||
return nil | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[For reviewers] The behavior is changed slightly here, which might affect apmbench
warmup i.e. this benchmark will fail if warmup agents fail to send due to other errors aside from context deadline.
Not sure if that is better or not for benchmarking. I will revert it if anyone have concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If config.RunForever
(see code ref) is configured in the benchmarks, then I don't think this change should lead to problems with current setup (please double check the configuration for the daily benchmarks). In this case the apm-perf
handling already decides what should be returned as an error.
// Report idle APM Server. | ||
w.Write([]byte(`{"libbeat.output.events.active":0}`)) | ||
_, _ = w.Write([]byte(`{"libbeat.output.events.active":0}`)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this explicit style actually.
systemtest/otlp_test.go
Outdated
attribute.String("service.name", "service2"), | ||
attribute.String("telemetry.sdk.name", "iOS"), | ||
attribute.String("telemetry.sdk.language", "swift"), | ||
) | ||
return sendOTLPTrace(ctx, newOTLPTracerProvider(exporter, sdktrace.WithResource(resource))) | ||
return sendOTLPTrace(ctx, newOTLPTracerProvider(exporter, sdktrace.WithResource(res))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of res
is religious, since res
can also indicate result
or response
. So I'd keep it explicit with the full name.
systemtest/benchtest/main_test.go
Outdated
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
if r.URL.Path == "/debug/vars" { | ||
// Wait until there are completed requests before reporting idle. | ||
for completedRequests.Load() == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: better use a channel to signal or at least add a call to Gosched here, since spinning in a tight loop could be problematic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I used channel but switched to this for some reason 🤔, don't think there was an issue.
EDIT: Seems like combining both would be better.
Summary
Fixes #15991.
How to test these changes
Trigger CI system tests multiple times and confirm that the listed tests don't fail anymore.