Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemtest: Fix flaky tests #15992

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ericywl
Copy link
Contributor

@ericywl ericywl commented Mar 4, 2025

Summary

Fixes #15991.

How to test these changes

Trigger CI system tests multiple times and confirm that the listed tests don't fail anymore.

@ericywl ericywl requested a review from a team as a code owner March 4, 2025 06:55
Copy link
Contributor

mergify bot commented Mar 4, 2025

This pull request does not have a backport label. Could you fix it @ericywl? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-9./d is the label to automatically backport to the 9./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

Comment on lines -275 to +282
wg.Add(1)
go func() {
defer wg.Done()
h.SendBatchesInLoop(ctx)
}()
eg.Go(func() error {
sendErr := h.SendBatchesInLoop(ctx)
if sendErr != nil && !errors.Is(sendErr, context.DeadlineExceeded) {
return sendErr
}
return nil
})
Copy link
Contributor Author

@ericywl ericywl Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[For reviewers] The behavior is changed slightly here, which might affect apmbench warmup i.e. this benchmark will fail if warmup agents fail to send due to other errors aside from context deadline.

Not sure if that is better or not for benchmarking. I will revert it if anyone have concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If config.RunForever (see code ref) is configured in the benchmarks, then I don't think this change should lead to problems with current setup (please double check the configuration for the daily benchmarks). In this case the apm-perf handling already decides what should be returned as an error.

// Report idle APM Server.
w.Write([]byte(`{"libbeat.output.events.active":0}`))
_, _ = w.Write([]byte(`{"libbeat.output.events.active":0}`))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this explicit style actually.

attribute.String("service.name", "service2"),
attribute.String("telemetry.sdk.name", "iOS"),
attribute.String("telemetry.sdk.language", "swift"),
)
return sendOTLPTrace(ctx, newOTLPTracerProvider(exporter, sdktrace.WithResource(resource)))
return sendOTLPTrace(ctx, newOTLPTracerProvider(exporter, sdktrace.WithResource(res)))
Copy link
Contributor

@rubvs rubvs Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of res is religious, since res can also indicate result or response. So I'd keep it explicit with the full name.

srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path == "/debug/vars" {
// Wait until there are completed requests before reporting idle.
for completedRequests.Load() == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better use a channel to signal or at least add a call to Gosched here, since spinning in a tight loop could be problematic

Copy link
Contributor Author

@ericywl ericywl Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I used channel but switched to this for some reason 🤔, don't think there was an issue.

EDIT: Seems like combining both would be better.

1pkg
1pkg previously approved these changes Mar 6, 2025
@ericywl ericywl requested review from rubvs and 1pkg March 6, 2025 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky system tests
4 participants