Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Enable race detector #15932

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

test: Enable race detector #15932

wants to merge 9 commits into from

Conversation

ericywl
Copy link
Contributor

@ericywl ericywl commented Feb 28, 2025

Motivation/summary

Enable race detector for tests, continuation of #13827.

How to test these changes

Run unit test, system test and CI.

Copy link
Contributor

mergify bot commented Feb 28, 2025

This pull request does not have a backport label. Could you fix it @ericywl? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-9./d is the label to automatically backport to the 9./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@ericywl ericywl mentioned this pull request Feb 28, 2025
2 tasks
@ericywl ericywl force-pushed the ci/test-race branch 3 times, most recently from 40c948f to a6f9618 Compare February 28, 2025 07:00
@ericywl ericywl marked this pull request as ready for review February 28, 2025 07:14
@ericywl ericywl requested a review from a team as a code owner February 28, 2025 07:14
@ericywl ericywl added the backport-skip Skip notification from the automated backport with mergify label Feb 28, 2025
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the part that fixes flaky TestServerElasticsearchOutput related to the race detector? If no, I'd prefer it to be a separate PR to fix #13606 , as we might want to backport it to make our lives easier.

@ericywl
Copy link
Contributor Author

ericywl commented Feb 28, 2025

is the part that fixes flaky TestServerElasticsearchOutput related to the race detector? If no, I'd prefer it to be a separate PR to fix #13606 , as we might want to backport it to make our lives easier.

Not exactly, I can extract that PR out.

1pkg
1pkg previously approved these changes Mar 1, 2025
"elasticsearch.events.count": 5,
"elasticsearch.events.queued": 5,
// Disable this flaky test to be fixed later
// "elasticsearch.bulk_requests.available": 9,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please create a follow up task to fix this issue and link from here?
cc @raultorrecilla for scheduling work to look into why this is flaky.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericywl paste it here once created please!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup apparently its a long standing issue: #13606

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericywl can you revert this change in this PR? The flaky test shouldn't block this PR. (Even if the bugfix isn't merged and it unfortunately fails, rerunning the CI should be good enough)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor

mergify bot commented Mar 3, 2025

This pull request is now in conflicts. Could you fix it @ericywl? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b ci/test-race upstream/ci/test-race
git merge upstream/main
git push upstream ci/test-race

@@ -55,16 +55,13 @@ jobs:
with:
go-version-file: go.mod
cache: true
- run: go test -v -race ./...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing CGO_ENABLED: "0" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race detector unfortunately requires CGO to be enabled: https://go.dev/doc/articles/race_detector.

The race detector requires cgo to be enabled, and on non-Darwin systems requires an installed C compiler. The race detector supports linux/amd64, linux/ppc64le, linux/arm64, linux/s390x, freebsd/amd64, netbsd/amd64, darwin/amd64, darwin/arm64, and windows/amd64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we explicitly set it to 0 to make sure we do not include any C based library. Has this change been discussed previously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, @kruskall I based this off your previous PR to enable the race detector. Not sure if that's something that has been discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed with @kruskall that this is ok. systemtests run without CGO, so we are covered on that side.
I would add a comment to document this for future reference

@@ -55,16 +55,13 @@ jobs:
with:
go-version-file: go.mod
cache: true
- run: go test -v -race ./...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: can you enable the race detector for the fips tests too ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me update it.

@@ -20,6 +20,7 @@ services:
interval: 1s
environment:
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
- _JAVA_OPTIONS # Workaround on OSX for https://bugs.openjdk.org/browse/JDK-8345296
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Is the key _JAVA_OPTIONS without value good enough to workaround your issue? I don't see any mention of _JAVA_OPTIONS in the bug report.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mainly so that I can run the system tests using _JAVA_OPTIONS="-XX:UseSVE=0" make system-test without having to update the docker compose.

@@ -57,8 +61,9 @@ func NewHTTPHandlers(logger *zap.Logger, processor modelpb.BatchProcessor, semap

// TODO we should add an otel counter metric directly in the
// apm-data consumer, then we could get rid of the callback.
metricRegistrationMu.Lock() // Temporary locking to fix data race issue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this a bugfix? If so, I prefer a separate PR, and if it is user-facing, it needs a changelog as well.

Copy link
Contributor Author

@ericywl ericywl Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has data race because the NewHTTPHandlers can be called by different goroutines concurrently in tests. I believe it's not user-facing since its only used internally in beater.

Copy link
Contributor

mergify bot commented Mar 3, 2025

This pull request is now in conflicts. Could you fix it @ericywl? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b ci/test-race upstream/ci/test-race
git merge upstream/main
git push upstream ci/test-race

@ericywl
Copy link
Contributor Author

ericywl commented Mar 4, 2025

System test failed previously due to Test_warmup but is suddenly passing again when I tried to look into it 🤡

EDIT: There seems to be another flaky system test, I will extract it into a new PR instead: #15992.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Skip notification from the automated backport with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants