Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elastic-agent] Re-introduce instrumentation #30385

Closed

Conversation

stuartnelson3
Copy link
Contributor

cf. #29031 for original description.

seems the issue might have been forgetting to add credentials when starting up the gRPC servers: 7638367

@stuartnelson3 stuartnelson3 requested a review from ph February 14, 2022 18:16
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 14, 2022
@mergify
Copy link
Contributor

mergify bot commented Feb 14, 2022

This pull request does not have a backport label. Could you fix it @stuartnelson3? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Feb 14, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 14, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-04T11:42:23.909+0000

  • Duration: 269 min 55 sec

Test stats 🧪

Test Results
Failed 0
Passed 43023
Skipped 3846
Total 46869

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@ph ph added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Feb 15, 2022
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 15, 2022
@stuartnelson3 stuartnelson3 added backport-v8.2.0 Automated backport with mergify and removed backport-skip Skip notification from the automated backport with mergify labels Feb 16, 2022
@stuartnelson3 stuartnelson3 marked this pull request as ready for review February 16, 2022 09:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

seems changes to metricbeat.yml were somehow
picked up.
@stuartnelson3
Copy link
Contributor Author

/test

Copy link
Member

@AndersonQ AndersonQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't critical, but I believe the code would profit from the simplifications I suggested.

@stuartnelson3
Copy link
Contributor Author

@AndersonQ thanks for the suggestions!

golint doesn't allow us to discard a sub-context's
cancel function. Discussion is available here:
golang/go#51160
@stuartnelson3
Copy link
Contributor Author

@AndersonQ i updated the code as per your suggestions, thank you! the one change I made was wrt discarding the cancel() func. golint won't let this happen, so I added a defer cancel(). I looked through the code and it looked like this should be fine (it seems each ctx except for ad.ctx is scoped to the function call?), but let me know if there's a reason we shouldn't be canceling this sub context!

Copy link
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok on my side, I haven't tested it yet.
Just want to make sure this goes in the next minor correct?

@stuartnelson3
Copy link
Contributor Author

@ph Yes, definitely. And if at all possible, getting it merged before elastic-agent is migrated to its own repo and this PR needs to be re-opened there :)

@ph
Copy link
Contributor

ph commented Feb 28, 2022

@AndersonQ Can you do a manual test of this PR so we can unblock @stuartnelson3 ?

@stuartnelson3 stuartnelson3 requested a review from a team as a code owner March 1, 2022 14:15
@stuartnelson3
Copy link
Contributor Author

/test

@stuartnelson3
Copy link
Contributor Author

@ph @AndersonQ any idea what the failures are? they only started with 61da5fc but i have no idea why.

from the e2e failure:

[2022-03-02T12:04:01.357Z] tar: /tmp/beats-events-20220302.ndjson: Cannot stat: No such file or directory
[2022-03-02T12:04:01.357Z] tar: Exiting with failure status due to previous errors

@ph
Copy link
Contributor

ph commented Mar 2, 2022

@stuartnelson3 I've never seen this on, not sure what beats-events*.json, I will check when this CI run is complete.

@stuartnelson3
Copy link
Contributor Author

/test

@AndersonQ
Copy link
Member

It does ring a bell, but I don't quite remember. Anyway I haven't found this log on the current build logs.
However I've found:

[](https://beats-ci.elastic.co/blue/organizations/jenkins/e2e-tests%2Fe2e-testing-mbp/detail/main/443/pipeline#step-5207-log-196)[](https://beats-ci.elastic.co/blue/organizations/jenkins/e2e-tests%2Fe2e-testing-mbp/detail/main/443/pipeline#step-5207-log-197)[](https://beats-ci.elastic.co/blue/organizations/jenkins/e2e-tests%2Fe2e-testing-mbp/detail/main/443/pipeline#step-5207-log-198)[2022-03-03T11:07:39.532Z] TRAC[2022-03-03T11:07:39Z] Output                                        output="Elastic Agent has been uninstalled."

[2022-03-03T11:07:39.532Z] TRAC[2022-03-03T11:07:39Z] Attaching service for configuration           installType=tar service="{elastic-agent []  false 1 []}"

[2022-03-03T11:07:39.532Z] TRAC[2022-03-03T11:07:39Z] Executing command                             args="[-l /opt/Elastic/Agent]" command=ls env="map[]"

[2022-03-03T11:07:39.532Z] ERRO[2022-03-03T11:07:39Z] Error executing command                       args="[-l /opt/Elastic/Agent]" baseDir=. command=ls env="map[]" error="exit status 2" stderr="ls: cannot access '/opt/Elastic/Agent': No such file or directory\n"

what seems to be correct, when the elastic-agent uninstalls itself, it removes the Agent folder, what is left is just /opt/Elastic

perhaps ping the robots, they're the ones owning the e2e tests.

@mdelapenya
Copy link
Contributor

I'd say the error could be related to a network glitch while downloading the elastic-agent from Elastic's artifacts API:

[2022-03-03T10:59:24.323Z] DEBU[2022-03-03T10:59:21Z] The Elastic artifacts API is available elapsedTime=141.051415ms retries=1 statusEndpoint="https://artifacts-api.elastic.co/v1/search/8.2.0-fdde08ec/elastic-agent?x-elastic-no-kpi=true"
[2022-03-03T10:59:24.323Z] TRAC[2022-03-03T10:59:21Z] Artifact found artifact=elastic-agent artifactName=elastic-agent-8.2.0-SNAPSHOT-linux-arm64.tar.gz elapsedTime=141.343787ms retries=1 version=8.2.0-fdde08ec
[2022-03-03T10:59:24.323Z] TRAC[2022-03-03T10:59:21Z] Downloading file path=/tmp/b331ed45-42e7-4c67-97d1-b368749727a7/94807414-5062-4430-9b98-cf6792ba546a url="https://snapshots.elastic.co/8.2.0-fdde08ec/downloads/beats/elastic-agent/elastic-agent-8.2.0-SNAPSHOT-linux-arm64.tar.gz"
[2022-03-03T10:59:24.323Z] TRAC[2022-03-03T10:59:21Z] File downloaded elapsedTime=381.276435ms path=/tmp/b331ed45-42e7-4c67-97d1-b368749727a7/94807414-5062-4430-9b98-cf6792ba546a retries=1 url="https://snapshots.elastic.co/8.2.0-fdde08ec/downloads/beats/elastic-agent/elastic-agent-8.2.0-SNAPSHOT-linux-arm64.tar.gz"
[2022-03-03T10:59:28.535Z] ERRO[2022-03-03T10:59:27Z] Could not write file error="stream error: stream ID 1; INTERNAL_ERROR" path=/tmp/b331ed45-42e7-4c67-97d1-b368749727a7/94807414-5062-4430-9b98-cf6792ba546a url="https://snapshots.elastic.co/8.2.0-fdde08ec/downloads/beats/elastic-agent/elastic-agent-8.2.0-SNAPSHOT-linux-arm64.tar.gz"
[2022-03-03T10:59:28.535Z] ERRO[2022-03-03T10:59:27Z] Could not download the binary for the Elastic artifact arch=arm64 artifact=elastic-agent error="stream error: stream ID 1; INTERNAL_ERROR" extension=tar.gz os=linux version=8.2.0-fdde08ec-SNAPSHOT
[2022-03-03T10:59:28.535Z] ERRO[2022-03-03T10:59:27Z] Could not download the binary arch=arm64 artifact=elastic-agent error="stream error: stream ID 1; INTERNAL_ERROR" extension=tar.gz os=linux version=8.2.0-fdde08ec-SNAPSHOT

It happens at the very beginning of the execution of that stage, which correlates with the failing scenario (the first one that is executed). After that, further downloads of the binary works.

I'd say that the error is not caused by this PR and could be ignored, as any other execution should work.

@stuartnelson3
Copy link
Contributor Author

@ph @AndersonQ how do you two feel about @mdelapenya 's comment?

this sounds like a transient network error, so I'll re-run to see if it works

@stuartnelson3
Copy link
Contributor Author

/test

@simitt
Copy link
Contributor

simitt commented Mar 3, 2022

@ph and @AndersonQ please take some time to support getting this merged in before the agent is moved to the new repo. This has been ongoing for quite a while and I think the important part is to get the base merged. Any further details could be ironed out on a follow up; I assume you would like to extend the instrumentation over time anyways.

@mdelapenya
Copy link
Contributor

The current e2e test failure is under our radar: elastic/e2e-testing#2203

@mergify
Copy link
Contributor

mergify bot commented Mar 4, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b re-introduce-instrumentation upstream/re-introduce-instrumentation
git merge upstream/main
git push upstream re-introduce-instrumentation

@ph
Copy link
Contributor

ph commented Mar 7, 2022

@stuartnelson3 @simitt Sorry for the delay, I was looking at an ongoing SDH, that took way more time than I wanted to solve. Looking a the last changes it look good to me @AndersonQ can you do a final review?

@simitt
Copy link
Contributor

simitt commented Mar 7, 2022

Thanks for the feedback @ph. @stuartnelson3 is on PTO, can you or @AndersonQ please take ownership of merging this in or specifying what exactly is missing?

@simitt
Copy link
Contributor

simitt commented Mar 7, 2022

I see that the Elastic Agent has moved to it's own repository. @ph can you please own transfering this PR over and get it over the finish line?

@stuartnelson3
Copy link
Contributor Author

my PTO is being shifted slightly so I'll take a look at transferring this PR over today. I'll catch up with @ph if there is anything I need help with when transferring

stuartnelson3 added a commit to stuartnelson3/elastic-agent that referenced this pull request Mar 8, 2022
@mergify
Copy link
Contributor

mergify bot commented Mar 17, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b re-introduce-instrumentation upstream/re-introduce-instrumentation
git merge upstream/main
git push upstream re-introduce-instrumentation

@ph
Copy link
Contributor

ph commented Mar 17, 2022

Going to close this one in favor of the one on the Elastic Agent repository.

@ph ph closed this Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.2.0 Automated backport with mergify Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants