Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

Audit test pipelines #3053

Open
cachedout opened this issue Sep 28, 2022 · 18 comments
Open

Audit test pipelines #3053

cachedout opened this issue Sep 28, 2022 · 18 comments
Assignees

Comments

@cachedout
Copy link
Contributor

cachedout commented Sep 28, 2022

This is a master tracking issue issue for auditing which E2E test pipelines need to remain enabled.

Beats CI pipelines

Pipeline Main Health Triggers Stakeholders Issue(s) Removal planned
Docker images ⭕ Stale None Robots [@cachedout and @kuisathaverat ]
Fleet E2E 🔴 Broken Daily build Fleet [@joshdover] elastic/elastic-agent#1174
Observability Helm Charts 🟢 Healthy Daily build Robots [@cachedout and @kuisathaverat ] Issue located in private repo
K8S Autodiscover 🟡 Flakey Daily build Cloud Native Monitoring [@gizas]
Observability MacOS 🔴 Broken Daily build Elastic Agent [@cmacknz and @jlind23 ] https://github.com/elastic/ci/issues/705
Fleet Server ⭕ Stale None Fleet [@joshdover ] elastic/fleet-server#1927
Fleet UI ⭕ Stale None Fleet and Integrations [@kpollich ]

Fleet CI pipelines

Pipeline Main Health Triggers Stakeholders Issue
Pipeline helper 🔴 Broken Push to main; PR labeled Elastic Agent[@cmacknz and @jlind23 ] elastic/elastic-agent#1174

⚠️ If you are listed as a stakeholder, we would like to know the following:

  1. Should the pipeline be removed from the CI or should it remain?
    1.1 If the pipeline remains and is broken, what is the link to an issue tracking a fix?
    1.2 If the pipeline should remain, how is it monitored by the team to ensure that build artifacts are not produced when the tests fail?

Next steps

Proposed pipeline criteria

I am proposing that we remove all pipelines which do not meet any of the following criteria:

  1. Necessary for the ongoing health of the E2E test suite itself
  2. Used by a product team as a quality gateway. Concretely, this means that a failing test blocks a PR from being merged or a build artifact from being produced.
  3. Exist to ensure the quality of a supported product.

Timeline

  1. All existing E2E pipelines have stakeholders assigned no later than: October 1, 2022
  2. All stakeholder agree upon proposed pipeline criteria no later than: October 20, 2022
  3. Non-confirming pipelines will be removed from Jenkins and code will be removed from the E2E test suite beginning on: Nov 1st, 2022

Related efforts

There is a separate effort to try and reduce the scope of E2E testing back to a point where stability can be maintained, but it is limited to tests for the Agent. That effort can be found here: elastic/elastic-agent#1174

@v1v
Copy link
Member

v1v commented Sep 28, 2022

For the Macos Daily -> it was originally implemented in #2626, and using the Orka ephemeral workers, and superseded #2336

the error is something the @elastic/ci-systems might need to help with:

[2022-09-28T04:58:54.763Z] + .ci/scripts/deployment.sh create
[2022-09-28T04:58:54.887Z] Cloning into '.obs'...
[2022-09-28T04:58:55.095Z] Host key verification failed.
[2022-09-28T04:58:55.095Z] fatal: Could not read from remote repository.
[2022-09-28T04:58:55.095Z] 
[2022-09-28T04:58:55.095Z] Please make sure you have the correct access rights
[2022-09-28T04:58:55.095Z] and the repository exists.

IIUC, the recent upgrade in the CI controllers added a host key verification by default, we reported this in the past and it was partially fixed since we dont' see the below error but a new one:

image

but the error now happens in a subsequent stage to clone a private repository -- see the above console log

It worked in the past

image

@kuisathaverat
Copy link
Contributor

Docker images generated the Systemd Docker images used in the e2e tests, probably we are the stakeholders.

@cachedout
Copy link
Contributor Author

@v1v Thanks, that helps. I'm also trying to figure out what it actually does so that I can figure out how the stakeholders should be. I'm code-diving right now a bit to try and get a sense of that.

@kuisathaverat
Copy link
Contributor

Observability Helm Charts can be removed

@cachedout
Copy link
Contributor Author

@kuisathaverat Thanks! Regarding the Docker images -- that pipeline hasn't been executed for over a year. Does it still need to exist?

@kuisathaverat
Copy link
Contributor

kuisathaverat commented Sep 28, 2022

Does it still need to exist?

It is the only way to generate those images, when they change should be executed. These images are for making a test on installation on a systems environment. The main changes that can have are bumping the systems version or the Linux version.

@cachedout
Copy link
Contributor Author

@cmacknz and @jlind23 Are you tracking any issues for the flakiness in the K8s Autodiscover pipeline?

@v1v
Copy link
Member

v1v commented Sep 28, 2022

@v1v Thanks, that helps. I'm also trying to figure out what it actually does so that I can figure out how the stakeholders should be. I'm code-diving right now a bit to try and get a sense of that.

There was an original request to test on MacOS, for such, it was initially attempted with the AWS MacOS, but it was declined for vary reasons:

  1. Cost, IIRC, machines will be created and pay for 24 hours minimal, see Add Mac OS12 as part of the platform tested #2336 (comment)
  2. Implementation, the Ansible ec2 integration didn't work well , see Add Mac OS12 as part of the platform tested #2336 (comment)
  3. Ephemeral Orkas were available. see Add Mac OS12 as part of the platform tested #2336 (comment)

I guess the stakeholder might be @jlind23 as he was the original requester for the MacOS in AWS

@jlind23
Copy link
Contributor

jlind23 commented Sep 28, 2022

@cachedout this is the issue we will use for the first half of 8.6. @AndersonQ is already assigned to this and will closely work with you in order to get back to a better place.

@cachedout
Copy link
Contributor Author

@jlind23 That link seems wrong? :)

@jlind23
Copy link
Contributor

jlind23 commented Sep 28, 2022

Sorry, this one - elastic/elastic-agent#1174

@cmacknz
Copy link
Member

cmacknz commented Sep 28, 2022

@cmacknz and @jlind23 Are you tracking any issues for the flakiness in the K8s Autodiscover pipeline?

No, it may make sense to follow up with the Observability Cloudnative monitoring team to see if they have interest in fixing these tests faster than the agent team can get to them. They have done the majority of the recent work for autodiscovery features in agent.

@cachedout
Copy link
Contributor Author

No, it may make sense to follow up with the Observability Cloudnative monitoring team to see if they have interest in fixing these tests faster than the agent team can get to them.

Looping in @gizas . We are trying to stabilize the E2E test suite. Are you aware of the flakiness in the k8s autodiscover tests, and if so, is anybody on your time investigating them?

@joshdover
Copy link
Contributor

I have disabled most of the tests in the Fleet E2E suite while we eval what to do with the remaining: elastic/elastic-agent#1174 (comment)

@gizas
Copy link
Contributor

gizas commented Oct 5, 2022

Sorry for delayed answer, @cachedout , @cmacknz just checking K8s Autodiscover pipeline. Can you point me to a fail instance to have a look?

Indeed in the past we had provided some fixes

@gizas gizas closed this as completed Oct 5, 2022
@gizas gizas reopened this Oct 5, 2022
@joshdover
Copy link
Contributor

Observability Helm Charts can be removed

Any reason this hasn't been done yet? Seeing it fail on a few PR runs recently and couldn't find the issue to track removing these.

@cachedout
Copy link
Contributor Author

cachedout commented Oct 17, 2022

Any reason this hasn't been done yet?

Hi @joshdover . The issue is this one: https://github.com/elastic/observability-robots/issues/1325

We were considering this blocked until they sorted out the future regarding charts, but TBH it's probably not a big deal if we just pull it out now if it's failing in PRs. LMK what you think.

@joshdover
Copy link
Contributor

joshdover commented Oct 17, 2022 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants