Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Why are our Windows distribution tests failing? #4635

Open
Swiddis opened this issue Apr 12, 2024 · 4 comments
Open

[QUESTION] Why are our Windows distribution tests failing? #4635

Swiddis opened this issue Apr 12, 2024 · 4 comments

Comments

@Swiddis
Copy link
Contributor

Swiddis commented Apr 12, 2024

I think this is meant to be a bug report (either in our plugin or in this build repo) but I'm not 100% sure what the bug is or where it is. I'd like help to root cause so I can convert this to a proper bug report.

Problem

In dashboards-observability we've been getting an autocut issue for a failing distribution since February. The pipeline in question seems to fluctuate a lot between passing/unstable/failing. I've been diving through the logs to figure out what the issue is and am coming up blank: usually there's an integ-test section that says which tests are failing but this hasn't been present in any of the recent runs I've checked. It says "observabilityDashboards" is in the failing plugins list, but I can't locate the error.

Possibly-Related Info

One hint I can find is that there are bootstrap issues in some of the logs, one suspicion I have is that we may need to tweak the pipeline to run with --single-version=loose or --single-version=ignore:

ERROR [single_version_dependencies] Multiple version ranges for the same dependency
      were found declared across different package.json files. Please consolidate
      those to match across all package.json files. Different versions for the
      same dependency is not supported.

      If you have questions about this please reach out to the operations team.

      The conflicting dependencies are:

        cypress
          9.5.4 => opensearch-dashboards
          ^13.6.0 => observability-dashboards

I also have asked some other coworkers about the logs and got directed to this dashboards issue about failing Windows tests, which also seems related, citing that the logs mention permissions issues for deleting test files (or perhaps something about running shell on Windows):

Traceback (most recent call last):
  File "C:\Users\Administrator\jenkins\workspace\distribution-build-opensearch-dashboards\src\run_build.py", line 113, in <module>
    sys.exit(main())
  File "C:\Users\Administrator\jenkins\workspace\distribution-build-opensearch-dashboards\src\run_build.py", line 93, in main
    builder.build(build_recorder)
  File "C:\Users\Administrator\jenkins\workspace\distribution-build-opensearch-dashboards\src\build_workflow\builder_from_source.py", line 56, in build
    self.git_repo.execute(build_command)
  File "C:\Users\Administrator\jenkins\workspace\distribution-build-opensearch-dashboards\src\git\git_repository.py", line 85, in execute
    subprocess.check_call(command, cwd=cwd, shell=True)
  File "C:\Users\ContainerAdministrator\scoop\apps\python39\3.9.13\lib\subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)

subprocess.CalledProcessError: Command 'bash C:\Users\Administrator\jenkins\workspace\distribution-build-opensearch-dashboards\scripts\components\OpenSearch-Dashboards\build.sh -v 3.0.0 -p windows -a x64 -d zip -s false -o builds' returned non-zero exit status 1.

script returned exit code 1

Question

Why is the distribution failing? Is it a problem with our plugin, or is it an issue in the build pipeline? Is pipeline.log even the right place to look to debug this?

Context

I'm working on understanding and resolving these issues for our goal to fix all the flaky distribution tests by 2.14: opensearch-project/dashboards-observability#1670.

@github-actions github-actions bot added the untriaged Issues that have not yet been triaged label Apr 12, 2024
@derek-ho
Copy link
Contributor

@peterzhuamazon can you take a look and help out here? I think all/most plugins are getting this autocut on main

@bbarani
Copy link
Member

bbarani commented Apr 25, 2024

Tagging @rishabh6788 to help here.

@prudhvigodithi
Copy link
Collaborator

[Triage]
As the 2.14.0 release is moving forward, I assume this issue is fixed, @rishabh6788 can you please let us know?
Thanks

@prudhvigodithi prudhvigodithi added integtest and removed untriaged Issues that have not yet been triaged labels May 9, 2024
@AMoo-Miki
Copy link
Contributor

One hint I can find is that there are bootstrap issues in some of the logs, one suspicion I have is that we may need to tweak the pipeline to run with --single-version=loose or --single-version=ignore

We should be using --single-version=loose wherever OSD is being bootstrapped. The ignore option should only be used for debugging and never in production or test builds.

... got directed to opensearch-project/OpenSearch-Dashboards#5688 about failing Windows tests

Cleaning all empty folders recursively and other folder deleting failures are caused by a race to delete a folder and its parent by different parallel processes. I am working on rewriting this functionality in OSD. If this is a widespread pain, I can prioritize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

5 participants