Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with diff in github actions #666

Open
tropnikovvl opened this issue May 11, 2024 · 10 comments
Open

Error with diff in github actions #666

tropnikovvl opened this issue May 11, 2024 · 10 comments

Comments

@tropnikovvl
Copy link

tropnikovvl commented May 11, 2024

Hello!

This error sometimes appears for an unknown reason about 1 time per 10 starts.
I'm using version 5.2.0, but I observed this on version 5.1.0 as well.

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmpeo1pvo2y/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml
DEBUG:flux_local.command:Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file

Traceback (most recent call last):
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/flux_local.py", line 61, in main
    asyncio.run(action.run(**vars(args)))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/diff.py", line 414, in run
    await asyncio.gather(
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 309, in inflate
    await asyncio.gather(*tasks)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 237, in inflate_release
    await visitor.func(pathlib.Path(""), release, cmd)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 197, in call_async
    objects = await cmd.objects()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 128, in objects
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 128, in <listcomp>
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 118, in _docs
    out = await self.run()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 112, in run
    return await run_piped(self._cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 120, in run_piped
    result = await _run_piped_with_sem(cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 110, in _run_piped_with_sem
    out = await asyncio.wait_for(cmd.run(stdin), _TIMEOUT)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 100, in run
    raise self.exc("\n".join(errors))
flux_local.exceptions.HelmException: Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file

flux-local error:  Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file
@allenporter
Copy link
Owner

I wonder if perhaps this is specific to a certain version of helm. This seems similar to helm/helm#7600 where the helm command may not be resiliant to multiple instances running at once sometimes.

@tropnikovvl
Copy link
Author

I have several jobs running in parallel to each other (via Github Actions matrixes).
And most likely they are executed on different hosts.

@allenporter
Copy link
Owner

Can you try a newer version of helm and see if that helps?

@tropnikovvl
Copy link
Author

Hello!
Thanks for the update!

I'll keep an eye on it, the fact is that on the previous version I encountered problems on average 1 time out of 10-15 launches.
If anything happens I will write here

@tropnikovvl
Copy link
Author

@allenporter
Unfortunately the problem persists

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmp5m55fxx_/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml
DEBUG:flux_local.command:Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

WARNING:asyncio:Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 2381 is closed
Traceback (most recent call last):
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/flux_local.py", line 61, in main
    asyncio.run(action.run(**vars(args)))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/diff.py", line 414, in run
    await asyncio.gather(
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 309, in inflate
    await asyncio.gather(*tasks)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 237, in inflate_release
    await visitor.func(pathlib.Path(""), release, cmd)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 197, in call_async
    objects = await cmd.objects()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 131, in objects
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 131, in <listcomp>
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 120, in _docs
    out = await self.run()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 114, in run
    return await run_piped(self._cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 122, in run_piped
    result = await _run_piped_with_sem(cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 110, in _run_piped_with_sem
    out = await asyncio.wait_for(cmd.run(stdin), _TIMEOUT)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 100, in run
    raise self.exc("\n".join(errors))
flux_local.exceptions.HelmException: Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

flux-local error:  Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7fd7689a2a70>
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_subprocess.py", line 126, in __del__
    self.close()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_subprocess.py", line 104, in close
    proto.pipe.close()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/unix_events.py", line 746, in close
    self.write_eof()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/unix_events.py", line 732, in write_eof
    self._loop.call_soon(self._call_connection_lost, None)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed

@allenporter
Copy link
Owner

Hi, what version of helm are you using? Thanks!

@tropnikovvl
Copy link
Author

Hello.

I have the latest version of Helm, but I don’t really understand why it’s here.
The diff is executed in the github runner and I do not pre-install anything into it.
Just using this code

name: "Flux Diff"

on:
  push:
    branches: ["renovate/*"]

concurrency:
  group: ${{ github.workflow }}-${{ github.event.number || github.ref }}
  cancel-in-progress: true

jobs:
  diffs:
    name: Compute diffs
    runs-on: ubuntu-22.04
    steps:
      - name: Setup Flux CLI
        uses: fluxcd/flux2/[email protected]

      - uses: allenporter/flux-local/action/[email protected]
        id: diff
        with:
          live-branch: develop
          path: clusters/path
          resource: helmrelease
          debug: true

      - name: PR Comments
        uses: mshick/add-pr-comment@v2
        if: ${{ steps.diff.outputs.diff != '' }}
        with:
          message-id: ${{ github.ref }}/flux-diff
          message-failure: Unable to post HelmRelease diff
          message: |
            `````diff
            ${{ steps.diff.outputs.diff }}
            `````

@allenporter
Copy link
Owner

allenporter commented Jul 6, 2024

What's the "concurrency" about? does that run in parallel on the same filesystem .

Basically we can't have multiple processes clobbering the local filesystem. Flux build creates temp files that may be getting messed up if two run at once in the same directory.

To do multiple runs at once they may need their own file paths checked out.

@tropnikovvl
Copy link
Author

tropnikovvl commented Jul 7, 2024

All launches are performed in parallel, but they work in individual containers of GitHub runners and should not affect each other.
Screenshot 2024-07-07 at 14 04 47
Screenshot 2024-07-07 at 14 05 05

That's why I'm confused when I see duplicate logs

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmp5m55fxx_/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml
DEBUG:flux_local.command:Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

@allenporter
Copy link
Owner

OK this still seems consistent with helms cache not working with multiple instances in parallel. People say the solution is to use a separate temporary directory for every instance. The reason for a shared repository cache is to avoid needing to pull the same repositories multiple times specially when running diffs (everything is loaded twice). We could workaround with a lock held on each repo as a hack but not a fan necessarily of that. Could also add more controls to tune helm concurrency.

I'd prefer if helm cli was fixed to be more resilient to running in parallel of course....

Need to think about this more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants