-
Notifications
You must be signed in to change notification settings - Fork 262
Constantly creating new releases for charts even when no changes #457
Comments
It seems version 1.1.0 constantly creates new releases for all |
I'm facing the similar issue with strimizi kafka operator. With 1.0.2 works well, but with 1.1.0 it keeps sync the chart even though there is no change. |
I'm seeing the same issue with version 1.1.0 and cert-manager 0.15.0 |
We're seeing the same issue with v1.1.0 and cert-manager v0.15.0.
|
We had the same issue, but in our case this was caused by too low memory limits for the helm operator. This has caused it to be restarted by kubernetes. I would prefer if the dry-run action would be called here. |
I'm also seeing this problem with some custom charts. Each reconcile iteration seems to create a new release upgrade even though nothing changes. EDIT: Figured it out after doing some debugging. It looks like if we use relative chart versions e.g. My issue doesn't seem to be related to the cert-manager issue described here, I think it would deserve its own ticket. EDIT2: Relevant issue #490 (resolved in v1.2.0, my comment can be ignored). |
I am also having this issue with a lot of charts, info can be found in slack. Can we have a maintainer chime in or at least rename this issue? It can happen on any random chart from what I've seen. I am not seeing the helm-operator pod being restarted like @twendt has. https://cloud-native.slack.com/archives/CLAJ40HV3/p1597320334119100 |
@onedr0p I have renamed the issue to be more generic. Still, we need an official response. This issue is open for 2 months without any feedback and I think it´s quite critical. I want to fully dive into GitOps, but this issue open for so long without any feedback doesnt give much confidence. @stefanprodan, can some maintainer look at this, please? |
I too observed this today, up to revision 2291(!) of a Helm-operator (v1.2.0) controlled HelmRelease |
Same here. I have just completed the 'get-started' tutorial and the demo apps are upgraded every 5 minutes:
Helm list:
|
Sorry for the late response all, last months have been hectic in terms of workload and GitOps Toolkit developments, and I was enjoying time off the last two weeks. I tried to reproduce the issue with version @davidholsgrove may it be possible that the revision drift up to 2291 was due to the misbehaving |
@hiddeco I've been using helm-operator I'm using fixed chart versions, so not the same as #469 Ive killed the fluxcd and helm-operator pods in each cluster to stop the helm history being trashed. Cluster 1prometheus-operator and helm-operator continually upgrading;
Upgrade occurring every 3 minutes (the last one manually, after helm-operator had been stopped for a day):
Cluster 2gitlab continually upgrading;
|
Hi there. I reported #469 in 1.1.0 but we are now running the same issue for all releases in 1.2.0, just like most of people here reported. Rolling back to 1.0.1 :( |
The problem seems to be that the I will try to have a prerelease ready for you by tomorrow. |
Still unsuccessful in replicating the issue where the Given you all seem to have installed the helm-operator using Helm itself, can you please provide me with the output of (Another option would be to |
Thanks @hiddeco - looks like it was the version of the CRD wasn't updated and caused the run away helm-operator upgrades. The HelmOperator chart option Previously I had a (stale) version of the CRD in the git repo my FluxCD enforces. Would be good if HelmOperator had an initcontainer or other check and refused to start if its CRD was the wrong version maybe? |
The right way is to not install the CRDs using the Helm chart, but apply it manually / synchronize it using Flux (as written out in the install instructions).
If possible, that would likely be an improvement, but I do not think it will be implemented at this time (or in the near future) as we are working on a next-gen |
@hiddeco we've been having this issue on v1.1.0 with the correct CRD managed by Flux. Are you saying that this issue affects v1.1.0 and v1.2.0 if the CRD isn't updated? |
After applying the CRD for Thanks @hiddeco ! |
Just a friendly reminder that Helm is not suitable for managing the lifecycle of Kubernetes CRD controllers. CRDs have to be extracted from charts and applied on the cluster with Flux as plain YAMLs, otherwise the controller version will diverge from its API and that can break production in various ways. |
@stefanprodan can you elaborate on how CRD should be handled? |
@talmarco it is described here, however these notes should be for upgrading too, not only install https://github.com/fluxcd/helm-operator/tree/master/chart/helm-operator#installation |
@onedr0p I already have helm-operator CRD installed. My question was how to handle other CRDs (like cert-manager) as I understand from @stefanprodan's answer this was the root cause for constantly upgrading the charts. |
When upgrading cert manager you also need to manually apply the crds or commit them to your repo for flux to apply. Same goes for here. |
@stevehipwell |
Thanks @hiddeco, that was what I thought and what we've seen when testing the v1.2.0 release.
@stefanprodan related to the above statement could you confirm that |
@stevehipwell |
Well we assume people follow our install/upgrade instructions and always update both CRDs and controller as they are part of the same thing. |
@stefanprodan I agree that there isn't anything wrong and that the instructions are correct but specifically calling out CRD changes in release notes would be further designing for success. |
I used to include them in the release notes (see for example the |
Given bumping Helm-Operator from v1.1.0 to v1.2.0 without applying the new CRD resulted in a change in behaviour of the controller, I'd suggest this wasn't a backwards compatible upgrade of the controller?
Thanks - I had the CRD from the v1.0.0 managed by inclusion as yaml in my Flux monitored git repo. Thanks again for the assist, looking forward to v2 in the GitOps Toolkit! |
@stefanprodan Hmm, I don't remember doing any updates to the helm operator since I installed it in my cluster. I have version 1.1.0 and here is the output of the CRD definition: CRD Definition
I could try doing an update 1.2.0 and update the CRD manually like said. |
I think I've tracked down this bug. In my case, I had a release being recreated non-stop. I decided to enable the
All of this fun lead me to a clean readable diff where I've identified the following in my case: \t\t"metadata_agent": map[string]interface{}{
\t\t\t"DEFAULT": map[string]interface{}{
\t\t\t\t... // 2 identical entries
\t\t\t\t"nova_metadata_host": string("metadata.openstack.svc.cluster.local"),
\t\t\t\t"nova_metadata_ip": string("metadata.openstack.svc.cluster.local"),
- \t\t\t\t"nova_metadata_port": float64(80),
+ \t\t\t\t"nova_metadata_port": int(80),
\t\t\t}, It looks like during the comparison, there is a mismatch with Helm thinking one part of this is I haven't dug into why this is happening exactly or where, but I'll be doing that. Help is appreciated. :) |
I think my theory right now is that we're currently building the dry-run using the original values which are parsed from YAML. This probably parses things in the correct type. However, the data we pull out to compare from Helm (stored inside a secret in this case) is going through the JSON unmarshal which as per the documentation:
I've dug around and it looks like we can workaround this, there seems to be something documented inside google/go-cmp#222 which helps document a similar issue with a resolution using |
having same issue here using version 1.2
This is particularly problematic for us because have migration hooks that are constantly being run. The chart in question has a job with
(note that before-hook-creation is the default). every |
@scott-grimes can you please try out |
same. heres logs from the new helm-operator
|
some further info here. enabling diff on helm-operator shows the following:
|
the "values" in both blocks ( |
also
is changed to
|
issue may be coming from helm-operator/pkg/helm/v3/release.go Line 70 in 4dc90d7
ultimately surfacing at https://github.com/fluxcd/helm-operator/blob/master/pkg/helm/utils.go#L19 does the comparison using when determining if an update is needed use a dry-run to generate the hypothetical new release, then compare the (hypothetical) new release to the existing release? several places which should be map[string]interface{} are instead &chartutil. (Values, Capabilities, etc) |
A dry-run is indeed used to determine if an upgrade needs to be performed, this approach has however proven to be brittle since the beginning, and we are using a different approach in the helm-controller to simply not have to deal with uses like the above (or more complex issues like determining what should be taken into account as a divergence that triggers an upgrade). |
/stale |
This is going to continue to be stale, as the Helm Operator is no longer actively maintained. We advise people to upgrade to the helm-controller. |
Sorry @hiddeco I was closing stale issues I'd opened and accidentally ended up commenting on this one that I didn't even open. |
Closing. Users are recommended to upgrade to Flux v2 and Helm Controller ASAP. https://fluxcd.io/docs/migration/helm-operator-migration/ (Edit: I've reopened this issue to avoid new duplicate reports, since there are still recent fresh reports of this issue. But our position remains the same, we cannot devote resources to fixing bugs in Helm Operator, which has been in a code freeze phase of maintenance mode for longer than one year.) |
We will be archiving the Helm Operator repo very soon, as described by: Please upgrade to Helm Controller and Flux v2, where issues like this can get attention and get solved by the very active team of maintainers. Closing now, as there will be no more work on Helm Operator in this repo. For migration support, please consult the migration guide: https://fluxcd.io/flux/migration/flux-v1-migration/ or contact us in the Flux channel on CNCF Slack where we still offer migration assistance, and workshops that are sponsored by the Flux project members and their supportive team at Weaveworks. |
Describe the bug
Hello.
I am starting experimenting with Flux and the Helm Operator on a new Cluster and everything went fine until I deployed cert-manager Helm chart.
Each time the sync runs, the helm operator tries to do an update and create a new release even without any change in the chart.
This is causing some instability in the Network of my cluster. (maybe to do excessive load in the API server resulted from the constant updates.
What is strange is that if I run:
kubectl -n cert-manager get helmreleases.helm.fluxcd.io
the latest update date is the initial deploy. Still a new secret with the helm release information is being created every time and the pods restarted.To Reproduce
Just a basic helm release manifest:
Expected behavior
Cert-manager should only be deployed when the is any change.
Logs
Here is the log output:
Sometimes I also found these errors:
Not exactly sure what this is but it seems to take some time to run.
Additional context
The text was updated successfully, but these errors were encountered: