Skip to content

Commit

Permalink
Update common (#96)
Browse files Browse the repository at this point in the history
* Updated namespaces template to include labels and annotations functionality

* Added schema validation to support additional formal for labels and annotations

* Updated the values-example.yaml to include new format for namespaces

* Updated Changes.md to include new namespaces functionality.

* Updating CI tests

* Fixed Markdown errors

* - Added functionality to support the following format for labels and annotations:
      labels:
        openshift.io/node-selector: ""
      annotations:
        openshift.io/cluster-monitoring: "true"

* Fixed CI Issues

* Avoid exited containers proliferation

When running the `pattern.sh` script multiple times, a lot of
podman exited containers will be left on the machine, adding
`--rm` parameter to `podman run` makes podman automatically
delete the exited containers leaving the machine cleaner.

* Handling of pre-release builds is too complex for a helm chart

Generating the ICSP and allowing insecure registries is best done prior
to helm upgrade, and requires VPN access to registry-proxy.engineering.redhat.com

* Fixing issues with operator groups

* Adding CI test

* Updated operator group template

* Updating CI issues

* Removed duplicate code for operatorgroup by using multiple conditions

* Allow overriding the pattern's name

This is especially useful when multiple people are working on a pattern
an have been using different names:

    $ make help |grep Pattern:
    Pattern: multicloud-gitops
    $ make NAME=foobar help |grep Pattern:
    Pattern: foobar

* Add precise instruction to upgrade the vault subchart

* Upgrade vault-helm to v0.24.1

* Add an item to README.md

* Fix up common/ tests

* Fix super linter

* Set gitOpsSpec.operatorSource

After merging validatedpatterns/patterns-operator@235b303
it is now effectively possible to pick a different catalogSource for
the gitops operator. This is needed in order to allow CI to install
the gitops operator from an IIB

* Introduce EXTRA_HELM_OPTS

This variable can be set in order to pass additional helm arguments from the
the command line. I.e. we can set things without having to tweak values files
So it is now possible to run something like the following:

  ./pattern.sh make install \
  EXTRA_HELM_OPTS="--set main.gitops.operatorSource=iib-49232"

* Disable var-naming[no-role-prefix] in ansible lint

* Add new ansible role to deal with IIBs

* Simplify load-iib target

* Add templates folder

* Fix a couple of linting warnings

* Fix some super-linter complaints

* Skip the iib-ci playbook

* Drop var-naming[no-role-prefix] linter

* Allow for multiple images when calling load-iib

* Add help for load-iib

* Output index_image in make

* Output index_image in make (2)

* Set facts later in the playbook not in defaults/

* Fix how we export vars in make load-iib

* Fix how we export vars in make load-iib (2)

* Use machineCount to register the number of nodes that need to be ready

* Add helpful debug messages

* Add | on shell now that we call pipefail

* Test dropping nevercontact source

* Skip insecure tls when logging in

* Also allow gchr.io

* Revert "Test dropping nevercontact source"

This reverts commit d8746a37fce2663018f52203c892f00b825e32a7.

* Fix typo

* Clarify instructions in the README file

* Automate the channel example

* Find out KUBEADMINAPI programmatically

* Use command instead of shell

* Do not grep for operator bundle unless it is the gitops operator

* Also whitelist ghcr.io

* Fetch the operator bundle itself in a more robust way

It seems that the operator bundle image itself is nowhere to be found
inside any OCP cluster object (it's not in packagemanifests nor
catalogsource). Resorting to parsing the IIB via opm alpha commands
to fetch the exact image.

* Add more mirrors

* Some more work to support MCE

* Cleanup spacing

* Fix super-linter

* Move task in right folder

* Drop last mention of operator instead of item

* Improve the grepping for the operator bundle

Without also grepping for the default_channel we can end up getting
multiple results, which breaks everything.

Tested this and it fixed the issue I was seeing with the
openshift-gitops-operator this morning

* Drop display_skipped_hosts

display_skipped_hosts=False has a horrible side-effect:
When a task takes a long time, it is always the *next* task and not the
one printed on the screen/log. That is because ansible has to wait for
the task to finish before printing it as it does not know before hand if
the host will be skipped and hence the task should not be displayed at
all

* Be more specific about the steps in the README

* Upgrade ESO to v0.8.2

* Update README.md

* Update tests after eso 0.8.2 upgrade

* Move to new spec format for dex/sso

Via https://issues.redhat.com/browse/GITOPS-2761 we are told that the
dex configuration has a new format.
Old format:

    spec:
      dex:
        openShiftOAuth: true
        resources:
        ...

New format:

    spec:
      sso:
        provider: dex
        dex:
          openShiftOAuth: true
          resources:
          ...

This format is only supported starting with gitops-1.8.0, so we should
merge this only when we are absolutely sure that no pattern in no
situation needs an older gitops version.

Tested on MCG with gitops-1.8.2

Note: with this change gitops < 1.8 is not supported. Starting with
gitops-1.9 the old format will be unsupported.

* Disable ArgoCD from kubeconform

The reason is that most of the tools we used to generate the json
schema, seem to be unmaintained, so it is getting hard to update
our schemas in our GH org. We'll need to revisit this in the future.

* Add a short line about username/token for the iib role on OCP <= 4.12

* Drop https:// from podman login

Seems we hit https://www.github.com/containers/podman/issues/13691 at
least with older podman versions.

If this turns out to break podman 4.5.0 I will special case it later

* Set the mce-subscription-spec annotation

We set it by default to "redhat-operators" and if defined to .Values.clusterGroup.subscriptions.acm.source
The reason we do this is the following:
1. In a default deployment scenario MCE has to be deployed as normal
   from the redhat-operators catalogSource just as ACM is
2. When we deploy gitops-operator from an IIB instead, MCE would be
   installed trying to get it from the IIB because
   https://www.github.com/stolostron/multiclusterhub-operator/pull/975
   made it so that it picks the latest version looking at all catalog
   sources. But since we only mirrored the gitops operator in the
   cluster, this breaks as the images for MCE from the IIB are not there
   By setting the default to "redhat-operators" we fix this case
3. Now in the case where we want to install ACM from an IIB we need to
   be able to override this and we will pick whatever value is set in
   .Values.clusterGroup.subscriptions.acm.source, which will need to be
   defined for this to work when testing ACM+MCE from an IIB

Note: Currently point 3. works only if you set it in a values file.
Setting .Values.clusterGroup.subscriptions.acm.source via extraParams
won't be passed down from the clusterGroup app to the applications.
It's a bug that we need to fix.

Note(2): We surround this with an 'if kindIs "map" .Values.clusterGroup.subscriptions'
because we do not want to break things if subscription is a list and not
a map. If we ever manage to drop subscriptions as list, then we can
remove that if

* Fix typo in README for iib

* Simplify the README a bit

* Add support for extraParams being passed down to all applications

Via validatedpatterns/patterns-operator#74
we add the extraParams in an extraParametersNested dictionary that holds
the extraParams key/value pairs. If they exist, let's add them as
parameters.

This allows them to end up in the applications.

* Add a lookup playbook to figure out IIB numbers

* Allow overriding channel and source when installing the patterns-operator

This will allow us to test the patterns-operator using a different
catalogsource (potentially installed via an IIB). So we can run:

make EXTRA_HELM_OPTS="\
  --set main.extraParameters[0].name=main.patternsOperator.channel --set main.extraParameters[0].value=slow \
  --set main.extraParameters[1].name=main.patternsOperator.source --set main.extraParameters[1].value=patten-index" install

* Fix small typo in iib instructions

* Drop a redirect and up retries when pushing the IIB to the internal registry

* Update ESO to v0.8.3

* WIP add presync for eso that waits for vault to be up

* Add tests

* Fix image and comment

* Adding rbac to support the vault sa checking on the vault-0 pod status.

* Make Test

* Revert "Make Test"

This reverts commit 64e9dc7.

* Revert "Adding rbac to support the vault sa checking on the vault-0 pod status."

This reverts commit 598bc74.

* Revert "Fix image and comment"

This reverts commit d4d3fe1.

* Revert "Add tests"

This reverts commit ab5532a.

* Revert "WIP add presync for eso that waits for vault to be up"

This reverts commit 2797699.

* Increase the default retry limit when syncing

ArgoCD will retry 5 times by default to sync an application in case of
errors and then will give up. So if an application contains a reference
to a CRD that has not been installed yet (say because it will be
installed by another application), it will error out and retry later.
This happens by default for a maximum of 5 times [1]. After those 5 times
the application will give up and will stay in Degraded moded and
eventually move to Failed. In this case a manual sync will usually fix
the application just fine (i.e. as long as the missing CRD has been
installed in the meantime).

Now to solve this issue we can add complex preSync Jobs that wait for
the needed resources, but this fundamentally breaks the simplicity of
things and introduces unneeded dependencies. In this change we just
increase the default retry limit to something larger (20) that should
cover most cases. The retry limit functionality is rather undocumented
currently in the docs but is defined at [2] and also shown at [3].

In our patterns' case the concrete issue happened as follows:
1. ESO ClusterSecrets were often not synced/degraded
2. We introduced a Job in a preSync hook for the ESO chart that would
   wait on vault to be ready before applying the rest of ESO
3. MCG started failing because the config-demo app had already tried to
   sync 5 times and failed everytime because the ESO CRDs were not
   installed yet (due to ESO waiting on vault)

So instead of adding yet another job, let's just try a lot more often.
We picked 20 as a sane default because that should have argo try for
about 60 minutes (3min is the default maximum backoff limit)

Tested with two MCG installations (with the ESO Job hook included) and
both worked out of the box. Whereas before I managed to get three
failures out of three installs.

[1] https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L1680
[2] https://github.com/argoproj/argo-cd/blob/master/manifests/crds/application-crd.yaml#L1476
[3] https://github.com/argoproj/argo-cd/blob/master/docs/operator-manual/application.yaml#L202C18-L202C100

* Add Changes.md entry

* Removed previous version of common to convert to subtree from https://github.com/hybrid-cloud-patterns/common.git main

* make test

---------

Co-authored-by: Lester Claudio <[email protected]>
Co-authored-by: Lorenzo Dalrio <[email protected]>
Co-authored-by: Michele Baldessari <[email protected]>
Co-authored-by: Andrew Beekhof <[email protected]>
Co-authored-by: Martin Jackson <[email protected]>
  • Loading branch information
6 people authored Jul 8, 2023
1 parent 41fb72b commit cf3e687
Show file tree
Hide file tree
Showing 36 changed files with 143 additions and 704 deletions.
5 changes: 5 additions & 0 deletions common/Changes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changes

## Jul 8, 2023

* Introduced a default of 20 for sync failures retries in argo applications (global override via global.options.applicationRetryLimit
and per-app override via .syncPolicy)

## May 22, 2023

* Upgraded ESO to 0.8.2
Expand Down
2 changes: 2 additions & 0 deletions common/acm/templates/policies/application-policies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ spec:
automated:
prune: false
selfHeal: true
retry:
limit: {{ default 20 $.Values.global.options.applicationRetryLimit }}
ignoreDifferences:
- group: apps
kind: Deployment
Expand Down
4 changes: 4 additions & 0 deletions common/clustergroup/templates/plumbing/applications.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ spec:
{{- else }}
syncPolicy:
automated: {}
retry:
limit: {{ default 20 $.Values.global.options.applicationRetryLimit }}
{{- end }}
{{- if .ignoreDifferences }}
ignoreDifferences: {{ .ignoreDifferences | toPrettyJson }}
Expand Down Expand Up @@ -239,6 +241,8 @@ spec:
{{- else }}
syncPolicy:
automated: {}
retry:
limit: {{ default 20 $.Values.global.applicationRetryLimit }}
# selfHeal: true
{{- end }}
---
Expand Down
4 changes: 4 additions & 0 deletions common/clustergroup/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,10 @@
"type": "string",
"deprecated": true,
"description": "This is used to approval strategy for the subscriptions of OpenShift Operators being installed. You can choose Automatic or Manual updates. NOTE: This setting is now available in the subcriptions description in the values file."
},
"applicationRetryLimit": {
"type": "integer",
"description": "Number of failed sync attempt retries; unlimited number of attempts if less than 0"
}
},
"required": [
Expand Down
1 change: 1 addition & 0 deletions common/clustergroup/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ global:
useCSV: True
syncPolicy: Automatic
installPlanApproval: Automatic
applicationRetryLimit: 20

enabled: "all"

Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

30 changes: 0 additions & 30 deletions common/golang-external-secrets/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,33 +18,3 @@ external-secrets:
certController:
image:
tag: v0.8.3-ubi

rbac:
roles:
- name: view-pods
createRole: true
apiGroups:
- '""'
scope:
cluster: false
namespace: vault
resources:
- pods
verbs:
- "get"
- "list"
- "watch"
roleBindings:
- name: view-pods-rb
createBinding: true
scope:
cluster: false
namespace: vault
subjects:
kind: ServiceAccount
name: vault
namespace: vault
apiGroup: ""
roleRef:
kind: Role
name: view-pods
2 changes: 2 additions & 0 deletions common/tests/acm-industrial-edge-hub.expected.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,8 @@ spec:
automated:
prune: false
selfHeal: true
retry:
limit: 20
ignoreDifferences:
- group: apps
kind: Deployment
Expand Down
2 changes: 2 additions & 0 deletions common/tests/acm-medical-diagnosis-hub.expected.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,8 @@ spec:
automated:
prune: false
selfHeal: true
retry:
limit: 20
ignoreDifferences:
- group: apps
kind: Deployment
Expand Down
4 changes: 4 additions & 0 deletions common/tests/acm-normal.expected.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,8 @@ spec:
automated:
prune: false
selfHeal: true
retry:
limit: 20
ignoreDifferences:
- group: apps
kind: Deployment
Expand Down Expand Up @@ -747,6 +749,8 @@ spec:
automated:
prune: false
selfHeal: true
retry:
limit: 20
ignoreDifferences:
- group: apps
kind: Deployment
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ data:
localClusterDomain: apps.region.example.com
namespace: pattern-namespace
options:
applicationRetryLimit: 20
installPlanApproval: Automatic
syncPolicy: Manual
useCSV: true
Expand Down Expand Up @@ -382,6 +383,8 @@ spec:
}
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/applications.yaml
Expand Down Expand Up @@ -428,6 +431,8 @@ spec:
value: apps.region.example.com
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/argocd.yaml
Expand Down
17 changes: 17 additions & 0 deletions common/tests/clustergroup-industrial-edge-hub.expected.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,7 @@ data:
localClusterDomain: apps.region.example.com
namespace: pattern-namespace
options:
applicationRetryLimit: 20
installPlanApproval: Automatic
syncPolicy: Manual
useCSV: true
Expand Down Expand Up @@ -713,6 +714,8 @@ spec:
]
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/applications.yaml
Expand Down Expand Up @@ -759,6 +762,8 @@ spec:
value: apps.region.example.com
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/applications.yaml
Expand Down Expand Up @@ -805,6 +810,8 @@ spec:
value: apps.region.example.com
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/applications.yaml
Expand Down Expand Up @@ -881,6 +888,8 @@ spec:
]
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/applications.yaml
Expand Down Expand Up @@ -927,6 +936,8 @@ spec:
value: apps.region.example.com
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/applications.yaml
Expand Down Expand Up @@ -973,6 +984,8 @@ spec:
value: apps.region.example.com
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/applications.yaml
Expand All @@ -997,6 +1010,8 @@ spec:
}
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/applications.yaml
Expand Down Expand Up @@ -1061,6 +1076,8 @@ spec:
value: "1.10.3-ubi"
syncPolicy:
automated: {}
retry:
limit: 20
# selfHeal: true
---
# Source: pattern-clustergroup/templates/plumbing/argocd.yaml
Expand Down
Loading

0 comments on commit cf3e687

Please sign in to comment.