Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD-604: Prune revisioned resources #1292

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tjungblu
Copy link
Contributor

@tjungblu tjungblu commented Jul 3, 2024

This cleans up some certificates that were copied but are not longer needed. At the same time, updating the docs to the state of today.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 3, 2024

@tjungblu: This pull request references ETCD-604 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

In response to this:

This cleans up some certificates that were copied but are not longer needed. At the same time, updating the docs to the state of today.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 3, 2024
@openshift-ci openshift-ci bot requested review from dusk125 and Elbehery July 3, 2024 15:06
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 3, 2024
@tjungblu
Copy link
Contributor Author

tjungblu commented Jul 3, 2024

seems kube-apiserver has this bundle hardcoded (wtf)

  - lastTransitionTime: "2024-07-03T15:26:51Z"
    message: |-
      GuardControllerDegraded: [Missing operand on node ip-10-0-107-23.us-east-2.compute.internal, Missing operand on node ip-10-0-121-222.us-east-2.compute.internal, Missing operand on node ip-10-0-8-249.us-east-2.compute.internal]
      InstallerControllerDegraded: missing required resources: [configmaps: etcd-serving-ca-1,kube-apiserver-audit-policies-1,kubelet-serving-ca-1,sa-token-signing-certs-1, secrets: etcd-client-1,localhost-recovery-client-token-1,localhost-recovery-serving-certkey-1]
      RevisionControllerDegraded: configmap "etcd-serving-ca" not found
    reason: GuardController_SyncError::InstallerController_Error::RevisionController_ContentCreationError
    status: "True"

edit: https://github.com/openshift/cluster-kube-apiserver-operator/blob/a88d4ae8e0a9c8d3457e8488d9da7f04680240d3/pkg/operator/resourcesynccontroller/resourcesynccontroller.go#L27-L32

I'll probably open a PR over there to use the "proper" bundle

@tjungblu
Copy link
Contributor Author

tjungblu commented Jul 4, 2024

okay, I pass, this is littered almost everywhere in openshift by now:
https://github.com/search?q=org%3Aopenshift%20%22etcd-serving-ca%22&type=code

reverting this part

This cleans up some certificates that were copied but are not longer
needed. At the same time, updating the docs to the state of today.

Signed-off-by: Thomas Jungblut <[email protected]>
* openshift-etcd/etcd-ca-bundle (etcd server, configmap, source of truth)
* openshift-etcd-operator/etcd-ca-bundle (for the operator to reach etcd)
* openshift-config/etcd-serving-ca (for apiserver and others to connect to etcd)
* openshift-config/etcd-ca-bundle (just for consistency’s sake, should replace etcd-serving-ca, but is very cumbersome)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting a change but just for consistency’s sake made me realize I forgot who is consuming openshift-config/etcd-ca-bundle since it's not the operator, apiserver or etcd as they all have their own copies of the CA.

Was this required for somewhere in bootstrap? Trying to recall if there's more here that we need to expand on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, let me try to remove it and see what's breaking :) If it's not needed we'll just kick it out here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems unused. I leave this as its own commit, in case we need to revert it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1292/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-serial/1811018907415220224

[sig-api-machinery] API data in etcd [It] should be stored at the correct location and version for all resources [Serial] [Suite:openshift/conformance/serial]
  github.com/openshift/origin/test/extended/etcd/etcd_test_runner.go:33

    [FAILED] Unexpected error:
        <*errors.StatusError | 0xc000cb0e60>: 
        configmaps "etcd-ca-bundle" not found
        {
            ErrStatus: 
                code: 404
                details:
                  kind: configmaps
                  name: etcd-ca-bundle
                message: configmaps "etcd-ca-bundle" not found
                metadata: {}
                reason: NotFound
                status: Failure,
        }
    occurred
    In [It] at: github.com/openshift/origin/test/extended/etcd/etcd_test_runner.go:96 @ 07/10/24 14:58:27.997

We might be hitting this test preventing us from removing the configmap I think. Need to look more closely at what that test is doing or where we can update the test data. But in the interest of this PR we can also leave this as is.

https://github.com/openshift/origin/blob/bd4b3b87e16002820359555330093c05ef00ba52/test/extended/etcd/etcd_storage_path.go#L218-L222

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's roll it back for now. I've just also looked into the 4.14 branch, where we have pulled it from openshift-config:
https://github.com/openshift/cluster-etcd-operator/blob/release-4.14/pkg/operator/resourcesynccontroller/resourcesynccontroller.go#L27-L32

which means the render command must've put it there in the first place. Then let's leave it for backward compatibility :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted the commit and adjusted the documentation accordingly

@tjungblu
Copy link
Contributor Author

/retest

@hasbro17
Copy link
Contributor

/lgtm
/retest-required

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 11, 2024
Copy link
Contributor

openshift-ci bot commented Jul 11, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hasbro17, tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +946 to +949
$ cd /etc/kubernetes/static-pod-resources/
$ mv etcd-pod-10 backup-etcd-pod-10
$ cp -r etcd-pod-9 etcd-pod-10
$ cp etcd-pod-9/etcd-pod.yaml /etc/kubernetes/manifests/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rue the day we might have to use this but this is good to know :)

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD f161fde and 2 for PR HEAD 7352ccc in total

Copy link
Contributor

openshift-ci bot commented Jul 11, 2024

@tjungblu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator-fips 7352ccc link false /test e2e-operator-fips
ci/prow/e2e-metal-ovn-ha-cert-rotation-shutdown 7352ccc link false /test e2e-metal-ovn-ha-cert-rotation-shutdown
ci/prow/e2e-aws-ovn-etcd-scaling 7352ccc link true /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-metal-ovn-sno-cert-rotation-shutdown 7352ccc link false /test e2e-metal-ovn-sno-cert-rotation-shutdown
ci/prow/e2e-aws-etcd-recovery 7352ccc link false /test e2e-aws-etcd-recovery
ci/prow/e2e-aws-etcd-certrotation 7352ccc link false /test e2e-aws-etcd-certrotation

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants