E2E K8s Context Protection #4391

reedjosh · 2024-11-25T21:57:20Z

What do you want to happen?

The initial E2E suite created via kubebuilder does not protect against using a production kubernetes context.

This can result in removing monitoring and cert manager from production clusters.

Tilt for example only allows the user to run against a context matching kind-* by default, and requires the user to intentionally add other contexts.

🙏 please add this protection. It is hard to otherwise ensure it doesn't crop up across a company.

Extra Labels

No response

The text was updated successfully, but these errors were encountered:

camilamacedo86 · 2024-11-26T00:04:23Z

Hi @reedjosh,

Thank you for raise this issue.

Could you please let me know if you are using a project with the latest scaffold? I'm just asking because it was already addressed.

See that in the Makefile, we check if kind is up and running:

kubebuilder/testdata/project-v4/Makefile

Lines 66 to 82 in 19237b5

    
           # TODO(user): To use a different vendor for e2e tests, modify the setup under 'tests/e2e'. 
        
           # The default setup assumes Kind is pre-installed and builds/loads the Manager Docker image locally. 
        
           # Prometheus and CertManager are installed by default; skip with: 
        
           # - PROMETHEUS_INSTALL_SKIP=true 
        
           # - CERT_MANAGER_INSTALL_SKIP=true 
        
           .PHONY: test-e2e 
        
           test-e2e: manifests generate fmt vet ## Run the e2e tests. Expected an isolated environment using Kind. 
        
           	@command -v kind >/dev/null 2>&1 || { \ 
        
           		echo "Kind is not installed. Please install Kind manually."; \ 
        
           		exit 1; \ 
        
           	} 
        
           	@kind get clusters | grep -q 'kind' || { \ 
        
           		echo "No Kind cluster is running. Please start a Kind cluster before running the e2e tests."; \ 
        
           		exit 1; \ 
        
           	} 
        
           	go test ./test/e2e/ -v -ginkgo.v

Then, see the suite test scaffold that before all we try to load the image. If the tests are executed against a cluster that is not kind it should fail. Kind is not intended to be used in production so, it will not run against a production cluster without intentional changes

kubebuilder/testdata/project-v4/test/e2e/e2e_suite_test.go

Lines 78 to 108 in 19237b5

    
           	// TODO(user): If you want to change the e2e test vendor from Kind, ensure the image is 
        
           	// built and available before running the tests. Also, remove the following block. 
        
           	By("loading the manager(Operator) image on Kind") 
        
           	err = utils.LoadImageToKindClusterWithName(projectImage) 
        
           	ExpectWithOffset(1, err).NotTo(HaveOccurred(), "Failed to load the manager(Operator) image into Kind") 
        
           	// The tests-e2e are intended to run on a temporary cluster that is created and destroyed for testing. 
        
           	// To prevent errors when tests run in environments with Prometheus or CertManager already installed, 
        
           	// we check for their presence before execution. 
        
           	// Setup Prometheus and CertManager before the suite if not skipped and if not already installed 
        
           	if !skipPrometheusInstall { 
        
           		By("checking if prometheus is installed already") 
        
           		isPrometheusOperatorAlreadyInstalled = utils.IsPrometheusCRDsInstalled() 
        
           		if !isPrometheusOperatorAlreadyInstalled { 
        
           			_, _ = fmt.Fprintf(GinkgoWriter, "Installing Prometheus Operator...\n") 
        
           			Expect(utils.InstallPrometheusOperator()).To(Succeed(), "Failed to install Prometheus Operator") 
        
           		} else { 
        
           			_, _ = fmt.Fprintf(GinkgoWriter, "WARNING: Prometheus Operator is already installed. Skipping installation...\n") 
        
           		} 
        
           	} 
        
           	if !skipCertManagerInstall { 
        
           		By("checking if cert manager is installed already") 
        
           		isCertManagerAlreadyInstalled = utils.IsCertManagerCRDsInstalled() 
        
           		if !isCertManagerAlreadyInstalled { 
        
           			_, _ = fmt.Fprintf(GinkgoWriter, "Installing CertManager...\n") 
        
           			Expect(utils.InstallCertManager()).To(Succeed(), "Failed to install CertManager") 
        
           		} else { 
        
           			_, _ = fmt.Fprintf(GinkgoWriter, "WARNING: CertManager is already installed. Skipping installation...\n") 
        
           		} 
        
           	} 
        
           })

Moreover, we added checks to try to avoid re-install Prometheus and CertManager, as you can see more comments to clarify how it works and help users.

If you are using this version, then we might add a logic to exit(1) when it is not possible to load the image.

We are looking forward to your reply.

damsien · 2024-11-26T07:48:43Z

What if the KUBECONFIG point to a production cluster and the kind cluster exists? I think the image will still be loaded in the cluster (since we set kind load docker-image --name <CLUSTER_NAME> to tell on which cluster we want to load the image) while having the KUBECONFIG pointing another cluster.

monteiro-renato · 2024-11-26T09:21:26Z

Slight tangent, but this check is probably not working as intended.
When you create a cluster with for example kind create cluster -n my-cluster, the result of kind get clusters will not have the kind- prefix; it will display the same name that was passed to the -n flag. e.g.

$ kind get clusters
linkerd-cluster
my-cluster

The prefix kind- is, however, added to the kubeconfig file. e.g.

cat ~/.kube/config | yq '.clusters.[].name'
"kind-linkerd-cluster"
"kind-my-cluster"

damsien · 2024-11-26T09:31:52Z

@monteiro-renato I think that's how they ensure that the cluster is not a production cluster. Because usually production clusters are not prefixed with kind-. If you want to run the tests, you must have a kind- prefixed local cluster.

monteiro-renato · 2024-11-26T11:40:45Z

Yea, I can see it now. When I got that error a while back I found it weird since I knew that I had a kind cluster running and with the correct context configured.
I'm usually quite paranoid when it comes to tools that default to w.e is configured on the kubeconfig so I prefer to create separate envs (container or VM) so that I can have peace of mind knowing that I will never target a cluster I'm not expecting.
But yea, I guess it's just that the error message could be a bit more explicit about it's intent.

monteiro-renato · 2024-11-26T11:55:01Z

But yea, I think OP's concern is still valid. The validation should probably be done against the context configured in the kubeconfig instead of relying on the output of a kind command.

camilamacedo86 · 2024-11-26T12:11:17Z

Hi @monteiro-renato, @damsien, @reedjosh,

How can we validate the context or name to determine whether it’s a production environment for the user?
This goes beyond what we can reasonably enforce. Moreover, if users execute any Makefile target against a production environment—such as make deploy, make install, or make undeploy—they may encounter serious issues.

Additionally, frameworks generally don’t handle this kind of validation. For example, tools like Terraform, Helm, etc., don’t verify whether a command targets a production environment. At some level, the user must understand the commands they’re running and the clusters they’re targeting.

On top of that:

Cluster admins should safeguard production environments by applying RBAC to restrict access for developers.
Developers should avoid configuring production context in their local dev environments an just leave it there. They should ensure that actions targeting production are separated from the development environment to prevent accidental operations. This is not a critical need specifically for e2e tests but applies to all possible actions that developers might execute using kubectl.

It was already discussed. We cannot validate the context configured by developers when they run the commands and say, "Ah, it's production, and it is not for the e2e tests," just as we do not perform such validation for any Makefile targets or features.

I hope that clarifies. I am looking forward to knowing if @reedjosh is or is not using a project with all changes related to this request. (as described in #4391 (comment)) I really would appreciate it if @reedjosh could share this information with us.

Thank you.

monteiro-renato · 2024-11-26T14:31:26Z

Hey @camilamacedo86,

We could create a kind cluster based on the project's name and then use kind's get kubeconfig subcommand to generate the kubeconfig file to use in the e2e tests. The tests would set the KUBECONFIG env var at the start of the tests to point to the location of the kubeconfig file generated by kind.

reedjosh added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 25, 2024

camilamacedo86 added the triage/needs-information Indicates an issue needs more information in order to work on it. label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E K8s Context Protection #4391

E2E K8s Context Protection #4391

reedjosh commented Nov 25, 2024 •

edited

Loading

camilamacedo86 commented Nov 26, 2024 •

edited

Loading

damsien commented Nov 26, 2024

monteiro-renato commented Nov 26, 2024

damsien commented Nov 26, 2024

monteiro-renato commented Nov 26, 2024

monteiro-renato commented Nov 26, 2024

camilamacedo86 commented Nov 26, 2024 •

edited

Loading

monteiro-renato commented Nov 26, 2024

E2E K8s Context Protection #4391

E2E K8s Context Protection #4391

Comments

reedjosh commented Nov 25, 2024 • edited Loading

What do you want to happen?

Extra Labels

camilamacedo86 commented Nov 26, 2024 • edited Loading

damsien commented Nov 26, 2024

monteiro-renato commented Nov 26, 2024

damsien commented Nov 26, 2024

monteiro-renato commented Nov 26, 2024

monteiro-renato commented Nov 26, 2024

camilamacedo86 commented Nov 26, 2024 • edited Loading

monteiro-renato commented Nov 26, 2024

reedjosh commented Nov 25, 2024 •

edited

Loading

camilamacedo86 commented Nov 26, 2024 •

edited

Loading

camilamacedo86 commented Nov 26, 2024 •

edited

Loading