Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E K8s Context Protection #4391

Open
reedjosh opened this issue Nov 25, 2024 · 8 comments
Open

E2E K8s Context Protection #4391

reedjosh opened this issue Nov 25, 2024 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@reedjosh
Copy link

reedjosh commented Nov 25, 2024

What do you want to happen?

The initial E2E suite created via kubebuilder does not protect against using a production kubernetes context.

This can result in removing monitoring and cert manager from production clusters.

Tilt for example only allows the user to run against a context matching kind-* by default, and requires the user to intentionally add other contexts.

🙏 please add this protection. It is hard to otherwise ensure it doesn't crop up across a company.

Extra Labels

No response

@reedjosh reedjosh added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 25, 2024
@camilamacedo86
Copy link
Member

camilamacedo86 commented Nov 26, 2024

Hi @reedjosh,

Thank you for raise this issue.

Could you please let me know if you are using a project with the latest scaffold? I'm just asking because it was already addressed.

  • See that in the Makefile, we check if kind is up and running:

# TODO(user): To use a different vendor for e2e tests, modify the setup under 'tests/e2e'.
# The default setup assumes Kind is pre-installed and builds/loads the Manager Docker image locally.
# Prometheus and CertManager are installed by default; skip with:
# - PROMETHEUS_INSTALL_SKIP=true
# - CERT_MANAGER_INSTALL_SKIP=true
.PHONY: test-e2e
test-e2e: manifests generate fmt vet ## Run the e2e tests. Expected an isolated environment using Kind.
@command -v kind >/dev/null 2>&1 || { \
echo "Kind is not installed. Please install Kind manually."; \
exit 1; \
}
@kind get clusters | grep -q 'kind' || { \
echo "No Kind cluster is running. Please start a Kind cluster before running the e2e tests."; \
exit 1; \
}
go test ./test/e2e/ -v -ginkgo.v

  • Then, see the suite test scaffold that before all we try to load the image. If the tests are executed against a cluster that is not kind it should fail. Kind is not intended to be used in production so, it will not run against a production cluster without intentional changes

// TODO(user): If you want to change the e2e test vendor from Kind, ensure the image is
// built and available before running the tests. Also, remove the following block.
By("loading the manager(Operator) image on Kind")
err = utils.LoadImageToKindClusterWithName(projectImage)
ExpectWithOffset(1, err).NotTo(HaveOccurred(), "Failed to load the manager(Operator) image into Kind")
// The tests-e2e are intended to run on a temporary cluster that is created and destroyed for testing.
// To prevent errors when tests run in environments with Prometheus or CertManager already installed,
// we check for their presence before execution.
// Setup Prometheus and CertManager before the suite if not skipped and if not already installed
if !skipPrometheusInstall {
By("checking if prometheus is installed already")
isPrometheusOperatorAlreadyInstalled = utils.IsPrometheusCRDsInstalled()
if !isPrometheusOperatorAlreadyInstalled {
_, _ = fmt.Fprintf(GinkgoWriter, "Installing Prometheus Operator...\n")
Expect(utils.InstallPrometheusOperator()).To(Succeed(), "Failed to install Prometheus Operator")
} else {
_, _ = fmt.Fprintf(GinkgoWriter, "WARNING: Prometheus Operator is already installed. Skipping installation...\n")
}
}
if !skipCertManagerInstall {
By("checking if cert manager is installed already")
isCertManagerAlreadyInstalled = utils.IsCertManagerCRDsInstalled()
if !isCertManagerAlreadyInstalled {
_, _ = fmt.Fprintf(GinkgoWriter, "Installing CertManager...\n")
Expect(utils.InstallCertManager()).To(Succeed(), "Failed to install CertManager")
} else {
_, _ = fmt.Fprintf(GinkgoWriter, "WARNING: CertManager is already installed. Skipping installation...\n")
}
}
})

  • Moreover, we added checks to try to avoid re-install Prometheus and CertManager, as you can see more comments to clarify how it works and help users.

If you are using this version, then we might add a logic to exit(1) when it is not possible to load the image.

We are looking forward to your reply.

@camilamacedo86 camilamacedo86 added the triage/needs-information Indicates an issue needs more information in order to work on it. label Nov 26, 2024
@damsien
Copy link
Contributor

damsien commented Nov 26, 2024

What if the KUBECONFIG point to a production cluster and the kind cluster exists? I think the image will still be loaded in the cluster (since we set kind load docker-image --name <CLUSTER_NAME> to tell on which cluster we want to load the image) while having the KUBECONFIG pointing another cluster.

@monteiro-renato
Copy link
Contributor

Slight tangent, but this check is probably not working as intended.
When you create a cluster with for example kind create cluster -n my-cluster, the result of kind get clusters will not have the kind- prefix; it will display the same name that was passed to the -n flag. e.g.

$ kind get clusters
linkerd-cluster
my-cluster

The prefix kind- is, however, added to the kubeconfig file. e.g.

cat ~/.kube/config | yq '.clusters.[].name'
"kind-linkerd-cluster"
"kind-my-cluster"

@damsien
Copy link
Contributor

damsien commented Nov 26, 2024

@monteiro-renato I think that's how they ensure that the cluster is not a production cluster. Because usually production clusters are not prefixed with kind-. If you want to run the tests, you must have a kind- prefixed local cluster.

@monteiro-renato
Copy link
Contributor

Yea, I can see it now. When I got that error a while back I found it weird since I knew that I had a kind cluster running and with the correct context configured.
I'm usually quite paranoid when it comes to tools that default to w.e is configured on the kubeconfig so I prefer to create separate envs (container or VM) so that I can have peace of mind knowing that I will never target a cluster I'm not expecting.
But yea, I guess it's just that the error message could be a bit more explicit about it's intent.

@monteiro-renato
Copy link
Contributor

But yea, I think OP's concern is still valid. The validation should probably be done against the context configured in the kubeconfig instead of relying on the output of a kind command.

@camilamacedo86
Copy link
Member

camilamacedo86 commented Nov 26, 2024

Hi @monteiro-renato, @damsien, @reedjosh,

How can we validate the context or name to determine whether it’s a production environment for the user?
This goes beyond what we can reasonably enforce. Moreover, if users execute any Makefile target against a production environment—such as make deploy, make install, or make undeploy—they may encounter serious issues.

Additionally, frameworks generally don’t handle this kind of validation. For example, tools like Terraform, Helm, etc., don’t verify whether a command targets a production environment. At some level, the user must understand the commands they’re running and the clusters they’re targeting.

On top of that:

  • Cluster admins should safeguard production environments by applying RBAC to restrict access for developers.
  • Developers should avoid configuring production context in their local dev environments an just leave it there. They should ensure that actions targeting production are separated from the development environment to prevent accidental operations. This is not a critical need specifically for e2e tests but applies to all possible actions that developers might execute using kubectl.

It was already discussed. We cannot validate the context configured by developers when they run the commands and say, "Ah, it's production, and it is not for the e2e tests," just as we do not perform such validation for any Makefile targets or features.

I hope that clarifies. I am looking forward to knowing if @reedjosh is or is not using a project with all changes related to this request. (as described in #4391 (comment)) I really would appreciate it if @reedjosh could share this information with us.

Thank you.

@monteiro-renato
Copy link
Contributor

Hey @camilamacedo86,

We could create a kind cluster based on the project's name and then use kind's get kubeconfig subcommand to generate the kubeconfig file to use in the e2e tests. The tests would set the KUBECONFIG env var at the start of the tests to point to the location of the kubeconfig file generated by kind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants