Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Upgrade to v18 of EKS module #64

Merged
merged 5 commits into from
Jul 20, 2022
Merged

refactor: Upgrade to v18 of EKS module #64

merged 5 commits into from
Jul 20, 2022

Conversation

bryantbiggs
Copy link
Contributor

@bryantbiggs bryantbiggs commented Jun 9, 2022

This updates the project to use the latest v18 EKS module in a similar manner to what was defined.

  • Managed node groups are used instead of self-managed node groups for the number of benefits they provide users when operating EKS clusters.
  • The lockfile was removed and an entry added to the gitignore file - this is going to be specific to how users initialize their project so I don't believe its valid here
  • See other comments below for more context

Motiviation

We would like to move users away from v17 and any newcomers should start off using v18. I don't know where the associated article repo is located but I suspect it will need to be updated to reflect these changes https://learn.hashicorp.com/tutorials/terraform/eks

Closes #14
Closes #19
Closes #22
Closes #25
Closes #28
Closes #29
Closes #31
Closes #33
Closes #35
Closes #42
Closes #43
Closes #51
Closes #56
Closes #61
Closes #63

Please let me know if there are any questions or concerns

eks-cluster.tf Outdated
workers_group_defaults = {
root_volume_type = "gp2"
# Extend cluster security group rules
cluster_security_group_additional_rules = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security group rules added here put the definition back inline with the v17 version. In v18 the security group rules were reduced to only the bare minimum required for a cluster to provision successfully while allowing users to open/extend access as they see fit for their workload

host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

exec {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exec plugin method is recommended for EKS by the provider to avoid token expiration https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#exec-plugins

@@ -1,12 +1,31 @@
# Kubernetes provider
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its common practice to have a generic main.tf for bits that are generic and not resource specific. This captures the providers, common local/data sources, and the random string resource used in the naming scheme

value = module.eks.cluster_security_group_id
}

output "kubectl_config" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this output or an equivalent is no longer provided in v18; users are able to retrieve this through the awscli:

aws eks update-kubeconfig --region <REGION> --name <CLUSTER_NAME>

@@ -28,20 +27,3 @@ resource "aws_security_group" "worker_group_mgmt_two" {
]
}
}

resource "aws_security_group" "all_worker_mgmt" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resource looks like it was never used anywhere so removed


resource "aws_security_group" "worker_group_mgmt_one" {
name_prefix = "worker_group_mgmt_one"
resource "aws_security_group" "node_group_one" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming modified to reflect what is used by the module and AWS. Instead of "workers" its "self-managed node groups", instead of "node groups" its "EKS managed node groups". Here I have just used the generic "node group" for brevity

}

random = {
source = "hashicorp/random"
version = "3.1.0"
}

local = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local and null providers not utilized here so removed

@@ -2,27 +2,17 @@ terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.20.0"
version = ">= 3.72"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really pertinent here, but setting to match the module's requirements

cluster_name = "education-eks-${random_string.suffix.result}"
}
cidr = "10.0.0.0/16"
azs = slice(data.aws_availability_zones.available.names, 0, 3)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding slice() since some regions have more than 3 AZs and we want to ensure we only pull 3 to match the subnets prescribed

enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true

tags = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not required, removed

@BrewedCoffee
Copy link

Thanks for doing this :) would have definitely helped me a few weeks ago, and I'm sure it'll help hundreds or thousands to come!

@someshkoli
Copy link

@bryantbiggs can you point me to the change where its fixing terraform-aws-modules/terraform-aws-eks#978 ?

@bryantbiggs
Copy link
Contributor Author

@bryantbiggs
Copy link
Contributor Author

@im2nguyen any thoughts on this change?

@im2nguyen
Copy link
Collaborator

Hi @bryantbiggs, thank you for updating this!

@judithpatudith can you have the team look into updating the tutorial? this is a high value update

@alanszlosek
Copy link
Contributor

Thank you @bryantbiggs, this is a fantastic contribution!

I'm on the education team at HashiCorp and am hoping to update our EKS tutorial to work with the changes you've made in this PR. Using version 18.24.1 of the EKS module and version 4.21.0 of the AWS provider the infrastructure deploys just fine (I upgraded to get rid of some TF warnings). But I'm unable to access the kubernetes dashboard through the kubectl proxy. Can you see whether it works for you?

I can make other k8s API calls (like curl http://127.0.0.1:8001/api/v1/namespaces/kubernetes-dashboard/services/) through the proxy, but when I try to visit the dashboard URL it spins and then eventually returns this:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "error trying to reach service: dial tcp 10.0.1.76:8443: connect: connection timed out",
  "reason": "ServiceUnavailable",
  "code": 503
}

I have a hunch it has something to do with security groups, but I'm not familiar enough with EKS to know what to modify. Do you have any ideas?

Thanks again for your help, we really appreciate it.

@bryantbiggs
Copy link
Contributor Author

ah yes, I think I know what it is - let me try on my end and update the PR shortly

@bryantbiggs
Copy link
Contributor Author

bryantbiggs commented Jul 6, 2022

validation steps:

  1. terraform init && terraform apply
  2. kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep service-controller-token | awk '{print $1}')
  3. Copy token from step 2
  4. kubectl proxy
  5. Open http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:https/proxy/
  6. Paste token

Please note that the project now provisions the metrics-server and kubernetes-dashboard through the Helm provider. The versions of those resource used in the current guide are quite a bit out of date and now there are less manual tasks required by users (unless that is desired)

cc @alanszlosek

@alanszlosek
Copy link
Contributor

alanszlosek commented Jul 12, 2022

Thanks, @bryantbiggs, this is great! Those changes fixed the proxy issue. Based on what I'm reading, attach_cluster_primary_security_group attaches the cluster security group to all servers within the cluster and permits all traffic between any with that SG. I'm guessing that change fixed the proxy issue by allowing ingress to nodes FROM non-node (control plane?) servers. Whereas in earlier PR commits, nodes were only permitting connections from servers with the node SG and nothing else. Am I understanding things correctly?

Switching gears ...

I've been talking to my team about the k8s dashboard installation section of the tutorial. The goal of the section was to "verify" the cluster, but we think it would be sufficient and simpler to have the user run kubectl cluster-info and kubectl get nodes instead. What are your thoughts?

If you agree, this means we can remove the helm provider as well as the service account, role binding, metrics server, and dashboard resources from the PR entirely. I think you can revert some of your changes to the kubernetes provider as well.

Lastly, can you restore the terraform lock file? It's a recommendation within our docs, and it helps us ensure that learners don't encounter any version-related hiccups while going through a tutorial. Does that make sense?

Again, thank you!

@bryantbiggs
Copy link
Contributor Author

I'm guessing that change fixed the proxy issue by allowing ingress to nodes FROM non-node (control plane?) servers. Whereas in earlier PR commits, nodes were only permitting connections from servers with the node SG and nothing else. Am I understanding things correctly?

Close - the EKS module in v18 was designed to only contain the bare minimum security group requirements to bring up a cluster successfully. Users then have the option to start opening up additional access to support the workloads they run. Its allowing users to start from a secure, least-access perspective and only opening up access for their specific workloads to maintain a stronger security posture.

Using this cluster primary security group means that all traffic among the resources attached to the security is permitted, and all egress to the public internet is allowed. This is the EKS "default behavior" when a security group isn't provided to nodes.

If you agree, this means we can remove the helm provider as well as the service account, role binding, metrics server, and dashboard resources from the PR entirely.

Absolutely. I wasn't sure of the intentions so just maintained the status quo but have now removed those

Lastly, can you restore the terraform lock file?

Done! @alanszlosek

@alanszlosek alanszlosek changed the base branch from main to update-eks-module-18.0.4 July 20, 2022 14:45
@alanszlosek alanszlosek changed the base branch from update-eks-module-18.0.4 to update-eks-module-v18 July 20, 2022 14:46
@alanszlosek
Copy link
Contributor

@bryantbiggs, that looks great. I'm going to merge this into a testing branch, update the tutorial text, and test it all end-to-end before release. Thanks for helping us make things better!

@alanszlosek alanszlosek merged commit 127b9fe into hashicorp:update-eks-module-v18 Jul 20, 2022
@bryantbiggs
Copy link
Contributor Author

bryantbiggs commented Jul 20, 2022

Thank you @alanszlosek - let me know if I can be of any further assistance or if you have any questions on the module usage

Also, I think you can probably safely close out most, if not all, of the open issues/PRs with this. Not sure why it didn't automatically close them on this PR

Edit: Ah, nvm - I see it went into a temp branch and not main so they wouldn't be automatically closed on merge

@bryantbiggs bryantbiggs deleted the refactor/v18-upgrade branch July 20, 2022 15:00
@alanszlosek
Copy link
Contributor

Thanks for your willingness, @bryantbiggs . I do have a question about the manage_aws_auth_configmap param. The EKS docs say this:

The aws-auth ConfigMap is automatically created and applied to your cluster when you create a managed node group or when you create a node group using eksctl.

Assuming I'm reading it correctly, it implies we shouldn't need Terraform to manage the ConfigMap, at least not for the tutorial. I'm hoping the mappings for each node group's IAM Role will be added automatically to the cluster when Terraform creates each group.

I know it's just one param, but if we don't need it, then I believe I can also remove the exec from the kubernetes provider. I'm on the lookout for additional ways to make the config simpler. :-)

Thanks for the heads-up on the Issues that can be closed. I'll make a note to close them.

@bryantbiggs
Copy link
Contributor Author

Correct - with EKS managed node groups and Fargate profiles, those are automatically added to the configmap. If you want to remove that line you can also remove the Kubernetes provider block as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment