Terraform Best Practices for AWS users.
Table of Contents
- Run terraform command with var-file
- Manage s3 backend for tfstate files
- Manage multiple Terraform modules and environments easily with Terragrunt
- Retrieve state meta data from a remote backend
- Turn on debug when you need do troubleshooting.
- Use shared modules
- Isolate environment
- Use terraform import to include as many resources you can
- Avoid hard coding the resources
- validate and format terraform code
- Enable version control on terraform state files bucket
- Generate README for each module with input and output variables
- Update terraform version
- Run terraform in docker container
- Run test
- Minimum AWS permissions necessary for a Terraform run
- Tips to deal with lambda functions
- usage of variable "self"
- Use pre-installed Terraform plugins
- Tips to upgrade to terraform 0.12
- Useful documents you should read
the README for terraform version 0.11 and less has been renamed to README.0.11.md
$ cat config/dev.tfvars
name = "dev-stack"
s3_terraform_bucket = "dev-stack-terraform"
tag_team_name = "hello-world"
$ terraform plan -var-file=config/dev.tfvars
With var-file
, you can easily manage environment (dev/stag/uat/prod) variables.
With var-file
, you avoid running terraform with long list of key-value pairs ( -var foo=bar
)
Terraform doesn't support Interpolated variables in terraform backend config, normally you write a seperate script to define s3 backend bucket name for different environments, but I recommend to hard code it directly as below
Add below code in terraform configuration files.
$ cat main.tf
terraform {
required_version = "~> 0.12"
backend "s3" {
encrypt = true
}
}
Define backend variables for particular environment
$ cat config/backend-dev.conf
bucket = "<unique_bucket_name>-terraform-development"
key = "development/service-1.tfstate"
encrypt = true
region = "ap-southeast-2"
kms_key_id = "alias/terraform"
dynamodb_table = "terraform-lock"
- bucket - s3 bucket name, has to be globally unique.
- key - Set some meaningful names for different services and applications, such as vpc.tfstate, application_name.tfstate, etc
- dynamodb_table - optional when you want to enable State Locking
After you set config/backend-dev.conf
and config/dev.tfvars
properly (for each environment). You can easily run terraform as below:
env=dev
terraform get -update=true
terraform init -backend-config=config/backend-${env}.conf
terraform plan -var-file=config/${env}.tfvars
terraform apply -var-file=config/${env}.tfvars
Terragrunt is a thin wrapper for Terraform that provides extra tools for working with multiple Terraform modules. https://www.gruntwork.io
Sample for reference: https://github.com/gruntwork-io/terragrunt-infrastructure-live-example
Its README is too long, if you need a quick start, follow below steps:
# Install terraform and terragrunt
# Make sure you are in right aws account
$ aws s3 ls
# use terragrunt to deploy
$ git clone https://github.com/gruntwork-io/terragrunt-infrastructure-live-example.git
$ cd terragrunt-infrastructure-live-example
# for example, you want to deploy mysql in stage non-prod at region us-east-1
$ cd non-prod/us-east-1/stage/mysql
$ terragrunt plan
# Confirm everything works
$ terragrunt apply
So if you followed the setting in terragrunt properly, you don't need to care about the backend state files and variable file path in different environments, even more, you can run terragrunt plan-all
to plan all modules together.
Normally we have several layers to manage terraform resources, such as network, database, application layers. After you create the basic network resources, such as vpc, security group, subnets, nat gateway in vpc stack. Your database layer and applications layer should always refer the resource from vpc layer directly via terraform_remote_state
data srouce.
Notes: in Terraform v0.12+, you need add extra
outputs
to reference the attributes, otherwise you will get error message of Unsupported attribute
data "terraform_remote_state" "vpc" {
backend = "s3"
config = {
bucket = var.s3_terraform_bucket
key = "${var.environment}/vpc.tfstate"
region = var.aws_region
}
}
# Retrieves the vpc_id and subnet_ids directly from remote backend state files.
resource "aws_xx_xxxx" "main" {
# ...
subnet_ids = split(",", data.terraform_remote_state.vpc.data_subnets)
vpc_id = data.terraform_remote_state.vpc.outputs.vpc_id
}
TF_LOG=DEBUG terraform <command>
# or if you run with terragrunt
TF_LOG=DEBUG terragrunt <command>
Manage terraform resource with shared modules, this will save a lot of coding time. No need re-invent the wheel!
You can start from below links:
terraform modules don't support count
parameter currently. You can follow up this ticket for updates: hashicorp/terraform#953
Sometimes, developers like to create a security group and share it to all non-prod (dev/qa) environments. Don't do that, create resources with different name for each environment and each resource.
variable "application" {
description = "application name"
default = "<replace_with_your_project_or_application_name, use short name if possible, because some resources have length limit on its name>"
}
variable "environment" {
description = "environment name"
default = "<replace_with_environment_name, such as dev, svt, prod,etc. Use short name if possible, because some resources have length limit on its name>
}
locals {
name_prefix = "${var.application}-${var.environment}"
}
resource "<any_resource>" "custom_resource_name" {
name = "${local.name_prefix}-<resource_name>"
...
}
With that, you will easily define the resource with a meaningful and unique name, and you can build more of the same application stack for different developers without change a lot. For example, you update the environment to dev, staging, uat, prod, etc.
Tips: some aws resource names have length limits, such as less than 24 characters, so when you define variables of application and environment name, use short name.
Sometimes developers manually created resources. You need to mark these resource and use terraform import
to include them in codes.
A sample:
account_number=“123456789012"
account_alias="mycompany"
region="us-east-2"
The current aws account id, account alias and current region can be input directly via data sources.
# The attribute `${data.aws_caller_identity.current.account_id}` will be current account number.
data "aws_caller_identity" "current" {}
# The attribue `${data.aws_iam_account_alias.current.account_alias}` will be current account alias
data "aws_iam_account_alias" "current" {}
# The attribute `${data.aws_region.current.name}` will be current region
data "aws_region" "current" {}
# Set as [local values](https://www.terraform.io/docs/configuration/locals.html)
locals {
account_id = data.aws_caller_identity.current.account_id
account_alias = data.aws_iam_account_alias.current.account_alias
region = data.aws_region.current.name
}
Always run terraform fmt
to format terraform configuration files and make them neat.
I used below code in Travis CI pipeline (you can re-use it in any pipelines) to validate and format check the codes before you can merge it to master branch.
- terraform validate
- terraform fmt -check=true -write=false -diff=true
One more check tflint you can add
- find . -type f -name "*.tf" -exec dirname {} \;|sort -u |while read line; do pushd $line; docker run --rm -v $(pwd):/data -t wata727/tflint; popd; done
Always set backend to s3 and enable version control on this bucket.
If you'd like to manage terraform state bucket as well, I recommend using this repostory I wrote tf_aws_tfstate_bucket to create the bucket and replicate to other regions automatically.
You needn't manually manage USAGE
about input variables and outputs. A tool named terraform-docs
can do the job for you.
Currently original terraform-docs doesn't support terraform 0.12+, follow this issue (terraform-docs/terraform-docs#62) for updating.
Now we have a work around.
# [Terraform >= 0.12]
docker run --rm \
-v $(pwd):/data \
cytopia/terraform-docs \
terraform-docs-012 --sort-inputs-by-required --with-aggregate-type-defaults md . > README.md
For details on how to run terraform-docs
, check this repository: https://github.com/cytopia/docker-terraform-docs
There is a simple sample for you to start tf_aws_acme, the README is generatd by terraform-docs
Hashicorp doesn't have a good qa/build/release process for their software and does not follow semantic versioning rules.
For example, terraform init
isn't compatible between 0.9 and 0.8. Now they are going to split providers and use "init" to install providers as plugin in coming version 0.10
So recommend to keep updating to latest terraform version
Terraform releases official docker containers that you can easily control which version you can run.
Recommend to run terraform docker container, when you set your build job in CI/CD pipeline.
TERRAFORM_IMAGE=hashicorp/terraform:0.12.3
TERRAFORM_CMD="docker run -ti --rm -w /app -v ${HOME}/.aws:/root/.aws -v ${HOME}/.ssh:/root/.ssh -v `pwd`:/app -w /app ${TERRAFORM_IMAGE}"
${TERRAFORM_CMD} init
${TERRAFORM_CMD} plan
Or with terragrunt
# (1) must mount the local folder to /apps in container.
# (2) must mount the aws credentials and ssh config folder in container.
$ docker run -ti --rm -v $HOME/.aws:/root/.aws -v ${HOME}/.ssh:/root/.ssh -v `pwd`:/apps alpine/terragrunt:0.12.3 bash
# cd to terragrunt configuration directory, if required.
$ terragrunt plan-all
$ terragrunt apply-all
Recommend to add awspec tests through kitchen and kitchen-terraform.
Reference: repo terraform-aws-modules/terraform-aws-eks
Reference: README for terraform awspec container
There will be no answer for this. But with below iam policy you can easily get started.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowSpecifics",
"Action": [
"lambda:*",
"apigateway:*",
"ec2:*",
"rds:*",
"s3:*",
"sns:*",
"states:*",
"ssm:*",
"sqs:*",
"iam:*",
"elasticloadbalancing:*",
"autoscaling:*",
"cloudwatch:*",
"cloudfront:*",
"route53:*",
"ecr:*",
"logs:*",
"ecs:*",
"application-autoscaling:*",
"logs:*",
"events:*",
"elasticache:*",
"es:*",
"kms:*",
"dynamodb:*"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Sid": "DenySpecifics",
"Action": [
"iam:*User*",
"iam:*Login*",
"iam:*Group*",
"iam:*Provider*",
"aws-portal:*",
"budgets:*",
"config:*",
"directconnect:*",
"aws-marketplace:*",
"aws-marketplace-management:*",
"ec2:*ReservedInstances*"
],
"Effect": "Deny",
"Resource": "*"
}
]
}
Depend on your company or project requirement, you can easily update the resources in Allow
session which terraform commands should have, and add deny policies in Deny
session if some of permissions are not required.
Headache to save python packages from pip install
into source codes and generate lambda zip file manually? Here is full codes with solution.
The folder lambda includes all codes, here is the explanation.
$ tree
.
├── lambda.tf # terraform HCL to deal with lambda
├── pip.sh # script to install python packages with pip.
└── source
├── .gitignore # Ignore all other files
├── main.py # Lambda function, replace with yours
├── requirements.txt # python package list, replace with yours.
└── setup.cfg # Useful for mac users who installed python using Homebrew
Replace main.py
and requirements.txt
with your applications.
After you run terraform apply
, it will:
- install all pip packages into source folder
- zip the source folder to
source.zip
- deploy lambda function with
source.zip
- because of
source/.gitignore
, it will ignore all new installed pip packages in git source codes.
This solution is reference from the comments in Ability to zip AWS Lambda function on the fly)
You should be fine to do the same for lambda functions using nodejs (npm install
) or other languages with this tip.
You need have python/pip installed when run terraform commands, if you run in terraform container, make sure you install python/pip in it.
Quote from terraform documents:
Attributes of your own resource
The syntax is self.ATTRIBUTE. For example ${self.private_ip} will interpolate that resource's private IP address.
Note: The self.ATTRIBUTE syntax is only allowed and valid within provisioners.
resource "aws_ecr_repository" "jenkins" {
name = var.image_name
provisioner "local-exec" {
command = "./deploy-image.sh ${self.repository_url} ${var.jenkins_image_name}"
}
}
variable "jenkins_image_name" {
default = "mycompany/jenkins"
description = "Jenkins image name."
}
You can easily define ecr image url (<account_id>.dkr.ecr.<aws_region>.amazonaws.com/<image_name>
) with ${self.repository_url}
Any attributes in this resource can be self referenced by this way.
Reference: https://github.com/shuaibiyy/terraform-ecs-jenkins/blob/master/docker/main.tf
There is a way to use pre-installed Terraform plugins instead of downloading them with terraform init
, the accepted answer below gives the detail:
Use pre-installed Terraform plugins instead of downloading them with terraform init
If you have any codes older than 0.12, please go through official documents first,
- terraform Input Variables, a lot of new features you have to know.
- Upgrading to Terraform v0.12
- terraform command 0.12upgrade
- Announcing Terraform 0.12
Then here are extra tips for you.
- upgrade to terraform 0.11 first, if you have any.
- upgrade terraform moudles to 0.12 first, because terraform 0.12 can't work with 0.11 modules.
- define
type
for each variable, otherwise you will get weird error messages.