Skip to content

Commit

Permalink
Merge pull request #4 from Tfmenard/feature/network-argument
Browse files Browse the repository at this point in the history
Add network, subnetwork and machine_type arguments
  • Loading branch information
morgante authored Jun 19, 2019
2 parents 7526b16 + ffa3bfd commit 93d950e
Show file tree
Hide file tree
Showing 11 changed files with 79 additions and 7 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,16 @@ Then perform the following commands on the root folder:

| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| machine\_type | The machine type to use for the job. | string | `""` | no |
| max\_workers | The number of workers permitted to work on the job. More workers may improve processing speed at additional cost. | string | `"1"` | no |
| name | The name of the dataflow job | string | n/a | yes |
| network\_self\_link | The network self link to which VMs will be assigned. | string | `"default"` | no |
| on\_delete | One of drain or cancel. Specifies behavior of deletion during terraform destroy. The default is cancel. | string | `"cancel"` | no |
| parameters | Key/Value pairs to be passed to the Dataflow job (as used in the template). | map | `<map>` | no |
| project\_id | The project in which the resource belongs. If it is not provided, the provider project is used. | string | n/a | yes |
| region | The bucket's region location | string | `"us-central1"` | no |
| service\_account\_email | The Service Account email that will be used to identify the VMs in which the jobs are running | string | `""` | no |
| subnetwork\_self\_link | The subnetwork self link to which VMs will be assigned. | string | `""` | no |
| temp\_gcs\_location | A writeable location on GCS for the Dataflow job to dump its temporary data. | string | n/a | yes |
| template\_gcs\_path | The GCS path to the Dataflow job template. | string | n/a | yes |
| zone | The zone in which the created job should run. If it is not provided, the provider zone is used. | string | `"us-central1-a"` | no |
Expand Down Expand Up @@ -103,7 +106,8 @@ If you want to use the service_account_email input to specify a service account
### Enable APIs
In order to launch a Dataflow Job, the Dataflow API must be enabled:

- Dataflow API - dataflow.googleapis.com
- Dataflow API - `dataflow.googleapis.com`
- Compute Engine API: `compute.googleapis.com`

## Install

Expand Down
8 changes: 8 additions & 0 deletions examples/simple_example/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Simple Example

This example illustrates how to use the Dataflow module to start multiple jobs with a common bucket for temporary job data.
A network and subnetwork are created as well to demonstrate how Dataflow instance can be created in a specific network and subnetwork.


## Best practices
Expand All @@ -9,6 +10,12 @@ This example illustrates how to use the Dataflow module to start multiple jobs w
As featured in this example, using a single regional bucket for storing your jobs' temporary data is recommended to optimize cost.
Also, to optimize your jobs performance, this bucket should always in the corresponding region of the zones in which your jobs are running.

## Running the example
Make sure you grant the addtional permissions below to the service account execute the module:

- roles/compute.networkAdmin


### Controller Service Account
This example features the use of a controller service accoun which is specified with the `service_account_email` input variables.
We recommend using a custome service account with fine-grained access control to mitigate security risks. See more about controller service accounts [here](https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#controller_service_account)
Expand All @@ -20,6 +27,7 @@ We recommend using a custome service account with fine-grained access control to

| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| force\_destroy | When deleting a bucket, this boolean option will delete all contained objects. If you try to delete a bucket that contains objects, Terraform will fail that run. | string | `"false"` | no |
| project\_id | The project ID to deploy to | string | n/a | yes |
| region | The region in which the bucket and the dataflow job will be deployed | string | n/a | yes |
| service\_account\_email | The Service Account email used to create the job. | string | n/a | yes |
Expand Down
39 changes: 34 additions & 5 deletions examples/simple_example/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*/

provider "google" {
version = "~> 2.0"
version = "~> 2.8.0"
region = "${var.region}"
}

Expand All @@ -27,23 +27,49 @@ locals {
gcs_bucket_name = "tmp-dir-bucket-${random_id.random_suffix.hex}"
}

module "vpc" {
source = "terraform-google-modules/network/google"
version = "~> 0.8.0"
project_id = "${var.project_id}"
network_name = "dataflow-network"

subnets = [
{
subnet_name = "dataflow-subnetwork"
subnet_ip = "10.1.3.0/24"
subnet_region = "us-central1"
},
]

secondary_ranges = {
dataflow-subnetwork = [{
range_name = "my-secondary-range"
ip_cidr_range = "192.168.64.0/24"
}]
}
}

module "dataflow-bucket" {
source = "../../modules/dataflow_bucket"
name = "${local.gcs_bucket_name}"
region = "${var.region}"
name = "${local.gcs_bucket_name}"
region = "${var.region}"
project_id = "${var.project_id}"
force_destroy = "${var.force_destroy}"
}

module "dataflow-job" {
source = "../../"
project_id = "${var.project_id}"
name = "wordcount-terraform-example"
name = "wordcount-terraform-example"
on_delete = "cancel"
zone = "${var.region}-a"
max_workers = 1
template_gcs_path = "gs://dataflow-templates/latest/Word_Count"
temp_gcs_location = "${module.dataflow-bucket.name}"
service_account_email = "${var.service_account_email}"
network_self_link = "${module.vpc.network_self_link}"
subnetwork_self_link = "${module.vpc.subnets_self_links[0]}"
machine_type = "n1-standard-1"

parameters = {
inputFile = "gs://dataflow-samples/shakespeare/kinglear.txt"
Expand All @@ -54,13 +80,16 @@ module "dataflow-job" {
module "dataflow-job-2" {
source = "../../"
project_id = "${var.project_id}"
name = "wordcount-terraform-example-2"
name = "wordcount-terraform-example-2"
on_delete = "cancel"
zone = "${var.region}-a"
max_workers = 1
template_gcs_path = "gs://dataflow-templates/latest/Word_Count"
temp_gcs_location = "${module.dataflow-bucket.name}"
service_account_email = "${var.service_account_email}"
network_self_link = "${module.vpc.network_self_link}"
subnetwork_self_link = "${module.vpc.subnets_self_links[0]}"
machine_type = "n1-standard-2"

parameters = {
inputFile = "gs://dataflow-samples/shakespeare/kinglear.txt"
Expand Down
5 changes: 5 additions & 0 deletions examples/simple_example/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,8 @@ variable "region" {
variable "service_account_email" {
description = "The Service Account email used to create the job."
}

variable "force_destroy" {
description = "When deleting a bucket, this boolean option will delete all contained objects. If you try to delete a bucket that contains objects, Terraform will fail that run."
default = "false"
}
3 changes: 3 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,7 @@ resource "google_dataflow_job" "dataflow_job" {
temp_gcs_location = "gs://${var.temp_gcs_location}/tmp_dir"
parameters = "${var.parameters}"
service_account_email = "${var.service_account_email}"
network = "${replace(var.network_self_link, "/(.*)/networks/(.*)/", "$2")}"
subnetwork = "${replace(var.subnetwork_self_link, "/(.*)/regions/(.*)/", "regions/$2")}"
machine_type = "${var.machine_type}"
}
1 change: 1 addition & 0 deletions modules/dataflow_bucket/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ See [here](../example/simple_example) for a multi jobs example.

| Name | Description | Type | Default | Required |
|------|-------------|:----:|:-----:|:-----:|
| force\_destroy | When deleting a bucket, this boolean option will delete all contained objects. If you try to delete a bucket that contains objects, Terraform will fail that run. | string | `"false"` | no |
| name | The name of the bucket. | string | n/a | yes |
| project\_id | The project_id to deploy the example instance into. (e.g. "simple-sample-project-1234") | string | n/a | yes |
| region | The GCS bucket region. This should be the same as your dataflow job's zone ot optimize performance. | string | `"us-central1"` | no |
Expand Down
1 change: 1 addition & 0 deletions modules/dataflow_bucket/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ resource "google_storage_bucket" "tmp_dir_bucket" {
location = "${var.region}"
storage_class = "REGIONAL"
project = "${var.project_id}"
force_destroy = "${var.force_destroy}"
}
2 changes: 1 addition & 1 deletion modules/dataflow_bucket/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ output "name" {
output "region" {
description = "The bucket's region location"
value = "${var.region}"
}
}
5 changes: 5 additions & 0 deletions modules/dataflow_bucket/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,8 @@ variable "region" {
variable "name" {
description = "The name of the bucket."
}

variable "force_destroy" {
description = "When deleting a bucket, this boolean option will delete all contained objects. If you try to delete a bucket that contains objects, Terraform will fail that run."
default = "false"
}
1 change: 1 addition & 0 deletions test/fixtures/simple_example/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,5 @@ module "example" {
project_id = "${var.project_id}"
region = "${var.region}"
service_account_email = "${var.service_account_email}"
force_destroy = "true"
}
15 changes: 15 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,18 @@ variable "region" {
description = "The bucket's region location"
default = "us-central1"
}

variable "subnetwork_self_link" {
description = "The subnetwork self link to which VMs will be assigned."
default = ""
}

variable "network_self_link" {
description = "The network self link to which VMs will be assigned."
default = "default"
}

variable "machine_type" {
description = "The machine type to use for the job."
default = ""
}

0 comments on commit 93d950e

Please sign in to comment.