-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from theodoresiu/dlp_api_example
Create Terraform script to run Dataflow template for DLP API
- Loading branch information
Showing
4 changed files
with
292 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# DLP API Example | ||
|
||
This dataflow example runs the DLP Dataflow template under gs://dataflow-templates/latest/Stream_DLP_GCS_Text_to_BigQuery. It downloads a fake credit card [zipfile](http://eforexcel.com/wp/wp-content/uploads/2017/07/1500000%20CC%20Records.zip) unzips to a csv, deidentifies the credit card number and pin columns using the DLP API and dumps the data into a BigQuery dataset. | ||
|
||
This terraform script allows users to use their own pre-created KMS key ring/key/wrapped key by setting the variable `create_key_ring=false` or can also create all such resources for them by setting the variable `create_key_ring=true`. | ||
|
||
|
||
## Best practices | ||
|
||
### Cost and Performance | ||
As featured in this example, using a single regional bucket for storing your jobs' temporary data is recommended to optimize cost. | ||
Also, to optimize your jobs performance, this bucket should always in the corresponding region of the zones in which your jobs are running. | ||
## | ||
Make sure the terraform service account to execute the example has the basic permissions needed for the module listed [here](../../README#configure-a-service-account-to-execute-the-module) | ||
Grant these additional permissions to the service account needed to run the example: | ||
- roles/bigquery.admin | ||
- roles/iam.serviceAccountUser | ||
- roles/storage.admin | ||
- roles/cloudkms.admin | ||
- roles/dlp.admin | ||
- roles/cloudkms.cryptoKeyEncrypterDecrypter | ||
|
||
### Controller Service Account | ||
This example features the use of a controller service account which is specified with the `service_account_email` input variables. | ||
We recommend using a custom service account with fine-grained access control to mitigate security risks. See more about controller service accounts [here](https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#controller_service_account) | ||
|
||
In order to execute this module, your Controller Service Account uses the following project roles: | ||
- roles/dataflow.worker | ||
- roles/storage.admin | ||
- roles/bigquery.admin | ||
- roles/cloudkms.admin | ||
- roles/dlp.admin | ||
- roles/cloudkms.cryptoKeyEncrypterDecrypter | ||
|
||
### GCloud | ||
This example uses gcloud shell commands to create a wrapped key and download the sample cc data. Please ensure that you have gcloud [installed](https://cloud.google.com/sdk/install) are authenticated using `gcloud init` and also properly set the project `gcloud config set project my-project`. You may need to enable the following APIs- see [here](https://cloud.google.com/apis/docs/enable-disable-apis) | ||
- Cloud Key Management Service (KMS) API: `cloudkms.googleapis.com` | ||
- Cloud Storage API : `storage-component.googleapis.com` | ||
- DLP API: `dlp.googleapis.com` | ||
|
||
|
||
[^]: (autogen_docs_start) | ||
|
||
## Inputs | ||
|
||
| Name | Description | Type | Default | Required | | ||
|------|-------------|:----:|:-----:|:-----:| | ||
| project\_id | The project ID to deploy to | string | n/a | yes | | ||
| region | The region in which the bucket and the dataflow job will be deployed | string | n/a | yes | | ||
| service\_account\_email | The Service Account email used to create the job. | string | n/a | yes | | ||
| key\_ring | The KMS key ring used to create a wrapped key (can be existing or created) | string | n/a | yes | | ||
| kms\_key\_name | The KMS key within the key ring used to create a wrapped key (can be existing or created) | string | n/a | yes | | ||
| wrapped\_key | The wrapped key generated from KMS used to encrypt sensitive information (leave blank if generating from terraform) | string | "" | yes | | ||
| create\_key\_ring | Boolean for creating own KMS key ring/key or using pre-created resource | string | "true" | yes | | ||
|
||
## Outputs | ||
|
||
| Name | Description | | ||
|------|-------------| | ||
| bucket\_name | The name of the bucket | | ||
| df\_job\_id | The unique Id of the newly created Dataflow job | | ||
| df\_job\_name | The name of the newly created Dataflow job | | ||
| df\_job\_state | The state of the newly created Dataflow job | | ||
| project\_id | The project's ID | | ||
|
||
[^]: (autogen_docs_end) | ||
|
||
To provision this example, run the following from within this directory: | ||
- `terraform init` to get the plugins | ||
- `terraform plan` to see the infrastructure plan | ||
- `terraform apply` to apply the infrastructure build | ||
- `terraform destroy` to destroy the built infrastructure. (Note that KMS key rings and crypto keys cannot be destroyed!) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
/** | ||
* Copyright 2019 Google LLC | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
provider "google" { | ||
version = "~> 2.4.0" | ||
region = "${var.region}" | ||
} | ||
|
||
resource "random_id" "random_suffix" { byte_length = 4 } | ||
|
||
locals { | ||
gcs_bucket_name = "tmp-dir-bucket-${random_id.random_suffix.hex}" | ||
} | ||
|
||
module "dataflow-bucket" { | ||
source = "../../modules/dataflow_bucket" | ||
name = "${local.gcs_bucket_name}" | ||
region = "${var.region}" | ||
project_id = "${var.project_id}" | ||
} | ||
|
||
resource "null_resource" "download_sample_cc_into_gcs" { | ||
provisioner "local-exec" { | ||
command = <<EOF | ||
curl http://eforexcel.com/wp/wp-content/uploads/2017/07/1500000%20CC%20Records.zip > cc_records.zip | ||
unzip cc_records.zip | ||
rm cc_records.zip | ||
mv 1500000\ CC\ Records.csv cc_records.csv | ||
gsutil cp cc_records.csv gs://${module.dataflow-bucket.name} | ||
rm cc_records.csv | ||
EOF | ||
} | ||
} | ||
|
||
resource "null_resource" "deinspection_template_setup" { | ||
provisioner "local-exec" { | ||
command = <<EOF | ||
if [ -f wrapped_key.txt ] && [ ${null_resource.create_kms_wrapped_key.count}=1 ]; then | ||
wrapped_key=$(cat wrapped_key.txt) | ||
else | ||
wrapped_key=${var.wrapped_key} | ||
fi | ||
echo $wrapped_key | ||
curl https://dlp.googleapis.com/v2/projects/${var.project_id}/deidentifyTemplates -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \ | ||
-H "Content-Type: application/json" \ | ||
-d '{"deidentifyTemplate": {"deidentifyConfig": {"recordTransformations": {"fieldTransformations": [{"fields": [{"name": "Card Number"}, {"name": "Card PIN"}], "primitiveTransformation": {"cryptoReplaceFfxFpeConfig": {"cryptoKey": {"kmsWrapped": {"cryptoKeyName": "projects/${var.project_id}/locations/global/keyRings/${var.key_ring}/cryptoKeys/${var.kms_key_name}", "wrappedKey": "'$wrapped_key'"}}, "commonAlphabet": "ALPHA_NUMERIC"}}}]}}}, "templateId": "15"}' | ||
EOF | ||
} | ||
} | ||
|
||
resource "google_bigquery_dataset" "default" { | ||
project = "${var.project_id}" | ||
dataset_id = "dlp_demo" | ||
friendly_name = "dlp_demo" | ||
description = "This is the BQ dataset for running the dlp demo" | ||
location = "US" | ||
default_table_expiration_ms = 3600000 | ||
} | ||
|
||
resource "google_kms_key_ring" "create_kms_ring" { | ||
project = "${var.project_id}" | ||
count = "${var.create_key_ring == "true" ? 1 : 0}" | ||
name = "${var.key_ring}" | ||
location = "global" | ||
} | ||
|
||
resource "google_kms_crypto_key" "create_kms_key" { | ||
count = "${google_kms_key_ring.create_kms_ring.count}" | ||
name = "${var.kms_key_name}" | ||
key_ring = "${google_kms_key_ring.create_kms_ring.self_link}" | ||
} | ||
|
||
resource "null_resource" "create_kms_wrapped_key" { | ||
count = "${google_kms_crypto_key.create_kms_key.count}" | ||
|
||
provisioner "local-exec" { | ||
command = <<EOF | ||
rm original_key.txt | ||
rm wrapped_key.txt | ||
python -c "import os,base64; key=os.urandom(32); encoded_key = base64.b64encode(key).decode('utf-8'); print(encoded_key)" >> original_key.txt | ||
original_key="$(cat original_key.txt)" | ||
gcloud kms keys add-iam-policy-binding ${var.kms_key_name} --project ${var.project_id} --location global --keyring ${var.key_ring} --member serviceAccount:${var.terraform_service_account_email} --role roles/cloudkms.cryptoKeyEncrypterDecrypter | ||
curl -s -X POST "https://cloudkms.googleapis.com/v1/projects/${var.project_id}/locations/global/keyRings/${var.key_ring}/cryptoKeys/${var.kms_key_name}:encrypt" -d '{"plaintext":"'$original_key'"}' -H "Authorization:Bearer $(gcloud auth application-default print-access-token)" -H "Content-Type:application/json" | python -c "import sys, json; print(json.load(sys.stdin)['ciphertext'])" >> wrapped_key.txt | ||
EOF | ||
} | ||
} | ||
|
||
module "dataflow-job" { | ||
source = "../../" | ||
project_id = "${var.project_id}" | ||
name = "dlp_example_${null_resource.download_sample_cc_into_gcs.id}_${null_resource.deinspection_template_setup.id}" | ||
on_delete = "cancel" | ||
zone = "${var.region}-a" | ||
template_gcs_path = "gs://dataflow-templates/latest/Stream_DLP_GCS_Text_to_BigQuery" | ||
temp_gcs_location = "${module.dataflow-bucket.name}" | ||
service_account_email = "${var.service_account_email}" | ||
max_workers = 5 | ||
|
||
parameters = { | ||
inputFilePattern = "gs://${module.dataflow-bucket.name}/cc_records.csv" | ||
datasetName = "${google_bigquery_dataset.default.dataset_id}" | ||
batchSize = 1000 | ||
dlpProjectId = "${var.project_id}" | ||
deidentifyTemplateName = "projects/${var.project_id}/deidentifyTemplates/15" | ||
} | ||
} | ||
|
||
resource "null_resource" "destroy_deidentify_template"{ | ||
provisioner "local-exec" { | ||
when = "destroy" | ||
command = <<EOF | ||
curl -s -X DELETE "https://dlp.googleapis.com/v2/projects/${var.project_id}/deidentifyTemplates/15" -H "Authorization:Bearer $(gcloud auth application-default print-access-token)" | ||
EOF | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
/** | ||
* Copyright 2019 Google LLC | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
output "project_id" { | ||
value = "${var.project_id}" | ||
description = "The project's ID" | ||
} | ||
|
||
output "df_job_state" { | ||
description = "The state of the newly created Dataflow job" | ||
value = "${module.dataflow-job.state}" | ||
} | ||
|
||
output "df_job_id" { | ||
description = "The unique Id of the newly created Dataflow job" | ||
value = "${module.dataflow-job.id}" | ||
} | ||
|
||
output "df_job_name" { | ||
description = "The name of the newly created Dataflow job" | ||
value = "${module.dataflow-job.name}" | ||
} | ||
|
||
output "bucket_name" { | ||
description = "The name of the bucket" | ||
value = "${module.dataflow-bucket.name}" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
/** | ||
* Copyright 2019 Google LLC | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
variable "project_id" { | ||
description = "The project ID to deploy to" | ||
} | ||
|
||
variable "region" { | ||
description = "The region in which the bucket and the dataflow job will be deployed" | ||
default = "us-central1" | ||
} | ||
|
||
variable "service_account_email" { | ||
description = "The Service Account email used to create the job." | ||
} | ||
|
||
variable "terraform_service_account_email" { | ||
description = "The Service Account email used by terraform to spin up resources- the one from environmental variable GOOGLE_APPLICATION_CREDENTIALS" | ||
} | ||
|
||
variable "key_ring" { | ||
description = "The GCP KMS key ring to be created" | ||
} | ||
|
||
variable "kms_key_name" { | ||
description = "The GCP KMS key to be created going under the key ring" | ||
} | ||
|
||
variable "wrapped_key" { | ||
description = "Wrapped key from KMS leave blank if create_key_ring=true" | ||
default = "" | ||
} | ||
|
||
variable "create_key_ring" { | ||
description = "Boolean for determining whether to create key ring with keys(true or false)" | ||
default = "true" | ||
} |