From 22f9b4e1067d655998e84fd3eb7b074e6c03276f Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Wed, 3 Jul 2024 12:11:55 -0300 Subject: [PATCH 01/13] adjust rag example documentation --- examples/genai-rag-multimodal/README.md | 75 ++++++++++++------------- 1 file changed, 36 insertions(+), 39 deletions(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 63f1f0a3..12f12dc4 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -20,7 +20,7 @@ The main modifications to the original example include: - Terraform v1.7.5 - [Authenticated Google Cloud SDK 469.0.0](https://cloud.google.com/sdk/docs/authorizing) -### Provision Infrastructure with Terraform +### Terraform Variables Configuration - Update the `terraform.tfvars` file with values from your environment. @@ -39,31 +39,11 @@ The main modifications to the original example include: - **KMS-PROJECT-ID**, **ML-ENV-KEYRING**, **ML-ENV-KEY**: Run `terraform output machine_learning_kms_keys` on `gcp-projects` repository, inside the Machine Learning business unit directory and on the development branch. - **REGION**: The chosen region. -### Allow file download from Google Notebook Examples Bucket on VPC-SC Perimeter - -When running the Notebook, you will reach a step that downloads an example PDF file from a bucket, you need to add the egress rule below on the VPC-SC perimeter to allow the operation. - -```yaml -- egressFrom: - identities: - - serviceAccount:rag-notebook-runner@.iam.gserviceaccount.com - egressTo: - operations: - - methodSelectors: - - method: google.storage.buckets.list - - method: google.storage.buckets.get - - method: google.storage.objects.get - - method: google.storage.objects.list - serviceName: storage.googleapis.com - resources: - - projects/200612033880 # Google Cloud Example Project -``` - ## Deploying infrastructure using Machine Learning Infra Pipeline -### Required Permissions for pipeline Service Account +### Required Permissions and VPC-SC adjustments for pipeline Service Account -- Give `roles/compute.networkUser` to the Service Account that runs the Pipeline. +- The Service Account that runs the Pipeline must have `roles/compute.networkUser` on the Shared VPC Host Project, you can give this role by running the command below: ```bash SERVICE_ACCOUNT=$(terraform -chdir="./gcp-projects/ml_business_unit/shared" output -json terraform_service_accounts | jq -r '."ml-machine-learning"') @@ -71,21 +51,7 @@ When running the Notebook, you will reach a step that downloads an example PDF f gcloud projects add-iam-policy-binding --member="serviceAccount:$SERVICE_ACCOUNT" --role="roles/compute.networkUser" ``` -- Add the following ingress rule to the Service Perimeter. - - ```yaml - ingressPolicies: - - ingressFrom: - identities: - - serviceAccount: - sources: - - accessLevel: '*' - ingressTo: - operations: - - serviceName: '*' - resources: - - '*' - ``` +- Add the build service account in the development VPC-SC perimeter. You can do this by adding "serviceAccount:" to `perimeter_additional_members` in `common.auto.tfvars` (development branch). If the command above executed succesfully, you can run `echo $SERVICE_ACCOUNT` to retrieve the Pipeline Service Account e-mail. ### Deployment steps @@ -159,7 +125,38 @@ When running the Notebook, you will reach a step that downloads an example PDF f ## Deploying infrastructure using terraform locally -Run `terraform init && terraform apply -auto-approve`. +- Run `terraform init` inside this directory. +- Run `terraform apply` inside this directory. + +## Post-Deployment + +### Allow file download from Google Notebook Examples Bucket on VPC-SC Perimeter + +When running the Notebook, you will reach a step that downloads an example PDF file from a bucket, you need to add the egress rule below on the VPC-SC perimeter to allow the operation. You can do this by adding this rule to `egress_rule` variable on `gcp-networks/envs/development/development.auto.tfvars` on the development branch. + +```terraform +{ + "from" = { + "identity_type" = "" + "identities" = [ + "serviceAccount:rag-notebook-runner@.iam.gserviceaccount.com" + ] + }, + "to" = { + "resources" = ["projects/200612033880"] # Google Cloud Example Project + "operations" = { + "storage.googleapis.com" = { + "methods" = [ + "google.storage.buckets.list", + "google.storage.buckets.get", + "google.storage.objects.get", + "google.storage.objects.list", + ] + } + } + } +}, +``` ## Usage From 02acae8d26a68b8ca7de1c2d36b57ef7fbdc28fc Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Wed, 3 Jul 2024 14:39:33 -0300 Subject: [PATCH 02/13] update docs --- examples/genai-rag-multimodal/README.md | 92 ++++++++++++++++++++++++- 1 file changed, 91 insertions(+), 1 deletion(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 12f12dc4..cc8060eb 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -114,6 +114,29 @@ The main modifications to the original example include: } ``` +- Verify if `backend.tf` file exists at `ml-machine-learning/ml_business_unit/development`. + - If there is a `backend.tf` file, proceed with the next step and ignore the sub-steps below. + - If there is no `backend.tf` file, follow the sub-steps below: + - Create the file and put the following content into it: + + ```terraform + terraform { + backend "gcs" { + bucket = "UPDATE_APP_INFRA_BUCKET" + prefix = "terraform/app-infra/ml_business_unit/development" + } + } + ``` + + - Run the command below to update `UPDATE_APP_INFRA_BUCKET`: + + ```bash + export backend_bucket=$(terraform -chdir="../gcp-projects/ml_business_unit/shared/" output -json state_buckets | jq '."ml-artifact-publish"' --raw-output) + echo "backend_bucket = ${backend_bucket}" + + for i in `find -name 'backend.tf'`; do sed -i "s/UPDATE_APP_INFRA_BUCKET/${backend_bucket}/" $i; done + ``` + - Commit and push ```terraform @@ -162,10 +185,77 @@ When running the Notebook, you will reach a step that downloads an example PDF f Once all the requirements are set up, you can start by running and adjusting the notebook step-by-step. -To run the notebook, open the Google Cloud Console on Vertex AI Workbench, open JupyterLab and upload the notebook (`multimodal_rag_langchain.ipynb`) to it. +To run the notebook, open the Google Cloud Console on Vertex AI Workbench (`https://console.cloud.google.com/vertex-ai/workbench/instances?referrer=search&project=`), click open JupyterLab on the created instance and upload the notebook (`multimodal_rag_langchain.ipynb`) in this repo to it. ### Optional: Use `terraform output` and bash command to fill in fields in the notebook +#### Infra Pipeline (Cloud Build) + +If you ran using Cloud Build, proceed with the steps below to use `terraform output`. + +- Update `outputs.tf` file on `ml-machine-learning/ml_business_unit/development` with the following values: + + ```terraform + output "private_endpoint_ip_address" { + value = module.genai_example.private_endpoint_ip_address + } + + output "host_vpc_project_id" { + value = module.genai_example.host_vpc_project_id + } + + output "host_vpc_network" { + value = module.genai_example.host_vpc_network + } + + output "notebook_project_id" { + value = module.genai_example.notebook_project_id + } + + output "vector_search_bucket_name" { + value = module.genai_example.vector_search_bucket_name + } + ``` + +- Run `./tf-wrapper init development` on `ml-machine-learning`. + +- Extract values from `terraform output` and validate. You must run the commands below at `ml-machine-learning/ml_business_unit/development`. + + ```bash + export private_endpoint_ip_address=$(terraform output -raw private_endpoint_ip_address) + echo private_endpoint_ip_address=$private_endpoint_ip_address + + export host_vpc_project_id=$(terraform output -raw host_vpc_project_id) + echo host_vpc_project_id=$host_vpc_project_id + + export notebook_project_id=$(terraform output -raw notebook_project_id) + echo notebook_project_id=$notebook_project_id + + export vector_search_bucket_name=$(terraform output -raw vector_search_bucket_name) + echo vector_search_bucket_name=$vector_search_bucket_name + + export host_vpc_network=$(terraform output -raw host_vpc_network) + echo host_vpc_network=$host_vpc_network + ``` + +- Search and Replace using `sed` command. + + ```bash + sed -i "s//$private_endpoint_ip_address/g" multimodal_rag_langchain.ipynb + + sed -i "s//$host_vpc_project_id/g" multimodal_rag_langchain.ipynb + + sed -i "s//$notebook_project_id/g" multimodal_rag_langchain.ipynb + + sed -i "s//$vector_search_bucket_name/g" multimodal_rag_langchain.ipynb + + sed -i "s::$host_vpc_network:g" multimodal_rag_langchain.ipynb + ``` + +#### Terraform Locally + +If you ran terraform locally, proceed with the steps below to use `terraform output`. + You can save some time adjusting the notebook by running the commands below: - Extract values from `terraform output` and validate. From 1619330f1a195c2a117243901ccc7b31f4a40b2d Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Thu, 4 Jul 2024 08:50:13 -0300 Subject: [PATCH 03/13] more info on usage --- examples/genai-rag-multimodal/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index cc8060eb..64ae052a 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -185,7 +185,9 @@ When running the Notebook, you will reach a step that downloads an example PDF f Once all the requirements are set up, you can start by running and adjusting the notebook step-by-step. -To run the notebook, open the Google Cloud Console on Vertex AI Workbench (`https://console.cloud.google.com/vertex-ai/workbench/instances?referrer=search&project=`), click open JupyterLab on the created instance and upload the notebook (`multimodal_rag_langchain.ipynb`) in this repo to it. +To run the notebook, open the Google Cloud Console on Vertex AI Workbench (`https://console.cloud.google.com/vertex-ai/workbench/instances?referrer=search&project=`), click open JupyterLab on the created instance. + +After clicking "open JupyterLab" button, you will be taken to an interactive JupyterLab Workspace, you can upload the notebook (`multimodal_rag_langchain.ipynb`) in this repo to it. Once the notebook is uploaded to the environment, run it cell-by-cell to see process of building a RAG chain. ### Optional: Use `terraform output` and bash command to fill in fields in the notebook From 508c96c8cab2ce0db9f100c60dd5ca090efbea52 Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Thu, 4 Jul 2024 16:21:36 -0300 Subject: [PATCH 04/13] adjust tf init, terraform output with chdir flag and more information about products --- examples/genai-rag-multimodal/README.md | 16 +++++++++++----- .../multimodal_rag_langchain.ipynb | 5 +++-- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 64ae052a..4b1ee50b 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -12,9 +12,15 @@ The main modifications to the original example include: - Adaptations to comply with Cloud Foundation Toolkit security measures. - Installation of additional libraries in the Conda environment. -- Use of Vertex AI Workbench to run the notebook with a custom Service Account. +- Use of Vertex AI Workbench to run the notebook with a custom Service Account in a secure environment. - Implementation of Vector Search on Vertex AI with [Private Service Connect](https://cloud.google.com/vpc/docs/private-service-connect). +For more information about the technologies used in this example, please refer to the following resources: + +- [Vertex AI Workbench Introduction](https://cloud.google.com/vertex-ai/docs/workbench/introduction) +- [Vertex AI Vector Search Overview](https://cloud.google.com/vertex-ai/docs/vector-search/overview) +- [Ragas Documentation](https://docs.ragas.io/en/stable/) + ## Requirements - Terraform v1.7.5 @@ -33,10 +39,10 @@ The main modifications to the original example include: ``` - Assuming you are deploying the example on top of the development environment, the following instructions will provide you more insight on how to retrieve these values: - - **NETWORK-PROJECT-ID**: Run `terraform output -raw restricted_host_project_id` on `gcp-networks` repository, inside the development environment directory and branch. - - **NETWORK-NAME**: Run `terraform output -raw restricted_network_name` on `gcp-networks` repository, inside the development environment directory and branch. - - **MACHINE-LEARNING-PROJECT-ID**: Run `terraform output -raw machine_learning_project_id` on `gcp-projects` repository, inside the Machine Learning business unit directory and on the development branch. - - **KMS-PROJECT-ID**, **ML-ENV-KEYRING**, **ML-ENV-KEY**: Run `terraform output machine_learning_kms_keys` on `gcp-projects` repository, inside the Machine Learning business unit directory and on the development branch. + - **NETWORK-PROJECT-ID**: Run `terraform -chdir="envs/development" output -raw restricted_host_project_id` on `gcp-networks` repository at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper init development` on the directory. + - **NETWORK-NAME**: Run `terraform -chdir="envs/development" output -raw restricted_network_name` on `gcp-networks` repository at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper init development` on the directory. + - **MACHINE-LEARNING-PROJECT-ID**: Run `terraform -chdir="ml_business_unit/development" output -raw machine_learning_project_id` on `gcp-projects` repository, at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper init development` on the directory. + - **KMS-PROJECT-ID**, **ML-ENV-KEYRING**, **ML-ENV-KEY**: Run `terraform -chdir="ml_business_unit/development" output machine_learning_kms_keys` on `gcp-projects` repository, at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper init development` on the directory. - **REGION**: The chosen region. ## Deploying infrastructure using Machine Learning Infra Pipeline diff --git a/examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb b/examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb index 4c138eed..d13e3b28 100644 --- a/examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb +++ b/examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb @@ -708,7 +708,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "* Retrieve the value of the service attachment and execute the command below on your local machine:" + "* Retrieve the value of the service attachment and execute the `gcloud` command below on your local machine:\n", + "* NOTE: If you don't have permission to run the command below, you may need to open a PR to `gcp-networks` repository adding the forwarding rule or ask a network engineer to create this forwarding rule." ] }, { @@ -720,7 +721,7 @@ "NETWORK=\"\"\n", "NETWORK_PROJECT_ID=\"\"\n", "\n", - "!gcloud compute forwarding-rules create vector-search-endpoint \\\n", + "!echo gcloud compute forwarding-rules create vector-search-endpoint \\\n", " --network={NETWORK} \\\n", " --address=vector-search-endpoint \\\n", " --target-service-attachment={SERVICE_ATTACHMENT} \\\n", From 6124a7582a216ae69226fa1db1716b4b50000879 Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Mon, 8 Jul 2024 16:33:08 -0300 Subject: [PATCH 05/13] fix ragas version --- examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb b/examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb index d13e3b28..cef283e4 100644 --- a/examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb +++ b/examples/genai-rag-multimodal/multimodal_rag_langchain.ipynb @@ -1025,7 +1025,7 @@ "metadata": {}, "outputs": [], "source": [ - "%pip install ragas" + "%pip install ragas==0.1.9" ] }, { From defb03e0c99aa6565d314a5fe9a24afddef72dcc Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Thu, 25 Jul 2024 10:23:19 -0300 Subject: [PATCH 06/13] add pricing info --- examples/genai-rag-multimodal/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 4b1ee50b..957e854a 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -299,6 +299,10 @@ You can save some time adjusting the notebook by running the commands below: sed -i "s::$host_vpc_network:g" multimodal_rag_langchain.ipynb ``` +## Notes + +- Some, but not exclusively, of the billable components deployed are: Vertex AI Workbench Instance, Private Service Connect Endpoint and Vector Search Endpoint. + ## Known Issues - `Error: Error creating Instance: googleapi: Error 400: value_to_check(https://compute.googleapis.com/compute/v1/projects/...) is not found`. From d768dc868e0b8d909a2f46bdfbb7dafea0a1f35f Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Fri, 2 Aug 2024 16:22:13 -0300 Subject: [PATCH 07/13] adjusting RAG procedure --- examples/genai-rag-multimodal/README.md | 92 ++++++++++++++++++++++--- 1 file changed, 84 insertions(+), 8 deletions(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 957e854a..528ce313 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -39,12 +39,70 @@ For more information about the technologies used in this example, please refer t ``` - Assuming you are deploying the example on top of the development environment, the following instructions will provide you more insight on how to retrieve these values: - - **NETWORK-PROJECT-ID**: Run `terraform -chdir="envs/development" output -raw restricted_host_project_id` on `gcp-networks` repository at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper init development` on the directory. - - **NETWORK-NAME**: Run `terraform -chdir="envs/development" output -raw restricted_network_name` on `gcp-networks` repository at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper init development` on the directory. - - **MACHINE-LEARNING-PROJECT-ID**: Run `terraform -chdir="ml_business_unit/development" output -raw machine_learning_project_id` on `gcp-projects` repository, at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper init development` on the directory. - - **KMS-PROJECT-ID**, **ML-ENV-KEYRING**, **ML-ENV-KEY**: Run `terraform -chdir="ml_business_unit/development" output machine_learning_kms_keys` on `gcp-projects` repository, at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper init development` on the directory. + - **NETWORK-PROJECT-ID**: Run `terraform -chdir="envs/development" output -raw restricted_host_project_id` on `gcp-networks` repository at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper.sh init development` on the directory. + - **NETWORK-NAME**: Run `terraform -chdir="envs/development" output -raw restricted_network_name` on `gcp-networks` repository at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper.sh init development` on the directory. + - **MACHINE-LEARNING-PROJECT-ID**: Run `terraform -chdir="ml_business_unit/development" output -raw machine_learning_project_id` on `gcp-projects` repository, at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper.sh init development` on the directory. + - **KMS-PROJECT-ID**, **ML-ENV-KEYRING**, **ML-ENV-KEY**: Run `terraform -chdir="ml_business_unit/development" output machine_learning_kms_keys` on `gcp-projects` repository, at the development branch. Please note that if you have not initialized the environment you will need to run `./tf-wrapper.sh init development` on the directory. - **REGION**: The chosen region. + - Optionally, you may follow the series of steps below to automatically cre the `terraform.tfvars` file: + - **IMPORTANT:** Please note that the steps below are assuming you are checked out on the same level as `terraform-google-enterprise-genai/` and the other repos (`gcp-bootstrap`, `gcp-org`, `gcp-projects`...). + - Retrieve values from terraform outputs to bash variables: + + ```bash + (cd gcp-networks && git checkout development && ./tf-wrapper.sh init development) + + export restricted_host_project_id=$(terraform -chdir="gcp-networks/envs/development" output -raw restricted_host_project_id) + + export restricted_network_name=$(terraform -chdir="gcp-networks/envs/development" output -raw restricted_network_name) + + (cd gcp-projects && git checkout development && ./tf-wrapper.sh init development) + + export machine_learning_project_id=$(terraform -chdir="gcp-projects/ml_business_unit/development" output -raw machine_learning_project_id) + + export machine_learning_kms_keys_json=$(terraform -chdir="gcp-projects/ml_business_unit/development" output -json machine_learning_kms_keys) + ``` + + - Extract the kms key from the `json` variable by using `jq`: + + ```bash + export machine_learning_kms_keys=$(echo $machine_learning_kms_keys_json | jq -r ".\"$region\".id") + ``` + + - Create region environment variable (if you are not using `us-central1`, remember to change the value below): + + ```bash + export region="us-central1" + ``` + + - Validate if the variables values are correct: + + ```bash + echo region=$region + echo restricted_host_project_id=$restricted_host_project_id + echo restricted_network_name=$restricted_network_name + echo machine_learning_project_id=$machine_learning_project_id + echo machine_learning_kms_keys=$machine_learning_kms_keys + ``` + + - Populate `terraform.tfvars` with the following command: + + ```bash + cat > terraform-google-enterprise-genai/examples/genai-rag-multimodal/terraform.tfvars < --member="serviceAccount:$SERVICE_ACCOUNT" --role="roles/compute.networkUser" + gcloud projects add-iam-policy-binding $restricted_host_project_id --member="serviceAccount:$SERVICE_ACCOUNT" --role="roles/compute.networkUser" ``` -- Add the build service account in the development VPC-SC perimeter. You can do this by adding "serviceAccount:" to `perimeter_additional_members` in `common.auto.tfvars` (development branch). If the command above executed succesfully, you can run `echo $SERVICE_ACCOUNT` to retrieve the Pipeline Service Account e-mail. +- Add the build service account in the development VPC-SC perimeter. + - Retrieve the service account value for your environment: + + ```bash + echo "serviceAccount:$SERVICE_ACCOUNT" + ``` + + - Add "serviceAccount:" to `perimeter_additional_members` field in `common.auto.tfvars` at `gcp-networks` repository on the development branch. + + - Commit and push the result by running the commands below: + ```bash + cd gcp-networks + git add common.auto.tfvars + git commit -m "Add machine learning build SA to perimeter" + git push origin development + ``` + ### Deployment steps **IMPORTANT:** Please note that the steps below are assuming you are checked out on the same level as `terraform-google-enterprise-genai/` and the other repos (`gcp-bootstrap`, `gcp-org`, `gcp-projects`...). @@ -137,7 +212,7 @@ For more information about the technologies used in this example, please refer t - Run the command below to update `UPDATE_APP_INFRA_BUCKET`: ```bash - export backend_bucket=$(terraform -chdir="../gcp-projects/ml_business_unit/shared/" output -json state_buckets | jq '."ml-artifact-publish"' --raw-output) + export backend_bucket=$(terraform -chdir="../gcp-projects/ml_business_unit/shared/" output -json state_buckets | jq '."ml-machine-learning"' --raw-output) echo "backend_bucket = ${backend_bucket}" for i in `find -name 'backend.tf'`; do sed -i "s/UPDATE_APP_INFRA_BUCKET/${backend_bucket}/" $i; done @@ -154,6 +229,7 @@ For more information about the technologies used in this example, please refer t ## Deploying infrastructure using terraform locally +- Only proceed with these steps if you have not deployed [using cloudbuild](#deploying-infrastructure-using-machine-learning-infra-pipeline). - Run `terraform init` inside this directory. - Run `terraform apply` inside this directory. @@ -225,7 +301,7 @@ If you ran using Cloud Build, proceed with the steps below to use `terraform out } ``` -- Run `./tf-wrapper init development` on `ml-machine-learning`. +- Run `./tf-wrapper.sh init development` on `ml-machine-learning`. - Extract values from `terraform output` and validate. You must run the commands below at `ml-machine-learning/ml_business_unit/development`. From a77e9e746a2bffd7b71dd7be4b1b483fe3fcd10c Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Mon, 5 Aug 2024 09:07:47 -0300 Subject: [PATCH 08/13] adjust rag populate tfvars command --- examples/genai-rag-multimodal/README.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 528ce313..891e3a15 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -90,8 +90,8 @@ For more information about the technologies used in this example, please refer t ```bash cat > terraform-google-enterprise-genai/examples/genai-rag-multimodal/terraform.tfvars < Date: Mon, 5 Aug 2024 15:02:52 -0300 Subject: [PATCH 09/13] adjusting rag example --- examples/genai-rag-multimodal/README.md | 6 +++++- examples/genai-rag-multimodal/outputs.tf | 4 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 891e3a15..2a6edf74 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -242,6 +242,8 @@ For more information about the technologies used in this example, please refer t When running the Notebook, you will reach a step that downloads an example PDF file from a bucket, you need to add the egress rule below on the VPC-SC perimeter to allow the operation. You can do this by adding this rule to `egress_rule` variable on `gcp-networks/envs/development/development.auto.tfvars` on the development branch. +> NOTE: If you are deploying this example on top of an existing foundation instance, the variable name might be `egress_policies`. + ```terraform { "from" = { @@ -272,7 +274,7 @@ Once all the requirements are set up, you can start by running and adjusting the To run the notebook, open the Google Cloud Console on Vertex AI Workbench (`https://console.cloud.google.com/vertex-ai/workbench/instances?referrer=search&project=`), click open JupyterLab on the created instance. -After clicking "open JupyterLab" button, you will be taken to an interactive JupyterLab Workspace, you can upload the notebook (`multimodal_rag_langchain.ipynb`) in this repo to it. Once the notebook is uploaded to the environment, run it cell-by-cell to see process of building a RAG chain. +After clicking "open JupyterLab" button, you will be taken to an interactive JupyterLab Workspace, you can upload the notebook (`multimodal_rag_langchain.ipynb`) in this repo to it. Once the notebook is uploaded to the environment, run it cell-by-cell to see process of building a RAG chain. The notebook contains placeholders variables that must be replaced, you may follow the next section instructions to automatically replace this placeholders using `sed` command. ### Optional: Use `terraform output` and bash command to fill in fields in the notebook @@ -306,6 +308,8 @@ If you ran using Cloud Build, proceed with the steps below to use `terraform out - Run `./tf-wrapper.sh init development` on `ml-machine-learning`. +- Run `cd ml_business_unit/development && terraform refresh`, to refresh the outputs. + - Extract values from `terraform output` and validate. You must run the commands below at `ml-machine-learning/ml_business_unit/development`. ```bash diff --git a/examples/genai-rag-multimodal/outputs.tf b/examples/genai-rag-multimodal/outputs.tf index 1cd4ae7a..dcf6aee0 100644 --- a/examples/genai-rag-multimodal/outputs.tf +++ b/examples/genai-rag-multimodal/outputs.tf @@ -25,8 +25,8 @@ output "host_vpc_project_id" { } output "host_vpc_network" { - description = "This is the Self-link of the Host VPC network" - value = google_workbench_instance.instance.gce_setup[0].network_interfaces[0].network + description = "This is the self-link of the Host VPC network, without the URL prefix (i.e. https://)" + value = var.network } output "notebook_project_id" { From 2e8e33e9fa4768f78ece49c59ca1d793537250bf Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Mon, 5 Aug 2024 15:08:52 -0300 Subject: [PATCH 10/13] Remove empty section --- examples/genai-rag-multimodal/README.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 2a6edf74..0b33435c 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -134,9 +134,6 @@ For more information about the technologies used in this example, please refer t git push origin development ``` -### Required permission for notebook runner Service Account - -- Allow ### Deployment steps **IMPORTANT:** Please note that the steps below are assuming you are checked out on the same level as `terraform-google-enterprise-genai/` and the other repos (`gcp-bootstrap`, `gcp-org`, `gcp-projects`...). From fbd6b68b4a31f630116a1f8c9643757783f082a1 Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Tue, 6 Aug 2024 14:30:47 -0300 Subject: [PATCH 11/13] adjust with more automated steps --- examples/genai-rag-multimodal/README.md | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 0b33435c..c2108b91 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -97,7 +97,7 @@ For more information about the technologies used in this example, please refer t EOF ``` - - Validate if all values are corrent in `terraform.tfvars` + - Validate if all values are correct in `terraform.tfvars` ```bash cat terraform-google-enterprise-genai/examples/genai-rag-multimodal/terraform.tfvars @@ -183,7 +183,8 @@ For more information about the technologies used in this example, please refer t - Create a file named `genai_example.tf` under `ml_business_unit/development` path that calls the module. - ```terraform + ```bash + cat > ml_business_unit/development/genai_example.tf < ml_business_unit/development/backend.tf < **IMPORTANT**: If you are planning to delete the notebook-runner service account at any moment, make sure you remove this policy before deleting it. + ## Usage Once all the requirements are set up, you can start by running and adjusting the notebook step-by-step. @@ -279,7 +285,7 @@ After clicking "open JupyterLab" button, you will be taken to an interactive Jup If you ran using Cloud Build, proceed with the steps below to use `terraform output`. -- Update `outputs.tf` file on `ml-machine-learning/ml_business_unit/development` with the following values: +- Update `outputs.tf` file on `ml-machine-learning/ml_business_unit/development` and add the following values to it, if the file does not exist create it: ```terraform output "private_endpoint_ip_address" { @@ -326,9 +332,11 @@ If you ran using Cloud Build, proceed with the steps below to use `terraform out echo host_vpc_network=$host_vpc_network ``` -- Search and Replace using `sed` command. +- Search and Replace using `sed` command at `terraform-google-enterprise-genai/examples/genai-rag-multimodal`. ```bash + cd ../../../terraform-google-enterprise-genai/examples/genai-rag-multimodal + sed -i "s//$private_endpoint_ip_address/g" multimodal_rag_langchain.ipynb sed -i "s//$host_vpc_project_id/g" multimodal_rag_langchain.ipynb From 5a15389813db97518819e86ca03e697ed2667edf Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Tue, 6 Aug 2024 14:35:57 -0300 Subject: [PATCH 12/13] add more info --- examples/genai-rag-multimodal/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index c2108b91..9af507cc 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -198,7 +198,7 @@ For more information about the technologies used in this example, please refer t ``` - Verify if `backend.tf` file exists at `ml-machine-learning/ml_business_unit/development`. - - If there is a `backend.tf` file, proceed with the next step and ignore the sub-steps below. + - If there is a `backend.tf` file, proceed with the next step (commit and push) and ignore the sub-steps below. - If there is no `backend.tf` file, follow the sub-steps below: - Create the file by running the command below: From 81378641827a1913bfefd5a351c0c1d6206b01f4 Mon Sep 17 00:00:00 2001 From: caetano-colin Date: Wed, 7 Aug 2024 08:35:03 -0300 Subject: [PATCH 13/13] add description about deployment options --- examples/genai-rag-multimodal/README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/examples/genai-rag-multimodal/README.md b/examples/genai-rag-multimodal/README.md index 9af507cc..598ddf41 100644 --- a/examples/genai-rag-multimodal/README.md +++ b/examples/genai-rag-multimodal/README.md @@ -21,6 +21,11 @@ For more information about the technologies used in this example, please refer t - [Vertex AI Vector Search Overview](https://cloud.google.com/vertex-ai/docs/vector-search/overview) - [Ragas Documentation](https://docs.ragas.io/en/stable/) +After ensuring all requirements are satisfied you can follow one of the two deployment options: + +1. [**Using Machine Learning Infra Pipeline**](#deploying-infrastructure-using-machine-learning-infra-pipeline): This is a robust option suitable for production environments and continuous deployment scenarios. +2. [**Using Terraform Locally**](#deploying-infrastructure-using-terraform-locally): This is a better option for one-time testing purposes where you know you will delete the example later. + ## Requirements - Terraform v1.7.5