Name	Name	Last commit message	Last commit date
parent directory ..
components	components
configs	configs
images	images
notebook	notebook
pipelines	pipelines
scripts	scripts
README.md	README.md
build_components_cb.sh	build_components_cb.sh
build_images_cb.sh	build_images_cb.sh
build_pipeline_spec_cb.sh	build_pipeline_spec_cb.sh
requirements.txt	requirements.txt
training_pipeline.png	training_pipeline.png

Vertex AI Pipeline

This repository demonstrates end-to-end MLOps process using Vertex AI platform and Smart Analytics technology capabilities.

In particular two general Vertex AI Pipeline templates has been provided:

Training pipeline including:
- Data processing
- Custom model training
- Model evaluation
- Endpoint creation
- Model deployment
- Deployment testing
- Model monitoring
Batch-prediction pipeline including
- Data processing
- Batch prediction using deployed model

Note that besides Data processing being done using BigQuery, all other steps are build on top of Vertex AI platform capabilities.

Dataset

The dataset used throughout the demonstration is Banknote Authentication Data Set. Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images. Attribute Information:

variance of Wavelet Transformed image (continuous)
skewness of Wavelet Transformed image (continuous)
curtosis of Wavelet Transformed image (continuous)
entropy of image (continuous)
class (integer)

Machine Learning Problem

Given the Banknote Authentication Data Set, a binary classification problem is adopted where attribute class is chosen as label and the remaining attributes are used as features.

LightGBM, a gradient boosting framework that uses tree based learning algorithms, is used to train the model for purpose of demonstrating custom training and custom serving capabilities of Vertex AI platform, which provide more native support for e.g. Tensorflow, Pytorch, Scikit-Learn and Pytorch.

Repository Structure

The repository contains the following:

.
├── components    : custom vertex pipeline components
├── images        : custom container images for training and serving
├── pipelines     : vertex ai pipeline definitions and runners
├── configs       : configurations for defining vertex ai pipeline
├── scripts       : scripts for runing local testing 
└── notebooks     : notebooks used development and testing of vertex ai pipeline

In addition

build_components_cb.sh: build all components defined in components folder using Cloud Build
build_images_cb.sh: build custom images (training and serving) defined in images folder using Cloud Build
build_pipeline_cb.sh: build training and batch-prediction pipeline defined in pipelines folder using Cloud Build

Get Started

The end-to-end process of creating and running the training pipeline contains the following steps:

Setting up MLOps environment on Google Cloud.
Create an Artifact Registry for your organization to manage container images
Develop the training and serving logic
Create the components required to build and run the pipeline
Prepare and consolidate the configurations of the various steps of the pipeline
Build the pipeline
Run and orchestrate the pipeline

Create Artifact Registry

Artifact Registry is a single place for your organization to manage container images and language packages (such as Maven and npm). It is fully integrated with Google Cloud’s tooling and runtimes and comes with support for native artifact protocols. More importantly, it supports regional and multi-regional repositories.

We have provided a helper script: scripts/create_artifact_registry.sh

Develop Training and Serving Logic

Develop your machine learning program and then containerize them as demonstrated in images. The requirements for writing training code can be found here as well. Note that custom serving image is not necessary if your choosen framework is supported by pre-built-container, which are organized by machine learning (ML) framework and framework version, provide HTTP prediction servers that you can use to serve predictions with minimal configuration

We have also provided helper scripts:

scripts/run_training_local.sh: test the training program locally
scripts/run_serving_local.sh: test the serving program locally
build_images_cb.sh: build the images using Cloud Build service

Environment variables for special Cloud Storage directories

Vertex AI sets the following environment variables when it runs your training code:

AIP_MODEL_DIR: a Cloud Storage URI of a directory intended for saving model artifacts.
AIP_CHECKPOINT_DIR: a Cloud Storage URI of a directory intended for saving checkpoints.
AIP_TENSORBOARD_LOG_DIR: a Cloud Storage URI of a directory intended for saving TensorBoard logs. See Using Vertex TensorBoard with custom training.

Build Components

The following template custom components are provided:

components/data_process: read BQ table, perform transformation in BQ and export to GCS
components/train_model: launch custom (distributed) training job on Vertex AI platform
components/check_model_metrics: check the metrics of a training job and verify whether it produces better model
components/create_endpoint: create an endpoint on Vertex AI platform
components/deploy_model: deployed a model artifact to a created endpoint on Vertex AI platform
components/test_endpoint: call the endpoint of deployed model for verification
components/monitor_model: track deployed model performance using Vertex Model Monitoring
components/batch_prediction: launch batch prediction job on Vertex AI platform

We have also provided a helper script: build_components_cb.sh

Build and Run Pipeline

The sample definition of pipelines are

pipelines/training_pipeline.py
pipelines/batch_prediction_pipeline.py

After compiled the training or batch-prediction pipeline, you may trigger the pipeline run using the provided runner

pipelines/trainin_pipeline_runner.py
pipelines/batch_prediction_pipeline_runner.py

An example to run training pipeline using the runner

python training_pipeline_runner \
  --project_id "$PROJECT_ID" \
  --pipeline_region $PIPELINE_REGION \
  --pipeline_root $PIPELINE_ROOT \
  --pipeline_job_spec_path $PIPELINE_SPEC_PATH \
  --data_pipeline_root $DATA_PIPELINE_ROOT \
  --input_dataset_uri "$DATA_URI" \
  --training_data_schema ${DATA_SCHEMA} \
  --data_region $DATA_REGION \
  --gcs_data_output_folder $GCS_OUTPUT_PATH \
  --training_container_image_uri "$TRAIN_IMAGE_URI" \
  --train_additional_args $TRAIN_ARGS \
  --serving_container_image_uri "$SERVING_IMAGE_URI" \
  --custom_job_service_account $CUSTOM_JOB_SA \
  --hptune_region $PIPELINE_REGION \
  --hp_config_max_trials 30 \
  --hp_config_suggestions_per_request 5 \
  --vpc_network "$VPC_NETWORK" \
  --metrics_name $METRIC_NAME \
  --metrics_threshold $METRIC_THRESHOLD \
  --endpoint_machine_type n1-standard-4 \
  --endpoint_min_replica_count 1 \
  --endpoint_max_replica_count 2 \
  --endpoint_test_instances ${TEST_INSTANCE} \
  --monitoring_user_emails $MONITORING_EMAIL \
  --monitoring_log_sample_rate 0.8 \
  --monitor_interval 3600 \
  --monitoring_default_threshold 0.3 \
  --monitoring_custom_skew_thresholds $MONITORING_CONFIG \
  --monitoring_custom_drift_thresholds $MONITORING_CONFIG \
  --enable_model_monitoring True \
  --pipeline_schedule "0 2 * * *" \
  --pipeline_schedule_timezone "US/Pacific" \
  --enable_pipeline_caching

We have also provided helper scripts:

scripts/build_pipeline_spec.sh: compile and build the pipeline specs locally
scripts/run_training_pipeline.sh: create and run training Vertex AI Pipeline based on the specs
scripts/run_batch_prediction_pipeline.sh: create and run batch-prediction Vertex AI Pipeline based on the specs
build_pipeline_spec_cb.sh: compile and build the pipeline specs using Cloud Build service

Some common parameters

Field	Explanation
project_id	Your GCP project
pipeline_region	The region to run Vertex AI Pipeline
pipeline_root	The GCS buckets used for storing artifacts of your pipeline runs
data_pipeline_root	The GCS staging location for custom job
input_dataset_uri	Full URI of input dataset
data_region	Region of input dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vertex_pipeline

vertex_pipeline

README.md

Vertex AI Pipeline

Dataset

Machine Learning Problem

Repository Structure

Get Started

Create Artifact Registry

Develop Training and Serving Logic

Environment variables for special Cloud Storage directories

Build Components

Build and Run Pipeline

Some common parameters

Contributors

Files

vertex_pipeline

Directory actions

More options

Directory actions

More options

Latest commit

History

vertex_pipeline

Folders and files

parent directory

README.md

Vertex AI Pipeline

Dataset

Machine Learning Problem

Repository Structure

Get Started

Create Artifact Registry

Develop Training and Serving Logic

Environment variables for special Cloud Storage directories

Build Components

Build and Run Pipeline

Some common parameters

Contributors