Skip to content

Latest commit

 

History

History
180 lines (156 loc) · 7.29 KB

README.md

File metadata and controls

180 lines (156 loc) · 7.29 KB

Demonstrates how to build and install SIRF on an Azure VM using Packer and Terraform. The VM is described in Packer files. Packer is used to build an image of the VM. Terraform deploys VMs, and their associated infrastructure in the cloud and then performs further configuration. There is currently GPU support (NVIDIA).

An Azure account is required for deployment. Terraform Cloud is used to store the infrastructure state remotely. This is optional, but recommended.

This configuration will currently deploy the symposium2019 branch of the SyneRBI VM.

Prerequisites

Configure access to Azure

The following instructions make use of the Azure CLI. All of these steps can be carried out in the Azure web portal, but this is not covered here.

  • Login to your account:
az login
  • Query your Azure account to get a list of subscription and tenant ID values:
az account show --query "{subscriptionId:id, tenantId:tenantId}"
  • Note the subscriptionId and tenantId for future use.
  • Set the environment variable SUBSCRIPTION_ID to the subscription ID returned by the az account show command. In Bash, this would be:
export SUBSCRIPTION_ID=your_subscription_id
  • Create an Azure service principal to use:
az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/${SUBSCRIPTION_ID}"
  • Make a note of the appId and password

Configure Azure environment variables

export ARM_SUBSCRIPTION_ID=${SUBSCRIPTION_ID}
export ARM_TENANT_ID=<APP_ID>
export ARM_CLIENT_ID=<CLIENT_ID>
export ARM_CLIENT_SECRET=<password>
export ARM_ENVIRONMENT=public
export TF_VAR_shared_image_name=sirfImageDef
export TF_VAR_shared_image_rg_name=galleryRG
export TF_VAR_shared_image_gallery_name=imgGallery
export TF_VAR_vm_prefix=sirf
export TF_VAR_vm_username=sirfuser
export TF_VAR_vm_password="virtual%1"

Build image on Azure with Packer

  • Copy creds.json.skeleton to creds.json:
cp creds.json.skeleton creds.json
  • Create a resource group:
az group create -n sirf-rg -l uksouth
  • Create a storage group on Azure:
az storage account create -n sirfsa -g sirf-rg -l uksouth --sku Standard_LRS

These are the defaults for the resource group and storage account. If changes are made, they should also be reflected in creds.json.

  • Change into the packer directory: cd packer
  • Build the image:
packer build -var-file=../creds.json sirf-gpu.json

This will take some time (~30-40 minutes). If the image already exists and it is necessary to overwrite it, the -force flag must be used.

Create shared image gallery

If there is a desire/need to deploy across multiple geographical data centres, then a shared image gallery can be used to deploy image replicates globally. As the image is currently ~100 GB, it can take a significant amount of time to replicate to distant data centres e.g. UK to Australia, so this should be factored in to the set up time.

  • Create a resource group for the image gallery:
az group create --name galleryRG --location uksouth
  • Create an image gallery:
az sig create --resource-group galleryRG --gallery-name imgGallery
  • Create an image definition:
az sig image-definition create \
--resource-group galleryRG \
--gallery-name imgGallery \
--gallery-image-definition sirfImageDef \
--publisher INM \
--offer SIRF \
--sku 18.04-LTS \
--os-type Linux
  • Create an image version:
az sig image-version create \
--resource-group galleryRG \
--gallery-name imgGallery \
--gallery-image-definition sirfImageDef \ --gallery-image-version 0.0.1 \ 
--target-regions uksouth=2 eastus2 westeurope \
--managed-image "/subscriptions/${SUBSCRIPTION_ID}/resourceGroups/sirf-rg/providers/Microsoft.Compute/images/sirf-gpu-UbuntuServer-18.04-LTS"

This will clone two replicates into uksouth, and one into eastus2 and westeurope. As a rule of thumb, it is good practice to have one replicate per region per 20 VMs that will be deployed.

Configure Terraform remote backend

export ARM_ACCESS_KEY=<access_key>
  • Edit the main.tf and replace the value of storage_account_name with the output of configure_azure.sh previously:
terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "tfstateXXXXX"
    container_name       = "tfstate"
    key                  = "dev.terraform.tfstate"
  }
}

Configure Terraform environment variables

export TF_VAR_shared_image_name=sirfImageDef
export TF_VAR_shared_image_rg_name=galleryRG
export TF_VAR_shared_image_gallery_name=imgGallery
export TF_VAR_vm_prefix=sirf
export TF_VAR_vm_username=sirfuser
export TF_VAR_vm_password="virtual%1"

Running the Terraform script

  • Initialise Terraform:
terraform init
  • To preview the actions that Terraform will take, run:
terraform plan 
  • To run the script:
terraform apply 
  • If this succeeded, a single VM called sirf-gpu-0 will be running on Azure. To create more VMs, modify vm_total_no_machines in the the module(s) in main.tf. Then terraform plan and terraform apply again. This will add more machines, while leaving the existing ones intact.

  • To access the machine via ssh:

ssh USERNAME@PUBLICIP

where USERNAME is the value set for vm_username (default: sirfuser) and PUBLICIP is the public IP address. The password for access to the virtual machine is the value of vm_password (default: virtual%1).

Jupyter

Once built, a Jupyter notebook will be running. The URL can be accessed from a web browser:

https://<PUBLICIP>:<JUPPORT>

where PUBLICIP is the IP address found previously and JUPPORT is the Jupyter server port set by vm_jupyter_port (default: 9999). The password for access to the notebook is controlled by the variable vm_jupyter_pwd (default: virtual%1).

Remote desktop

A remote desktop to the VM is available. See the instructions on the wiki.

Removing the infrastructure

terraform destroy 

To avoid incurring unexpected costs, it is highly recommended that you check the Azure web portal to ensure that all resources have successfully been destroyed.

Troubleshooting

If you get an error related to SkuNotAvailable, try to display all available machine types and see if the chosen machine exists in the region:

az vm list-skus --output table