Skip to content

Guide for setting up an AiiDA computational environment on Azure

License

Notifications You must be signed in to change notification settings

ltalirz/aiida-on-azure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

AiiDA on Azure

This is a step-by-step guide for setting up a computational environment using the AiiDA workflow manager on the Microsoft Azure cloud.

The following guide will be made with the following architecture in mind:

  • A always on VM which will host the AiiDA installation and other services needed. This will be called the aiida-machine.
  • A CycleCloud cluster where the calculations will be performed. This will be called the compute-cluster.

Disclaimer: this guide is under active development and offered without warranty of any kind. Use at your own risk.

Features

  • Azure CycleCloud with autoscaling cluster for compute
  • NFS for sharing file sharing
  • Azure blobstorage for backup
  • AiiDA on standalone virtual machine, connected to the CycleCloud

Steps

Broadly speaking there are four steps that need to be performed:

  1. Configuration of Azure account/services.
  2. Deployment/configuration of the aiida-machine with CycleCloud.
  3. Installation of AiiDA.
  4. Configuration/deployment of the compute-cluster.

This guide assumes that the user has an Azure account, with the appropriate permissions to create resource groups, and a service principal. The guide shows how many of the operations are performed via the Azure CLI, but the same operations can also be performed via the Azure portal or the Azure PowerShell.

1. Configuration of Azure account/storage services

To try to keep everything organized it is best to create a resource group where all the following services/resources will be placed:

az group create --name <resource_group_name> --location <location_name>

Once the resource group is created, you will want to create a storage account. This gives you a centralized place for storing data, especially by making use of the blob storage:

az storage account create --name <storage_account_name> --resource-group <resource_group_name> --location <location_name> --sku Standard_ZRS  --encryption-services blob

For simplicity, in this guide the location of all resources is the same. This is not necessary, however.

Important: Not all resources are available in all regions. This is particularly relevantt for the compute-cluster: check which VM sizes you want to use, and choose the region accordingly.

With the storage account created, create a container where the data will actually be stored. The container will eventually be connected to the VM using the blobfuse application.

Important: When mounting blobfuse you can choose between giving permission to the files only to the user mounting the blobfuse or to all the users (i.e all users would have access to all folders/files present in this container). If you want to keep data strictly separated for each user, it is advisable to create a container for each user.

Below, the first command gives the user the capability of generating the container and the second creates the container itself.

az ad signed-in-user show --query objectId -o tsv | az role assignment create \
    --role "Storage Blob Data Contributor" \
    --assignee @- \
    --scope "/subscriptions/<subscription>/resourceGroups/<resource_group_name>/providers/Microsoft.Storage/storageAccounts/<storage_account_name>"

az storage container create \
    --account-name <storage_account_name> \
    --name <container_name> \
    --auth-mode login

2. Deployment/configuration of the aiida-machine

This machine will host the CycleCloud server which will be used to spawn the compute-cluster. This can be done in several ways, either using the Azure marketplace, an ARM template, using a container or via manual installation. This guide follows the manual installation route.

Start by deploying a VM in your resource group. The configuration of the machine will depend on the number of users and applications that will be deployed. Consider that this machine will be always-on, so it is important to strike the right balance of performance and cost. The VM should have a Linux OS - here we choose Ubuntu, though the procedure is similar for RHEL.

One can create the VM using an SSH key, in this way one can connect using a RSA key. One can improve the security by requiring that all users connect only using RSA keys. It is also recommended to setup fail2ban, to try to protect against brute force attacks as much as possible.

The public IP address of the machine can be static and a DNS label can be given. Though not strictly necessary (in contrast to the head node of the compute-cluster) it might be useful to ease the connection to the VM.

Once the machine is created, and the connectivity is solved (SSH keys, fail2ban, DNS label, etc.), update the VM sudo apt-get update && sudo apt-get upgrade to ensure that the latest security patches and updates are applied.

Install CycleCloud

The next step is to install CycleCloud, for which one first needs to install a couple of dependencies:

sudo apt update && sudo apt -y install wget gnupg2

These will allow one to download the necessary trusted keys from MS to install CycleCloud using apt-get

wget -qO - https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
sudo echo 'deb https://packages.microsoft.com/repos/cyclecloud bionic main' > /etc/apt/sources.list.d/cyclecloud.list
sudo apt update
sudo apt -y install cyclecloud8

Once that is done, all the pieces that comprise CycleCloud (sans the CLI) will be installed in the VM, however, they need to be configured so that they work properly. This must be done via a web browser. If the CycleCloud server can be accessed from the public IP, one can use the browser from the local machine, otherwise one needs to install a browser in the VM, in the following firefox will be used:

sudo apt-get install firefox

after that one can access the browser in the local via

firefox -url=http://cycle_coud_domain_name:8080 -no-remote

after that one must configure the server. This is done by setting an administrator user account, with a password and ssh-key.

Inside the CycleCloud web application one can configure which users have access and level of access to the CycleCloud clusters. It is recommended that one sets up the ssh keys of each user here, since of that way once the cluster is deployed the users will be able to login to the head node using ssh.

One should also configure the CycleCloud application to use a service principal, this will allow the CycleCloud application to create nodes inside the resource group (or other resource groups).

Lastly one can install the CycleCloud CLI, this can be done either via the web application, or by downloading the installer from the terminal

wget https://<cycle_coud_domain_name>/static/tools/cyclecloud-cli.zip

with the installer one can then extract the installer to a temporary folder and run the provided installation script

cd /tmp
unzip /opt/cycle_server/tools/cyclecloud-cli.zip
cd /tmp/cyclecloud-cli-installer
./install.sh

After this CycleCloud should be configured so that one can start deploying clusters.

Setup blobfuse

Setting up blobfuse can be useful as it will help to easily transfer data between the aiida-machine and the compute-cluster. Depending on how one setups the blob storage it can also be one of the most cost-effective long-term storage solutions for data generated from simulations.

wget https://packages.microsoft.com/config/ubuntu/<ubuntu version 16.04 or 18.04 or 20.04>/packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo apt-get update
sudo apt-get install blobfuse fuse

Important As of the 17th of July 2022 a deb package is not provided for Ubuntu 22.04, however, one can build blobfuse from source, and it will work. However, use at your own risk as it is not officially supported.

Once blobfuse is installed one must mount a container. Bear in mind, that blobfuse is a process, that can go down for many reasons, such as connectivity issues, lack of resources, etc. To mount the container one must first create a folder where the cache will be stored, it is recommended to always use the fastest drive possible to obtain better performance

sudo mkdir /mnt/blobfusetmp
sudo chown <youruser> /mnt/blobfusetmp

After this one can mount the container via

blobfuse path_where_to_mount --tmp-path=/mnt/blobfusetmp --use-attr-cache=true -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --config-file=path_to_config/connection.cfg

where the connection.cfg is a file with the configuration for the connection, i.e. this is where the Azure secrets, container name, etc. are located keep this file secret.

accountName <account-name-here>
# Please provide either an account key or a SAS token, and delete the other line.
accountKey <account-key-here-delete-next-line>
#change authType to specify only 1
sasToken <shared-access-token-here-delete-previous-line>
authType <MSI/SAS/SPN/Key/empty>
containerName <insert-container-name-here>
#if you are using a proxy server and https, https is the default protocol set the caCertFile below
caCertFile <insert the certfile name with full path>
#if you are using a proxy server and https protocol which is the default protocol set the httpsProxy below 
httpsProxy <insert the https proxy server Eg: http://10.1.0.23:8080>
#if you are using proxy server and have turned https off using --use-https=false, set the httpProxy below
httpProxy <insert the http proxy server if any if you have turned https off using --use-https=true>

If blobfuse were to fail, or if one wants to stop the process, one can use fusermount -u path_where_to_mount to unmount the disk.

3. Installation of AiiDA

Installing AiiDA in an Azure VM is performed in the same way as one would in a local machine. One should install it using a virtual environment to ensure that there are no conflicts with dependencies. It can be either via virtualenv or conda.

First one should install the prerequisites

sudo apt install git python3-dev python3-pip postgresql postgresql-server-dev-all postgresql-client rabbitmq-server

Important: CycleCloud will use port 5672 the same default port from RabbitMQ, thus when RabbitMQ is installed it will instead probably use port 5673. To check which exact port is being used one can run the command

sudo lsof -i -P -n | grep 'rabbitmq'

Once the pre-requisites have been installed one can install aiida-core using pip in a virtual environment via

python -m venv ~/envs/aiida
source ~/envs/aiida/bin/activate
pip install aiida-core

Lastly, one needs to configure an aiida profile. To ensure that all the configuration is performed properly (mostly due to the possible issues with RabbitMQ) verdi setup is preferred over verdi quicksetup.

When setting up the RabbitMQ configuration via verdi setup one must ensure that the proper port (not the default one) is passed to the configuration.

With blobfuse being ready and aiida being installed one can setup the backup so that the data is stored in the blob. Of this way one would make sure that in the case of the failure of the VM the data is stored in as redundant manner as possible.

4. Configuration/deployment of the compute-cluster

When creating a compute-cluster in CycleCloud, the first thing that one needs to select is which kind of scheduler one wishes to use. CycleCloud supports a variety of schedulers such as PBSPro, GridEngine and SLURM. Any of these defaults clusters provided with CycleCloud can be used to provision a cluster that can be used in combination with AiiDA with just small modifications. The possible options and defaults for a given cluster can be defined via a template file. The best way to generate a customized cluster is to take one of the templates found in the Azure github, modify it and upload it to the CycleCloud server so that it can be easily deployed.

For using the compute-cluster with AiiDA, one can provision one of the default clusters from a template and afterwards set a static public IP and a DNS label. Without the last step, the IP address of the head node will vary, and your users would have to define a different computer every time the IP changes.

The pre-defined clusters use one of the Azure HPC images, which come with several compilers and libraries pre-installed which would allow the user to install most of the simulation software required. They also come with the modules package to handle the environment variables that different software packages might require.

The pre-defined clusters allow the users the capability of adding NFS disks to the cluster. This is the place where the users' ${HOME} folders will be mounted. It is important to notice that this disk will be persistent both for the head node and the compute nodes, hence it is also a good place to store the simulation code that will be used. If one has found a particular NFS configuration that will be used for any cluster one can define it in the cluster template file. For example the size of the default shared filesystem can be defined as

    [[[volume shared]]]
    Size = 1024
    SSD = True
    Mount = builtinshared
    Persistent = ${NFSType == "Builtin"}

A similar modification can be performed to ensure that the compute-cluster has a pre-defined static IP address and DNS label. First one needs to create a static public IP address using the Azure CLI

az network public-ip create \
    --resource-group <resource_group_name> \
    --name <public_static_ip_name> \
    --version IPv4 \
    --sku Standard \
    --zone 1 2 3

Once the IP address has been created one can add it to the configuration of the cluster, of this way the cluster is ready to be used in conjunction with AiiDA

[[node scheduler]]
[[[network-interface eth0]]]
    PublicIp = /subscriptions/${subscription_id/resourceGroups/${resource_group_name}/providers/Microsoft.Network/publicIPAddresses/${public_static_ip_name}
    AssociatePublicIpAddress = true
    PublicDnsLabel = myuniquename

One can also define different types of calculation nodes in order to handle jobs with different resource requirements. For this, define different nodearrays, with different configurations

    [[node scheduler]]
    MachineType = $SchedulerMachineType
    ImageName = $SchedulerImageName
    IsReturnProxy = $ReturnProxy
    AdditionalClusterInitSpecs = $SchedulerClusterInitSpecs

    [[nodearray hpc]]
    MachineType = $HPCMachineType
    ImageName = $HPCImageName
    MaxCoreCount = $MaxHPCCoreCount

    Azure.MaxScalesetSize = $MaxScalesetSize
    AdditionalClusterInitSpecs = $HPCClusterInitSpecs

Where one can then define the default values of the different types of nodes

    [[parameters Virtual Machines ]]
    Description = "The cluster, in this case, has two roles: the scheduler with shared filer and the execute hosts. Configure which VM types to use based on the requirements of your application."
    Order = 20

        [[[parameter Region]]]
        Label = Region
        Description = Deployment Location
        ParameterType = Cloud.Region

        [[[parameter SchedulerMachineType]]]
        Label = Scheduler VM Type
        Description = The VM type for scheduler and shared filer.
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_B4ms

        [[[parameter HPCMachineType]]]
        Label = HPC VM Type
        Description = The VM type for HPC nodes
        ParameterType = Cloud.MachineType
        DefaultValue = Standard_H16r
        Config.Multiselect = true

You can add as many different types of nodes as desired, and these can then be accessed by the different schedulers when submitting a calculation.

After one has modified the template file one can upload it to the CycleCloud application via the CycleCloud CLI. Important: give your template a unique name in order to avoid conflicts with previous templates of the same name.

cyclecloud import_template -c path_to_template/template.txt

After this the cluster can be provisioned via the CycleCloud application.

Contribute

Suggestions for improvements and bug fixes are highly welcome. Just open an issue or a pull request.

About

Guide for setting up an AiiDA computational environment on Azure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published