In this section, we walk through the following instructions:
- Create a deployment environment.
- Install prerequisite software packages.
- Create an HealthImaging data store.
- Configure the
metadata-index
solution. - Deploy the
metadata-index
solution. - (Optional) Launch an Amazon EC2 Windows Server instance and install MySQL Workbench.
We will use the EC2 Windows Server instance and MySQL Workbench to query the Aurora MySQL metadata store.
Please set up a local deployment environment that has installed Docker and Node.js on a supported Linux platform. Please make sure that your local environment has sufficient compute power (at least 2 vCPU’s), memory (at least 8 GiB), and storage (at least 10 GB).
Alternatively, you can deploy a m5.large EC2 Amazon Linux instance with 10 GB of gp3 storage as a remote deployment environment. For instructions, please refer to the Get started with EC2 documentation, and install Docker and Node.js on the EC2 instance.
On your deployment environment, please install AWS Command Line Interface (AWS CLI) v2, AWS Cloud Development Kit (AWS CDK), and AWS credentials. Please make sure that your AWS credentials are associated with an IAM role or user with sufficient IAM permissions for CDK deployment.
You will need the following software packages installed locally to deploy this solution in your AWS account.
Python3/pip:
The deployment automation code is written in Python.
CDK:
Please refer to CDK documentation to install the framework and bootstrap your AWS environment.
Docker:
When running the CDK deploy command, the container images will automatically be built (via docker build command), and sent to AWS ECR registry. Docker must be present on the machine where the cdk deployment is executed. Docker desktop is sufficient. Refer to Docker Desktop documentation to install it on your machine.
Compatible region:
As of 09/01/2024 this project is compatible in the following regions: US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), and Europe (Ireland).
If you do not have an existing HealthImaging data store, you can create one by following the instructions for creating a data store in the Introduction to AWS HealthImaging workshop.
In this section, we walk through the following instructions:
- Download
metadata-index
solution directory. - Edit
config.py
file:[project root]/backend/config.py
. - Edit
cdk.context.json
file:[project root]/backend/cdk.context.json
.
- From your deployment environment, clone the
aws-healthimaging-samples
repository:
git clone https://github.com/aws-samples/aws-healthimaging-samples.git
- Change your working directory to the
metadata-index
solution directory:
cd aws-healthimaging-samples/metadata-index/backend
The config.py
file is located at: [project root]/backend/config.py
.
At a minimum, you must perform the following changes:
- Copy the Amazon Resource Name (ARN) of your HealthImaging data store.
- Set the
AHI_DATASTORE_ARN
parameter to the ARN of your HealthImaging data store.
In addition, you can change the following parameters:
section | Parameter | Default value | Description |
---|---|---|---|
ROOT | CDK_APP_NAME | metadata-index | Name of the solution. This name will be use to tag all the resources created by the solution. If you intend to deploy multiple instance of this solution on the same AWS account make sure to change this name for each deployment. |
ROOT | AHI_DATASTORE_ARN | "" | The ARN of the AHI datastore for which the metadata should be indexed. The solution will set appropriate privileges for the Lambda parsers to request the metdata. |
RDBMS_CONFIG | enabled | False | Enables the RDBMS index mode. Enabling this will deploy an Aurora serverless MYSQL database and the RDBMS Lambda parser. |
RDBMS_CONFIG | populate_instance_level | False | Specifies if the index should populate the instance level of the DICOM data. if set to `False` only the issuer, patient, study and series tables will be populated. |
RDBMS_CONFIG | populate_frame_level | False | Specifies if the index should populate the frame level of the DICOM data. if set to "True" the instance level will also be populated regardless of the `populate_instance_level` setting. |
RDBMS_CONFIG | db_name | ahiindex | Name of the database. |
RDBMS_CONFIG | db_engine_pause | 20 | Number of minutes before the Aurora MYSQL goes on sleep if idling. ( no SQL operations done) |
RDBMS_CONFIG | min_acu_capacity | 1 | Minimum resource allocation for the Aurora MYSQL engine. |
RDBMS_CONFIG | max_acu_capacity | 16 | Maximum resource allocation for the Aurora MYSQL engine. |
DATALAKE_CONFIG | enabled | True | Enables the Datalake index mode. Enabling this will deploy an S3 bucket if no exisiting bucket is specified, and the Datalake Lambda parser. |
DATALAKE_CONFIG | populate_instance_level | True | Specifies if the index should populate the instance level of the DICOM data. if set to `False` only the issuer, patient, study and series tables will be populated. |
DATALAKE_CONFIG | destination_bucket_name | "" | The name of the bucket to be use as the datalake repository. If left empty the solution will create a new bucket for this purpose. |
DATALAKE_CONFIG | deploy_glue_default_config | True | Set to True to deploy a default database, tables and crawler in Glue. This allows for using Athena and QuickSight out of the box. table schemas can be modified in the file datalake_tables_config. Set it to False if you plan on using your own schemas. |
OPENSEARCH_CONFIG | enabled | False | Enable the OpenSearch index mode. THIS MODE IS NOT IMPLEMENTTED YET. |
The cdk.context.json
is located at: [project root]/backend/cdk.context.json
.
At a minimum, you must perform the following changes:
- Set the
ACCOUNT_NUMBER
to the AWS Account ID of your deployment account. - Set the
REGION
to the AWS Region of your deployment region. Default value isus-east-1
.
In addition, you can change the following parameters:
section | Parameter | Default value | Description |
---|---|---|---|
availability-zones | ACCOUNT_NUMBER | "" | AWS Account ID of the deployment account. |
availability-zones | REGION | us-east-1 | AWS Region for the deployment region. If you change the default region, please change the list of availability zones to match the changed region. |
Perform the following instructions to deploy the metadata-index
solution.
The cdk deploy
command in the last step takes about 25 to 30 minutes to complete.
Before proceeding with next steps, confirm that the solution deployment completes successfully.
- From
[project root]/
, create a Python virtual environment on MacOS and Linux:
python3 -m venv .venv
- After the init process completes and the Python virtual environment has been created, you can use the following command to activate your Python virtual environment.
source .venv/bin/activate
- If you are on a Windows platform, you should use the following command to activate your Python virtual environment:
.venv\Scripts\activate.bat
- Once the Python virtual environment has been activated, navigate to the
[project root]/backend/
folder.
cd backend
- Install the required dependencies.
pip install -r requirements.txt
- If it is the first time that you are using CDK to deploy in this account and region, bootstrap for CDK deployment:
cdk bootstrap
- Use CDK to synthetize and deploy the CloudFormation template for this code.
cdk deploy
In order to facilitate querying and testing on the Aurora MySQL metadata store, you can optionally deploy a m5.large EC2 Windows Server instance (Microsoft Windows Server 2022 Base) with 30 GB of gp3 storage into metadata-index-Lambdas-SG
security group, and install MySQL Workbench (version 8.0.41) on the EC2 Windows Server instance.
For instructions, please refer to the Get started with EC2 and Installing MySQL Workbench on Windows documentation.