Skip to content

Commit

Permalink
Merge pull request #118 from materials-data-facility/deployment_docs
Browse files Browse the repository at this point in the history
Add instructions on deploying MDF
  • Loading branch information
blaiszik authored Jun 12, 2024
2 parents cda8b33 + 2066e0f commit 273ba38
Showing 1 changed file with 85 additions and 0 deletions.
85 changes: 85 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,91 @@
# MDF Connect
The Materials Data Facility Connect service is the ETL flow to deeply index datasets into MDF Search. It is not intended to be run by end-users. To submit data to the MDF, visit the [Materials Data Facility](https://materialsdatafacility.org).

# Architecture
The MDF Connect service is a serverless REST service that is deployed on AWS.
It consists of an AWS API Gateway that uses a lambda function to authenticate
requests against GlobusAuth. If authorised, the endpoints trigger AWS lambda
functions. Each endpoint is implemented as a lambda function contained in a
python file in the [aws/](aws/) directory. The lambda functions are deployed
via GitHub actions as described in a later section.

The API Endpoints are:
* [POST /submit](aws/submit.py): Submits a dataset to the MDF Connect service. This triggers a Globus Automate flow
* [GET /status](aws/status.py): Returns the status of a dataset submission
* [POST /submissions](aws/submissions.py): Forms a query and returns a list of submissions

# Globus Automate Flow
The Globus Automate flow is a series of steps that are triggered by the POST
/submit endpoint. The flow is defined using a python dsl that can be found
in [automate/minimus_mdf_flow.py](automate/minimus_mdf_flow.py). At a high
level the flow:
1. Notifies the admin that a dataset has been submitted
2. Checks to see if the data files have been updated or if this is a metadata only submission
3. If there is a dataset, it starts a globus transfer
4. Once the transfer is complete it may trigger a curation step if the organization is configured to do so
5. A DOI is minted if the organization is configured to do so
6. The dataset is indexed in MDF Search
7. The user is notified of the completion of the submission


# Development Workflow
Changes should be made in a feature branch based off of the dev branch. Create
PR and get a friend to review your changes. Once the PR is approved, merge it
into the dev branch. The dev branch is automatically deployed to the dev
environment. Once the changes have been tested in the dev environment, create a
PR from dev to main. Once the PR is approved, merge it into main. The main
branch is automatically deployed to the prod environment.

# Deployment
The MDF Connect service is deployed on AWS into development and production
environments. The automate flow is deployed into the Globus Automate service via
a second GitHub action.

## Deploy the Automate Flow
Changes to the automate flow are deployed via a GitHub action, triggered by the
push of a new GitHub release. If the release is tagged as "pre-release" it will
be deployed to the dev environment, otherwise it will be deployed to the prod
environment.

The flow IDs for dev and prod are stored in
[automate/mdf_dev_flow_info.json](automate/mdf_dev_flow_info.json) and
[automate/mdf_prod_flow_info.json](automate/mdf_prod_flow_info.json)
respectively. The flow ID is stored in the `flow_id` key.

### Deploy a Dev Release of the Flow
1. Merge your changes into the `dev` branch
2. On the GitHub website, click on the _Release_ link on the repo home page.
3. Click on the _Draft a new release_ button
4. Fill in the tag version as `X.Y.Z-alpha.1` where X.Y.Z is the version number. You can use subsequent alpha tags if you need to make further changes.
5. Fill in the release title and description
6. Select `dev` as the Target branch
7. Check the _Set as a pre-release_ checkbox
8. Click the _Publish release_ button

### Deploy a Prod Release of the Flow
1. Merge your changes into the `main` branch
2. On the GitHub website, click on the _Release_ link on the repo home page.
3. Click on the _Draft a new release_ button
4. Fill in the tag version as `X.Y.Z` where X.Y.Z is the version number.
5. Fill in the release title and description
6. Select `main` as the Target branch
7. Check the _Set as the latest release_ checkbox
8. Click the _Publish release_ button

You can verify deployment of the flows in the
[Globus Automate Console](https://app.globus.org/flows/library).

## Deploy the MDF Connect Service
The MDF Connect service is deployed via a GitHub action. The action is triggered
by a push to the dev or main branch. The action will deploy the service to the
dev or prod environment respectively.

## Updating Schemas
Schemas and the MDF organization database are managed in the automate branch
of the [Data Schemas Repo](https://github.com/materials-data-facility/data-schemas/tree/automate).

The schema is deployed into the docker images used to serve up the lambda
functions.

# Running Tests
To run the tests first make sure that you are running python 3.7.10. Then install the dependencies:
Expand Down

0 comments on commit 273ba38

Please sign in to comment.