Merge pull request #123 from AllenNeuralDynamics/main

Merges main back into dev
AllenNeuralDynamics · Jun 28, 2024 · 2f3cfdd · 2f3cfdd
2 parents 2d50d5f + 338fe5d
commit 2f3cfdd
Show file tree

Hide file tree

Showing 22 changed files with 618 additions and 193 deletions.
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,14 @@
+version: 2
+
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.9"
+
+python:
+  install:
+    - method: pip
+      path: .
+      extra_requirements:
+        - dev
+        - docs
diff --git a/README.md b/README.md
@@ -6,194 +6,4 @@
 
 This service can be used to upload data stored in a VAST drive. It uses FastAPI to upload a job submission csv file that will be used to trigger a data transfer job in an on-prem HPC. Based on the information provided in the file, the data upload process fetches the appropriate metadata and starts the upload process.
 
-## Metadata Sources
-
-The associated metadata files get pulled from different sources. 
-
-- subject from LabTracks
-- procedures from NSB Sharepoint, TARS
-- instrument/rig from SLIMS
-
-
-## Usage
-
-There are two options for uploading data: a python API or a browser UI service.
-
-### Browser UI
-You can go to http://aind-data-transfer-service to submit a `.csv` or `.xlsx` file with the necessary parameters needed to launch a data upload job. Click on **Job Submit Template** to download a template which you may use as a reference. 
-
-What each column means in the job submission template:
-
-- **project_name**: Project name. A full list can be downloaded at [Project Names](http://aind-metadata-service/project_names)
-- **process_capsule_id**: Optional Code Ocean capsule or pipeline to run when data is uploaded
-- **input_data_mount**: Optional data mount when running a custom pipeline
-- **platform**: For a list of platforms click [here](https://github.com/AllenNeuralDynamics/aind-data-schema/blob/main/src/aind_data_schema/models/platforms.py).
-- **acq_datetime**: The time that the data was acquired
-- **subject_id**: The unique id of the subject
-- **modality0**: For a list of modalities, click [here](https://github.com/AllenNeuralDynamics/aind-data-schema/blob/main/src/aind_data_schema/models/modalities.py). 
-- **modality0.source**: The source (path to file) of **modality0** in VAST drive
-- **metadata_dir**: An optional folder for pre-compiled metadata json files
-
-Modify the job template as needed and click on **Browse** to upload the file. A rendered table with a message **Successfully validated jobs from file**  appears to indicate a valid file. If there are errors in the job submit file, a message that says **Error validating jobs from file** appears. 
-
-To launch a data upload job, click on `Submit`. A message that says **Successfuly submitted jobs** should appear. 
-
-After submission, click on `Job Status` to see the status of the data upload job process.  
-
-### Python API
-It's also possible to submit a job via a python api. Here is an example script that can be used.
-
-Assuming that the data on a shared drive is organized as:
-```
-/shared_drive/vr_foraging/690165/20240219T112517
-  - Behavior
-  - Behavior videos
-  - Configs
-```
-then a job request can be submitted as:
-```python
-from aind_data_transfer_service.configs.job_configs import ModalityConfigs, BasicUploadJobConfigs
-from pathlib import PurePosixPath
-import json
-import requests
-
-from aind_data_transfer_models.core import ModalityConfigs, BasicUploadJobConfigs, SubmitJobRequest
-from aind_data_schema_models.modalities import Modality
-from aind_data_schema_models.platforms import Platform
-from datetime import datetime
-
-source_dir = PurePosixPath("/shared_drive/vr_foraging/690165/20240219T112517")
-
-s3_bucket = "private"
-subject_id = "690165"
-acq_datetime = datetime(2024, 2, 19, 11, 25, 17)
-platform = Platform.BEHAVIOR
-
-
-behavior_config = ModalityConfigs(modality=Modality.BEHAVIOR, source=(source_dir / "Behavior"))
-behavior_videos_config = ModalityConfigs(modality=Modality.BEHAVIOR_VIDEOS, source=(source_dir / "Behavior videos"))
-metadata_dir = source_dir / "Config"  # This is an optional folder of pre-compiled metadata json files
-project_name="Ephys Platform"
-
-upload_job_configs = BasicUploadJobConfigs(
-  project_name=project_name,
-  s3_bucket = s3_bucket,
-  platform = platform,
-  subject_id = subject_id,
-  acq_datetime=acq_datetime,
-  modalities = [behavior_config, behavior_videos_config],
-  metadata_dir = metadata_dir
-)
-
-# Add more to the list if needed
-upload_jobs=[upload_job_configs]
-
-# Optional email address and notification types if desired
-user_email = "my_email_address"
-email_notification_types = ["fail"]
-submit_request = SubmitJobRequest(
-    upload_jobs=upload_jobs,
-    user_email=user_email,
-    email_notification_types=email_notification_types,
-)
-
-post_request_content = json.loads(submit_request.model_dump_json(round_trip=True))
-submit_job_response = requests.post(url="http://aind-data-transfer-service/api/v1/submit_jobs", json=post_request_content)
-print(submit_job_response.status_code)
-print(submit_job_response.json())
-```
-
-## Installation
-To use the software, in the root directory, run
-```bash
-pip install -e .
-```
-
-To develop the code, run
-```bash
-pip install -e .[dev]
-```
-
-## Local Development
-Run uvicorn:
-```bash
-export AIND_METADATA_SERVICE_PROJECT_NAMES_URL='http://aind-metadata-service-dev/project_names'
-export AIND_AIRFLOW_SERVICE_URL='http://localhost:8080/api/v1/dags/run_list_of_jobs/dagRuns'
-export AIND_AIRFLOW_SERVICE_JOBS_URL='http://localhost:8080/api/v1/dags/transform_and_upload/dagRuns'
-export AIND_AIRFLOW_SERVICE_PASSWORD='*****'
-export AIND_AIRFLOW_SERVICE_USER='user'
-uvicorn aind_data_transfer_service.server:app --host 0.0.0.0 --port 5000
-```
-You can now access `http://localhost:5000`.
-
-## Contributing
-
-### Linters and testing
-
-There are several libraries used to run linters, check documentation, and run tests.
-
-- Please test your changes using the **coverage** library, which will run the tests and log a coverage report:
-
-```bash
-coverage run -m unittest discover && coverage report
-```
-
-- Use **interrogate** to check that modules, methods, etc. have been documented thoroughly:
-
-```bash
-interrogate .
-```
-
-- Use **flake8** to check that code is up to standards (no unused imports, etc.):
-```bash
-flake8 .
-```
-
-- Use **black** to automatically format the code into PEP standards:
-```bash
-black .
-```
-
-- Use **isort** to automatically sort import statements:
-```bash
-isort .
-```
-
-### Pull requests
-
-For internal members, please create a branch. For external members, please fork the repository and open a pull request from the fork. We'll primarily use [Angular](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit) style for commit messages. Roughly, they should follow the pattern:
-```text
-<type>(<scope>): <short summary>
-```
-
-where scope (optional) describes the packages affected by the code changes and type (mandatory) is one of:
-
-- **build**: Changes that affect build tools or external dependencies (example scopes: pyproject.toml, setup.py)
-- **ci**: Changes to our CI configuration files and scripts (examples: .github/workflows/ci.yml)
-- **docs**: Documentation only changes
-- **feat**: A new feature
-- **fix**: A bugfix
-- **perf**: A code change that improves performance
-- **refactor**: A code change that neither fixes a bug nor adds a feature
-- **test**: Adding missing tests or correcting existing tests
-
-### Semantic Release
-
-The table below, from [semantic release](https://github.com/semantic-release/semantic-release), shows which commit message gets you which release type when `semantic-release` runs (using the default configuration):
-
-| Commit message                                                                                                                                                                                   | Release type                                                                                                    |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------- |
-| `fix(pencil): stop graphite breaking when too much pressure applied`                                                                                                                             | ~~Patch~~ Fix Release, Default release                                                                          |
-| `feat(pencil): add 'graphiteWidth' option`                                                                                                                                                       | ~~Minor~~ Feature Release                                                                                       |
-| `perf(pencil): remove graphiteWidth option`<br><br>`BREAKING CHANGE: The graphiteWidth option has been removed.`<br>`The default graphite width of 10mm is always used for performance reasons.` | ~~Major~~ Breaking Release <br /> (Note that the `BREAKING CHANGE: ` token must be in the footer of the commit) |
-
-### Documentation
-To generate the rst files source files for documentation, run
-```bash
-sphinx-apidoc -o doc_template/source/ src
-```
-Then to create the documentation HTML files, run
-```bash
-sphinx-build -b html doc_template/source/ doc_template/build/html
-```
-More info on sphinx installation can be found [here](https://www.sphinx-doc.org/en/master/usage/installation.html).
+More information can be found at [http://aind-data-transfer-service.readthedocs.io](readthedocs).
diff --git a/doc_template/Makefile → docs/Makefile b/doc_template/Makefile → docs/Makefile
diff --git a/docs/diagrams/system_container.png b/docs/diagrams/system_container.png
diff --git a/docs/diagrams/system_container.puml b/docs/diagrams/system_container.puml
@@ -0,0 +1,26 @@
+@startuml
+!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml
+' uncomment the following line and comment the first to use locally
+' !include C4_Container.puml
+
+' LAYOUT_TOP_DOWN()
+' LAYOUT_AS_SKETCH()
+LAYOUT_WITH_LEGEND()
+
+title Container diagram for AIND Data Transfer Service
+
+Person(user, "User", "A scientist or engineer that wants to upload data to the cloud.")
+
+System_Boundary(c1, "AIND Data Transfer Service") {
+    Container(app, "API Application", "FastAPI, Docker Container", "Validates and submits request to aind-airflow-service. Runs in K8s cluster managed by Central IT.")
+}
+
+System_Ext(aind_airflow_service, "AIND Airflow Service", "Receives job requests, does additional validation checks, submits and monitors jobs.")
+System_Ext(slurm, "Slurm", "High performance computing cluster that runs data transformation and data upload jobs.")
+
+Rel(user, app, "Uses", "HTTP, REST")
+
+Rel_Back(user, aind_airflow_service, "Sends e-mails to", "SMTP")
+Rel(app, aind_airflow_service, "Uses", "REST API")
+Rel(aind_airflow_service, slurm, "Uses", "REST API")
+@enduml
diff --git a/docs/diagrams/system_context.png b/docs/diagrams/system_context.png
diff --git a/docs/diagrams/system_context.puml b/docs/diagrams/system_context.puml
@@ -0,0 +1,19 @@
+@startuml
+!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
+' uncomment the following line and comment the first to use locally
+' !include C4_Context.puml
+
+LAYOUT_WITH_LEGEND()
+
+title System Context diagram for AIND Data Transfer Service
+
+Person(user, "User", "A scientist or engineer that wants to upload data to the cloud.")
+System(transfer_service, "AIND Data Transfer Service", "Allows people to send job requests to compress (or transform) and upload raw data assets.")
+System_Ext(aind_airflow_service, "AIND Airflow Service", "Receives job requests, does additional validation checks, submits and monitors jobs.")
+System_Ext(slurm, "Slurm", "High performance computing cluster that runs data transformation and data upload jobs.")
+
+Rel(user, transfer_service, "Uses", "web portal or REST API")
+Rel_Back(user, aind_airflow_service, "Sends e-mails to", "SMTP")
+Rel(transfer_service, aind_airflow_service, "Uses", "REST API")
+Rel(aind_airflow_service, slurm, "Uses", "REST API")
+@enduml
diff --git a/docs/examples/example1.csv b/docs/examples/example1.csv
@@ -0,0 +1,4 @@
+project_name, process_capsule_id, modality0, modality0.source, modality1, modality1.source, s3-bucket, subject-id, platform, acq-datetime
+Ephys Platform, , ECEPHYS, dir/data_set_1, ,, some_bucket, 123454, ecephys, 2020-10-10 14:10:10
+Behavior Platform, 1f999652-00a0-4c4b-99b5-64c2985ad070, BEHAVIOR_VIDEOS, dir/data_set_2, MRI, dir/data_set_3, open, 123456, BEHAVIOR, 10/13/2020 1:10:10 PM
+Behavior Platform, , BEHAVIOR_VIDEOS, dir/data_set_2, BEHAVIOR_VIDEOS, dir/data_set_3, scratch, 123456, BEHAVIOR, 10/13/2020 1:10:10 PM
diff --git a/doc_template/make.bat → docs/make.bat b/doc_template/make.bat → docs/make.bat