When registering datasets generated from tissue samples collected from human donors, providers such as HuBMAP Tissue Mapping Centers (TMCs) include clinical information about donors. Clinical information on a donor varies in detail and scope, ranging from a spreadsheet row with a few curated elements to scans of UNOS forms that accompany an organ donation.
In general, clinical information is both unstructured and considered Patient Health Information (PHI) per HIPAA. The University of Pittsburgh, acting as a HIPAA Honest Broker, curates clinical information, creating metadata data that are:
- de-identified
- encoded, or associated with codes from standard biomedical vocabularies such as SNOMEDCT and NCI.
Up to now, curation has been manual: the curator manually encodes information from source files into a data entry spreadsheet, following this process. The spreadsheet becomes the input source for a script that inserts donor clinical metadata into the associated dataset record in the provenance database.
Manual curation is both tedious and prone to error. Curation can be automated.
Clinical metadata of interest is one of three types:
- Numeric (e.g., lab values, measurements such as height)
- Categorical (e.g., Cause of Death, blood type)
- Free text
Each type of metadata can be associated with discrete codes. Codes that describe a particular type of data are collected into valuesets and maintained in an online Valuset Manager spreadsheet.
Categorical metadata may be organized in the Valueset Manager in one of two ways:
- in a dedicated tab in the spreadsheet (e.g., Race)
- as a set of rows in a tab (e.g., ABO blood types in the Blood Type tab)
We have few expectations regarding the form of clinical data from providers beyond the minimum set required to build a DOI for a dataset:
- race
- sex
- age
- whether from an organ donor or a living donor
Clinical data from a provider often contains novel information--e.g., previously undocumented medical history conditions or measurements.
In general, it is necessary to update the Valueset Manager spreadsheet for every set of donors.
The curation solution features:
- A user interface that allows
- data entry--e.g., selection of categorical values or entry of numeric or text values
- validation--e.g., data type and range
- multiple entries for patient medical history
- Encoding of each metadata element to appropriate valuesets
- Form content driven by the valueset spreadsheet to allow rapid changes in valuesets
- Ability to update a neo4j provenance database with structured clinical metadata for a donor
The curator is a Python Web application involving:
tool | purpose |
---|---|
Python | application function |
Flask | Python web framework |
Flask Blueprints | modular Flask applications |
WTForms | forms in Flask applications |
Jinja | Web page templating |
Javascript | Event handling and UI features (including a spinner control) |
Bootstrap | UI toolkit |
HuBMAP entity-api | Reads/updates donor metadata in HuBMAP provenance |
SenNet entity-api | Reads/updates source metadata in SenNet provenance |
The application uses app.cfg to obtain:
- consortium options (HuBMAP or SenNet)
- environment option (dev or production)
- entity-api environment (e.g., development or production)
- URI for the Valueset Manager Google sheet
- Globus client keys and secrets for HuBMAP and SenNet
The application expects to find the app-config file in a folder on the local machine named donor-metadata.
The application works with three databases:
- the Valueset Manager, a Google Sheets document
- the neo4j provenance databases for the two consortia, abstracted by consortium-specific instances of entity-api.
The application:
- uses the valuesetmanager helper class to read from the Valueset Manager spreadsheet
- manages the Flask session
- registers Flask Blueprints
- customizes HTTP error handling, routing to 401.html
- The Home HTML page index.html includes a form that allows the user to specify
- consortium (HuBMAP or SenNet)
- Donor ID
- The WTForm globusform.py populates index.html with information from the app.cfg file.
- The Blueprint route globus.py:
- authenticates the user in the appropriate Globus context
- works with the donor helper class to verify that the donor is in provenance
- redirects either to the Edit page or the custom 401 page
- The Edit page edit.html includes a form that allows metadata data entry.
- The WTForm editform.py:
- works with the valuesetmanager helper class to obtain valueset content
- populates the form in edit.html, including content of categorical lists
- The Blueprint route edit.py:
- works with the donor helper class to obtain current donor metadata from provenance
- populates the form in edit.html with current metadata values
- translates form data into a revised metadata JSON that conforms to the donor metadata schema in provenance.
- sets defaults for required metadata--e.g., if no race is specified, sets the race to Unknown
- converts linear and weight measurements to metric units
- compares current and revised metadata JSON for the donor
- posts JSONs for current metadata, revised metadata, and comparison to review.html
- The Review page review.html displays:
- the current metadata JSON for the donor
- the new metadata JSON for the donor
- the comparison of the current and new metadata JSONs
- The Blueprint route review.py:
- works with the donor helper class to update the donor metadata in provenance
- redirects to index.html
The 401.html page is a custom 401 error that explains potential causes and solutions for authentication errors.
The 404.html page is a custom 404 error. The 404 error in this case is "donor not found", not "file not found".
All HTML files in the application inherit from base.html, which includes:
- a navbar
- a message panel that displays Flask flash messages
- a spinner control to animate waiting in the search form
This file contains a custom Jinja script used to populate content from WTForms forms in a HTML page.
name | role | uses |
---|---|---|
appconfig | reads from the app.cfg | |
valuesetmanager | reads from the Valueset Manager spreadsheet | |
entity | reads from and writes to a provenance database | entity-api |
donor | represents donor metadata | entity |
DonorUI | encapsulates the Flask app | app |
- The application can only update metadata for an existing donor; it does not create donor entities in provenance.
- The application will not update metadata for a donor that is associated with published datasets.
- The application can document a maximum of 10 medical conditions. The application cannot update metadata for a donor if the current metadata includes more than 10 conditions.
- The application will only update metadata if there was a change.
- In SenNet, the application will only update donors that are human sources.
- The application attempts to standardize on units (e.g., in and cm for height). If an existing unit is unexpected, the application will require manual intervention.
The entity-api requires an authentication token, which is obtained from Globus. An authentication token for a consortium's entity-api is set via the consortium's Single Sign On.
The application caches the authentication token in a session cookie. Caching the token facilitates the curation of multiple donors in a session: it is not necessary to supply the token for each donor.
Although authenticaion tokens have a long expiration period (72 hours), the application clears the cached token from the session cookie after 5 minutes. It is also possible to clear the session cookie explicitly using a button in the form in the home page.
The application will raise a HTTP 401 exception when:
- The authentication token expires at the consortium level.
- The authentication token is for the incorrect consortium--e.g., if the user provides a HuBMAP token for an update to a donor in SenNet.
Donors already containing metadata may diverge from the current schema. This is especially the case for donors in HuBMAP who were registered prior to the implementation of valuesets.
Examples of divergence include:
- Some early donors have a race of "Hispanic". The current practice is to code "Hispanic" as an ethnicity.
- Grouping concepts for some categorical metadata may have changed.
Donors with metadata that does not comply with the current schema may cause issues in the Edit form. For example, if the value for a categorical metadata element is not in the current valueset associated with the metadata element, the Edit form will raise a validation error.
Units are currently not encoded in metadata, but stored as free text. This has resulted in variance in units--i.e., different spellings or case. Because the Edit form emulates encoding of metadata using a list (e.g., only "in" and "cm" for height), there will be validation errors for measurements with variant units (e.g., "inches").
The application can be Dockerized.
To run the application in a local container:
- Clone this repo.
- In the app/instance directory, copy app.cfg.example to a file named app.cfg in a directory on the local manchine named donor-metadata.
- In app.cfg, edit the value of the ENDPOINT_BASE key to point to the desired instance of entity-api.
- Specify values for
- GLOBUS_HUBMAP_CLIENT
- GLOBUS_HUBMAP_SECRET
- GLOBUS_SENNET_CLIENT
- GLOBUS_SENNET_SECRET
- Install Docker on the local machine.
- Execute build_local.sh to create a Docker image named hmsn/donor-data-local.
- Execute run_local.sh to create a Docker container named donor-data.
- Execute compose-run.sh.
- Execute run_hub.sh. This script uses the latest release of the Docker image at hubmap/donor-metadata.
The containerized application is mapped to the URL http://127.0.0.1:5000 on the local machine. The run_local.sh script opens the default browser to the URL.