Note: this workflow only works for Unix systems (MacOS and Linux). Windows is not supported.
The scripts in the wa-doh-linelists
directory of the backoffice repo will create linelists of ID3C and REDCap data for submission to the Washington Department of Health.
See the specification for the linelist data format.
The generate script is the high-level script that exports ID3C data, merges it with REDCap report data, and performs transformations all in one.
It requires test result DATE
as an argument in YYYY-MM-DD
format.
It also requires the REDCap API tokens environment variables and a connection to an ID3C database (see below).
There are multiple ways to configure your environment, but here, we'll walk through one, specific way that the dev team uses. Once you've configured your environment properly, you can generate linelists with one command. Don't forget to install the latest python environment before each run, otherwise you may run into unexpected bugs!
The filepaths in the following examples may need to be modified according to the location of the backoffice and env.d directories on your local system.
Copy and paste this following example using 2021-01-01
as the target results date:
Note: Don't forget to install the latest development environment before each run!
cd ~/backoffice/wa-doh-linelists
PGSERVICE=seattleflu-production \
envdir ../id3c-production/env.d/redcap \
pipenv run ../bin/wa-doh-linelists/generate --date 2021-01-01 \
--output-dir /tmp
In this example, we run the command in the folder, backoffice/wa-doh-linelists
, where its uses a different virtual environment compared to backoffice/id3c-production
. The output linelists will be stored locally at /tmp/linelist_2021-01-01.csv
. You can also specify a remote directory with the --output-dir argument using S3 or SSH/SFTP protocols, e.g. --output-dir s3://bucketname/dirname
or --output-dir ssh://remote-hostname/dirname
. In the case of SSH, the scripts use the DOH_USERNAME and DOH_PRIVKEY_PATH environment variables to determine the SSH username and the path to the private key to use.
Note: the
--output-dir
argument was historically optional, but is now required. In normal (i.e., via cron) operation, this should be specified in the crontab.
The generate
script invoked above is comprised of two, separate scripts.
Calling these individual scripts is useful if you want lower level control over the input and output of the generated linelists.
-
One of them exports linelist data from ID3C given a date (and requires a connection to an ID3C database):
cd ~/backoffice PGSERVICE=seattleflu-production \ ./bin/wa-doh-linelists/export-id3c-hcov19-results 2021-01-01 > ./bin/wa-doh-linelists/data/id3c-export-2021-01-01.csv
-
The other script,
transform
, merges ID3C linelist data and REDCap report data, transforming and standardizing them into a format WA DoH expects, and uploading them to the output directory. This script requires 3 arguments:--date
, which specifies the testing date in YYYY-MM-DD format,--id3c-data
which, if you choose to run this script instead ofgenerate
, you'll need to export using theexport-id3c-hcov19-results
script explained above, and--output-dir
, which saves the transformed data to a file under the local or remote directory specified, named after the testing date. This may be specified multiple times, and will upload data to all of them.
cd ~/backoffice/wa-doh-linelists envdir ../id3c-production/env.d/redcap pipenv run ../bin/wa-doh-linelists/transform \ --date 2021-01-01 \ --id3c-data ../bin/wa-doh-linelists/data/id3c-export-2021-01-01.csv \ --output-dir /tmp \ --output-dir s3://bucketname/dirname \ --output-dir ssh://sft-testing.wa.gov/dirname \ --project-id 12345 \ --upload-if-empty s3 \ --upload-if-empty /tmp
This script may also take two optional arguments:
--project-id
filters the linelist export to only product linelists for the single, given project ID.--upload-if-empty
bypasses checks for an empty output file when using the given protocol or URL, and will upload it even if empty. In other words, if the DoH doesn't want us to send them empty files via SSH, but they do want to be able to see that one was generated in the S3 bucket, specifying--upload-if-empty s3
will accomplish that.
See more information about the
transform
script with./bin/wa-doh-linelists/transform --help
.
You only have to complete these steps once.
First, you'll need a username and password for the ID3C production database to complete this step.
If you don't have or know your username or password, please contact the @dev-team
on Slack at the #informatics channel.
We have to point our generate script to a database. Here, we'll walk through how to configure a PostgreSQL service definition.
First, open up a new terminal, and type the following:
nano ~/.pgpass
Paste the following into this file, substituting my-user-name
and my-password
with your ID3C database credentials.
Keep the colon (:
) separating them.
production.db.seattleflu.org:5432:*:my-user-name:my-password
Then, press CTRL+X
to exit.
Remember to save your work!
(Press Y
then hit Enter
to confirm the original save location.)
IMPORTANT: Once you've exited the file, restrict its permissions with:
chmod 600 ~/.pgpass
Next, type the following:
nano ~/.pg_service.conf
Inside of this file, paste the following configuration, substituting my-user-name
with your ID3C database username.
[seattleflu-production]
host=production.db.seattleflu.org
user=my-user-name
dbname=production
port=5432
Exit nano and save your work.
Now, you're ready to point a command to the ID3C production database by calling it with your newly configured database service definition.
PGSERVICE=seattleflu-production example-command
You only have to complete these steps once.
-
envdir
Check if
envdir
is installed with:which envdir
If you see no output, install it with the following:
pip install envdir
-
pipenv
Check if
pipenv
is installed with:which pipenv
If you see no output, install it with:
pip install pipenv
-
python 3.6
You'll need to have python 3.6 installed. if you don't have it installed, one way to do it is with pyenv. I would recommend trying to complete the step on installing your python environment and coming back to this step only if you get errors about not having python 3.6 installed. If you need to install python 3.6, and you're on MacOS, run:
brew install pyenv
Make sure to follow pyenv's instructions to add it's initialization to your shell initialization file. You can do this by running:
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
You'll then need to start a new terminal session by opening a new tab or new window.
Then run:
pyenv install 3.6.9
If you are getting errors here, you may need to also run:
xcode-select --install
You only have to complete this step once.
If you haven't done so already, clone the backoffice
git repository to your local machine.
In this example, we'll put it in our home directory.
cd ~/
git clone https://github.com/seattleflu/backoffice.git
You should run this step every time.
Once you've installed pipenv
and python 3.6, install the python development environment by running the following commands:
cd ~/backoffice
git pull --rebase
If you see the message "Already up to date.
," you're done with this step!
Otherwise, run the following, too:
PIPENV_PIPFILE=./id3c-production/Pipfile pipenv sync
This may take a few minutes.
You need to do these steps once per REDCap project.
REDCap requires a separate environment variable for every project whose data we wish to export programmatically.
cd ~/backoffice/id3c-production/env.d/
ls
There, you'll see several folders.
We'll be targeting two of them for creating linelists.
You'll need to copy your REDCap API token from each project and create a new file containing only the token.
We create file names using this pattern: REDCAP_API_TOKEN_redcap.iths.org_{PROJECT_ID}
.
Here's an example of creating a new environment variable for the SCAN IRB English project (project ID 22461) with the text editor nano
:
nano redcap/REDCAP_API_TOKEN_redcap.iths.org_22461
Note that you are literally typing the string REDCAP_API_TOKEN
here.
Once you're inside the file with nano
, paste your actual token copied from REDCap.
Then, press CTRL+X
to exit.
Remember to save your work!
(Press Y
then hit Enter
to confirm the original save location.)
Repeat this process for every SCAN REDCap project you're targeting in the linelist configuration file (also at ~/backoffice/etc/wa-doh-linelists.yaml
).
Then, when you're done with SCAN projects, repeat the process above for all the SFS projects.
Now, when you're calling a command that requires REDCap API tokens as environment variables, you can do so by preceding the envdir
call before the script (but AFTER any other variable declarations like PGSERVICE
). e.g.
envdir ./id3c-production/env.d/redcap \
example-command
The linelist generation code will fail if duplicate collection barcodes exist in any one REDCap project. If duplicates exist, you'll see an error message like this:
Exception: Duplicate barcodes detected in project 11111: 'Seattle Flu Study - Example Project'
record_id collection_barcode
0 123 aaaaaaaa
1 234 aaaaaaaa
If the duplicate collection barcode problem can't be easily fixed, it's possible to modify our linelist configuration file so that we can continue to generate linelists while dropping the problematic REDCap project.
To drop a REDCap project from the linelist generation, open up ./etc/wa-doh-linelists.yml
in a text editor (such as nano
, described in the setup steps).
Sections of the configuration file are separated with three hyphens (---
).
Find the section pertaining to the problematic REDCap project.
For example, let's pretend the linelist generation gave an error message complaining about:
project 19212: Seattle Flu Study - Asymptomatic Enrollments
Find the section in the file ./etc/wa-doh-linelists.yml
that starts with:
---
name: Seattle Flu Study - Asymptomatic Enrollments
project_id: 19212
Delete everything starting with the section separator, ---
, until the next separator (or the end of the file if no other section exists).
Save your edits.
Now, when you run the linelist generation code, the SFS Asymptomatic enrollments project will be excluded.
Once the duplicate collection barcode problem is fixed, you can refresh the ./etc/wa-doh-linelists.yml
config file back to its original contents by running:
git restore ./etc/wa-doh-linelists.yml
In some circumstances, records that should be included in the linelist may be dropped. If this happens, these records will need to be added to a new catchup linelist that can be sent to DoH. To aid in the generation of these linelists (especially when missing records span multiple days), there is a jupyter notebook attached to this linelist trello card. To run this notebook and generate the linelist with missing records, you will need 6 things:
- Return of Results data with Sample Barcode
- Presence Absence Results with Sample Barcode
- The EH&S Linelist Report
- All records from
shipping.linelist_data_for_wa_doh_v1
- All linelists from
<S3 Path>
and its subdirectories
To obtain (1.) run:
select
qrcode,
RIGHT(identifier::text, 8) as sample_barcode,
status_code,
collect_ts,
result_ts,
swab_type,
staff_observed,
pre_analytical_specimen_collection
from
shipping.return_results_v3 left join
warehouse.identifier ident on ident.barcode = qrcode left join
warehouse.sample samp on samp.collection_identifier = ident.uuid::text
To obtain (2.) run:
select
barcode,
sample_id,
hcov19_presence_absence_result_v1.result_ts,
hcov19_presence_absence_result_v1.hcov19_result_release_date
from
shipping.hcov19_presence_absence_result_v1
join warehouse.sample using (sample_id)
join warehouse.identifier on uuid::text = collection_identifier
To obtain (3.) export the EH&S Linelist Report. Then to obtain (4.), you can run the following command. You'll want to redirect the output.
psql --quiet --no-align --tuples-only --set ON_ERROR_STOP= <<<"
copy (select * from shipping.linelist_data_for_wa_doh_v1) to stdout with (format csv, header);
"
Finally, linelist records can be obtained from the public-health-reporting
directory of the S3 BBI Shipping bucket. You'll want to recursively copy all the files from that directory.
Once you have obtained the necessary data, alter the filepaths in the second cell of the notebook file. Also update the output path to <date>_append.csv
. You will also want to alter the date range in cell 7 to exclude EH&S records that were created today since they will not be in a linelist yet. You can then run all the cells and verify the linelist output at the end of the process is as expected. As written, the notebook will output all records that are present in the EH&S project but not in any of your downloaded linelist files.