Tips for moving forward when things break.
- ETL processes
- Metabase
- SCAN RoR PDF generation
- Presence Absence Results
- Barcode collection sets
- SFS Switchboard
- Software Stack
- Linelists
- Husky Musher
When encountering an unknown barcode that looks like a typo or a close match to a real barcode, use the unknown barcode Metabase query to try to find a possible match. If the barcode is from the FHIR ETL, identify the underlying REDCap record this barcode belongs to, and add a new card to #record-troubleshooting describing the problem. If the barcode is from a different ETL, Slack #lab to prompt them to update the specimen manifest with the correct barcode (whatever it may be).
Aborting with error: Specimen with unexpected «collections-clia-compliance» barcode «aaaaaaaa»
This is the wrong type of barcode, so delete this record and create a new Trello card in #record-troubleshooting. Be sure to tag Peter and Brian (of the lab) on the new card.
No sample with identifier «aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa» found
or
No sample found with identifier «aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa» or collection identifier «None»
This means that specimen information sent from the LIMS (Lab Inventory Management System) to ID3C is out of date. This is a time-sensitive problem that will impact the return of testing results to participants.
ID3C should be resynced from the LIMS. Contact the lab staff to resync the LIMS to ID3C. You can reach them via Slack in the #bat-lab-pipeline-alerts channel. If no one responds in that channel, you can also @ someone in the #informatics or #lab channel.
This error can also be caused by a duplicated collection barcode for a sample which is noted on the specimen manifest sheet.
One solution here is to manually create samples with just the sample identifiers from the lab's aliquot manifest. Once the collection barcode duplication issue is resolved, the manifest ETL will pick it up and update the newly created samples.
Aborting with error: Identifier found in set «samples-haarvi», not «samples»
This error means that a sample from a separate study arm that we're not supposed to be ingesting was not properly marked as experimental (_exp
), so it ended up in our pipeline.
Contact the appropriate data-transfer channel in Slack. Copy-paste the error message along with the presence_absence_id/group number into the message. Find the presence_absence_id/group number by looking through /var/log/syslog on the backoffice server for
Rolling back to savepoint presence_absence group {the group number}
We can ask NWGC to re-send the same JSON bundle but with _exp
designations on the affected samples.
To manually skip the bundle in recieving.presence_absence
:
--NOTE: this is NOT the correct procedure for SampleNotFoundErrors.
update receiving.presence_absence
set processing_log = '[
{
"status": "manually skipped",
"revision": {the current revision number of the presence-absence ETL}
}
]'
where presence_absence_id = {the group number}
-
AssertionError: Collection identifier found in set «samples», not {'collections-environmental', 'collections-kiosks', 'http://collections-seattleflu.org', 'collections-self-test', 'collections-household-intervention', 'collections-swab&send-asymptomatic', 'collections-kiosks-asymptomatic', 'collections-swab&send', 'collections-household-observation', 'http://collections-fluathome.org'}
In this case, we need to ask Peter or someone in #lab to update the specimen manifest with new collection IDs. We may need to generate new ones for them. See example Slack threads (1, 2) of how this problem has been resolved previously.
-
Aborting with error: Collection identifier found in set «collections-haarvi», not {'collections-household-observation', 'http://collections-seattleflu.org', 'http://collections-fluathome.org', 'collections-household-intervention', 'collections-self-test', 'samples-haarvi', 'collections-environmental', 'collections-kiosks', 'collections-swab&send', 'collections-scan', 'collections-kiosks-asymptomatic', 'collections-swab&send-asymptomatic'}
Assuming we're supposed to actually ingest barcodes from this collection, we need to add a missing collection (e.g.
collections-haarvi
) to the manifest ETL.Sometimes we receive barcodes from collections we're not supposed to be ingesting (e.g.
collections-clia-compliance
). In that case, make a card for the incorrect barcode in #record-troubleshooting, and delete the affected rows fromreceiving.manifest.
-
Aborting with error: Sample identifier found in set «samples-haarvi», not {'samples'}
This means we've received a sample from a separate study arm that we're not supposed to be ingesting. Ask someone in the #lab Slack channel to update these sample identifiers to have a prefix of
_exp
so they won't get ingested in the next manifest upload. The original affected records should be deleted fromreceiving.manifest
.
-
Aborting with error: More than one sample matching sample and/or collection barcodes: [Record(id=137871, identifier=None, collection_identifier='aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa', encounter_id=1754975), Record(id=138045, identifier='bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb', collection_identifier=None, encounter_id=None)]
This error can arise due to incomplete records within the manifest uploaded by the lab team. This usually affects a large batch of samples because the lab team completes one plate of samples at a time in the manifest. Search the receiving table to find affected records with:
with error_records as ( select document ->> 'sample' as sample, document ->> 'collection' as collection, processing_log from receiving.manifest where received::date between '<date-before-error>'::date and '<date-of-error>'::date and document ->> 'collection' is null ), complete_records as ( select document ->> 'sample' as sample, document ->> 'collection' as collection from receiving.manifest where received::date = '<date-of-error>'::date and document ->> 'collection' is not null ) select * from warehouse.sample where sample_id in ( select distinct(cast(b.plog ->> 'sample_id' as int)) as error_sample_id from ( select jsonb_array_elements(a.processing_log) as plog from ( select err.processing_log from error_records err join complete_records comp using (sample) ) as a ) as b where b.plog @> '{"etl":"manifest"}' order by cast(b.plog ->> 'sample_id' as int) ) ;
Look at the output of that final
select *
. If that looks correct, change theselect *
todelete
to delete the records from the warehouse.sample table. -
Aborting with error: More than one sample matching sample and/or collection barcodes: [Record(id=118997, identifier='aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa', collection_identifier='bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb', encounter_id=416344), Record(id=120434, identifier=None, collection_identifier='cccccccc-cccc-cccc-cccc-cccccccccccc', encounter_id=416344)]
This is one of the nefarious problems caused by duplicate barcodes. This situation arises when there are two samples associated with an encounter. Of the two samples, you should delete the one that does not have any presence-absence results attached to it. Then, the manifest ETL will find only one matching sample in the warehouse. It is then able to update the collection identifier to the corrected collection identifier.
-
Aborting with error: More than one sample matching sample and/or collection barcodes: [Record(id=122250, identifier=None, collection_identifier='aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa', encounter_id=418757), Record(id=119610, identifier='bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb', collection_identifier='cccccccc-cccc-cccc-cccc-cccccccccccc', encounter_id=420854)]
This is another duplicate barcodes problem, but more insidious than #1 because these sampes are already linked to two different encounters.
We've solved this in the past by taking the following steps:
- Delete the sample that doesn't have any presence-absence results.
In this case, it's sample
122250
. - Run the manifest ETL.
- Manually upload a DET for the affected REDCap records so they may be linked to the correct encounter and have a sample.
Note that samples that have not yet been aliquoted will resolve when they're added to the aliquoting manifest. In this case,
cccccccc-cccc-cccc-cccc-cccccccccccc
was one of the duplicate barcodes. The tangling that occurred here was probably due to the timing of fixes. - Delete the sample that doesn't have any presence-absence results.
In this case, it's sample
Found duplicate record id «999» in project 12345.
Duplicate record ids are commonly due to repeating instruments/longitudinal events in REDCap,
which the redcap-det ETL is currently unable to handle.
If your REDCap project does not have repeating instruments or longitudinal events,
then this may be caused by a bug in REDCap.
The warning message really says it all. While this issue doesn't cause this ETL pipeline to fail, we still want to post in the #record-troubleshooting Trello board to alert the REDCap team about this problem. If left unmitigated, duplicate record IDs in REDCap could cause our return of results PDF generation to fail. Be sure to tag Misja and Sarah (of the REDCap team) in the new card.
Duplicate record IDs are a commonly known REDCap bug, a duplicate record ID across two study arms (e.g. symptomatic and asymptomatic) for the most part is not surprising. However, if the problem seems especially bizarre -- for example, if every single REDCap record ID in the priority code study arm is a duplicate record ID of another arm -- then send an additional message in the #redcap channel notifying Misja and Sarah of the situation.
Unknown discharge disposition value «...» for barcode «...».
The ETL pipeline attempts to map the raw discharge disposition value (indicating where the patient went upon discharge from the hospital) to a FHIR Encounter.hospitalization.dischargeDisposition code. This warning indicates that there is no mapping for the raw value and the REDCap DET has been skipped in the ETL process.
If the raw value is a human readable disposition value, add a mapping to the data source specific ETL pipeline code file in the id3c-customizations repository. For example, for the UW retrospectives pipeline, edit the discharge_disposition function in this file.
If the raw value isn't a human readable disposition (e.g., is «96») or if you need more information about how to map it, contact Becca at ITHS. She will research the data issue and generate a correction if necessary.
Once the mapping has been updated and deployed, be sure to manually generate/upload the DETs for th skipped records so that they can be ingested!
- Restarting Metabase may help.
- If restarting Metabase fails, you can check its Docker container status with:
docker container ls -a
- Try adjusting the Metabase cache.
- You can kill long, existing Metabase queries with this bit of SQL:
select pg_cancel_backend(pid) FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes' and usename = 'metabase' and state = 'active'
Metabase caching is enabled so that slow queries don't have to be executed every time the question is viewed. A query that takes longer than the MINIMUM QUERY DURATION
setting to run will get cached.
The global CACHE TIME-TO-LIVE (TTL) MULTIPLIER
setting controls how long results get cached.
From Metabase:
To determine how long each saved question's cached result should stick around, we take the query's average execution time and multiply
that by whatever you input here. So if a query takes on average 2 minutes to run, and you input 10 for your multiplier, its cache entry
will persist for 20 minutes.
This TTL multiplier setting can be overridden for a question by setting a value in the cache_ttl
column of the report_card
table in the metabase database.
First, you may need to give your admin database user permissions by connecting as the postgres
user and running:
grant metabase to "{your admin user name}"
Then update the correct row in the report_card
table. To set a 0 multiplier (to disable caching altogether) for a question run a command like:
update report_card
set cache_ttl = 0
where name = '{the name of the question}'
ERROR Errors were encountered (n=1) during processing of: ['s3://dokku-stack-phi/covid19/results-scan-study/AAAAAAAA-2020-01-01-en.pdf']
- This problem is commonly caused by duplicate record IDs in REDCap.
It is related to a known REDCap bug that ITHS is working to fix.
If there are duplicate record IDs in or across study arms (e.g. asymptomatic or symptomatic), post a Slack message in the #redcap channel
describing that there are one or more duplicate
record_id
s causing errors in our results PDF generation. Include which REDCap project contains the problem (e.g. English), and tag Misja. - Rarely, this problem pops up when ID3C failed to create a Specimen resource for a given REDCap record. Manually generating a DET for the target REDCap record should resolve this issue. If this issue continues to arise, then further debugging of our REDCap ingest process is warranted.
Dropped 2 REDCap records with duplicated barcodes: ['AAAAAAAA']
Until this warning is resolved, it will prevent returning results to the affected study participants. When you encounter this warning, please create a new Trello card in the #record-troubleshooting board, and assign it to:
- Annie if it's a UW Reopening, non lab-related issue, or
- Misja for all other issues
- We do not have plate name/info in a field within the JSONs we get from Samplify.
However it does seem like the controls of each plate has an
investigatorId
that starts with the plate name (e.g.BAT-049A*
). - So if we need to investigate the results we received for a specific plate,
I recommend running the following query to find the presence_absence_id for
the plate. In this example, we are searching for plate
BAT049A
:select presence_absence_id from receiving.presence_absence, json_to_recordset(document -> 'samples') as s("investigatorId" text) where "investigatorId" like 'BAT-049A%' group by presence_absence_id;
- When this happens, the lab may create new samples and reattach the results to those samples (see an example in Slack).
- Our ETL would create new presence/absence records for the sample since the
lims_id
is part of the presence/absence identifier. - The sample record would be associated with both LIMS IDs, and the previous presence/absence results would still exist. A manual deletion of results associated with the original LIMS ID may be necessary.
- When this happens, the lab will re-push the entire plate of results with the updated sample results marked as
Fail
. - Our ETL skips
Fail
results, so we need to manually delete records in ID3C.- Find the sample record within
warehouse.sample
using the collection barcode or sample barcode - Find all result JSON documents containing the sample within
receiving.presence_absence
(Hopefully there's only two records: first incorrect one and an updated one) - Download and diff the JSONs to verify that only the affected sample results are different.
\copy (select document from receiving.presence_absence where presence_absence_id = ...) to 'fileA.json' \copy (select document from receiving.presence_absence where presence_absence_id = ...) to 'fileB.json'
jq -S . fileA.json > fileA_fmt.json jq -S . fileB.json > fileB_fmt.json diff fileA_fmt.json fileB_fmt.json
- Delete the incorrect result record(s) for the sample from
warehouse.presence_absence
- Delete the incorrect results JSON document from
receiving.presence_absence
so we don't ingest the incorrect results again when we bump the ETL revision number.
- Find the sample record within
- Sometimes samples that are retroactively marked as
never-tested
for quality control reasons had presence absence results ingested into ID3C. This can be counterintuitive, so when it occurs we should remove those presence absence results from ID3C. - There is a Metabase pulse set to alert if there are sample records marked as
never-tested
that have associated PA results. Note this only sends alerts for non OpenArray assays. - You can find PA results for samples marked as
never-tested
with the following query:
with never_tested as (
select distinct sample_id, sample.details from warehouse.sample
join warehouse.identifier on (sample.identifier = identifier.uuid::text)
join warehouse.presence_absence using (sample_id)
where sample.details ->> 'note' = 'never-tested' and presence_absence.details ->> 'device' != 'OpenArray'
) select * from warehouse.presence_absence where sample_id in (select sample_id from never_tested)
- You may want to check with the lab if something weird looks like it is occurring. We have technically gotten the OK to remove these samples, but sometimes it is good to keep them if something atypical is going on. We generally leave OpenArray results since they are held to a higher standard than Taqman results.
- Remove results from
warehouse.presence_absence
by changing the select to a delete.
Sometimes, the wrong barcode collection identifier set labels are used for an in-person enrollment event. For example, the study may have an event enrolling people in the SCAN In-Person Enrollments REDCap project, but on the way over, an RA accidentally grabbed Asymptomatic Kiosk collection labels instead of SCAN STAVE collection labels. If this happens, there are a few steps to take to remedy this:
-
Retrieve the list of all affected collection barcodes that need to have their internal identifier set changed to the correct set. Following the example above, we want a list of every Asymptomatic Kiosk collection barcode that was accidentally used for SCAN In-Person Enrollments. Let's imagine that the affected barcodes are:
AAAAAAAA BBBBBBBB CCCCCCCC
-
Verify that the given barcodes are of the correct identifier set:
select distinct identifier_set_id from warehouse.identifier where barcode in ( 'AAAAAAAA', 'BBBBBBBB', 'CCCCCCCC' );
If there is only one
identifier_set_id
returned and it matches our expectations (e.g. it is16
which is theidentifier_set_id
forcollections-kiosks-asymptomatic
), then we may proceed. Otherwise, we need to go back to the beginning and make sure we have a correct understanding of the problem. -
Then, update the collection identifier set IDs of the affected barcodes. Make sure to test locally first. Assuming the desired
identifier_set_id
is25
(collections-scan-kiosks
), our code may look like:update warehouse.identifier set identifier_set_id = 25 where barcode in ( 'AAAAAAAA', 'BBBBBBBB', 'CCCCCCCC' );
-
Now, check the #id3c-alerts channel or the processing log in the
receiving
tables to see if any ETL jobs skipped the record because the encounter's collection barcode was not in an expected set. For example, the FHIR ETL does not currently support ingesting Asymptomatic Kiosk barcodes. If these barcodes were skipped in any ETL job, upload manually generated REDCap DETs for the affected encounters.
Sometimes the SFS Switchboard's database fails to update. There is hardly a single cause here, so there are a few things to try when this happens.
- Check the systemd logs with
sudo systemctl status sfs-switchboard
- Check the journal logs with
sudo journalctl -fu sfs-switchboard
- Capture the state of the Switchboard data with:
cd $(mktemp -d) ps aux > ps cp -rp /opt/sfs-switchboard/data .
Try to capture as much information about the failed state of the service as possible before manually restarting the service for the lab.
To manually restart the service, depending on your problem, you may choose to:
- Manually generate the Switchboard data via
PIPENV_PIPFILE=/opt/sfs-switchboard/Pipfile \ envdir /opt/backoffice/id3c-production/env.d/redcap-sfs/ \ envdir /opt/backoffice/id3c-production/env.d/redcap-scan/ \ pipenv run make -BC /opt/sfs-switchboard/
- Restart the service with
sudo systemctl restart sfs-switchboard
We currently use Python 3.6 in production but it is currently in security-fixes-only support, and neither the Apple Command-line Developer Tools nor Homebrew support Python 3.6 at this point. In addition, attempting to build it on Big Sur fails due to compilation problems. Here is how to fix that build error and get Python 3.6 on Big Sur. Open a Terminal window and do the following (commands to be run are prepended with a $
):
- Download Homebrew from https://brew.sh: $ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install the prerequisite packages (some may be installed as a matter of course for Homebrew): $ brew install bzip2 openssl readline sqlite zlib
- Download Python 3.6.x (currently 3.6.13, but check https://www.python.org/downloads/source/ for the most recent 3.6 update): $ curl -O https://www.python.org/ftp/python/3.6.13/Python-3.6.13.tgz
- Untar it: $ tar xzf Python-3.6.13.tgz
cd
into the new directory: $ cd Python-3.6.13- Apply this diff to fix the build error:
diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
index 776a3d2..3b91180 100644
--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -19,7 +19,10 @@
# pragma weak lchown
# pragma weak statvfs
# pragma weak fstatvfs
-
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/uio.h>
+#include <copyfile.h>
#endif /* APPLE */
#define PY_SSIZE_T_CLEAN
Save it as
patch.txt
and apply it in the main Python source directory with: $ patch -p1 < patch.txt Make sure the patch applies cleanly--it's possible that the location will need to be fuzzed (which is probably fine; you should still check it manually) but if it fails, consider these instructions to be broken. - Configure the installation with:
CC=/usr/bin/clang CFLAGS="-I/usr/local/opt/zlib/include -I/usr/local/opt/openssl/include -I/usr/local/opt/ncurses/include -I/usr/local/opt/sqlite/include -I/usr/local/opt/bzip2/include -I/usr/local/opt/readline/include" LDFLAGS="-L/usr/local/opt/zlib/lib -L/usr/local/opt/openssl/lib -L/usr/local/opt/ncurses/lib -L/usr/local/opt/sqlite/lib -L/usr/local/opt/bzip2/lib -L/usr/local/opt/readline/lib" ./configure --prefix=/usr/local/python --enable-optimizations --with-ensurepip
This command will configure Python 3.6 for placement in
/usr/local/python
; if you want it in a different prefix (which is a good idea, especially if you have multiple versions of Python on your machine), change theprefix
argument. - Build and install it with: $ make && make test $ make altinstall
- Now you can access Python 3.6 by running:
$ /usr/local/python/bin/python3.6
(of course, change
/usr/local/python
to whatever you set the prefix to in step 7)
And now for the disclaimers. You should only expect these instructions to work with:
- Big Sur 11.4
- Intel Mac
- Python 3.6.13
- Command-Line Developer Tools v12.5.0.0.1.1617976050, as obtained by
pkgutil --pkg-info=com.apple.pkg.CLTools_Executables
Please don't depend on this for anything resembling production, or use it with important data, or in the development of nuclear weapons.
If we get an alert that the linelist didn't successfully upload to it's destination (e.g. from the /opt/backoffice/bin/wa-doh-linelists/generate script), we need to manually rerun and upload the linelist file.
Once the underlying issue is fixed, one way to do this is to update the crontab on backoffice production instance, and change the time to run in the next few minutes, reinstall the crontab, let it run, verify output, change the time back and reinstall crontab.
If you need to run it on a different date than the original failed run, you will need to update the --date
parameter to run for the missed day. (note: this crontab is currently set to generate a linelist for the previous day's result)
One other option is to just run backoffice/bin/wa-doh-linelists/generate
script locally, passing in appropriate environment variables and parameters.
You can find additional information about running the linelists generating scripts in our documentation
When the new linelist is uploaded to its destination, post a note in the slack channel #linelist-submissions for visibility.
If we get an alert that the available uWSGI workers for huksy-musher has dropped to 0, we need to check the status of those workers and reload husky-musher if one or more workers is stuck.
First get the pids of husky-musher worker processes. With the current configuration (11/30/21) there are 3 workers, which should be listed under the parent process:
ps -auxf | grep husky-musher
ubuntu@ip-172-31-15-85:~$ ps -auxf | grep husky-musher
www-data 999 0.0 0.6 943776 111772 ? Ss 04:00 0:03 /usr/bin/uwsgi --ini /etc/uwsgi/base.ini --ini /etc/uwsgi/apps-available/husky-musher.ini
www-data 2308 0.0 0.6 1273788 108844 ? Sl 04:00 0:11 \_ /usr/bin/uwsgi --ini /etc/uwsgi/base.ini --ini /etc/uwsgi/apps-available/husky-musher.ini
www-data 2330 0.0 0.7 1278360 114892 ? Sl 04:00 0:18 \_ /usr/bin/uwsgi --ini /etc/uwsgi/base.ini --ini /etc/uwsgi/apps-available/husky-musher.ini
www-data 2334 0.0 0.7 1278112 113716 ? Sl 04:00 0:18 \_ /usr/bin/uwsgi --ini /etc/uwsgi/base.ini --ini /etc/uwsgi/apps-available/husky-musher.ini
ubuntu 13827 0.0 0.0 14852 1060 pts/0 S+ 11:59 0:00 \_ grep --color=auto husky-musher
prometh+ 2307 0.0 0.1 113628 18696 ? Ssl 04:00 0:07 uwsgi_exporter --web.listen-address localhost:46947 --stats.uri unix:///run/uwsgi/app/husky-musher/stats
Run strace
on each worker pid (2308, 2330, and 2334 in the example above). For example:
sudo strace -f -p 2308
With the current cofiguration (11/30/2021) each worker has 4 threads, each with their own pid, which should be output by strace
. Check for any threads that are stuck in read state (i.e. read(4, <unfinished ...>
), and monitor strace for a few seconds to see if those pids remain in that state.
ubuntu@ip-172-31-15-85:~$ sudo strace -f -p 2308
strace: Process 2308 attached with 4 threads
[pid 2313] read(94, <unfinished ...>
[pid 2315] futex(0x555ce06f3bf8, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 2315] epoll_wait(9, <unfinished ...>
[pid 2314] futex(0x555ce06f3bf8, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 2308] read(92, <unfinished ...>
...
If one or more worker threads is indeed stuck in "read" state, reload husky-musher:
sudo systemctl reload uwsgi@husky-musher
Wait a few seconds for it to reload workers gracefully, at which point the workers should have new pids:
ubuntu@ip-172-31-15-85:~$ ps -auxf | grep husky-musher
www-data 999 0.0 0.6 943848 112408 ? Ss 04:00 0:04 /usr/bin/uwsgi --ini /etc/uwsgi/base.ini --ini /etc/uwsgi/apps-available/husky-musher.ini
www-data 14952 0.1 0.5 1154804 81212 ? Sl 12:07 0:00 \_ /usr/bin/uwsgi --ini /etc/uwsgi/base.ini --ini /etc/uwsgi/apps-available/husky-musher.ini
www-data 14956 0.1 0.5 1158996 82144 ? Sl 12:07 0:00 \_ /usr/bin/uwsgi --ini /etc/uwsgi/base.ini --ini /etc/uwsgi/apps-available/husky-musher.ini
www-data 14960 0.0 0.4 1145932 78984 ? Sl 12:07 0:00 \_ /usr/bin/uwsgi --ini /etc/uwsgi/base.ini --ini /etc/uwsgi/apps-available/husky-musher.ini
ubuntu 15028 0.0 0.0 14852 1008 pts/0 S+ 12:08 0:00 \_ grep --color=auto husky-musher
prometh+ 2307 0.0 0.1 113628 18696 ? Ssl 04:00 0:07 uwsgi_exporter --web.listen-address localhost:46947 --stats.uri unix:///run/uwsgi/app/husky-musher/stats
Repeat the steps above with strace to confirm all workers and threads are functioning and/or check Grafana (web dashboard, uWSGI workers available panel).
If a graceful reload does not work, restarting is a more forceful approach:
sudo systemctl restart uwsgi@husky-musher