Skip to content

Troubleshooting

Dave Lawrence edited this page May 18, 2023 · 3 revisions

General debugging

Handy to see all of Django's output in the console, with python3 manage.py runserver

VCF import hanging

It should usually only take <2 hours for a VCF import and running the annotation pipeline on any new variants. You'll see a hanging import via:

  • Sequencing Runs - this page will have a "spinning" logo for ages where the VCF icons usually are...
  • Data - import status of "importing"

There are 3 things that could happen here, and you can tell by the VCF "Vcf import stage". To find this out, click the "VCF" tab on the data page, then click on the link to the VCF that is hung with "importing"

Empty - it failed in the Import process. Click the "View upload processing" - a finished one will have a pie chart, but an unfinished one will just have a grid with a few jobs that are not SUCCESS. Click the "retry import" button. Hopefully it works this time.

Annotating Variants -

See the state of Variant Annotation Runs - via menu Annotation -> Variant Annotation Runs link on page

Login to server and run management commands (see below)

# Annotation run completed but job stuck as annotating
python3 manage.py annotation_set_all_complete
# Annotation run not started
python3 manage.py annotate_unannotated_variants

You can run either of these commands multiple times in whatever order as they check the state of things to make sure it's ok.

Annotation run broken

If the celery task is dead but the state isn't errored out, so you can't click "retry upload"

python3.8 manage.py shell
In [1]: from annotation.models import AnnotationRun                                                                                           

In [2]: ar = AnnotationRun.objects.get(pk=2722)                                                                                               

In [3]: ar.error_exception = "blah"                                                                                                           

In [4]: ar.save()   

calculating sample stats -

This is a really CPU/Databse intensive task, and we run up to 32 of them at a time so sometimes we have Celery jobs crash when the database doesn't allow new connections or something.

Login to server and run management commands (see below)

python3 manage.py calculate_sample_stats

Running management commands

# ssh onto server
sudo su variantgrid
cd /opt/variantgrid # on SAPath server, it's /mnt/variantgrid on VM
python3 manage.py # This will show you all of the commands you can run.

Deleting everything and starting again

  • On VCF page, click Sharing/Permissions tab, then delete

This should delete the project, which will be uploaded again. You can wait for a max of 2 hours for this to happen, or go to the sequencing page, click "manage disk scans" then trigger it manually.

If it doesn't re-load the project, try deleting the SequencingRun (click link, then "Admin" then delete) - this should reload everything.

Jobs not running

Go to the Server Status page. (Settings -> Server Status if you're an admin user)

The celery workers should be in green, if they are in red something is wrong and Celery has crashed. In theory the service should restart, but if not try:

sudo bash
~/stop_services.sh
# wait a while
# maybe check ps aux | grep variant - there should be nothing running except the grep command
~/start_services.sh

Server Down

  • If the server gets reset due to power etc, it should come back up with the services running, but if it was down long enough, perhaps the IP address will have changed. If you turn on the monitor and use the keyboard under my desk, login and type ifconfig then tell everyone the new address.

If the services aren't running, see above to start them.

Database

To see a list of running processes in the database:

#!bash
sudo su postgres -c 'psql -d snpdb'

Then run SQL:

#!sql
-- To see what queries are running and their PIDs
SELECT * FROM pg_stat_activity;

-- To kill something
SELECT pg_cancel_backend(PID);

-- To REALLY kill something
SELECT pg_terminate_backend(PID);

to kill everything

SELECT pg_cancel_backend(pg_stat_activity.pid)
FROM pg_stat_activity
WHERE datname = current_database()
  AND pid <> pg_backend_pid();

If all else fails

See if you can see any errors here:

Copy the logs and email them to Dave Lawrence ([email protected])

mkdir vg_logs
scp -r [email protected]:/var/log/variantgrid vg_logs
tar cvf vg_logs.tar.gz vg_logs

Redis

If you get redis errors with "Read Only Filesystem" - you need to add the redis dir to the SystemD service - /etc/systemd/system/redis.service, eg:

ReadWriteDirectories=-/mnt/redis_database

Redis can't save

(error) MISCONF Redis is configured to save RDB snapshots, but it is currently not able to persist on disk.

Disable save, then do whatever to clean it (redis-cli flushall, or celery purge --app variantgrid) then allow saving again

127.0.0.1:6379> config get save
1) "save"
2) "900 1 300 10 60 10000"
127.0.0.1:6379> config set save ""
OK
127.0.0.1:6379> config set save "900 1 300 10 60 10000"
OK
127.0.0.1:6379> config get save
1) "save"
2) "900 1 300 10 60 10000"

Celery

Value: 'int' object has no attribute 'signature'.

as per:

Error: This file failed to import due to: Error: File "/home/dlawrence/localwork/variantgrid/upload/tasks/vcf/import_vcf_step_task.py", line 131, in schedule_pipeline_stage_steps parallel_tasks.append(task_class.si(upload_step.pk, 0)) File "/usr/local/lib/python3.6/dist-packages/celery/app/task.py", line 784, in si return self.signature(args, kwargs, immutable=True) Type: <class 'AttributeError'>, Value: 'int' object has no attribute 'signature'.

This is caused by not registering the Celery Task class, you need to do eg:

ClassificationImportLinkVariantsTask = app.register_task(ClassificationImportLinkVariantsTask())

Clearing Celery Queues

You need to stop the workers first or you can't purge properly

To clear just 1 queue:

celery -A variantgrid amqp queue.purge seqauto_single_worker

To clear all the queues:

celery --app variantgrid purge

Troubleshooting Variants - dupes, deleting bad inserts etc

Clone this wiki locally