Investigate/mitigate acquisition lock timeouts #1600
Labels
acquisition
changes acquisition logic
bug
devops
building, running, deploying, environment stuff, handy utils, repository-related, engineer QoL, etc
mysql
mysql database related
There appears to be a conflicting interaction between covidcast acquisition and the "coverage_crossref_updater" that causes acquisition to encounter a lock timeout (
mysql.connector.errors.DatabaseError: 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
) which in turn causes failure on some input files. The failing files can be re-acquired, but this takes manual intervention and is suboptimal.We have noticed a pattern in timing of errors for some patching job runs that leads to this hypothesis (see slack thread for more detail and links to logs). It makes sense because the updater has a long-running query that is reading from the same table that acquisition writes into.
Look into this to verify that it is indeed the case, and if so, figure out a way around it (which could just be "figure out how to prevent these jobs from running concurrently in our scheduler").
The text was updated successfully, but these errors were encountered: