Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple conflicting time valid records can be inserted #2

Open
DanBuchan opened this issue Aug 9, 2019 · 3 comments
Open

Multiple conflicting time valid records can be inserted #2

DanBuchan opened this issue Aug 9, 2019 · 3 comments
Labels

Comments

@DanBuchan
Copy link
Collaborator

DanBuchan commented Aug 9, 2019

Somehow when multiple jobs run the database allow writes of multiple, time conflicting, records. Even though all read and write functions on the database are set to atomic. That is a two records for a given md5 can be inserted on the same day an dget the same expiry date

At a guess this is because the code running on each blast client is not aware of the other. But I thought atomicity was handled on the postgres instance. hmmm.

@DanBuchan DanBuchan added the bug label Aug 9, 2019
@DanBuchan
Copy link
Collaborator Author

Here is a set of bad data the triggers this issue
bad_data.csv.gz

@DanBuchan
Copy link
Collaborator Author

Also the psiblast running client scripts should probably fail over to running the psiblast (and not try and insert to the cache) and then send a warning to the psipred-alerts slack channel

@DanBuchan
Copy link
Collaborator Author

Probably we shouldn't be in the business of trying to write and maintain out own caching software given how notoriously difficult a CS problem that is. The entire rest of the server uses standard frameworks for all the other hard problems we're not experts in (i.e. queuing), so I really have no idea why I thought I could fashion a caching server from scratch.

Probably the blast cache should be re-visited. I assume there exists a nice Journaling cache system build on top of redis or mongoDB that we could use drop in place instead and it wouldn't be too much work to replace my current solution.

Configuring it to recocrd the information we want and then updating the clients scripts should be all that's needed. Probably a week to do the work and some number of days to add the new cache to the DB server and switch off this django based version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant