Skip to content
This repository has been archived by the owner on Apr 26, 2023. It is now read-only.

Support of Postgresql database (or any other type of database) #2072

Open
ammarik opened this issue Jan 10, 2020 · 5 comments
Open

Support of Postgresql database (or any other type of database) #2072

ammarik opened this issue Jan 10, 2020 · 5 comments

Comments

@ammarik
Copy link

ammarik commented Jan 10, 2020

Hello,
is it possible to use for example Postgresql database instead of Sqlite database? I've found that there is a Postgresql mapper (https://github.com/I2PC/scipion/blob/master/pyworkflow/mapper/postgresql.py), but it seems to be outdated and unmaintained. I also found that database layer does not seem to be well separated from other parts of the code (manipulation with Sqlite db files at various parts of code, ...), there is no abstraction, which is a terrible sw design and it hinders of simple addition of any new db implementation.

We would love to use a different type of database than Sqlite. Is there any way to use database like Postgresql, Mysql, ... instead of the Sqlite db? Are you planning to add possibility to use some other database?

@delarosatrevin
Copy link
Member

Dear @ammarik

We started some proof of concept using Postgress, but it is far from complete. The Mapper layer is designed to be abstracted from the database, but as you already noticed, there are a few places where it is not so well isolated but should be only a few.

Regarding the use of other SQL engines, you should notice that we have two use of the mapper. One is for the project entities and their relations. The other is for each individual set. For the second, I think it can be complicated to use an SQL server that is not independent file, but it should be possible.

Anyway, Sqlite performs well in most of the cases, so I don't think this is a priority right now. Do you have any particular problem that you are interested in other mapper implementations?

@ammarik
Copy link
Author

ammarik commented Jan 10, 2020

Dear @delarosatrevin,
thank you for your answer. Where can I find the proof of concept that you mentioned? Is there separate branch with it, or is it the postgresql mapper that is in the master branch?

I would like to execute different protocols on different nodes (we have multiple computing nodes) and at the same time I don't want to copy sqlite databases between nodes. To accomplish this, one of things I need, Is to have a central (e.g. Postgresql) database.

Do you think this usecase is achievable?

I know that there is a support for Slurm queue system, but I think it isn't suitable for my usecase.

@pconesa
Copy link
Member

pconesa commented Jan 10, 2020

Hi @ammarik can you share the filesystem across your nodes?

@ammarik
Copy link
Author

ammarik commented Jan 13, 2020

Hello,
I tried to use AWS S3 storage mounted to AWS EC2 instances, but in the case of large projects, with many protocols, behavior of the Scipion was slow. I guess it was caused by frequent operations (accessing, copying) with sqlite files. That's why I'm trying to use database server.

I would like to be able execute different protocols on different AWS EC2 instances, while all the instances are connected to one database. I would like to distribute protocol executions to multiple different instances (computing nodes). For example I would like to be able to execute protocol that requires GPU on GPU EC2 instance and at the same time execute protocol that doesn't require GPU on some non-GPU instance, both connected to the same database.

@delarosatrevin
Copy link
Member

delarosatrevin commented Jan 13, 2020 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants