Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run pyensembl using multiple threads? #251

Open
damianosmel opened this issue May 6, 2021 · 1 comment
Open

How to run pyensembl using multiple threads? #251

damianosmel opened this issue May 6, 2021 · 1 comment

Comments

@damianosmel
Copy link

damianosmel commented May 6, 2021

Dear pyensembl team,

First, thank you again for developing pyensembl :)

In my application, I have a class that uses the pyensembl extensively. I initialize this class as follows:

class AnnotateVariants:
    def __init__(self,..):
       self.ensembl_data = EnsemblRelease(75) 
       ...

I would need to allow multiple threads to use the pyenseml object, in order to use the functions to annotate variants in parallel.
For some internal reasons, I use the multiprocessing.dummy library, thus I use threads and not processes.

In my current implementation I assign to each distinct thread a new instance of the AnnotateVariants class.
However, looking at the log file I can see that the threads do not run in parallel. That is, say I start with a pool of 16 threads, 5 of them run in parallel and the others wait. Then the next subgroup of threads run and so on.

Is this related to the constructor of pyensembl (EnsemblRelease) as I see that the constructor gives the same ensembl release instance if it's already cached (docs)?

If this is true then the same connection to the sqlite 3 db instance is given to all the threads in pool, so the threads are in race-condition.
That's my interpretation of the observation. Please let me know your ideas.

Second, please advice me on your way to run pyensembl in a fully parallel way.

My pyensembl version is 1.9.0.

Thanks a lot!

@damianosmel damianosmel changed the title How to run pyensembl using multiple threads How to run pyensembl using multiple threads? May 19, 2021
@damianosmel
Copy link
Author

Dear developers team,

any update on this question?

Thank you,
Damianos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant