You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thank you again for developing pyensembl :)
In my application, I have a class that uses the pyensembl extensively. I initialize this class as follows:
class AnnotateVariants:
def __init__(self,..):
self.ensembl_data = EnsemblRelease(75)
...
I would need to allow multiple threads to use the pyenseml object, in order to use the functions to annotate variants in parallel.
For some internal reasons, I use the multiprocessing.dummy library, thus I use threads and not processes.
In my current implementation I assign to each distinct thread a new instance of the AnnotateVariants class.
However, looking at the log file I can see that the threads do not run in parallel. That is, say I start with a pool of 16 threads, 5 of them run in parallel and the others wait. Then the next subgroup of threads run and so on.
Is this related to the constructor of pyensembl (EnsemblRelease) as I see that the constructor gives the same ensembl release instance if it's already cached (docs)?
If this is true then the same connection to the sqlite 3 db instance is given to all the threads in pool, so the threads are in race-condition.
That's my interpretation of the observation. Please let me know your ideas.
Second, please advice me on your way to run pyensembl in a fully parallel way.
My pyensembl version is 1.9.0.
Thanks a lot!
The text was updated successfully, but these errors were encountered:
damianosmel
changed the title
How to run pyensembl using multiple threads
How to run pyensembl using multiple threads?
May 19, 2021
Dear pyensembl team,
First, thank you again for developing pyensembl :)
In my application, I have a class that uses the pyensembl extensively. I initialize this class as follows:
I would need to allow multiple threads to use the pyenseml object, in order to use the functions to annotate variants in parallel.
For some internal reasons, I use the multiprocessing.dummy library, thus I use threads and not processes.
In my current implementation I assign to each distinct thread a new instance of the AnnotateVariants class.
However, looking at the log file I can see that the threads do not run in parallel. That is, say I start with a pool of 16 threads, 5 of them run in parallel and the others wait. Then the next subgroup of threads run and so on.
Is this related to the constructor of pyensembl (EnsemblRelease) as I see that the constructor gives the same ensembl release instance if it's already cached (docs)?
If this is true then the same connection to the sqlite 3 db instance is given to all the threads in pool, so the threads are in race-condition.
That's my interpretation of the observation. Please let me know your ideas.
Second, please advice me on your way to run pyensembl in a fully parallel way.
My pyensembl version is 1.9.0.
Thanks a lot!
The text was updated successfully, but these errors were encountered: