Deadlock when deploying in FastAPI + Gunicorn/Uvicorn + Docker #12746
              
                Unanswered
              
          
                  
                    
                      dandiep
                    
                  
                
                  asked this question in
                Help: Coding & Implementations
              
            Replies: 2 comments 2 replies
-
| 
         Hey dandiep, We have a few issues related to the deadlock you are experiencing. Please check out this one and let us know if you need more information: #4667  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            -
| 
         OK, can confirm.  @kadarakos can I recommend that this get added to the docs somewhere? I think deploying spacy for inference behind an API isn't that uncommon... I lost a good week figuring out what in the world was going on here and building a reproducible test case.  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    2 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've got a FastAPI application where we're using spacy with various transformer models to parse text. Under concurrent load, the spacy (or the transformers spacy uses) seem to deadlock. Every thread gets deadlocked on this same torch.nn.Linear and nothing advances. I can only reproduce this in Docker, I haven't been able to reproduce it on my mac.
Here's a thread dump from pystack:
The code itself is pretty simple. I have a function that loads spacy and caches it in a dict:
Then my FastAPI request is this:
A single request at a time will execute, but if I fire a bunch of requests, I inevitably get the deadlock above.
I've set Gunicorn workers to 1, and the worker type is Uvicorn. I cannot have more than 1 worker because that will create multiple instances of each transformer model and I will run out of memory.
If, I add locks around the spacy code, I don't appear to be able to reproduce the issue. To do that, I create a wrapper:
and then just do
SpacyWrapper(nlp)Any ideas on what could be going on? Or investigative paths to proceed down?
Beta Was this translation helpful? Give feedback.
All reactions