You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use Gunicorn as web server with flask api and I have performance issue compare with using Waitress as web server with flask
when I try to calculate matrix multiplication wth numpy there's no huge different in response time between Gunicorn and Waitress
I evaluate this on
machine spec: Macbook Air m2 ram16
this is api that send request to Gunicorn and Waitress
import asyncio
import httpx
import time
from collections import defaultdict
import numpy as np
N = 1
url_paths = ["numpy", "torch", "torch_no_grad"]
API_URLS = [
"http://localhost:8001/",
"http://localhost:8002/",
]
API_URLS_DICT = {
"http://localhost:8001/": "waitress",
"http://localhost:8002/": "gunicorn",
}
async def fetch(client, url):
start_time = time.perf_counter() # Start timing
response = await client.get(url+url_path, timeout=20.0)
end_time = time.perf_counter() # End timing
response_time = end_time - start_time # Calculate response time
return {
"url": url,
"status": response.status_code,
"response_time": response_time,
"data": response.json()
}
async def main():
async with httpx.AsyncClient() as client:
tasks = [fetch(client, url) for url in API_URLS for _ in range(N)]
results = await asyncio.gather(*tasks)
return results
if __name__ == "__main__":
repeat_time = 5
for url_path in url_paths:
count = defaultdict(list)
print(url_path)
print('----------')
for _ in range(repeat_time):
y = asyncio.run(main())
for x in y:
count[API_URLS_DICT[x['url']]].append(x['response_time'])
for k, v in count.items():
v = np.array(v)
print(f"{k}: Mean={v.mean():.4f}s, Std={v.std():.4f}s")
print()
The text was updated successfully, but these errors were encountered:
yothinsaengs
changed the title
Torch with Gunicorn+Flask api performance issue
Torch with Gunicorn + Flask API performance issue on Docker
Feb 18, 2025
How did you launch the test targets? Specifically, I am inquiring about the command lines containing the localhost:8001 (resp localhost:8002) listen address. I am assuming you are testing against Gunicorn 23.0 on Python 3.11, correct?
How did you launch the test targets? Specifically, I am inquiring about the command lines containing the localhost:8001 (resp localhost:8002) listen address. I am assuming you are testing against Gunicorn 23.0 on Python 3.11, correct?
python version is 3.10, here is Dockerfile
# Use official Python image
FROM python:3.10
# Set the working directory
WORKDIR /app
# Copy the application files
COPY app.py requirements.txt ./
# Install dependencies
RUN pip install -r requirements.txt
# Install curl for health check
RUN apt-get update && apt-get install -y curl
# Expose port 8002
EXPOSE 8002
# Run the app with Gunicorn (use default worker count)
CMD ["gunicorn", "-b", "0.0.0.0:8002", "app:app"]
note: there is no different with or without health check in performance
I use Gunicorn as web server with flask api and I have performance issue compare with using Waitress as web server with flask
when I try to calculate matrix multiplication wth numpy there's no huge different in response time between Gunicorn and Waitress
Numpy API
But when I calculate the same operation with torch (both enable and disable torch_no_grad)
Torch API
Torch_no_grad API
there is a huge difference in response time
I evaluate this on
machine spec: Macbook Air m2 ram16
this is api that send request to Gunicorn and Waitress
The text was updated successfully, but these errors were encountered: