-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate strawberry performance using async + PyPy #104
Comments
Thank you @cirospaciari, I should have raised a separate issue, my bad! Here's how to reproduce it: from socketify import App
import strawberry
from strawberry.types import Info
from strawberry.utils.graphiql import get_graphiql_html
# this is the GQL helper as defined in the socketify docs
from graphiql import graphiql_from
@strawberry.type
class User:
name: str
@strawberry.type
class Query:
@strawberry.field
async def user(self, info: Info) -> User | None:
return User(name="Hello")
def run(app: App):
app.get("/", lambda res, req: res.end(get_graphiql_html()))
app.post("/", graphiql_from(Query))
# you can also pass a Mutation as second parameter
# app.post("/", graphiql_from(Query, Mutation))
app.listen(
8000,
lambda config: print(f"Listening on port http://localhost:{config.port} now\n"),
)
app.run()
run(App()) If you remove that I have never experienced this perf degradation with fastapi + strawberry, so I doubt it's strawberry itself. |
No problems, thanks for the feedback this is really helpful to me and to the project. |
Unfortunately, that doesn't make a difference |
Both CPython and PyPy have weird fluctuations here: Without async both are way faster! The process is always async for socketify the code BTW: def graphiql_from(Query, Mutation=None):
if Mutation:
schema = strawberry.Schema(query=Query, mutation=Mutation)
else:
schema = strawberry.Schema(Query)
async def post(res, req):
# we can pass whatever we want to context, query, headers or params, cookies etc
context_value = req.preserve()
# get all incoming data and parses as json
body = await res.get_json()
query = body["query"]
variables = body.get("variables", None)
root_value = body.get("root_value", None)
operation_name = body.get("operation_name", None)
data = await schema.execute(
query,
variables,
context_value,
root_value,
operation_name,
)
res.cork_end(
{
"data": (data.data),
**({"errors": data.errors} if data.errors else {}),
**({"extensions": data.extensions} if data.extensions else {}),
}
)
return post
In the socketify code path, get_json and req.preserve can be optimized (and we already have issues+tasks for it in progress), but outside it is only strawberry code that changes internally, I will need time to investigate the source code of strawberry. |
Maybe this is related: |
Timing only the scheme.execute part: start_time = time.monotonic()
data = await schema.execute(
query,
variables,
context_value,
root_value,
operation_name,
)
end_time = time.monotonic()
print(timedelta(seconds=end_time - start_time))
Using uvloop instead of asyncio don't make much difference: In this case just put at the top: import uvloop
uvloop.install() |
If is something with or async process this maybe related or not: |
strawberry itself when sync is faster on PyPy with 3.7k req/s vs 1k req/s in CPython |
@wedobetter I will continue this next week, I have a lot of work to do today, but maybe my comments shine some lights here when under stress and after warmup PyPy gets faster than CPython. |
I had never tried fastapi on PyPy and I have just tried PyPy + fastapi (async) + strawberry, with the same code I am getting ~15ms responses |
Maybe related graphql-python/graphql-core#190 |
@cirospaciari if I decide to go sync, does your server implementation act as a go webserver? i.e. does it handle each request in a system thread, i.e. concurrently or is it blocking on each request? |
Python does not have true threads, will block on CPU and only be viable when doing IO, socketify does not use threads for IO we encourage async, but you can do it with a simple middleware or helper run, take a look on Python ThreadPoolExecutor. Socketify way is do async only when you need async (IO) you can always use res.run_async(coroutine) and will be faster then ThreadPoolExcutor for IO Python 3.12 or later will start trying to get GIL optional and threads will be viable. In practice just do some benchmarks at your workload strawberry is not the best performing for PyPy at the moment and need time for warm-up, short tests will not show the true performance, like mentioned earlier on this issue. In my tests after warming up I can get PyPy async faster than CPython sync with strawberry, JIT just take their time, run wrk or some stress tool to warm up your application, may take more the one run to warm-up. But in general I will keep this issue because I want to investigate why strawberry performs this way and offer a better experience or alternative Remember for socketify when using Strawberry will be ALWAYS be async (see the helper implementation) so the problem is something on async on strawberry implementation when running on socketify loop. Also you need to see if strawberry handles threadpool executor on their non async methods for you, everything I said about middleware for ir is just for socketify not strawberry. |
Right, thanks for your reply... As you said I need to experiment and see what works best, by the way I have dropped PyPy for now because I can't build GRPC libs with it, a old old annoying problem. Thank you again! |
HPy project is bringing compatibility with Cython and PyPy it may be solved by it self this year or next, I will not recommend PyPy for now if you depends on Cython libraries unless your benchmarks says that is worth. Socketify + CPython are faster than japronto at this stage and will be even faster at the end of this year. Hope that I can solve the issues you have with PyPy when I have some time. |
Some numbers: Underpressure aka: wrk --latency -d 5 -c 4096 --timeout 8 -t 8 http://localhost:8000 -s post.lua post.lua: wrk.method = "POST"
wrk.body = "{\"query\":\"query MyQuery {\\n user {\\n name\\n }\\n}\",\"operationName\":\"MyQuery\"}"
wrk.headers["Content-Type"] = "application/json" from socketify import App
from strawberry.types import Info
from strawberry.utils.graphiql import get_graphiql_html
# this is the GQL helper as defined in the socketify docs
import strawberry
import strawberry.utils.graphiql
import time
from datetime import timedelta
@strawberry.type
class User:
name: str
@strawberry.type
class Query:
@strawberry.field
async def user(self, info: Info) -> User:
return User(name="Hello")
def run(app: App):
schema = strawberry.Schema(Query)
router = app.router()
@router.get("/")
def home(res, req):
res.end(get_graphiql_html())
@router.post("/")
async def post(res, req):
# we can pass whatever we want to context, query, headers or params, cookies etc
context_value = req.preserve()
# get all incoming data and parses as json
body = await res.get_json()
query = body["query"]
variables = body.get("variables", None)
root_value = body.get("root_value", None)
operation_name = body.get("operation_name", None)
start_time = time.monotonic()
data = await schema.execute(
query,
variables,
context_value,
root_value,
operation_name,
)
end_time = time.monotonic()
print(timedelta(seconds=end_time - start_time))
res.cork_end(
{
"data": (data.data),
**({"errors": data.errors} if data.errors else {}),
**({"extensions": data.extensions} if data.extensions else {}),
}
) Run with: pypy3 -m socketify strawberry_test:run Results:
|
With no pressure same code: 0:00:01.558609
0:00:01.405271
0:00:01.529131
0:00:01.549445
0:00:01.733283
0:00:00.119836
0:00:01.210686
0:00:01.441472
0:00:01.571545 With no warmup and no pressure, strawberry does not perform well (1.5s) with warmup and pressure we got more performance with 35ms, this is with print for logging so is a bad way to measure it, it is probably way lower if you remove the print statements. This is only measuring the |
Traced down to: with is: from graphql import execute as original_execute This await call that is taking too long in this case is |
After a quick look, when using futures (aka get_data/get_json etc) graphql is impacted, when not using it graphql is not affected, this is a really odd behavior, using futures or asyncio.Queue gives the same result. get_data/get_json performs really well and the waiting time is all on The new helper avoids using get_data/get_json and is 3x faster when not under pressure, only when not under pressure can we feel the latency, this might be something on socketify loop.py, and #102 can be related or not. I need more investigation, so I will take some time to look at it this weekend. That may be something in asyncio itself that affects it down the road. Today socketify run asyncio loop + libuv loop on the same thread. The second Hypothesis is that asyncio is calling too much libuv run_once and getting blocked by libuv loop itself. New Helper: |
The second Hypothesis was the right answer. The new helper will have a little less overhead, but in practice will not be that much difference. |
Closed on https://pypi.org/project/socketify/0.0.17/ Now latency is 3 to 6ms |
@wedobetter related some weird cases, we need to investigate, not only async but sync, and also CPython and other popular libraries, for this Issue, investigating strawberry async + PyPy is the priority.
In recent raw async coroutines performance PyPy is faster than vibora and in CPython is faster than Japronto, with really good results. The last version removed a lot of overhead from async, probably this is some weird use of async by the library itself.
Recent TFB preliminary tests show that version v0.0.16 (the same as the strawberry problem) is really fast in ASGI and async in both CPython and PyPy so raw async probably is not the problem:
https://www.techempower.com/benchmarks/#section=test&runid=328ef78f-93c0-4e5d-a5c1-69b5ceb95064&test=plaintext&l=hra0hr-35r
Screenshot provided by @wedobetter
PyPy when accessing C Extensions can be slower than CPython (Cython is a popular one, this will change with HPy), maybe this is the case.
The text was updated successfully, but these errors were encountered: