Investigate strawberry performance using async + PyPy #104

cirospaciari · 2023-02-24T16:34:59Z

@wedobetter related some weird cases, we need to investigate, not only async but sync, and also CPython and other popular libraries, for this Issue, investigating strawberry async + PyPy is the priority.

In recent raw async coroutines performance PyPy is faster than vibora and in CPython is faster than Japronto, with really good results. The last version removed a lot of overhead from async, probably this is some weird use of async by the library itself.

Recent TFB preliminary tests show that version v0.0.16 (the same as the strawberry problem) is really fast in ASGI and async in both CPython and PyPy so raw async probably is not the problem:

https://www.techempower.com/benchmarks/#section=test&runid=328ef78f-93c0-4e5d-a5c1-69b5ceb95064&test=plaintext&l=hra0hr-35r

Screenshot provided by @wedobetter

PyPy when accessing C Extensions can be slower than CPython (Cython is a popular one, this will change with HPy), maybe this is the case.

wedobetter · 2023-02-24T16:46:10Z

Thank you @cirospaciari, I should have raised a separate issue, my bad!

Here's how to reproduce it:

from socketify import App
import strawberry
from strawberry.types import Info
from strawberry.utils.graphiql import get_graphiql_html

# this is the GQL helper as defined in the socketify docs
from graphiql import graphiql_from


@strawberry.type
class User:
    name: str


@strawberry.type
class Query:
    @strawberry.field
    async def user(self, info: Info) -> User | None:
        return User(name="Hello")


def run(app: App):
    app.get("/", lambda res, req: res.end(get_graphiql_html()))
    app.post("/", graphiql_from(Query))
    # you can also pass a Mutation as second parameter
    # app.post("/", graphiql_from(Query, Mutation))
    app.listen(
        8000,
        lambda config: print(f"Listening on port http://localhost:{config.port} now\n"),
    )
    app.run()


run(App())

If you remove that async from the Query, you get awesome performance again ;)

I have never experienced this perf degradation with fastapi + strawberry, so I doubt it's strawberry itself.

cirospaciari · 2023-02-24T16:49:54Z

Thank you @cirospaciari, I should have raised a separate issue, my bad!

Here's how to reproduce it:

from socketify import App
import strawberry
from strawberry.types import Info
from strawberry.utils.graphiql import get_graphiql_html

from graphiql import graphiql_from


@strawberry.type
class User:
    name: str


@strawberry.type
class Query:
    @strawberry.field
    async def user(self, info: Info) -> User | None:
        return User(name="Hello")


def run(app: App):
    app.get("/", lambda res, req: res.end(get_graphiql_html()))
    app.post("/", graphiql_from(Query))
    # you can also pass a Mutation as second parameter
    # app.post("/", graphiql_from(Query, Mutation))
    app.listen(
        8000,
        lambda config: print(f"Listening on port http://localhost:{config.port} now\n"),
    )
    app.run()


run(App())

If you remove that async from the Query, you get awesome performance again ;)

No problems, thanks for the feedback this is really helpful to me and to the project.
Maybe this is something with task factory, can you try to run it with: --task-factory-maxitems 0 at the end of the CLI command?

wedobetter · 2023-02-24T16:57:31Z

Unfortunately, that doesn't make a difference

cirospaciari · 2023-02-24T17:14:30Z

Both CPython and PyPy have weird fluctuations here:

CPython:

PyPy:

Without async both are way faster!

The process is always async for socketify the code BTW:

def graphiql_from(Query, Mutation=None):
    if Mutation:
        schema = strawberry.Schema(query=Query, mutation=Mutation)
    else:
        schema = strawberry.Schema(Query)

    async def post(res, req):
        # we can pass whatever we want to context, query, headers or params, cookies etc
        context_value = req.preserve()

        # get all incoming data and parses as json
        body = await res.get_json()

        query = body["query"]
        variables = body.get("variables", None)
        root_value = body.get("root_value", None)
        operation_name = body.get("operation_name", None)

        data = await schema.execute(
            query,
            variables,
            context_value,
            root_value,
            operation_name,
        )

        res.cork_end(
            {
                "data": (data.data),
                **({"errors": data.errors} if data.errors else {}),
                **({"extensions": data.extensions} if data.extensions else {}),
            }
        )

    return post

In the socketify code path, get_json and req.preserve can be optimized (and we already have issues+tasks for it in progress), but outside it is only strawberry code that changes internally, I will need time to investigate the source code of strawberry.

cirospaciari · 2023-02-24T17:17:02Z

Maybe this is related:
strawberry-graphql/strawberry#1886

cirospaciari · 2023-02-24T17:38:45Z

Timing only the scheme.execute part:

With and without async

start_time = time.monotonic()
data = await schema.execute(
            query,
            variables,
            context_value,
            root_value,
            operation_name,
)
end_time = time.monotonic()
print(timedelta(seconds=end_time - start_time))

Using uvloop instead of asyncio don't make much difference:

In this case just put at the top:

import uvloop
uvloop.install()

cirospaciari · 2023-02-24T17:41:51Z

If is something with or async process this maybe related or not:
#102

cirospaciari · 2023-02-24T18:06:51Z

strawberry itself when sync is faster on PyPy with 3.7k req/s vs 1k req/s in CPython
async for PyPy is only 1.2k req/s vs 400 req/s in CPython, it's clear is something in strawberry not in socketify, but we need to investigate further testing using Falcon and other frameworks.
PyPy at first seems to be slower at 200 req/s but after JIT kicks in the performance is a lot better.
Still, we can get faster async in PyPy than sync with CPython.
Caching strategy seems to be really important here since 1k or 4k req/s is really slow, socketify can deliver 1 mi req/s in the same test without graphql

cirospaciari · 2023-02-24T18:15:52Z

@wedobetter I will continue this next week, I have a lot of work to do today, but maybe my comments shine some lights here when under stress and after warmup PyPy gets faster than CPython.

wedobetter · 2023-02-24T21:05:56Z

Maybe this is related:
strawberry-graphql/strawberry#1886

I had never tried fastapi on PyPy and I have just tried PyPy + fastapi (async) + strawberry, with the same code I am getting ~15ms responses

dacevedo12 · 2023-02-27T16:54:17Z

Maybe related graphql-python/graphql-core#190

wedobetter · 2023-03-10T11:38:07Z

@cirospaciari if I decide to go sync, does your server implementation act as a go webserver? i.e. does it handle each request in a system thread, i.e. concurrently or is it blocking on each request?

cirospaciari · 2023-03-10T12:09:18Z

@cirospaciari if I decide to go sync, does your server implementation act as a go webserver? i.e. does it handle each request in a system thread, i.e. concurrently or is it blocking on each request?

Python does not have true threads, will block on CPU and only be viable when doing IO, socketify does not use threads for IO we encourage async, but you can do it with a simple middleware or helper run, take a look on Python ThreadPoolExecutor.

Socketify way is do async only when you need async (IO) you can always use res.run_async(coroutine) and will be faster then ThreadPoolExcutor for IO
For CPU bound the use of JIT (PyPy) and workers (forks) is the way to go.

Python 3.12 or later will start trying to get GIL optional and threads will be viable.

In practice just do some benchmarks at your workload strawberry is not the best performing for PyPy at the moment and need time for warm-up, short tests will not show the true performance, like mentioned earlier on this issue.

In my tests after warming up I can get PyPy async faster than CPython sync with strawberry, JIT just take their time, run wrk or some stress tool to warm up your application, may take more the one run to warm-up.

But in general I will keep this issue because I want to investigate why strawberry performs this way and offer a better experience or alternative

Remember for socketify when using Strawberry will be ALWAYS be async (see the helper implementation) so the problem is something on async on strawberry implementation when running on socketify loop.

Also you need to see if strawberry handles threadpool executor on their non async methods for you, everything I said about middleware for ir is just for socketify not strawberry.

wedobetter · 2023-03-10T12:37:55Z

Right, thanks for your reply...
I know the threading deal with Python, I thought requests were handled by a multithreaded C written library, I have just realised I asked a silly question cause the main thread is Python anyway.

As you said I need to experiment and see what works best, by the way I have dropped PyPy for now because I can't build GRPC libs with it, a old old annoying problem.

Thank you again!

cirospaciari · 2023-03-10T12:41:50Z

HPy project is bringing compatibility with Cython and PyPy it may be solved by it self this year or next, I will not recommend PyPy for now if you depends on Cython libraries unless your benchmarks says that is worth.

Socketify + CPython are faster than japronto at this stage and will be even faster at the end of this year.

Hope that I can solve the issues you have with PyPy when I have some time.

cirospaciari · 2023-03-10T13:51:41Z

Some numbers:

Underpressure aka:

wrk --latency -d 5 -c 4096 --timeout 8 -t 8 http://localhost:8000 -s post.lua

post.lua:

wrk.method = "POST"
wrk.body = "{\"query\":\"query MyQuery {\\n  user {\\n    name\\n  }\\n}\",\"operationName\":\"MyQuery\"}"
wrk.headers["Content-Type"] = "application/json"

from socketify import App
from strawberry.types import Info
from strawberry.utils.graphiql import get_graphiql_html

# this is the GQL helper as defined in the socketify docs
import strawberry
import strawberry.utils.graphiql

import time
from datetime import timedelta

@strawberry.type
class User:
    name: str


@strawberry.type
class Query:
    @strawberry.field
    async def user(self, info: Info) -> User:
        return User(name="Hello")





def run(app: App):
    schema = strawberry.Schema(Query)
    router = app.router()

    @router.get("/")
    def home(res, req): 
        res.end(get_graphiql_html())


    @router.post("/")
    async def post(res, req):
        # we can pass whatever we want to context, query, headers or params, cookies etc
        context_value = req.preserve()

        # get all incoming data and parses as json
        body = await res.get_json()
        
        query = body["query"]
        
        variables = body.get("variables", None)
        root_value = body.get("root_value", None)
        operation_name = body.get("operation_name", None)

        start_time = time.monotonic()

        data = await schema.execute(
            query,
            variables,
            context_value,
            root_value,
            operation_name,
        )
        end_time = time.monotonic()
        print(timedelta(seconds=end_time - start_time))

        res.cork_end(
            {
                "data": (data.data),
                **({"errors": data.errors} if data.errors else {}),
                **({"extensions": data.extensions} if data.extensions else {}),
            }
        )

Run with:

pypy3 -m socketify strawberry_test:run

Results:

0:00:00.053364
0:00:00.053202
0:00:00.052997
0:00:00.052786
0:00:00.052469
0:00:00.051713
0:00:00.051311
0:00:00.051040
0:00:00.050842
0:00:00.050672
0:00:00.050468
0:00:00.050253
0:00:00.050038
0:00:00.049820
0:00:00.049266
0:00:00.049026
0:00:00.048820
0:00:00.048654
0:00:00.048442
0:00:00.048273
0:00:00.048122
0:00:00.047974
0:00:00.047827
0:00:00.047505
0:00:00.047271
0:00:00.047092
0:00:00.046937
0:00:00.046789
0:00:00.046641
0:00:00.046495
0:00:00.046351
0:00:00.046136
0:00:00.045977
0:00:00.045660
0:00:00.045498
0:00:00.045343
0:00:00.045012
0:00:00.044752
0:00:00.044495
0:00:00.044233
0:00:00.043987
0:00:00.043718
0:00:00.043223
0:00:00.043045
0:00:00.042884
0:00:00.042730
0:00:00.042576
0:00:00.042426
0:00:00.042277
0:00:00.042128
0:00:00.041979
0:00:00.041604
0:00:00.041417
0:00:00.041252
0:00:00.041096
0:00:00.040912
0:00:00.040749
0:00:00.040599
0:00:00.040450
0:00:00.040302
0:00:00.040011
0:00:00.039838
0:00:00.039647
0:00:00.039504
0:00:00.039356
0:00:00.039208
0:00:00.039059
0:00:00.038913
0:00:00.038769
0:00:00.038623
0:00:00.038314
0:00:00.038153
0:00:00.037998
0:00:00.037848
0:00:00.037698
0:00:00.037551
0:00:00.037406
0:00:00.037262
0:00:00.037793
0:00:00.037475
0:00:00.037311
0:00:00.037157
0:00:00.037010
0:00:00.036864
0:00:00.036680
0:00:00.036519
0:00:00.036374
0:00:00.036230
0:00:00.035902
0:00:00.035733
0:00:00.035576
0:00:00.035424
0:00:00.036180
0:00:00.035951
0:00:00.035792
0:00:00.035641
0:00:00.035496
0:00:00.035203
0:00:00.035032
0:00:00.034872
0:00:00.034720
0:00:00.034569

cirospaciari · 2023-03-10T13:54:24Z

With no pressure same code:
Results:

0:00:01.558609
0:00:01.405271
0:00:01.529131
0:00:01.549445
0:00:01.733283
0:00:00.119836
0:00:01.210686
0:00:01.441472
0:00:01.571545

With no warmup and no pressure, strawberry does not perform well (1.5s) with warmup and pressure we got more performance with 35ms, this is with print for logging so is a bad way to measure it, it is probably way lower if you remove the print statements.

This is only measuring the schema.execute time because here is where the problem lives.

cirospaciari · 2023-03-10T14:28:18Z

Traced down to:
https://github.com/strawberry-graphql/strawberry/blob/main/strawberry/schema/execute.py#L128-L142

with is:

from graphql import execute as original_execute

This await call that is taking too long in this case is graphql.execute result that is handing too much for some reason, i will investigate further on graphql package.

cirospaciari · 2023-03-10T15:44:42Z

After a quick look, when using futures (aka get_data/get_json etc) graphql is impacted, when not using it graphql is not affected, this is a really odd behavior, using futures or asyncio.Queue gives the same result.

get_data/get_json performs really well and the waiting time is all on graphql.execute not on socketify.

The new helper avoids using get_data/get_json and is 3x faster when not under pressure, only when not under pressure can we feel the latency, this might be something on socketify loop.py, and #102 can be related or not. I need more investigation, so I will take some time to look at it this weekend.

That may be something in asyncio itself that affects it down the road. Today socketify run asyncio loop + libuv loop on the same thread.
The first Hypothesis is asyncio is not calling libuv loop run_once (calling run_once will not block GIL) instead is doing some weird work, dispatching futures/queue etc.

The second Hypothesis is that asyncio is calling too much libuv run_once and getting blocked by libuv loop itself.

New Helper:
https://github.com/cirospaciari/socketify.py/blob/main/examples/helpers/graphiql.py

cirospaciari · 2023-03-10T16:06:33Z

The second Hypothesis was the right answer.
I will change run_once to run_nowait and this will be fixed. I need time to do tests and release a new version so @wedobetter I will commit to main the fix and at the end of the day or tomorrow morning I will publish the new version.

The new helper will have a little less overhead, but in practice will not be that much difference.

cirospaciari · 2023-03-10T16:26:41Z

Closed on https://pypi.org/project/socketify/0.0.17/

Now latency is 3 to 6ms

cirospaciari added the enhancement New feature or request label Feb 24, 2023

cirospaciari self-assigned this Feb 24, 2023

cirospaciari mentioned this issue Feb 24, 2023

Recreate uvloop, aiofiles and fetch using CFFI or HPy #18

Open

cirospaciari closed this as completed Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate strawberry performance using async + PyPy #104

Investigate strawberry performance using async + PyPy #104

cirospaciari commented Feb 24, 2023 •

edited

Loading

wedobetter commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023

wedobetter commented Feb 24, 2023

cirospaciari commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023

cirospaciari commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023

wedobetter commented Feb 24, 2023 •

edited

Loading

dacevedo12 commented Feb 27, 2023

wedobetter commented Mar 10, 2023

cirospaciari commented Mar 10, 2023 •

edited

Loading

wedobetter commented Mar 10, 2023

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023

cirospaciari commented Mar 10, 2023 •

edited

Loading

Investigate strawberry performance using async + PyPy #104

Investigate strawberry performance using async + PyPy #104

Comments

cirospaciari commented Feb 24, 2023 • edited Loading

wedobetter commented Feb 24, 2023 • edited Loading

cirospaciari commented Feb 24, 2023

wedobetter commented Feb 24, 2023

cirospaciari commented Feb 24, 2023 • edited Loading

cirospaciari commented Feb 24, 2023

cirospaciari commented Feb 24, 2023 • edited Loading

cirospaciari commented Feb 24, 2023 • edited Loading

cirospaciari commented Feb 24, 2023 • edited Loading

cirospaciari commented Feb 24, 2023

wedobetter commented Feb 24, 2023 • edited Loading

dacevedo12 commented Feb 27, 2023

wedobetter commented Mar 10, 2023

cirospaciari commented Mar 10, 2023 • edited Loading

wedobetter commented Mar 10, 2023

cirospaciari commented Mar 10, 2023 • edited Loading

cirospaciari commented Mar 10, 2023 • edited Loading

cirospaciari commented Mar 10, 2023 • edited Loading

cirospaciari commented Mar 10, 2023 • edited Loading

cirospaciari commented Mar 10, 2023 • edited Loading

cirospaciari commented Mar 10, 2023

cirospaciari commented Mar 10, 2023 • edited Loading

cirospaciari commented Feb 24, 2023 •

edited

Loading

wedobetter commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Feb 24, 2023 •

edited

Loading

wedobetter commented Feb 24, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading

cirospaciari commented Mar 10, 2023 •

edited

Loading