Skip to content

Conversation

casperisfine
Copy link

@casperisfine casperisfine commented Nov 8, 2024

Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one, we accumulate them on a stack and then use the faster batch APIs to directly create the final container.

The original action stack remains, as we need to keep track of what we're currently parsing and how big it is.

But for the recursive cases, we no longer need to create a child stack.

@casperisfine casperisfine force-pushed the rvalue-stack-bulk-create branch 5 times, most recently from b13d5c4 to 2ffc3c9 Compare November 8, 2024 15:24
Basically a port of ruby/json#678

Rather than to allocate the container and push elements one by one,
we accumulate them on a stack and then use the faster batch APIs
to directly create the final container.

The original action stack remains, as we need to keep track of what
we're currently parsing and how big it is.

But for the recursive cases, we no longer need to create a child stack.
@casperisfine casperisfine force-pushed the rvalue-stack-bulk-create branch from 2ffc3c9 to ec22a5d Compare November 8, 2024 16:06
@casperisfine
Copy link
Author

Using ruby/json benchmarks.

Before

== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   632.000 i/100ms
        msgpack.pool     1.155k i/100ms
Calculating -------------------------------------
                json      6.398k (± 1.0%) i/s  (156.30 μs/i) -     32.232k in   5.038193s
        msgpack.pool     11.546k (± 0.4%) i/s   (86.61 μs/i) -     57.750k in   5.002028s

Comparison:
                json:     6398.1 i/s
        msgpack.pool:    11545.5 i/s - 1.80x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    51.000 i/100ms
        msgpack.pool    89.000 i/100ms
Calculating -------------------------------------
                json    519.791 (± 0.6%) i/s    (1.92 ms/i) -      2.601k in   5.004149s
        msgpack.pool    890.718 (± 0.6%) i/s    (1.12 ms/i) -      4.539k in   5.096053s

Comparison:
                json:      519.8 i/s
        msgpack.pool:      890.7 i/s - 1.71x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    29.000 i/100ms
        msgpack.pool    50.000 i/100ms
Calculating -------------------------------------
                json    300.520 (± 1.0%) i/s    (3.33 ms/i) -      1.508k in   5.018400s
        msgpack.pool    500.175 (± 0.4%) i/s    (2.00 ms/i) -      2.550k in   5.098318s

Comparison:
                json:      300.5 i/s
        msgpack.pool:      500.2 i/s - 1.66x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json     3.000 i/100ms
        msgpack.pool    31.000 i/100ms
Calculating -------------------------------------
                json     32.527 (± 0.0%) i/s   (30.74 ms/i) -    165.000 in   5.073787s
        msgpack.pool    301.562 (± 0.7%) i/s    (3.32 ms/i) -      1.519k in   5.037237s

Comparison:
                json:       32.5 i/s
        msgpack.pool:      301.6 i/s - 9.27x  faster

After:

== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json   634.000 i/100ms
        msgpack.pool     1.191k i/100ms
Calculating -------------------------------------
                json      6.404k (± 0.5%) i/s  (156.16 μs/i) -     32.334k in   5.049357s
        msgpack.pool     11.953k (± 0.7%) i/s   (83.66 μs/i) -     60.741k in   5.081737s

Comparison:
                json:     6403.8 i/s
        msgpack.pool:    11953.5 i/s - 1.87x  faster


== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    52.000 i/100ms
        msgpack.pool    95.000 i/100ms
Calculating -------------------------------------
                json    521.096 (± 0.8%) i/s    (1.92 ms/i) -      2.652k in   5.089552s
        msgpack.pool    949.552 (± 0.8%) i/s    (1.05 ms/i) -      4.750k in   5.002717s

Comparison:
                json:      521.1 i/s
        msgpack.pool:      949.6 i/s - 1.82x  faster


== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json    30.000 i/100ms
        msgpack.pool    50.000 i/100ms
Calculating -------------------------------------
                json    300.999 (± 0.3%) i/s    (3.32 ms/i) -      1.530k in   5.083143s
        msgpack.pool    521.542 (± 0.6%) i/s    (1.92 ms/i) -      2.650k in   5.081259s

Comparison:
                json:      301.0 i/s
        msgpack.pool:      521.5 i/s - 1.73x  faster


== Parsing float parsing (2251051 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                json     3.000 i/100ms
        msgpack.pool    38.000 i/100ms
Calculating -------------------------------------
                json     32.149 (± 0.0%) i/s   (31.11 ms/i) -    162.000 in   5.039300s
        msgpack.pool    363.761 (± 1.4%) i/s    (2.75 ms/i) -      1.824k in   5.015290s

Comparison:
                json:       32.1 i/s
        msgpack.pool:      363.8 i/s - 11.31x  faster

So it helps a bit, but really not as much. The reason is likely because batch APIs aren't that much faster if you already allocate the containers with the right size like we were doing, wheras with json we weren't so the saving was much bigger.

The one optimization we really ought to port though, is the hash key cache, as profiling shows that's where we spent the most time by far.

byroot added a commit that referenced this pull request Nov 11, 2024
Extracted from: #370

As initially implemented, we allocate a new stack whenever we
encounter a child stack. This means acquiring a whole 4kiB chunk
from the arena, which generally is overkill.

Instead we can introduce a new type of stack item, that behave just
like a Hash or Array.
@byroot
Copy link
Member

byroot commented Nov 11, 2024

Closing in favor of #370.

I don't think the batch API are worth it here, but it's worth no longer allocating a child stack.

@byroot byroot closed this Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants