feat(electric): Proposed speedup of Sent Rows Graph popping method #1389

msfstef · 2024-06-19T16:03:16Z

Addresses VAX-1977

While running Linearlite multi-project, I noticed unsubscribes took a very long time, and I found that the Graph.delete_vertices operation is what takes the most time to run by far, not scaling very well.

It doesn't make sense to me that deleting a bunch of vertices from a tree graph would take this long especially when traversing the whole graph is fast, so reconstructing it should really not be too much trouble (?), so my conjecture is that the delete_vertices implementation which interacts with a map quite a lot is just suboptimal.

The proposed change is a minimal diff to do the same thing which with the same workload I'm observing >100x improvement in speed with much better scaling properties, but obviously when you get this kind of speedup it casts doubt on its correctness.

Instead of deleting vertices, we keep track of all edges to be removed, as well as vertices to be removed - then run Graph.remove_edge which seems to have a fairly optimal implementation for all edges to be popped, and instead of removing the vertices with Graph.delete_vertices we use Graph.subgraph with a diff of the graph's existing vertices minus the ones to be removed, which should generate a maximally connected subgraph containing the specified vertices.

Since the request related edges have been removed, along with the disconnected vertices, the subgraph should be (?) the same as what we were producing before, but I suspect because of the implementation of Graph.subgraph being more aimed at recreating a graph from scratch through traversing the existing one rather than modifying the existing one it is more efficient. edit: I suspect also that since once a vertex that forms a subtree that is going to be removed entirely is marked as removed, constructing a subgraph ignores the whole subtree whereas delete_vertices will try to clean up the subtree that we're going to entirely ignore anyway (?)

Even if there is an issue with the correctness of this approach, I believe we should be able to traverse the graph from the root to recreate a graph without the specified edges and vertices much faster than with Graph.delete_vertices, even if it requires a more custom implementation.

Testing with Linearlite varying number of issues and comments, split over 3 projects/shapes.
The unsubscribe Shapes.SentRowsGraph.pop_by_request_ids call takes:

For 3k rows / shape (single shape)

old ~1200ms
new ~ 30ms

For 10k rows / shape (single shape)

old ~11000ms
new ~50ms

For 20k rows / shape (multi-shape)

Old

sub 1 shape unsub 1 shape ~ 46000ms
sub 2 shapes unsub 1 shape ~ 180000ms
( didn't bother with 3 shapes)

New

sub 1 shape unsub 1 shape ~100ms
sub 2 shapes unsub 1 shape ~ 300ms
sub 3 shapes unsub 1 shape ~ 400ms

components/electric/lib/electric/replication/shapes/sent_rows_graph.ex

icehaunter

Great find!

Addresses [VAX-1985](https://linear.app/electric-sql/issue/VAX-1985/large-unsubscribes-lead-client-with-wa-sqlite-driver-to-a-stack) Apparently the method we were using, where we chained `OR` and `AND` clauses in the `WHERE` clause, led to: 1. SQLite call stack overflow (at least with `wa-sqlite`) at around 6k `OR` clauses, far below the max parameter limit 2. Sub-optimal performance in both SQLite and PGlite I've converted the batching to use `IN` statements for both single-column queries and multi-column ones, like with composite primary keys. The performance is significantly improved and has room for more improvement: #### Old Batching ##### Shadow Table 30k deletes - ~4700ms pglite - ~500ms wa-sqlite (arbitrary small batching to avoid overflow) ##### Comment 9k deletes - ~700ms pglite - ~2000ms wa-sqlite (max 6k batches) #### New Batching ##### Shadow Table 30k deletes - ~100ms pglite - ~450ms wa-sqlite ##### Comment 9k deletes - ~40ms pglite - ~700ms wa-sqlite While the shadow table deletes take similar time for wa-sqlite (~500ms), the shadow table uses a triple composite key of 3 string columns. There's room for significant optimization there, concatenating the parameters and deleting based on the concatenated PK columns reduced the time to execute from ~500ms to ~50ms - pglite also had a very small improvement but hard to measure. I think it's possible to avoid having a composite primary key for the shadow table since we know it's 3 string columns and they can be collapsed into one if that proves to be a significant optimization, but that is outside of the scope of this fix, just raising it here as a potential optimization. NOTE: this improvement is the result of the same investigation that yielded #1389, both together result in a useable linearlite with >2k issues otherwise we run into timeouts and call stack overflows.

msfstef added 2 commits June 19, 2024 19:02

proposed improvement

ebf3566

Remove logs

1cb9ef4

msfstef changed the title ~~proposed improvement~~ feat(electric): Proposed speedup of Sent Rows Graph popping method Jun 20, 2024

Fix formatting issues

0961802

msfstef requested review from robacourt and icehaunter June 20, 2024 08:04

msfstef mentioned this pull request Jun 20, 2024

fix(client): Fix batch deletes causing call stack overflow #1394

Merged

Add a couple of tests

b863df8

msfstef marked this pull request as ready for review June 20, 2024 15:31

msfstef commented Jun 25, 2024

View reviewed changes

components/electric/lib/electric/replication/shapes/sent_rows_graph.ex Outdated Show resolved Hide resolved

further optimized the function to not build up unneeded sets

95d5e3a

icehaunter approved these changes Jun 25, 2024

View reviewed changes

Add changeset

464747f

msfstef merged commit 5e16611 into main Jun 25, 2024
7 checks passed

msfstef deleted the msfstef/temp-test-improvement branch June 25, 2024 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(electric): Proposed speedup of Sent Rows Graph popping method #1389

feat(electric): Proposed speedup of Sent Rows Graph popping method #1389

msfstef commented Jun 19, 2024 •

edited

Loading

icehaunter left a comment

feat(electric): Proposed speedup of Sent Rows Graph popping method #1389

feat(electric): Proposed speedup of Sent Rows Graph popping method #1389

Conversation

msfstef commented Jun 19, 2024 • edited Loading

For 3k rows / shape (single shape)

For 10k rows / shape (single shape)

For 20k rows / shape (multi-shape)

Old

New

icehaunter left a comment

Choose a reason for hiding this comment

msfstef commented Jun 19, 2024 •

edited

Loading