[WIP] Rewrite the "sync_local" query #56

rkistner · 2025-02-03T09:10:26Z

#40 fixed a performance issue in initial/bulk sync when there are many duplicate row_ids, but decreased the performance slightly for the general case. This attempts to optimize it again, mostly by removing the second temp b-tree used in query execution.

This does not make a massive difference in overall initial sync performance. On my machine, with 1M ops, the query time reduces from around 5s -> 3s, versus a total initial sync time of 60s. So it's not a big gain overall, but this is the slowest query that locks the database for writes and cannot be split into smaller subqueries, so any optimization here helps with app responsiveness.

There is another query form added in the comments, which can take the initial sync query time down in the above case to under 2s (with no temp b-tree at all), but it doesn't cater for incremental updates. It needs some stats tracking / heuristics added to know when to use one query or the other, so I'm leaving that for later.

The temp b-trees used by this query could also be related to RangeError: Maximum call stack size exceeded errors seen on iOS, as well as disk I/O error occasionally seen on Android, when SQLite is configured to use files for temporary storage. While those issues have other workarounds, any changes to reduce temporary b-trees here could help.

TODO:

Regression testing
Test real-world performance

rkistner · 2025-02-07T08:58:11Z

I did some performance benchmarks on web, with 1M rows with large ids. This forces SQLite to write the temp b-tree to disk.

Takeways:

The optimized query here makes a minor but still significant difference with native and OPFS.
The optimized "sync from scratch" query gives a major performance boost, so it's worth investigating that further.
OPFSCoopSyncVFS appears to scale similar to native performance, just taking 2x as long for any query.
IDBBatchAtomicVFS scales much worse, and can take 10x as long as long as OPFSCoopSyncVFS with large datasets, when it works at all.
With IDBBatchAtomicVFS, we cannot sync large datasets at all without PRAGMA temp_store = 2.
With IDBBatchAtomicVFS, there is a lot of filesystem overhead. We can however use PRAGMA temp_store = 2, PRAGMA cache_size = -200000 to let SQLite get similar read performance to OPFS due to everything being in memory.

Linux, native:

Original query: 2.01s
Optimized query: 1.42s
Optimized "sync from scratch" query: 0.254s

Web, IDBBatchAtomicVFS:

Original query: IDBBatchAtomicVFS.js:553 RangeError: offset is out of bounds after around 10-15s.
Optimized query: IDBBatchAtomicVFS.js:553 RangeError: offset is out of bounds after around 10-15s.
Optimized "sync from scratch" query: 5.81s

Linux, OPFSCoopSyncVFS:

Original query: 4.38s
Optimized query: 2.68s
Optimized "sync from scratch" query: 0.502s

Web, IDBBatchAtomicVFS, PRAGMA temp_store = 2:

Original query: 40s.
Optimized query: 21s.
Optimized "sync from scratch" query: 5.81s

Web, IDBBatchAtomicVFS, PRAGMA temp_store = 2, PRAGMA cache_size = -200000, pre-warmed cache:

Original query: 4.43s.
Optimized query: 3.30s.
Optimized "sync from scratch" query: 0.727s

The test data:

BEGIN TRANSACTION;

-- 1M ops
WITH RECURSIVE generate_rows(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1 FROM generate_rows WHERE n < 1000000
)
INSERT INTO ps_oplog (bucket, op_id, row_type, row_id, key, data, hash)
SELECT 
    (n % 10), -- Generate 10 different buckets
    n,
    'thisisatable',
    'thisismyrowid' || n,
    'thisisarowkeykey_' || n,
    '{"n": ' || n || '}',
    (n * 17) % 1000000000 -- Some pseudo-random hash
    
FROM generate_rows;

-- 10 buckets
WITH RECURSIVE generate_rows(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1 FROM generate_rows WHERE n < 10
)
INSERT INTO ps_buckets (id, name)
SELECT 
    (n % 10),
    'bucket' || n
    
FROM generate_rows;

COMMIT;

rkistner added 2 commits February 3, 2025 11:01

Rewrite the "sync_local" query.

8107e42

Add note on is_err().

f1f039d

rkistner mentioned this pull request Feb 4, 2025

Support bucket with different priorities #55

Merged

rkistner mentioned this pull request Feb 7, 2025

Cache size & temp storage powersync-ja/powersync-js#492

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Rewrite the "sync_local" query #56

[WIP] Rewrite the "sync_local" query #56

rkistner commented Feb 3, 2025 •

edited

Loading

rkistner commented Feb 7, 2025

[WIP] Rewrite the "sync_local" query #56

Are you sure you want to change the base?

[WIP] Rewrite the "sync_local" query #56

Conversation

rkistner commented Feb 3, 2025 • edited Loading

rkistner commented Feb 7, 2025

rkistner commented Feb 3, 2025 •

edited

Loading