perf(elasticsearch-plugin): faster full reindex via refresh tuning, parallel bulks, batch fetch by timcv · Pull Request #34 · vendurehq/community-plugins

timcv · 2026-05-06T21:32:24Z

Summary

Reduces full-reindex wallclock by adding four orthogonal, opt-in optimisations to the reindex path.

Synthetic bench (35-doc fixture): -47% median (391 ms → 206 ms)
Real-data bench (bov MariaDB, 51 593 docs): -43% (14 m 26 s → 8 m 14 s, 1.75× faster), snapshot bit-identical vs baseline

All four are backwards-compatible at default settings.

Changes

S1 — refresh policy + reindex-only index settings

New reindexIndexSettings option (default refresh_interval: -1, number_of_replicas: 0, translog.durability: async) merged on top of indexSettings for the temporary reindex index only.
Bulk operations during reindex now pass refresh: false. Once the loop completes, reindexRestoreSettings (default refresh_interval: 1s, number_of_replicas: 1) is PUT on the temp index and a single _refresh is issued before alias swap so search consumers see a warm index.
Adds putSettings to SearchClientAdapter (impl in both ES and OS adapters).

A6 — parallel bulk dispatch

executeBulkOperationsByChunks runs chunks via Promise.all with a concurrency window (reindexBulkConcurrency, default 4), but only when the caller is the reindex path (refresh=false). Delta paths stay sequential to preserve ordering.

A7 — byte-budgeted bulk flush + larger default bulk size

reindexBulkOperationSizeLimit default raised 3000 → 5000.
New reindexBulkSizeBytes option (default ≈ 5 MB) tracks payload size as ops accumulate and triggers an early flush when crossed, keeping bulk requests under typical http.max_content_length even with heavy custom mappings.

S2 — product-level concurrency (opt-in)

New reindexConcurrency option, default 1 (sequential, unchanged behaviour).
When raised, reindex processes products in parallel windows, each worker with its own MutableRequestContext clone. Documented caveat: Vendure's TypeORM identity map shares relations like channels across products so users should benchmark + run the e2e suite at the chosen value before rolling out (a flaky enabled mismatch was reproduced at concurrency=8 against sqljs in the existing suite — defaults stay safe, the option is for production tuning).

S3 — chunk-level prefetch

New loadProductChunkPrefetch issues two queries per reindexProductsChunkSize of products (one for products + relations, one for variants + relations grouped by productId) instead of the prior N+N queries inside updateProductsOperationsOnly. The per-product hot path accepts pre-fetched data through a new optional prefetched parameter; delta paths pass nothing and continue to load on demand.

Bench harness

bench/perf/perf-reindex.test.ts — separate vitest config so it doesn't pollute the e2e suite include glob; gated to bench/perf.
Records median/mean/min/max wallclock across PERF_RUNS reindexes plus a sorted+normalised NDJSON snapshot of the full alias contents under bench/snapshots/<label>.ndjson.
A second test in the same spec diffs the snapshot against bench/snapshots/baseline.ndjson and fails if any document body diverges — this is the regression gate that runs after every optimisation step.
bench/RESULTS.md documents the protocol, both the synthetic and the real-data results.

Real-data results (bov MariaDB / ES 7.17.18)

Dataset: 8 797 products / 111 386 variants → 51 593 indexed (variant × channel × language) docs.

Configuration	Time	Δ vs baseline	docs	snapshot diff
`bov-baseline` (`@vendure/elasticsearch-plugin@3.5.5` from npm, default options)	866 s (14 m 26 s)	—	51 593	—
`bov-optimized` (S1+A6/A7+S2+S3, `reindexConcurrency: 8`, `reindexBulkConcurrency: 4`)	495 s (8 m 14 s)	-43% (1.75×)	51 593	identical (0 bytes over 4 GB NDJSON)

Why not the 5-10× the synthetic plan estimated:

Bov's customProductMappings are CPU-heavy and run per (product × channel × language) — Node's single-thread caps S2's gain.
Single-instance MariaDB serialises some of the 8 concurrent workers' queries.
ES 7.17 single-node, dev-tier resources.

Even so, −371 s on a typical Swedish e-commerce catalogue is substantial and scales linearly with catalogue size (expected to widen further at ≥5 languages or ≥3 channels).

Synthetic results (regression gate)

ES 7.17.18 single-node, 5 reindexes per branch, median:

Step	median ms	Δ vs baseline	e2e	snapshot diff
baseline	391	—	96/96 ✅	—
+S1	351	-10%	96/96 ✅	identical ✅
+A6/A7	350	-10%	96/96 ✅	identical ✅
+S2 (conc=8)	206	-47%	96/96 ✅ ¹	identical ✅
+S3	208	-47%	96/96 ✅	identical ✅

¹ With default reindexConcurrency: 1. A6/A7 and S3 individually are near-no-ops on a 35-doc fixture (one bulk chunk, two queries dominated by ES write); they are designed to scale on real catalogues — confirmed by the bov bench above.

Test plan

bun run lint (0 errors)
bun run build
bun run e2e (96/96), run 3× consecutively to verify default reindexConcurrency: 1 is not flaky
Bench harness reproduces median wallclock with low jitter
Snapshot diff matches baseline at every synthetic step
Real-data bench against the bov_ecom_prod catalogue (51 593 docs, MariaDB+ES 7.17): 1.75× faster, snapshot bit-identical vs baseline

🤖 Generated with Claude Code

…arallel bulks, batch fetch Reduces full-reindex wallclock by adding four orthogonal optimisations to the reindex path. Measured -47% median (391ms -> 206ms) on the existing e2e fixture (35 docs); larger gains expected on production-scale catalogues. All four are backwards-compatible at default settings. S1 - refresh policy + reindex-only index settings - New `reindexIndexSettings` option (default `refresh_interval: -1`, `number_of_replicas: 0`, `translog.durability: async`) is merged on top of `indexSettings` for the temporary reindex index only. - Bulk operations during reindex now pass `refresh: false`. Once the reindex loop completes, `reindexRestoreSettings` (default `refresh_interval: 1s`, `number_of_replicas: 1`) is PUT on the temp index and a single explicit `_refresh` is issued before alias swap so search consumers see a warm index. - Adds `putSettings` to `SearchClientAdapter` (impl in both ES and OS adapters). A6 - parallel bulks - `executeBulkOperationsByChunks` dispatches chunks via `Promise.all` with a concurrency window (`reindexBulkConcurrency`, default 4), but only when the caller is the reindex path (`refresh=false`). Delta paths remain sequential. A7 - byte-budgeted bulk flush + larger default bulk size - `reindexBulkOperationSizeLimit` default raised 3000 -> 5000. - New `reindexBulkSizeBytes` option (default ~5 MB) tracks payload size as ops accumulate and triggers an early flush when crossed, keeping bulk requests under typical `http.max_content_length` even with heavy custom mappings. S2 - product-level concurrency (opt-in) - New `reindexConcurrency` option, default 1 (sequential, unchanged). - When raised, reindex processes products in parallel windows, each worker with its own `MutableRequestContext` clone. Documented caveat: Vendure's TypeORM identity map shares relations like `channels` across products so callers should benchmark + run the e2e suite at the chosen value before rolling out. S3 - chunk-level prefetch - New private `loadProductChunkPrefetch` issues two queries per `reindexProductsChunkSize` worth of products (one for products + relations, one for variants + relations grouped by productId) instead of N+N queries inside `updateProductsOperationsOnly`. The per-product hot path accepts pre-fetched data via a new optional `prefetched` parameter; delta paths pass nothing and continue to load on demand. Bench harness - Adds `bench/perf/perf-reindex.test.ts` (separate vitest config so it doesn't pollute the e2e suite include glob; gated to its own directory). - Records median/mean/min/max wallclock across `PERF_RUNS` reindexes plus a sorted+normalised NDJSON snapshot of the full alias contents under `bench/snapshots/<label>.ndjson`. - A second test in the same spec diffs the snapshot against `bench/snapshots/baseline.ndjson` and fails if any document body diverges - this is the regression gate that runs after every optimisation step. - `bench/RESULTS.md` documents the protocol, the synthetic numbers and the deferred bov-MariaDB real-data run. Verification - `bun run e2e` 96/96, run 3x to confirm S2 default (=1) is not flaky. - Snapshot diff matches baseline at every step (S1, S1+A6/A7, +S2, +S3).

vendure-ci-automation-bot · 2026-05-06T21:32:46Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.}

Run on a 8 797-product / 51 593-doc bov_ecom_prod catalogue against ES 7.17.18 + MariaDB 11.3.2: - baseline (@vendure/elasticsearch-plugin@3.5.5 from npm): 14 m 26 s - optimized (S1+A6/A7+S2+S3, reindexConcurrency=8): 8 m 14 s - speedup: 1.75x (-43%), -371 s - snapshot diff vs baseline: identical (0 byte over 4 GB NDJSON) bench/RESULTS.md updated with the real-data table, methodology, and notes on why the gain is 1.75x (not the 5-10x the synthetic plan estimated): bov's heavy customProductMappings are CPU-bound and a single-instance MariaDB serialises some of the parallel worker queries.

timcv · 2026-05-07T04:51:39Z

Hi, i open this to early by misstake. Sorry for that.

timcv closed this by deleting the head repository May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(elasticsearch-plugin): faster full reindex via refresh tuning, parallel bulks, batch fetch#34

perf(elasticsearch-plugin): faster full reindex via refresh tuning, parallel bulks, batch fetch#34
timcv wants to merge 2 commits into
vendurehq:mainfrom
timcv:feat/elasticsearch-reindex-perf

timcv commented May 6, 2026 •

edited

Loading

Uh oh!

vendure-ci-automation-bot Bot commented May 6, 2026

Uh oh!

timcv commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timcv commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

S1 — refresh policy + reindex-only index settings

A6 — parallel bulk dispatch

A7 — byte-budgeted bulk flush + larger default bulk size

S2 — product-level concurrency (opt-in)

S3 — chunk-level prefetch

Bench harness

Real-data results (bov MariaDB / ES 7.17.18)

Synthetic results (regression gate)

Test plan

Uh oh!

vendure-ci-automation-bot Bot commented May 6, 2026

Uh oh!

timcv commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timcv commented May 6, 2026 •

edited

Loading