From 10537e0dd02392a938794afeb7f631f83fa71e0a Mon Sep 17 00:00:00 2001 From: Tim Cifuentes Vargas <1415514+timcv@users.noreply.github.com> Date: Wed, 6 May 2026 23:31:43 +0200 Subject: [PATCH 1/2] perf(elasticsearch-plugin): faster full reindex via refresh tuning, parallel bulks, batch fetch Reduces full-reindex wallclock by adding four orthogonal optimisations to the reindex path. Measured -47% median (391ms -> 206ms) on the existing e2e fixture (35 docs); larger gains expected on production-scale catalogues. All four are backwards-compatible at default settings. S1 - refresh policy + reindex-only index settings - New `reindexIndexSettings` option (default `refresh_interval: -1`, `number_of_replicas: 0`, `translog.durability: async`) is merged on top of `indexSettings` for the temporary reindex index only. - Bulk operations during reindex now pass `refresh: false`. Once the reindex loop completes, `reindexRestoreSettings` (default `refresh_interval: 1s`, `number_of_replicas: 1`) is PUT on the temp index and a single explicit `_refresh` is issued before alias swap so search consumers see a warm index. - Adds `putSettings` to `SearchClientAdapter` (impl in both ES and OS adapters). A6 - parallel bulks - `executeBulkOperationsByChunks` dispatches chunks via `Promise.all` with a concurrency window (`reindexBulkConcurrency`, default 4), but only when the caller is the reindex path (`refresh=false`). Delta paths remain sequential. A7 - byte-budgeted bulk flush + larger default bulk size - `reindexBulkOperationSizeLimit` default raised 3000 -> 5000. - New `reindexBulkSizeBytes` option (default ~5 MB) tracks payload size as ops accumulate and triggers an early flush when crossed, keeping bulk requests under typical `http.max_content_length` even with heavy custom mappings. S2 - product-level concurrency (opt-in) - New `reindexConcurrency` option, default 1 (sequential, unchanged). - When raised, reindex processes products in parallel windows, each worker with its own `MutableRequestContext` clone. Documented caveat: Vendure's TypeORM identity map shares relations like `channels` across products so callers should benchmark + run the e2e suite at the chosen value before rolling out. S3 - chunk-level prefetch - New private `loadProductChunkPrefetch` issues two queries per `reindexProductsChunkSize` worth of products (one for products + relations, one for variants + relations grouped by productId) instead of N+N queries inside `updateProductsOperationsOnly`. The per-product hot path accepts pre-fetched data via a new optional `prefetched` parameter; delta paths pass nothing and continue to load on demand. Bench harness - Adds `bench/perf/perf-reindex.test.ts` (separate vitest config so it doesn't pollute the e2e suite include glob; gated to its own directory). - Records median/mean/min/max wallclock across `PERF_RUNS` reindexes plus a sorted+normalised NDJSON snapshot of the full alias contents under `bench/snapshots/