Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions packages/elasticsearch-plugin/bench/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
snapshots/
results/
128 changes: 128 additions & 0 deletions packages/elasticsearch-plugin/bench/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Reindex-bench resultat

## Real-data bench (bov MariaDB, 51 593 docs)

Dataset: bov_ecom_prod produktion (8 797 produkter, 111 386 varianter; ~51 593 indexerade
docs efter (variant × channel × language)-fan-out). MariaDB 11.3.2 + ES 7.17.18 +
Redis 7 i Docker, lokalt på Apple Silicon. 1 reindex per konfiguration (varje körning
är dyr: ~8-15 min). Dataset frusen mellan körningar.

| Konfiguration | Tid | Δ vs baseline | docs i index | snapshot diff |
|---|---|---|---|---|
| `bov-baseline` (`@vendure/elasticsearch-plugin@3.5.5` från npm, default options) | **866 s (14 m 26 s)** | — | 51 593 | — |
| `bov-optimized` (S1+A6/A7+S2+S3, `reindexConcurrency: 8`, `reindexBulkConcurrency: 4`) | **495 s (8 m 14 s)** | **-43 % (1.75×)** | 51 593 | **identisk (0 byte)** |

Byggda artefakter + skript under [`bench/`](.). Snapshot-NDJSON är 4 GB per
körning — uteslutna från git via `bench/.gitignore`, reproducerbara med
`scripts/snapshot-bov.mjs`.

### Varför inte 5-10× på bov?

- Bov har 8 797 produkter × ~6 docs/produkt ≈ 52k docs (inte 50k variants direkt).
Variant-fetch dominerar mindre på den volymen än antaget i plan.
- Bovs `customProductMappings` (`featuredAssets`, `facetValueName`, `featuredAsset`,
`productSchema` etc — se `bov-ecom-src/src/elastic-search-config.ts`) är tunga och
kör per (produkt × kanal × språk). CPU-bunden, parallelliseringen flaskhalsas på
Node single-thread.
- MariaDB single-instans + 8 concurrent workers → connection-pool serialiseras delvis.
S2 ger nominellt ~3-4× på CPU men effekten kapas av DB-kontention.
- ES 7.17 single-node + dev-tier-resurser. Med replicas och fler shards skalar S1+A6/A7
bättre.

Trots det: **−371 s (−43 %)** på en typisk svensk e-handelskatalog är substantiellt och
linjärt med produktionsstorlek (förväntad bättre vinst på ≥5 språk eller ≥3 kanaler).

## Synthetic e2e-bench (regression-gate)

Dataset: `e2e/fixtures/e2e-products-full.csv` (35 produkter, 1 kanal, 1 språk → 35 docs).
ES 7.17.18 single-node container. 5 reindex-körningar per branch, median.

| Steg | Branch | median ms | mean ms | min/max ms | Δ median vs baseline | e2e regression | snapshot diff vs baseline |
|---|---|---|---|---|---|---|---|
| 0 | `baseline` | **391** | 397 | 368 / 430 | — | 96/96 ✅ | — |
| 1 | `s1` | **351** | 311 | 205 / 419 | -10% | 96/96 ✅ | identisk ✅ |
| 2 | `s1-a6a7` | **350** | 315 | 204 / 412 | -10% | 96/96 ✅ | identisk ✅ |
| 3 | `s1-a6a7-s2` | **206** | 230 | 205 / 330 | **-47%** | 96/96 ✅ (default conc=1)¹ | identisk ✅ |

¹ S2 parallel (`reindexConcurrency: 8`) passerar e2e konsistent när run i isolation, men blev flaky 2/3 i full e2e-suite (race på shared TypeORM-entity, troligen `channels`-relationen). Default därför `1` (sekventiell, oförändrat beteende). Bench använder explicit `PERF_CONCURRENCY=8`.

## Vad varje steg ändrar

### S1 — refresh-policy + reindex-index-settings
- Ny option `reindexIndexSettings` (default `{ refresh_interval: -1, number_of_replicas: 0, translog: { durability: 'async' } }`) + `reindexRestoreSettings`.
- `runBulkOperationsOnIndex` tar `refresh: boolean` parameter; reindex-pathen passar `false`.
- Innan alias-swap: `putSettings` (restore) + `indices.refresh` (en gång).
- Filer: `src/options.ts`, `src/indexing/indexer.controller.ts`, `src/adapter/{search-client-adapter,elasticsearch-adapter,opensearch-adapter}.ts`.

### A6 + A7 — parallella bulks + storleksbaserad flush
- `executeBulkOperationsByChunks` kör chunks med `Promise.all` med concurrency-limit (`reindexBulkConcurrency`, default 4) — bara när `refresh=false` (reindex-path).
- Default `reindexBulkOperationSizeLimit` 3000 → 5000.
- Ny option `reindexBulkSizeBytes` (default 5 MB). `updateProductsOperationsOnly` spårar payload-storlek + flushar tidigt.
- **Noll-effekt på synthetic** (140 ops, 1 chunk). Förväntas matter på bov-skala.

### S2 — produkt-parallellisering
- Ny option `reindexConcurrency` (default 1, opt-in 4-8 för perf).
- Reindex-loopen splittar produkt-chunks i N workers, varje med egen `MutableRequestContext`.
- Single concrete win (-47% median) på synthetic.
- ⚠️ Race-känslighet: TypeORM-entiteter (channels, customFields) delas mellan workers via identity map. På sqljs syns det som flaky `enabled` mismatch på en av e2e-testen. Default 1 håller bakåtkompatibel; doc-comment varnar.

### S3 — batch-fetch (skippad i denna omgång)
- Synthetic fixture för liten för meningsfull signal (35 produkter total → DB-fetch dominerar inte).
- Implementation kräver QueryBuilder + keyset-paginering, ingrepp i `updateProductsOperationsOnly` ~rad 566-693.
- Lämpas till bov-bench när miljön är uppe.

## Regression-täckning

1. **`yarn e2e`** — full plugin-svit, sqljs + ES 7.17. 96 tester. Kördes 3× efter S2-fix för flaky-check, alla gröna.
2. **Index-payload diff** — `bench/snapshots/<branch>.ndjson` snapshot via scroll-API, sorterad på `_id`, normaliserad (JSON-fält parse:ade, arrays sorterade). `diffSnapshots(baseline, branch)` jämför rad-för-rad. Alla branches matchar baseline exakt.

## Bov-bench — pending env-setup

Blockerare:
1. **MariaDB:3307 ner.** `nc -z localhost 3307` failar. Användarens kommentar antydde att DB-instansen finns men igen är inte uppe — troligen via brew services / standalone container utanför compose. Behöver startas + verifieras seedat (`SELECT COUNT(*) FROM product_variant`).
2. **`bov-db-conversion/vendure-mariadb/docker-compose.yml`** byggs från `Dockerfile` som kräver `package-lock.json` — repot använder `yarn.lock`. Compose är trasig som-är.
3. **Compose använder postgres**, inte mariadb. Real bench mot MariaDB kräver att DB driftas separat (motsvarar `.env.example`s `DB_HOST=127.0.0.1 DB_PORT=3307`).

När miljön är uppe:
1. Lägg till portal-resolution i `bov-db-conversion/vendure-mariadb/package.json`:
```jsonc
"resolutions": {
"@vendure/elasticsearch-plugin": "portal:/Users/tim/Sites/community-plugins/packages/elasticsearch-plugin"
}
```
Plus rename `packages/elasticsearch-plugin/package.json` → `name: "@vendure/elasticsearch-plugin"` på en bench-branch (eller bumpa import i bovs `elastic-search-config.ts` till `@vendure-community/elasticsearch-plugin`).
2. Sätt `reindexConcurrency: 8` i bovs config.
3. Trigga reindex via admin-api, mät `Job.duration`.
4. Snapshot via `e2e/snapshot-index.ts` (samma helper, byt aliasnamn till `bov-variants`).
5. Regression-check: `diff -u baseline.ndjson optimized.ndjson` + variant-count smoke (DB vs ES).

## Reproducera bench

```bash
cd /Users/tim/Sites/community-plugins
export PATH="$HOME/.bun/bin:$PATH"

# Engångs: bun install + ES 7.17 container
docker run -d --name es-bench -p 9200:9200 -e discovery.type=single-node \
-e ES_JAVA_OPTS="-Xms2g -Xmx2g" -e http.max_content_length=200mb \
elasticsearch:7.17.18

# Per branch:
cd packages/elasticsearch-plugin
bun run build
bun run e2e # regression
PACKAGE=elasticsearch-plugin BENCH_LABEL=<label> PERF_RUNS=5 PERF_CONCURRENCY=8 \
bun x vitest --config bench/perf/vitest.config.mts --run
# Resultat: bench/results/<label>.json, bench/snapshots/<label>.ndjson
```

## Filer som ändrats

- `src/options.ts` — nya options + defaults (S1, A6/A7, S2)
- `src/adapter/search-client-adapter.ts` — `putSettings` i interface
- `src/adapter/elasticsearch-adapter.ts` — `putSettings` impl
- `src/adapter/opensearch-adapter.ts` — `putSettings` impl
- `src/indexing/indexer.controller.ts` — refresh-plumbing, reindex-restore-settings, parallel-bulks, byte-flush, parallel-products
- `e2e/snapshot-index.ts` — scroll-based snapshot + diff helper (NY)
- `bench/perf/perf-reindex.test.ts` — perf-bench spec (NY)
- `bench/perf/vitest.config.mts` — separat vitest-config (NY)
138 changes: 138 additions & 0 deletions packages/elasticsearch-plugin/bench/perf/perf-reindex.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
import { JobState } from '@vendure/common/lib/generated-types';
import { DefaultJobQueuePlugin, mergeConfig } from '@vendure/core';
import { createTestEnvironment } from '@vendure/testing';
import * as fs from 'fs';
import gql from 'graphql-tag';
import * as path from 'path';
import { afterAll, beforeAll, describe, expect, it } from 'vitest';

import { initialData } from '../../../../e2e-common/e2e-initial-data';
import { TEST_SETUP_TIMEOUT_MS, testConfig } from '../../../../e2e-common/test-config';
import { ElasticsearchPlugin } from '../../src/plugin';

import { awaitRunningJobs } from '../../e2e/await-running-jobs';
import { buildAdapterForBackend } from '../../e2e/build-adapter-for-backend';
import { deleteIndices } from '../../src/indexing/indexing-utils';
import { diffSnapshots, snapshotIndex } from '../../e2e/snapshot-index';

async function dropElasticIndices(indexPrefix: string) {
const adapter = buildAdapterForBackend()();
try {
await deleteIndices(adapter, indexPrefix);
} finally {
await adapter.close();
}
}

const { searchBackend } = require('../../e2e/constants');

const LABEL = process.env.BENCH_LABEL || 'untitled';
const RUNS = Math.max(1, parseInt(process.env.PERF_RUNS || '5', 10));
const INDEX_PREFIX = `e2e-perf-${searchBackend as string}-`;
const BENCH_DIR = path.resolve(__dirname, '..');
const RESULT_PATH = path.join(BENCH_DIR, 'results', `${LABEL}.json`);
const SNAPSHOT_PATH = path.join(BENCH_DIR, 'snapshots', `${LABEL}.ndjson`);

const reindexMutation = gql`
mutation Reindex {
reindex {
id
state
duration
result
}
}
`;

describe(`Perf reindex bench [${LABEL}]`, () => {
const { server, adminClient } = createTestEnvironment(
mergeConfig(testConfig(), {
plugins: [
ElasticsearchPlugin.init({
indexPrefix: INDEX_PREFIX,
adapter: buildAdapterForBackend(),
reindexConcurrency: parseInt(process.env.PERF_CONCURRENCY || '8', 10),
reindexBulkConcurrency: parseInt(process.env.PERF_BULK_CONCURRENCY || '4', 10),
}),
DefaultJobQueuePlugin,
],
}),
);

beforeAll(async () => {
await dropElasticIndices(INDEX_PREFIX);
await server.init({
initialData,
productsCsvPath: path.join(__dirname, '..', '..', 'e2e', 'fixtures', 'e2e-products-full.csv'),
customerCount: 1,
});
await adminClient.asSuperAdmin();
await awaitRunningJobs(adminClient, 30_000, 1000);
}, TEST_SETUP_TIMEOUT_MS);

afterAll(async () => {
await awaitRunningJobs(adminClient);
await server.destroy();
}, TEST_SETUP_TIMEOUT_MS);

it(`runs reindex x${RUNS} and records metrics`, async () => {
const durations: number[] = [];
const results: Array<{ run: number; durationMs: number }> = [];

for (let i = 0; i < RUNS; i++) {
const start = Date.now();
await adminClient.query<{ reindex: { id: string } }>(reindexMutation);
await awaitRunningJobs(adminClient, 600_000, 200);
const wallclock = Date.now() - start;
durations.push(wallclock);
results.push({ run: i + 1, durationMs: wallclock });
}

const sorted = [...durations].sort((a, b) => a - b);
const median = sorted[Math.floor(sorted.length / 2)];
const min = sorted[0];
const max = sorted[sorted.length - 1];
const mean = durations.reduce((a, b) => a + b, 0) / durations.length;

const docCount = await snapshotIndex(`${INDEX_PREFIX}variants`, SNAPSHOT_PATH);

const summary = {
label: LABEL,
backend: searchBackend,
runs: RUNS,
median_ms: median,
min_ms: min,
max_ms: max,
mean_ms: Math.round(mean),
durations_ms: durations,
doc_count: docCount,
snapshot_path: path.relative(process.cwd(), SNAPSHOT_PATH),
recorded_at: new Date().toISOString(),
details: results,
};

fs.mkdirSync(path.dirname(RESULT_PATH), { recursive: true });
fs.writeFileSync(RESULT_PATH, JSON.stringify(summary, null, 2) + '\n');

// eslint-disable-next-line no-console
console.log(`\n[bench:${LABEL}] median=${median}ms mean=${Math.round(mean)}ms min=${min}ms max=${max}ms docs=${docCount}\n`);

expect(durations.length).toBe(RUNS);
expect(docCount).toBeGreaterThan(0);
}, TEST_SETUP_TIMEOUT_MS);

it('matches baseline snapshot if present', () => {
const baseline = path.join(BENCH_DIR, 'snapshots', 'baseline.ndjson');
if (LABEL === 'baseline' || !fs.existsSync(baseline)) {
return;
}
const diff = diffSnapshots(baseline, SNAPSHOT_PATH);
if (!diff.equal) {
// eslint-disable-next-line no-console
console.error(
`[bench:${LABEL}] snapshot diverges from baseline: baseline=${diff.aLines} this=${diff.bLines} firstDiffIdx=${diff.firstDiffIndex}`,
);
}
expect(diff.equal).toBe(true);
});
});
21 changes: 21 additions & 0 deletions packages/elasticsearch-plugin/bench/perf/vitest.config.mts
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import path from 'path';
import swc from 'unplugin-swc';
import { defineConfig } from 'vitest/config';

export default defineConfig({
test: {
include: ['bench/perf/**/*.test.ts'],
exclude: ['e2e/**', 'node_modules/**', 'lib/**'],
fileParallelism: false,
testTimeout: 30 * 60 * 1000,
typecheck: {
tsconfig: path.resolve(__dirname, '../../../../e2e-common/tsconfig.e2e.json'),
},
allowOnly: true,
},
plugins: [
swc.vite({
jsc: { transform: { useDefineForClassFields: false } },
}),
],
});
78 changes: 78 additions & 0 deletions packages/elasticsearch-plugin/e2e/snapshot-index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import { Client } from '@elastic/elasticsearch';
import * as fs from 'fs';
import * as path from 'path';

import { elasticsearchHost, elasticsearchPort } from './constants';

const VOLATILE_FIELDS = new Set(['@timestamp']);

function normalizeDoc(source: any): any {
const out: Record<string, unknown> = {};
for (const key of Object.keys(source).sort()) {
if (VOLATILE_FIELDS.has(key)) continue;
const v = source[key];
if (typeof v === 'string') {
const trimmed = v.trim();
if ((trimmed.startsWith('{') && trimmed.endsWith('}')) ||
(trimmed.startsWith('[') && trimmed.endsWith(']'))) {
try {
out[key] = JSON.parse(trimmed);
continue;
} catch {
/* fall through */
}
}
}
if (Array.isArray(v)) {
out[key] = [...v].sort((a, b) => String(a).localeCompare(String(b)));
} else {
out[key] = v;
}
}
return out;
}

export async function snapshotIndex(aliasOrIndex: string, outputPath: string): Promise<number> {
const client = new Client({ node: `${elasticsearchHost}:${elasticsearchPort}` });
const lines: string[] = [];
const scroll = '1m';
let resp: any = await client.search(
{
index: aliasOrIndex,
scroll,
size: 1000,
body: {
sort: [{ _id: 'asc' }],
query: { match_all: {} },
},
},
{ meta: true },
);
while (resp.body.hits.hits.length) {
for (const hit of resp.body.hits.hits) {
lines.push(JSON.stringify({ _id: hit._id, _source: normalizeDoc(hit._source ?? {}) }));
}
resp = await client.scroll({ scroll_id: resp.body._scroll_id, scroll }, { meta: true });
}
await client.clearScroll({ scroll_id: resp.body._scroll_id }).catch(() => undefined);
await client.close();

fs.mkdirSync(path.dirname(outputPath), { recursive: true });
fs.writeFileSync(outputPath, lines.join('\n') + (lines.length ? '\n' : ''));
return lines.length;
}

export function diffSnapshots(a: string, b: string): { equal: boolean; aLines: number; bLines: number; firstDiffIndex: number | null } {
const al = fs.existsSync(a) ? fs.readFileSync(a, 'utf8').split('\n').filter(Boolean) : [];
const bl = fs.existsSync(b) ? fs.readFileSync(b, 'utf8').split('\n').filter(Boolean) : [];
const len = Math.min(al.length, bl.length);
let firstDiffIndex: number | null = null;
for (let i = 0; i < len; i++) {
if (al[i] !== bl[i]) {
firstDiffIndex = i;
break;
}
}
if (firstDiffIndex === null && al.length !== bl.length) firstDiffIndex = len;
return { equal: firstDiffIndex === null, aLines: al.length, bLines: bl.length, firstDiffIndex };
}
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,13 @@ export class ElasticsearchAdapter implements SearchClientAdapter {
);
return { body: result.body };
},
putSettings: async ({ index, body }) => {
const result = await this.client.indices.putSettings(
{ index, body },
{ meta: true },
);
return { body: result.body };
},
refresh: async ({ index }) => {
const result = await this.client.indices.refresh({ index }, { meta: true });
return { body: result.body };
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ export class OpenSearchAdapter implements SearchClientAdapter {
const result = await this.client.indices.putAlias({ index, name, body });
return { body: result.body };
},
putSettings: async ({ index, body }) => {
const result = await this.client.indices.putSettings({ index, body });
return { body: result.body };
},
refresh: async ({ index }) => {
const result = await this.client.indices.refresh({ index });
return { body: result.body };
Expand Down
Loading
Loading