speed up perf-tests by more efficient parallelism

# Alter Parallelism in Perf Measuring

## Context

**Today:** `bench/cli/commands/compare/index.ts:131` runs discovered `.abtest.ts`
tests one-at-a-time in a plain `for` loop. Inside each test,
`bench/core/run.ts:59-106` (`run()`) builds a per-test worker pool of size
`parallelism` (halved internally in simultaneous mode so Chrome-process count
roughly matches the flag). There is **no cross-test parallelism** — test shakacode/shakaperf-old#2
cannot start before test shakacode/shakaperf-old#1 fully finishes.

**Problem:** With 10 tests × 20 measurements each on a 16-core box, one test
at a time is a poor use of cores. When many tests exist, we want them to run
concurrently, each using the minimum parallelism (1 pair-worker = 2 Chrome
slots). When few tests remain, free slots should help in-flight tests finish
faster.

**Goal:** Introduce a single **global sampler pool** sized by `parallelism`
(in Chrome-process slots). Each test submits its warmup and measurements to
this shared pool. Cross-test parallelism emerges naturally: FIFO scheduling
with pair-tasks taking 2 slots means `floor(parallelism / 2)` tests make
progress concurrently; as tests drain, spare workers migrate onto remaining
work.

**Non-goals / constraints (must preserve):**
- **Both sampling modes stay functional.** The noise-resilience study isn't
  done; `sequential` is kept (with a runtime deprecation warning pointing at
  `NOISE_RESISTANT_PERF_TESTS_STUDY.md`) so simultaneous can still be proven
  superior.
- **Barrier-synced paired measurements** in simultaneous mode — control and
  experiment for a given iteration start together, both complete before the
  pair resolves. Breaking this destroys statistical sensitivity.
- **Shuffling of control/experiment order** within each iteration stays.
- **Per-test warmup** (trial iteration) stays, once per test, before measurements.

## Files

**Refactor:**
- `packages/shaka-perf/src/bench/core/run.ts` — replace the internal worker
  pool (lines 85-171) with calls into the new shared pool. Keep shuffling
  (line 272) and pair-barrier logic (line 206), but delegate scheduling.
  Remove the `parallelism / 2` halving at line 81-83 (pool handles it).
- `packages/shaka-perf/src/bench/cli/commands/compare/index.ts` — the
  `for (const testDef of tests)` loop at line 131 becomes
  `await Promise.all(tests.map(runOneTest))`. One shared pool instance is
  created before the map and disposed after.
- `packages/shaka-perf/src/compare/config.ts` — after Zod parse of
  `PerfConfigSchema` (lines 61-77), emit a `console.warn(...)` if
  `samplingMode === 'sequential'`, referencing the study file. Also update
  the JSDoc on the `samplingMode` enum entry.

**Create:**
- `packages/shaka-perf/src/bench/core/sampler-pool.ts` — the shared pool.

## New module: `sampler-pool.ts`

A single class `SamplerPool` owning up to `parallelism` concurrent Chrome
samplers ("slots") and a FIFO task queue.

```ts
// Each task is scheduled by runOneTest (warmup or measurement).
type Task<TSample> =
  | {
      kind: 'single';
      testKey: string;
      benchmark: Benchmark<TSample>;  // .setup() spawns a Chrome sampler
      iteration: number;
      isTrial: boolean;
      resolve: (s: TSample) => void;
      reject: (e: unknown) => void;
    }
  | {
      kind: 'pair';
      testKey: string;
      control: Benchmark<TSample>;
      experiment: Benchmark<TSample>;
      iteration: number;
      isTrial: boolean;
      resolve: (r: { control: TSample; experiment: TSample }) => void;
      reject: (e: unknown) => void;
    };

class SamplerPool<TSample> {
  constructor(slots: number, sampleTimeoutMs: number, raceCancellation?: RaceCancellation);

  submitSingle(task): Promise<TSample>;   // claims 1 slot
  submitPair(task):   Promise<{...}>;     // claims 2 slots (barrier)

  dispose(): Promise<void>;               // tear down all live samplers
}
```

**Worker model (internal to the pool):**
- The pool owns N "slots". Each slot may be empty or bound to an active
  `BenchmarkSampler` (one Chrome child process).
- A slot is addressed by `(testKey, group)` — i.e. the specific URL+test
  combination its Chrome is set up for.
- Scheduler loop (single coroutine):
  1. Pick the oldest task whose required slot(s) can be acquired. "Acquired"
     means: a slot already bound to the right `(testKey, group)` exists AND
     is idle, OR a free slot exists (empty or bound to a drained test) that
     we can bind/rebind for this task.
  2. If a pair task: we need two slots — one for `(testKey, 'control')` and
     one for `(testKey, 'experiment')`. Both must be simultaneously
     available; else skip this task and try the next one (head-of-line
     avoidance keeps the queue honest — see "FIFO vs HoL").
  3. Bind slots (spawning/disposing Chromes as needed), then start the
     sample(s). For pairs, shuffle the start order, then run both via
     `Promise.allSettled` for barrier-sync — same logic as today's
     `runOnePair` in `run.ts:206`.
  4. On completion: mark slots idle, wake the scheduler.
- `dispose()`: drain the queue (reject any pending tasks via
  `raceCancellation`), await in-flight tasks, dispose all bound samplers via
  `disposeAllSamplerSets`-style Promise.all.

**FIFO vs head-of-line blocking:** A strict FIFO would stall when the head
task is a pair needing 2 free slots but only 1 is free. To preserve the
"tasks executed in order" intent without starvation, use the convention:
try head first; if its slot requirement cannot be met *now*, scan forward
for the first task that can be served. (This is the same allowance the
current `run.ts` implicitly makes across its workers.) Shuffling and pair
barriers are unaffected.

**Worker migration (the "free slots help remaining tests" behavior):** Falls
out of the design for free. When a test's work is drained, its slots go
idle. The next scheduler tick assigns those idle slots to tasks from
whichever tests still have queued work, rebinding Chromes as needed
(dispose old sampler, call `benchmark.setup()` for the new `(testKey,
group)`). Chrome respawn cost (~3s) is amortized across the rest of that
test's iterations.

## Refactored `run()`

`run()` no longer owns a pool. It becomes a thin per-test coroutine that
submits into a pool passed in from the caller.

```ts
export default async function run<TSample>(
  benchmarks: [Benchmark<TSample>, Benchmark<TSample>],  // [control, experiment]
  iterations: number,
  progress: SampleProgressCallback,
  pool: SamplerPool<TSample>,
  options: { samplingMode: SamplingMode; testKey: string; ... }
): Promise<SampleGroup<TSample>[]> {
  const [control, experiment] = benchmarks;

  // 1. Warmup — one trial per test, awaited before measurements.
  if (samplingMode === 'simultaneous') {
    await pool.submitPair({ testKey, control, experiment, iteration: 0, isTrial: true });
  } else {
    // sequential: shuffle the order of the two trial singles (preserves today's behavior)
    const order = shuffleTwo([control, experiment]);
    for (const b of order) {
      await pool.submitSingle({ testKey, benchmark: b, iteration: 0, isTrial: true });
    }
  }

  // 2. Measurements — all iterations submitted via Promise.all.
  const controlSamples: TSample[] = new Array(iterations);
  const experimentSamples: TSample[] = new Array(iterations);

  if (samplingMode === 'simultaneous') {
    await Promise.all(
      range(iterations).map(async (i) => {
        const { control: c, experiment: e } = await pool.submitPair({
          testKey, control, experiment, iteration: i + 1, isTrial: false,
        });
        controlSamples[i] = c;
        experimentSamples[i] = e;
        progress(...);
      })
    );
  } else {
    // sequential: per-iteration shuffled order of [control, experiment] singles
    await Promise.all(
      range(iterations).flatMap((i) => {
        const order = shuffleTwo([control, experiment]);
        return order.map(async (b) => {
          const s = await pool.submitSingle({
            testKey, benchmark: b, iteration: i + 1, isTrial: false,
          });
          (b === control ? controlSamples : experimentSamples)[i] = s;
          progress(...);
        });
      })
    );
  }

  return [
    { group: control.group, samples: controlSamples },
    { group: experiment.group, samples: experimentSamples },
  ];
}
```

Indexing (`controlSamples[i]` / `experimentSamples[i]`) preserves pair
alignment, so paired Wilcoxon / Hodges-Lehmann stats keep working
unchanged. Reuses today's `shuffle()` helper (`run.ts:272`).

## Refactored compare entry

`bench/cli/commands/compare/index.ts` currently creates Benchmarks inside the
loop. The new version hoists one shared pool out and maps all tests into it:

```ts
const pool = new SamplerPool<NavigationSample>(
  compareFlags.parallelism as number,       // in Chrome slots
  (sampleTimeout ?? 120) * 1000,
  raceCancellation,
);

try {
  await Promise.all(tests.map(async (testDef) => {
    const testKey = slugify(testDef.name);
    const control    = createLighthouseBenchmark('control',    controlURL,    testDef, testOptions);
    const experiment = createLighthouseBenchmark('experiment', experimentURL, testDef, testOptions);

    const sampleGroups = await run(
      [control, experiment],
      numberOfMeasurements,
      progressCallback,
      pool,
      { samplingMode, testKey },
    );

    // existing per-test result-writing logic (unchanged):
    writeMeasurements(testResultsFolder, sampleGroups);
  }));
} finally {
  await pool.dispose();
}
```

`parallelism` now directly means Chrome-process slots — no internal halving.
The user-visible default stays `max(1, floor(os.cpus().length / 2))`
(`compare/config.ts:5`).

## Deprecation of `sampling-mode: sequential`

In `compare/config.ts`, after the Zod `safeParse` at line 98, add:

```ts
if (result.data.perf.samplingMode === 'sequential') {
  console.warn(
    '[shaka-perf] perf.samplingMode "sequential" is deprecated and retained ' +
    'only for scientific comparison against "simultaneous". ' +
    'See NOISE_RESISTANT_PERF_TESTS_STUDY.md for why.'
  );
}
```

And add a JSDoc `@deprecated` tag on the enum in `PerfConfigSchema` (lines
69-71) describing the same. The enum stays — no user breaking change.

## Shuffling + pair-barrier preservation (explicit)

- Simultaneous pair tasks start their two samples via `Promise.allSettled`
  of shuffled-order `sampleOne` calls, exactly like today's `runOnePair`
  (`run.ts:206`). Both must complete before the pair's `resolve` fires.
- Sequential mode shuffles the two group starts per iteration (same as
  today's `sampleOne` loop at `run.ts:196-201`), just expressed as two
  `submitSingle` calls in shuffled order within each iteration.

## Verification

**Smoke:**
```bash
cd demo-ecommerce
yarn shaka-perf twins-start       # if not already up
yarn build
yarn shaka-perf compare --parallelism 4 --numberOfMeasurements 5
```
Check `compare-results/report.html` renders and each test's measurement
count equals `numberOfMeasurements`.

**Cross-test parallelism:** With ≥ 2 tests discovered and `parallelism >= 4`,
add a one-line `console.log(testKey, iteration)` in `runOneTest` and
confirm log lines interleave across testKeys (today they never interleave).

**Paired-stats integrity:** Run the noise-resilience jest suite
unchanged:
```bash
cd packages/shaka-perf
yarn jest noise-resilience
```
It exercises both sampling modes across `par=1` and `par=3`. If paired
alignment or shuffling broke, p-values degrade visibly (see Q1-Q7
predictions in `NOISE_RESISTANT_PERF_TESTS_STUDY.md`).

**Sequential warning:** Add `perf: { samplingMode: 'sequential' }` to a
local `abtests.config.ts` and verify the deprecation message prints to
stderr on `shaka-perf compare`.

**Disposal:** No orphan Chrome processes after `compare` exits
(`ps aux | grep -i chrome`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speed up perf-tests by more efficient parallelism #11

Alter Parallelism in Perf Measuring

Context

Files

New module: `sampler-pool.ts`

Refactored `run()`

Refactored compare entry

Deprecation of `sampling-mode: sequential`

Shuffling + pair-barrier preservation (explicit)

Verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

speed up perf-tests by more efficient parallelism #11

Description

Alter Parallelism in Perf Measuring

Context

Files

New module: sampler-pool.ts

Refactored run()

Refactored compare entry

Deprecation of sampling-mode: sequential

Shuffling + pair-barrier preservation (explicit)

Verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

New module: `sampler-pool.ts`

Refactored `run()`

Deprecation of `sampling-mode: sequential`