Skip to content

speed up perf-tests by more efficient parallelism #11

Description

@Romex91

Alter Parallelism in Perf Measuring

Context

Today: bench/cli/commands/compare/index.ts:131 runs discovered .abtest.ts
tests one-at-a-time in a plain for loop. Inside each test,
bench/core/run.ts:59-106 (run()) builds a per-test worker pool of size
parallelism (halved internally in simultaneous mode so Chrome-process count
roughly matches the flag). There is no cross-test parallelism — test shakacode/shakaperf-old#2
cannot start before test shakacode/shakaperf-old#1 fully finishes.

Problem: With 10 tests × 20 measurements each on a 16-core box, one test
at a time is a poor use of cores. When many tests exist, we want them to run
concurrently, each using the minimum parallelism (1 pair-worker = 2 Chrome
slots). When few tests remain, free slots should help in-flight tests finish
faster.

Goal: Introduce a single global sampler pool sized by parallelism
(in Chrome-process slots). Each test submits its warmup and measurements to
this shared pool. Cross-test parallelism emerges naturally: FIFO scheduling
with pair-tasks taking 2 slots means floor(parallelism / 2) tests make
progress concurrently; as tests drain, spare workers migrate onto remaining
work.

Non-goals / constraints (must preserve):

  • Both sampling modes stay functional. The noise-resilience study isn't
    done; sequential is kept (with a runtime deprecation warning pointing at
    NOISE_RESISTANT_PERF_TESTS_STUDY.md) so simultaneous can still be proven
    superior.
  • Barrier-synced paired measurements in simultaneous mode — control and
    experiment for a given iteration start together, both complete before the
    pair resolves. Breaking this destroys statistical sensitivity.
  • Shuffling of control/experiment order within each iteration stays.
  • Per-test warmup (trial iteration) stays, once per test, before measurements.

Files

Refactor:

  • packages/shaka-perf/src/bench/core/run.ts — replace the internal worker
    pool (lines 85-171) with calls into the new shared pool. Keep shuffling
    (line 272) and pair-barrier logic (line 206), but delegate scheduling.
    Remove the parallelism / 2 halving at line 81-83 (pool handles it).
  • packages/shaka-perf/src/bench/cli/commands/compare/index.ts — the
    for (const testDef of tests) loop at line 131 becomes
    await Promise.all(tests.map(runOneTest)). One shared pool instance is
    created before the map and disposed after.
  • packages/shaka-perf/src/compare/config.ts — after Zod parse of
    PerfConfigSchema (lines 61-77), emit a console.warn(...) if
    samplingMode === 'sequential', referencing the study file. Also update
    the JSDoc on the samplingMode enum entry.

Create:

  • packages/shaka-perf/src/bench/core/sampler-pool.ts — the shared pool.

New module: sampler-pool.ts

A single class SamplerPool owning up to parallelism concurrent Chrome
samplers ("slots") and a FIFO task queue.

// Each task is scheduled by runOneTest (warmup or measurement).
type Task<TSample> =
  | {
      kind: 'single';
      testKey: string;
      benchmark: Benchmark<TSample>;  // .setup() spawns a Chrome sampler
      iteration: number;
      isTrial: boolean;
      resolve: (s: TSample) => void;
      reject: (e: unknown) => void;
    }
  | {
      kind: 'pair';
      testKey: string;
      control: Benchmark<TSample>;
      experiment: Benchmark<TSample>;
      iteration: number;
      isTrial: boolean;
      resolve: (r: { control: TSample; experiment: TSample }) => void;
      reject: (e: unknown) => void;
    };

class SamplerPool<TSample> {
  constructor(slots: number, sampleTimeoutMs: number, raceCancellation?: RaceCancellation);

  submitSingle(task): Promise<TSample>;   // claims 1 slot
  submitPair(task):   Promise<{...}>;     // claims 2 slots (barrier)

  dispose(): Promise<void>;               // tear down all live samplers
}

Worker model (internal to the pool):

  • The pool owns N "slots". Each slot may be empty or bound to an active
    BenchmarkSampler (one Chrome child process).
  • A slot is addressed by (testKey, group) — i.e. the specific URL+test
    combination its Chrome is set up for.
  • Scheduler loop (single coroutine):
    1. Pick the oldest task whose required slot(s) can be acquired. "Acquired"
      means: a slot already bound to the right (testKey, group) exists AND
      is idle, OR a free slot exists (empty or bound to a drained test) that
      we can bind/rebind for this task.
    2. If a pair task: we need two slots — one for (testKey, 'control') and
      one for (testKey, 'experiment'). Both must be simultaneously
      available; else skip this task and try the next one (head-of-line
      avoidance keeps the queue honest — see "FIFO vs HoL").
    3. Bind slots (spawning/disposing Chromes as needed), then start the
      sample(s). For pairs, shuffle the start order, then run both via
      Promise.allSettled for barrier-sync — same logic as today's
      runOnePair in run.ts:206.
    4. On completion: mark slots idle, wake the scheduler.
  • dispose(): drain the queue (reject any pending tasks via
    raceCancellation), await in-flight tasks, dispose all bound samplers via
    disposeAllSamplerSets-style Promise.all.

FIFO vs head-of-line blocking: A strict FIFO would stall when the head
task is a pair needing 2 free slots but only 1 is free. To preserve the
"tasks executed in order" intent without starvation, use the convention:
try head first; if its slot requirement cannot be met now, scan forward
for the first task that can be served. (This is the same allowance the
current run.ts implicitly makes across its workers.) Shuffling and pair
barriers are unaffected.

Worker migration (the "free slots help remaining tests" behavior): Falls
out of the design for free. When a test's work is drained, its slots go
idle. The next scheduler tick assigns those idle slots to tasks from
whichever tests still have queued work, rebinding Chromes as needed
(dispose old sampler, call benchmark.setup() for the new (testKey, group)). Chrome respawn cost (~3s) is amortized across the rest of that
test's iterations.

Refactored run()

run() no longer owns a pool. It becomes a thin per-test coroutine that
submits into a pool passed in from the caller.

export default async function run<TSample>(
  benchmarks: [Benchmark<TSample>, Benchmark<TSample>],  // [control, experiment]
  iterations: number,
  progress: SampleProgressCallback,
  pool: SamplerPool<TSample>,
  options: { samplingMode: SamplingMode; testKey: string; ... }
): Promise<SampleGroup<TSample>[]> {
  const [control, experiment] = benchmarks;

  // 1. Warmup — one trial per test, awaited before measurements.
  if (samplingMode === 'simultaneous') {
    await pool.submitPair({ testKey, control, experiment, iteration: 0, isTrial: true });
  } else {
    // sequential: shuffle the order of the two trial singles (preserves today's behavior)
    const order = shuffleTwo([control, experiment]);
    for (const b of order) {
      await pool.submitSingle({ testKey, benchmark: b, iteration: 0, isTrial: true });
    }
  }

  // 2. Measurements — all iterations submitted via Promise.all.
  const controlSamples: TSample[] = new Array(iterations);
  const experimentSamples: TSample[] = new Array(iterations);

  if (samplingMode === 'simultaneous') {
    await Promise.all(
      range(iterations).map(async (i) => {
        const { control: c, experiment: e } = await pool.submitPair({
          testKey, control, experiment, iteration: i + 1, isTrial: false,
        });
        controlSamples[i] = c;
        experimentSamples[i] = e;
        progress(...);
      })
    );
  } else {
    // sequential: per-iteration shuffled order of [control, experiment] singles
    await Promise.all(
      range(iterations).flatMap((i) => {
        const order = shuffleTwo([control, experiment]);
        return order.map(async (b) => {
          const s = await pool.submitSingle({
            testKey, benchmark: b, iteration: i + 1, isTrial: false,
          });
          (b === control ? controlSamples : experimentSamples)[i] = s;
          progress(...);
        });
      })
    );
  }

  return [
    { group: control.group, samples: controlSamples },
    { group: experiment.group, samples: experimentSamples },
  ];
}

Indexing (controlSamples[i] / experimentSamples[i]) preserves pair
alignment, so paired Wilcoxon / Hodges-Lehmann stats keep working
unchanged. Reuses today's shuffle() helper (run.ts:272).

Refactored compare entry

bench/cli/commands/compare/index.ts currently creates Benchmarks inside the
loop. The new version hoists one shared pool out and maps all tests into it:

const pool = new SamplerPool<NavigationSample>(
  compareFlags.parallelism as number,       // in Chrome slots
  (sampleTimeout ?? 120) * 1000,
  raceCancellation,
);

try {
  await Promise.all(tests.map(async (testDef) => {
    const testKey = slugify(testDef.name);
    const control    = createLighthouseBenchmark('control',    controlURL,    testDef, testOptions);
    const experiment = createLighthouseBenchmark('experiment', experimentURL, testDef, testOptions);

    const sampleGroups = await run(
      [control, experiment],
      numberOfMeasurements,
      progressCallback,
      pool,
      { samplingMode, testKey },
    );

    // existing per-test result-writing logic (unchanged):
    writeMeasurements(testResultsFolder, sampleGroups);
  }));
} finally {
  await pool.dispose();
}

parallelism now directly means Chrome-process slots — no internal halving.
The user-visible default stays max(1, floor(os.cpus().length / 2))
(compare/config.ts:5).

Deprecation of sampling-mode: sequential

In compare/config.ts, after the Zod safeParse at line 98, add:

if (result.data.perf.samplingMode === 'sequential') {
  console.warn(
    '[shaka-perf] perf.samplingMode "sequential" is deprecated and retained ' +
    'only for scientific comparison against "simultaneous". ' +
    'See NOISE_RESISTANT_PERF_TESTS_STUDY.md for why.'
  );
}

And add a JSDoc @deprecated tag on the enum in PerfConfigSchema (lines
69-71) describing the same. The enum stays — no user breaking change.

Shuffling + pair-barrier preservation (explicit)

  • Simultaneous pair tasks start their two samples via Promise.allSettled
    of shuffled-order sampleOne calls, exactly like today's runOnePair
    (run.ts:206). Both must complete before the pair's resolve fires.
  • Sequential mode shuffles the two group starts per iteration (same as
    today's sampleOne loop at run.ts:196-201), just expressed as two
    submitSingle calls in shuffled order within each iteration.

Verification

Smoke:

cd demo-ecommerce
yarn shaka-perf twins-start       # if not already up
yarn build
yarn shaka-perf compare --parallelism 4 --numberOfMeasurements 5

Check compare-results/report.html renders and each test's measurement
count equals numberOfMeasurements.

Cross-test parallelism: With ≥ 2 tests discovered and parallelism >= 4,
add a one-line console.log(testKey, iteration) in runOneTest and
confirm log lines interleave across testKeys (today they never interleave).

Paired-stats integrity: Run the noise-resilience jest suite
unchanged:

cd packages/shaka-perf
yarn jest noise-resilience

It exercises both sampling modes across par=1 and par=3. If paired
alignment or shuffling broke, p-values degrade visibly (see Q1-Q7
predictions in NOISE_RESISTANT_PERF_TESTS_STUDY.md).

Sequential warning: Add perf: { samplingMode: 'sequential' } to a
local abtests.config.ts and verify the deprecation message prints to
stderr on shaka-perf compare.

Disposal: No orphan Chrome processes after compare exits
(ps aux | grep -i chrome).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions