Skip to content

feat: per-queue resource awareness with measured CPU/memory estimates#18

Merged
sylvesterdamgaard merged 12 commits into
mainfrom
per-queue-resource-aware
Apr 30, 2026
Merged

feat: per-queue resource awareness with measured CPU/memory estimates#18
sylvesterdamgaard merged 12 commits into
mainfrom
per-queue-resource-aware

Conversation

@sylvesterdamgaard
Copy link
Copy Markdown
Contributor

@sylvesterdamgaard sylvesterdamgaard commented Apr 30, 2026

Summary

Resolves #14 — Phase 1 + Phase 2 foundation for per-queue resource-aware capacity calculations.

  • New ResourceEstimate value object — carries per-worker CPU cores and memory MB estimates with per-dimension source metadata (Measured, Config, Default) and sample counts
  • New ResourceEstimateResolver singleton — three-source precedence chain: measured runtime data > per-queue config override > global default. Each dimension (CPU, memory) resolves independently
  • CapacityCalculator refactoredcalculateMaxWorkers() now requires a ResourceEstimate parameter instead of reading from internal state; memory calculations are now adaptive (were previously static config only)
  • AutoscaleManager writes per-queue measured estimates — replaces the old updateMeasuredWorkerCpuEstimate() (global weighted average across all jobs) with updateMeasuredResourceEstimates() that groups by connection:queue and tracks both CPU and memory
  • Per-queue resources config — operators can declare cpu_cores and memory_mb per queue as cold-start fallbacks before measured data is available

What this enables

For heterogeneous workloads (e.g. fast 50MB jobs vs slow 2GB jobs on different queues), the autoscaler now uses the actual resource shape of each queue for capacity math instead of flattening everything to a single global average. This prevents over-provisioning fast queues and under-provisioning heavy queues.

Architecture

ResourceEstimateResolver (singleton)
  ├─ AutoscaleManager → writes per-queue measured data each tick
  ├─ ScalingEngine    → reads resolved estimate before capacity call
  └─ CapacityCalculator → pure function of (workers, estimate, system metrics)

Resolution order per dimension: measuredqueue.resources configlimits.worker_*_estimate

Not included (future phases)

  • Phase 3: Resource-aware bin-packing in distributeClusterTarget() — the resolver and per-queue estimates are in place, distribution just needs to consume them
  • Confidence-based fallback (e.g. ignore measured if sampleCount < 50)

Test plan

  • 525 tests passing (1473 assertions)
  • PHPStan clean (1 pre-existing property.unusedType warning, not introduced by this PR)
  • Pint clean
  • New unit tests for ResourceEstimate, ResourceEstimateResolver (11 tests covering all precedence paths, clamping, partial measurement, queue isolation, reset)
  • Integration tests verifying heterogeneous workloads produce different capacity results
  • All existing CapacityCalculator, ScalingEngine, policy, cluster, and simulation tests updated and passing

sylvesterdamgaard and others added 12 commits April 30, 2026 09:18
Add AutoscaleConfiguration::queueResources() static method to read per-queue
CPU/memory resource overrides from config, with tests and updated config docs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oves internal estimate state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…city calculation

Injects ResourceEstimateResolver into ScalingEngine and calls resolve() with
the queue's connection/queue name in evaluate(), passing the resulting
ResourceEstimate to CapacityCalculator::calculateMaxWorkers(). Updates all
test consumers (unit, feature, simulation) to supply the new resolver argument.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lt()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nto resolver

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…solver

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sylvesterdamgaard sylvesterdamgaard merged commit 3e130f9 into main Apr 30, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-queue resource awareness: track and use heterogeneous CPU/memory estimates from queue-metrics

1 participant