Skip to content

feat: add planner-wide LOC bounds with backend-specific repair #6

@vitali87

Description

@vitali87

Summary

Add planner-wide LOC bound support so pr-split can target both a lower and upper group-size bound instead of treating max_loc as a warning-only hint.

This should be framed as a split-planning capability across all backends, not only as an LLM refinement feature. The current system supports llm, graph, and cp_sat partitioning, and size-bound behavior should fit that architecture.

Why this matters

pr-split is explicitly trying to produce small, reviewable PRs. Today:

  • max_loc is a soft limit and only emits warnings.
  • There is no first-class min_loc.
  • Scoring already penalizes both LOC overflow and very small groups, but those are optimization pressures, not user-visible controls.

That leaves a gap between the product goal and the actual controls users can rely on.

Current state

  • split exposes --max-loc, but not --min-loc.
  • Validation only warns when a group exceeds max_loc.
  • Scoring penalizes overflow and tiny groups, but does not enforce user-supplied bounds.
  • The planner entry point routes through multiple backends:
    • llm
    • graph
    • cp_sat

Because of that, the problem should be defined as bound-aware planning, with backend-specific implementations.

Proposed behavior

User-facing controls

  1. Add --min-loc with a conservative default or None/disabled-by-default behavior.
  2. Keep --max-loc as the upper target bound.
  3. Add --strict-loc-bounds to fail instead of proceeding when a final plan still violates bounds.
  4. Add --max-refinement-iterations only for backends that actually use iterative repair.

Validation and plan metadata

  1. Add min_loc to runtime settings and persisted split-plan metadata.
  2. Validate min_loc < max_loc when both are set.
  3. Introduce structured LOC bound violation detection:
    • undersized group
    • oversized group
  4. Report violations consistently in logs and CLI output.

Planner behavior

  1. The final plan should be evaluated against both lower and upper size bounds.
  2. Backend support should be staged:
    • cp_sat: extend the objective/constraints to prefer or enforce the lower bound in the optimizer itself.
    • graph: add a deterministic repair/merge pass for undersized groups and better resistance to oversized groups.
    • llm: add an optional refinement loop that repairs violations after the initial plan.
  3. If a backend cannot satisfy the bounds exactly, it should return the best plan it can plus explicit violation reporting.
  4. Strict failure should be opt-in, not the default.

Recommended implementation plan

Phase 1: shared configuration and validation

  • Add:
    • min_loc
    • optional strict-bound mode
    • optional refinement-iteration cap
  • Persist min_loc in SplitPlan.
  • Add cross-field validation for min_loc and max_loc.
  • Replace warning-only overflow checks with structured bound-violation detection that supports both underflow and overflow.
  • Update logs, CLI summaries, and tests.

Phase 2: scoring and shared helpers

  • Update plan metrics so lower-bound violations are visible in scoring and logs.
  • Introduce shared helpers for:
    • detecting violations
    • formatting violation summaries
    • deciding whether a plan is acceptable, warning-only, or strict-fail

Phase 3: backend-specific support

cp_sat

  • Prefer implementing lower-bound handling in the model itself, since this backend already owns the most formal objective.
  • Possible options:
    • soft underflow penalty variables
    • optional hard lower-bound constraints where feasible

graph

  • Add a post-partition repair pass that merges or reassigns undersized groups where doing so does not create invalid overlap or pathological scatter.
  • Preserve the backend’s deterministic behavior.

llm

  • Add an optional repair loop that:
    • detects current violations
    • supplies the current plan plus focused violation context
    • asks for a complete revised plan
    • repeats until convergence or iteration limit
  • This loop should be one backend-specific repair strategy, not the core definition of the feature.

Non-goals

  • Do not make iterative LLM refinement the only implementation path.
  • Do not require exact bound satisfaction by default when the diff structure makes that unrealistic.
  • Do not introduce a fixed min_loc default that is obviously disconnected from max_loc without justification.

Open questions

  • Should min_loc default to disabled, or to a derived value such as a fraction of max_loc?
  • Should strict mode fail on any remaining violation, or only when the planner never improved from the initial plan?
  • For cp_sat, do we want soft lower-bound penalties first, or full hard bounds immediately?
  • For graph, is a merge-only repair pass sufficient, or do we need local reassignment as well?

Acceptance criteria

  • Users can configure lower and upper LOC bounds.
  • The plan model persists the configured bounds needed to understand later plan state.
  • Validation can identify both undersized and oversized groups.
  • Each backend has a clear strategy for handling bounds, even if the implementation lands in phases.
  • CLI/log output clearly distinguishes:
    • satisfied bounds
    • warning-only violations
    • strict-mode failures

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions