-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Summary
Add planner-wide LOC bound support so pr-split can target both a lower and upper group-size bound instead of treating max_loc as a warning-only hint.
This should be framed as a split-planning capability across all backends, not only as an LLM refinement feature. The current system supports llm, graph, and cp_sat partitioning, and size-bound behavior should fit that architecture.
Why this matters
pr-split is explicitly trying to produce small, reviewable PRs. Today:
max_locis a soft limit and only emits warnings.- There is no first-class
min_loc. - Scoring already penalizes both LOC overflow and very small groups, but those are optimization pressures, not user-visible controls.
That leaves a gap between the product goal and the actual controls users can rely on.
Current state
splitexposes--max-loc, but not--min-loc.- Validation only warns when a group exceeds
max_loc. - Scoring penalizes overflow and tiny groups, but does not enforce user-supplied bounds.
- The planner entry point routes through multiple backends:
llmgraphcp_sat
Because of that, the problem should be defined as bound-aware planning, with backend-specific implementations.
Proposed behavior
User-facing controls
- Add
--min-locwith a conservative default orNone/disabled-by-default behavior. - Keep
--max-locas the upper target bound. - Add
--strict-loc-boundsto fail instead of proceeding when a final plan still violates bounds. - Add
--max-refinement-iterationsonly for backends that actually use iterative repair.
Validation and plan metadata
- Add
min_locto runtime settings and persisted split-plan metadata. - Validate
min_loc < max_locwhen both are set. - Introduce structured LOC bound violation detection:
- undersized group
- oversized group
- Report violations consistently in logs and CLI output.
Planner behavior
- The final plan should be evaluated against both lower and upper size bounds.
- Backend support should be staged:
cp_sat: extend the objective/constraints to prefer or enforce the lower bound in the optimizer itself.graph: add a deterministic repair/merge pass for undersized groups and better resistance to oversized groups.llm: add an optional refinement loop that repairs violations after the initial plan.
- If a backend cannot satisfy the bounds exactly, it should return the best plan it can plus explicit violation reporting.
- Strict failure should be opt-in, not the default.
Recommended implementation plan
Phase 1: shared configuration and validation
- Add:
min_loc- optional strict-bound mode
- optional refinement-iteration cap
- Persist
min_locinSplitPlan. - Add cross-field validation for
min_locandmax_loc. - Replace warning-only overflow checks with structured bound-violation detection that supports both underflow and overflow.
- Update logs, CLI summaries, and tests.
Phase 2: scoring and shared helpers
- Update plan metrics so lower-bound violations are visible in scoring and logs.
- Introduce shared helpers for:
- detecting violations
- formatting violation summaries
- deciding whether a plan is acceptable, warning-only, or strict-fail
Phase 3: backend-specific support
cp_sat
- Prefer implementing lower-bound handling in the model itself, since this backend already owns the most formal objective.
- Possible options:
- soft underflow penalty variables
- optional hard lower-bound constraints where feasible
graph
- Add a post-partition repair pass that merges or reassigns undersized groups where doing so does not create invalid overlap or pathological scatter.
- Preserve the backend’s deterministic behavior.
llm
- Add an optional repair loop that:
- detects current violations
- supplies the current plan plus focused violation context
- asks for a complete revised plan
- repeats until convergence or iteration limit
- This loop should be one backend-specific repair strategy, not the core definition of the feature.
Non-goals
- Do not make iterative LLM refinement the only implementation path.
- Do not require exact bound satisfaction by default when the diff structure makes that unrealistic.
- Do not introduce a fixed
min_locdefault that is obviously disconnected frommax_locwithout justification.
Open questions
- Should
min_locdefault to disabled, or to a derived value such as a fraction ofmax_loc? - Should strict mode fail on any remaining violation, or only when the planner never improved from the initial plan?
- For
cp_sat, do we want soft lower-bound penalties first, or full hard bounds immediately? - For
graph, is a merge-only repair pass sufficient, or do we need local reassignment as well?
Acceptance criteria
- Users can configure lower and upper LOC bounds.
- The plan model persists the configured bounds needed to understand later plan state.
- Validation can identify both undersized and oversized groups.
- Each backend has a clear strategy for handling bounds, even if the implementation lands in phases.
- CLI/log output clearly distinguishes:
- satisfied bounds
- warning-only violations
- strict-mode failures
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels