Skip to content

feat: add shared LOC bounds configuration and validation#28

Merged
vitali87 merged 2 commits intomainfrom
feat/loc-bounds-foundation
Mar 24, 2026
Merged

feat: add shared LOC bounds configuration and validation#28
vitali87 merged 2 commits intomainfrom
feat/loc-bounds-foundation

Conversation

@vitali87
Copy link
Owner

Summary

  • add shared min/max LOC bounds configuration and persisted plan metadata
  • add structured LOC bound violation detection plus strict split-time enforcement
  • expose underflow metrics and document the new CLI/env options

Part of #6.

Testing

  • uv run ruff check pr_split tests
  • uv run pytest -q

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the pr-split tool's ability to manage the size of generated sub-PRs. It introduces granular control over lines of code (LOC) by allowing users to specify both minimum and maximum bounds. Furthermore, it provides a strict enforcement mechanism for these bounds and improves the reporting of plan metrics to include underflow, offering a more comprehensive understanding of the split plan's adherence to desired size constraints.

Highlights

  • Lines of Code (LOC) Bounds Configuration: Introduced new CLI options (--min-loc, --strict-loc-bounds) and environment variables (PR_SPLIT_MIN_LOC, PR_SPLIT_STRICT_LOC_BOUNDS) to allow users to define both minimum and maximum LOC bounds for sub-PRs, enhancing control over PR size.
  • Strict LOC Bound Enforcement: Implemented a strict mode (--strict-loc-bounds) that causes the pr-split tool to exit if the final plan violates the configured LOC bounds, ensuring adherence to defined limits.
  • Structured LOC Violation Detection and Metrics: Added structured detection for LOC bound violations, distinguishing between groups below the minimum and above the maximum. New 'underflow' metrics are now exposed in the plan summary, providing better insights into PR sizing.
  • Persisted Plan Metadata: The new min_loc and strict_loc_bounds settings are now persisted as part of the split plan metadata, ensuring consistency across runs and better traceability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR adds shared min_loc/max_loc/strict_loc_bounds configuration to the Settings model, persists those bounds in the saved SplitPlan, and introduces structured LOC-bound violation detection with a new LocBoundViolation named-tuple. Violations are now surfaced as either non-fatal warnings (default) or hard exits (--strict-loc-bounds).

Key changes:

  • Settings / config.py: adds min_loc (Optional[int], ge=1), strict_loc_bounds, and a validate_loc_bounds model validator that enforces min_loc < max_loc; all ValueError-raising validators now correctly surface as pydantic.ValidationError and are caught accordingly in both cli.py and tests/test_config.py.
  • validator.py: splits validate_loc_bounds into detect_loc_bound_violations (returns structured LocBoundViolation objects) + formatting helpers, adding BELOW_MIN detection alongside the existing ABOVE_MAX path.
  • scoring.py: adds loc_underflow and undersized_groups to PlanMetrics and folds them into the objective function. loc_underflow is logged in PLAN_METRICS; undersized_groups is computed but not yet surfaced in the log message.
  • cli.py: new _handle_loc_bound_warnings helper centralises warn-vs-exit logic; Settings construction is wrapped in try/(ValidationError, ValueError) for a clean CLI error message.
  • Tests: comprehensive coverage added for all new paths, including env-var wiring, strict-mode exit, post-edit re-validation, and plan serialisation round-trip.

Confidence Score: 4/5

  • PR is safe to merge; the one logic-level gap (undersized_groups never logged) is a missing observability feature, not a correctness defect.
  • All previously flagged issues (duplicate warning output, ValueError vs ValidationError, help-text inconsistency) have been resolved. The ValidationError import and catch are correct. _handle_loc_bound_warnings behaves correctly in both branches and the typer.Exit it raises correctly escapes the surrounding except PRSplitError handler. One minor gap: undersized_groups is computed in score_plan and returned in PlanMetrics but is never logged or otherwise surfaced, unlike loc_underflow which was correctly wired into the log template. The test mock in test_client.py is also missing this field, which will become a latent AttributeError once the log call is updated.
  • pr_split/planner/scoring.py and pr_split/planner/client.pyundersized_groups metric needs to be added to the PLAN_METRICS log format to match the treatment of loc_underflow.

Important Files Changed

Filename Overview
pr_split/cli.py Adds --min-loc, --max-loc re-wiring and --strict-loc-bounds CLI options; properly catches ValidationError from Settings construction; introduces _handle_loc_bound_warnings helper used consistently in both initial and post-edit validation paths.
pr_split/config.py Adds min_loc, strict_loc_bounds fields and validate_loc_bounds model validator; correctly uses Field(ge=1) on `int
pr_split/planner/scoring.py Introduces loc_underflow and undersized_groups metrics. loc_underflow is correctly added to the log format; undersized_groups is computed but never logged or surfaced anywhere — likely an oversight.
pr_split/planner/validator.py Refactors to separate structured LocBoundViolation detection from string formatting; adds BELOW_MIN violation type alongside the existing ABOVE_MAX path.
pr_split/schemas.py Persists min_loc and strict_loc_bounds into SplitPlan; backward-compatible defaults (None and False) preserve existing serialised plans.
tests/test_client.py Mock for score_plan was updated to add loc_underflow but is missing the new undersized_groups attribute; will cause AttributeError once that field is wired into the log call.
tests/test_config.py All existing tests updated to pytest.raises(ValidationError, ...) correctly; new TestSettingsLocBoundsValidation class covers the new min/max validation logic comprehensively.
tests/test_cli_new_features.py Adds well-structured tests for _handle_loc_bound_warnings and the new CLI env-var/strict-mode paths; correctly asserts save_plan is not called when strict mode exits early.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["CLI: split()"] --> B["Build Settings\n(min_loc, max_loc, strict_loc_bounds)"]
    B -->|"ValidationError"| C["Print error & Exit(1)"]
    B --> D["plan_split(parsed_diff, settings)"]
    D --> E["score_plan(groups, max_loc, min_loc)\n→ PlanMetrics incl. loc_underflow, undersized_groups"]
    E --> F["validate_plan(groups, …, max_loc, min_loc)"]
    F --> G["detect_loc_bound_violations()\n→ list[LocBoundViolation]"]
    G --> H["validate_loc_bounds()\n→ list[str warnings]"]
    H --> I["_handle_loc_bound_warnings(warnings, strict_loc_bounds)"]
    I -->|"strict=True & warnings"| J["Print violations & Exit(1)"]
    I -->|"strict=False or no warnings"| K["logger.warning each\n(non-fatal)"]
    K --> L["_interactive_edit()"]
    L --> M["Re-validate (same path F→I)"]
    M -->|"Exit(1)"| N["Abort — save_plan NOT called"]
    M -->|"OK"| O["SplitPlan(min_loc, max_loc, strict_loc_bounds)\nsave_plan / create PRs"]
Loading

Comments Outside Diff (3)

  1. pr_split/planner/scoring.py, line 52-57 (link)

    undersized_groups computed but never logged

    undersized_groups is a new field added to PlanMetrics alongside loc_underflow, but only loc_underflow was wired into the PLAN_METRICS log template (logs.py) and the logger.info call in client.py. undersized_groups is computed here and returned in the dataclass but is never surfaced to the user anywhere — not in logs, not in console output. This is inconsistent with how loc_underflow was introduced (it was both added to the dataclass and to the log message).

    To surface it, update logs.PLAN_METRICS and client.py's logger call:

    # logs.py
    PLAN_METRICS = (
        "Plan metrics: groups={groups}, max_group_loc={max_loc}, underflow={underflow}, "
        "undersized={undersized}, overflow={overflow}, width={width}, depth={depth}, "
        "scatter={scatter}, objective={objective}"
    )
    # client.py
    logger.info(
        logs.PLAN_METRICS.format(
            ...
            underflow=metrics.loc_underflow,
            undersized=metrics.undersized_groups,
            overflow=metrics.loc_overflow,
            ...
        )
    )
  2. tests/test_client.py, line 237-250 (link)

    Mock missing undersized_groups attribute

    The score_plan mock return value was updated to add loc_underflow, but is missing the equally new undersized_groups field that was added to PlanMetrics in this PR. Currently this doesn't cause a test failure because client.py's logger call doesn't access metrics.undersized_groups, but once undersized_groups is wired into the log call (see the scoring.py comment), the test will raise AttributeError at runtime.

  3. pr_split/planner/validator.py, line 68-97 (link)

    A group can generate two violations simultaneously

    detect_loc_bound_violations appends a BELOW_MIN violation when estimated_loc < min_loc, and then separately appends an ABOVE_MAX violation when estimated_loc > max_loc. These two conditions are mutually exclusive only because the Settings validator enforces min_loc < max_loc. If that constraint ever changes or if a caller bypasses Settings, a single group could appear in the returned list twice — once for each bound — producing duplicate warning messages.

    An explicit elif would make the mutual exclusivity self-documenting and defensive:

    for group in groups:
        if min_loc is not None and group.estimated_loc < min_loc:
            violations.append(...)
        elif group.estimated_loc > max_loc:
            violations.append(...)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: pr_split/planner/scoring.py
Line: 52-57

Comment:
**`undersized_groups` computed but never logged**

`undersized_groups` is a new field added to `PlanMetrics` alongside `loc_underflow`, but only `loc_underflow` was wired into the `PLAN_METRICS` log template (`logs.py`) and the `logger.info` call in `client.py`. `undersized_groups` is computed here and returned in the dataclass but is never surfaced to the user anywhere — not in logs, not in console output. This is inconsistent with how `loc_underflow` was introduced (it was both added to the dataclass and to the log message).

To surface it, update `logs.PLAN_METRICS` and `client.py`'s logger call:

```python
# logs.py
PLAN_METRICS = (
    "Plan metrics: groups={groups}, max_group_loc={max_loc}, underflow={underflow}, "
    "undersized={undersized}, overflow={overflow}, width={width}, depth={depth}, "
    "scatter={scatter}, objective={objective}"
)
```

```python
# client.py
logger.info(
    logs.PLAN_METRICS.format(
        ...
        underflow=metrics.loc_underflow,
        undersized=metrics.undersized_groups,
        overflow=metrics.loc_overflow,
        ...
    )
)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/test_client.py
Line: 237-250

Comment:
**Mock missing `undersized_groups` attribute**

The `score_plan` mock return value was updated to add `loc_underflow`, but is missing the equally new `undersized_groups` field that was added to `PlanMetrics` in this PR. Currently this doesn't cause a test failure because `client.py`'s logger call doesn't access `metrics.undersized_groups`, but once `undersized_groups` is wired into the log call (see the scoring.py comment), the test will raise `AttributeError` at runtime.

```suggestion
        mock_score.return_value = type(
            "Metrics",
            (),
            {
                "total_groups": 1,
                "max_group_loc": 1,
                "loc_underflow": 0,
                "loc_overflow": 0,
                "dag_width": 1,
                "dag_depth": 1,
                "file_scatter": 0,
                "undersized_groups": 0,
                "objective": 1,
            },
        )()
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: pr_split/planner/validator.py
Line: 68-97

Comment:
**A group can generate two violations simultaneously**

`detect_loc_bound_violations` appends a `BELOW_MIN` violation when `estimated_loc < min_loc`, and then separately appends an `ABOVE_MAX` violation when `estimated_loc > max_loc`. These two conditions are mutually exclusive only because the `Settings` validator enforces `min_loc < max_loc`. If that constraint ever changes or if a caller bypasses `Settings`, a single group could appear in the returned list twice — once for each bound — producing duplicate warning messages.

An explicit `elif` would make the mutual exclusivity self-documenting and defensive:

```python
for group in groups:
    if min_loc is not None and group.estimated_loc < min_loc:
        violations.append(...)
    elif group.estimated_loc > max_loc:
        violations.append(...)
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (4): Last reviewed commit: "fix: align loc bounds CLI with validatio..." | Re-trigger Greptile

@gemini-code-assist
Copy link

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@vitali87 vitali87 force-pushed the feat/loc-bounds-foundation branch from b55b07d to 8591f42 Compare March 24, 2026 10:58
@vitali87
Copy link
Owner Author

@greptileai review

@vitali87 vitali87 merged commit 4c6c727 into main Mar 24, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant