feat: add graph min-loc repair pass by vitali87 · Pull Request #29 · vitali87/pr-split

vitali87 · 2026-03-24T16:25:34Z

Summary

add a deterministic graph-backend repair pass for undersized groups when min_loc is set
keep the graph backend's max bound hard by only merging groups when the combined load still fits within max_loc
avoid merging same-file groups when doing so would interleave hunk order, and add focused graph repair tests

Testing

uv run ruff check pr_split tests
uv run pytest -q

Part of #6

gemini-code-assist · 2026-03-24T16:25:51Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the graph-based partitioning strategy by introducing a new repair pass. This pass intelligently merges undersized groups to meet a specified minimum size ("min_loc"), while strictly adhering to the maximum group size ("max_loc") and preserving the contiguous order of changes within shared files. This ensures more robust and well-formed code partitions.

Highlights

Graph-backend repair pass: Implemented a deterministic repair pass for the graph-based partitioning strategy to address undersized groups when a "min_loc" (minimum lines of code) is specified.
Group merging logic: Ensured that group merging respects the "max_loc" (maximum lines of code) constraint, only combining groups if their total load remains within this limit.
Hunk order preservation: Prevented the merging of groups from the same file if doing so would disrupt the original hunk order, and added specific tests for this graph repair logic.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new feature to the graph-based partitioning strategy, allowing for the repair of undersized groups by merging them based on a min_loc setting. This involves several new helper functions for calculating group load, affinity, and finding the best merge targets, as well as a new _repair_graph_min_loc function. Corresponding tests have been added to verify this new functionality, including its determinism. The review comment points out an inefficiency in the _merge_group_units function due to deep copying and suggests a more performant approach.

gemini-code-assist · 2026-03-24T16:27:49Z

pr_split/planner/partitioning.py

+    merged_group = sorted(
+        grouped_units[target_idx] + grouped_units[source_idx],
+        key=lambda unit: unit.position,
+    )
+    repaired_groups = [list(group_units) for group_units in grouped_units]
+    repaired_groups[target_idx] = merged_group
+    del repaired_groups[source_idx]
+    return repaired_groups


The current implementation of _merge_group_units is inefficient as it creates a deep copy of all groups ([list(group_units) for group_units in grouped_units]) on every merge. This can be costly if there are many groups. Additionally, using del with indices can be subtle to reason about.

A clearer and more performant approach is to build a new list from scratch, which avoids the expensive copy and the del operation.

Suggested change

merged_group = sorted(

grouped_units[target_idx] + grouped_units[source_idx],

key=lambda unit: unit.position,

)

repaired_groups = [list(group_units) for group_units in grouped_units]

repaired_groups[target_idx] = merged_group

del repaired_groups[source_idx]

return repaired_groups

merged_group = sorted(

grouped_units[target_idx] + grouped_units[source_idx],

key=lambda unit: unit.position,

)

repaired_groups = []

for i, group in enumerate(grouped_units):

if i == source_idx:

continue

if i == target_idx:

repaired_groups.append(merged_group)

else:

repaired_groups.append(group)

return repaired_groups

greptile-apps · 2026-03-24T16:36:44Z

Greptile Summary

This PR adds a deterministic post-processing repair pass (_repair_graph_min_loc) to the graph partitioning backend that iteratively merges undersized groups (those below min_loc) into their best neighbours, while respecting the hard max_loc ceiling and preserving hunk-order contiguity for shared files. The implementation fits cleanly into partition_diff as a single line after _group_units_graph.

Key changes:

Four new helpers: _group_load, _group_anchor_position, _group_affinity, _shared_file_merge_is_contiguous, _best_graph_merge_target, _merge_group_units, and the orchestrating _repair_graph_min_loc.
The repair loop is deterministic: sources are processed smallest-load-first, and targets are ranked by a stable 6-element key tuple.
partition_diff calls _repair_graph_min_loc only for PartitionStrategy.GRAPH; the CP-SAT backend is unaffected.
Two new tests (test_min_loc_merges_undersized_groups_when_possible, test_min_loc_repair_is_deterministic) and extended helper signatures for min_loc in the test file.
Both new tests exercise only the same simple two-group UNRELATED_DIFF scenario; the tiebreaking logic in _best_graph_merge_target and the blocked-merge graceful-degradation path are not covered by any test.

Confidence Score: 4/5

Safe to merge; implementation is logically correct and terminates, but test coverage for edge cases could be strengthened.
The core algorithm is correct: the repair loop terminates (bounded by the finite number of groups), all index manipulations are safe (indices are recomputed after each merge), and the Settings validator already rejects min_loc >= max_loc. The contiguity guard prevents hunk-order interleaving. The main gap is test coverage — both new tests use the same trivial two-group scenario, leaving the tiebreaking code path and the permanently-blocked-merge path unexercised. The greedy source-selection order also lacks a rationale comment, which could confuse future maintainers.
tests/test_partitioning_extensive.py — new tests duplicate the same scenario and miss important edge cases.

Important Files Changed

Filename	Overview
pr_split/planner/partitioning.py	Adds a deterministic graph-backend repair pass (`_repair_graph_min_loc`) with four new helper functions; logic is correct and terminates, but the greedy source-selection order lacks a documentation comment explaining its rationale.
tests/test_partitioning_extensive.py	Adds two new graph repair tests and extends helper signatures for `min_loc`; both new tests use the identical two-group `UNRELATED_DIFF` scenario, leaving the tiebreaking logic and the blocked-merge graceful-degradation path untested.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[partition_diff] --> B[_group_units_graph]
    B --> C[_repair_graph_min_loc]
    C --> D{min_loc set AND\ngroups >= 2?}
    D -- No --> E[Return groups as-is]
    D -- Yes --> F[Sort undersized groups\nby load, anchor, idx]
    F --> G{Any undersized group\nhas valid merge target?}
    G -- No --> H[Return repaired groups]
    G -- Yes --> I[_best_graph_merge_target\nfor first source]
    I --> J{merged_load\n<= max_loc?}
    J -- No --> K[Skip target]
    J -- Yes --> L{_shared_file_merge\n_is_contiguous?}
    L -- No --> K
    L -- Yes --> M{merged_underflow\n< current_underflow?}
    M -- No --> K
    M -- Yes --> N[Score merge via\n6-element key tuple]
    N --> O[_merge_group_units\nsource into target]
    O --> F

Prompt To Fix All With AI

This is a comment left during a code review.
Path: tests/test_partitioning_extensive.py
Line: 187-197

Comment:
**Duplicate test scenario reduces determinism-test value**

`test_min_loc_repair_is_deterministic` uses the exact same `UNRELATED_DIFF`, `max_loc=10`, `min_loc=5`, and `ORTHOGONAL` priority as `test_min_loc_merges_undersized_groups_when_possible`. Because `UNRELATED_DIFF` produces exactly two groups with a single forced merge path, the determinism test doesn't exercise any branching in `_best_graph_merge_target`. A truly meaningful determinism test would use a diff with three or more groups so there are multiple valid merge candidates and the tiebreaking logic is actually exercised.

Consider replacing `UNRELATED_DIFF` here with a multi-group fixture, for example a diff with three files each adding 3 LOC (`max_loc=10`, `min_loc=5`), which would produce three undersized groups and have at least two possible merge paths to verify against.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/test_partitioning_extensive.py
Line: 121-133

Comment:
**No test coverage for permanently-blocked repair pass**

There is no test for the case where every possible merge for an undersized group is blocked — either because every pair would exceed `max_loc`, or because contiguity constraints rule out all candidates. In that scenario `_repair_graph_min_loc` returns groups that still violate `min_loc`, which is valid and expected behaviour, but it is currently untested.

A suggested fixture: two files each with 6 LOC (`max_loc=10`, `min_loc=7`). Each file forms a single group of 6 LOC (below `min_loc=7`), but merging them would yield 12 LOC which exceeds `max_loc=10`. The repair pass must leave both groups as-is. Adding this as a test (and asserting `len(groups) == 2`) would confirm the graceful degradation path.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: pr_split/planner/partitioning.py
Line: 289-301

Comment:
**Greedy source-selection order can miss globally better merge sequences**

The `undersized_group_indices` list is sorted by `(load, anchor, group_idx)` — smallest-load group is always attempted first as the merge source. When there are three or more undersized groups with different affinities, this greedy priority can leave a higher-affinity merge unreachable.

For example, suppose groups A (load 2), B (load 2), C (load 3) all exist with `min_loc=5`, `max_loc=6`. A+C=5 (fits, resolves A's underflow) and B+C=5 (fits, resolves B's underflow), but A+B=4 (still undersized). The algorithm will pick A (smallest load) first. If it finds C as the best target it merges A+C, leaving B alone and unable to merge (B+merged=7 > max_loc). Had B+C been done first, A could still merge with B+C later (2+5=7, over max). Alternatively B+C first, then A is stuck. So here the result is the same, but in more complex scenarios the greedy source order can lead to more residual undersized groups than an alternative ordering would.

This is a known limitation of greedy repair and may be acceptable for the current goals, but it is worth documenting with a comment above the sort so future maintainers understand why this ordering was chosen.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat: repair undersized graph groups" | Re-trigger Greptile}

greptile-apps · 2026-03-24T16:36:48Z

tests/test_partitioning_extensive.py

+    def test_min_loc_repair_is_deterministic(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        settings = _settings(
+            monkeypatch,
+            max_loc=10,
+            min_loc=5,
+            partition_strategy=PartitionStrategy.GRAPH,
+            priority=Priority.ORTHOGONAL,
+        )
+        parsed = parse_diff(UNRELATED_DIFF)
+        signatures = {_group_signature(partition_diff(parsed, settings)) for _ in range(3)}
+        assert len(signatures) == 1


Duplicate test scenario reduces determinism-test value

test_min_loc_repair_is_deterministic uses the exact same UNRELATED_DIFF, max_loc=10, min_loc=5, and ORTHOGONAL priority as test_min_loc_merges_undersized_groups_when_possible. Because UNRELATED_DIFF produces exactly two groups with a single forced merge path, the determinism test doesn't exercise any branching in _best_graph_merge_target. A truly meaningful determinism test would use a diff with three or more groups so there are multiple valid merge candidates and the tiebreaking logic is actually exercised.

Consider replacing UNRELATED_DIFF here with a multi-group fixture, for example a diff with three files each adding 3 LOC (max_loc=10, min_loc=5), which would produce three undersized groups and have at least two possible merge paths to verify against.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/test_partitioning_extensive.py Line: 187-197 Comment: **Duplicate test scenario reduces determinism-test value** `test_min_loc_repair_is_deterministic` uses the exact same `UNRELATED_DIFF`, `max_loc=10`, `min_loc=5`, and `ORTHOGONAL` priority as `test_min_loc_merges_undersized_groups_when_possible`. Because `UNRELATED_DIFF` produces exactly two groups with a single forced merge path, the determinism test doesn't exercise any branching in `_best_graph_merge_target`. A truly meaningful determinism test would use a diff with three or more groups so there are multiple valid merge candidates and the tiebreaking logic is actually exercised. Consider replacing `UNRELATED_DIFF` here with a multi-group fixture, for example a diff with three files each adding 3 LOC (`max_loc=10`, `min_loc=5`), which would produce three undersized groups and have at least two possible merge paths to verify against. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-24T16:36:49Z

tests/test_partitioning_extensive.py

+    def test_min_loc_merges_undersized_groups_when_possible(
+        self, monkeypatch: pytest.MonkeyPatch
+    ) -> None:
+        settings = _settings(
+            monkeypatch,
+            max_loc=10,
+            min_loc=5,
+            partition_strategy=PartitionStrategy.GRAPH,
+            priority=Priority.ORTHOGONAL,
+        )
+        groups = partition_diff(parse_diff(UNRELATED_DIFF), settings)
+        assert len(groups) == 1
+        _assert_valid_plan(groups, UNRELATED_DIFF, 10, min_loc=5)


No test coverage for permanently-blocked repair pass

There is no test for the case where every possible merge for an undersized group is blocked — either because every pair would exceed max_loc, or because contiguity constraints rule out all candidates. In that scenario _repair_graph_min_loc returns groups that still violate min_loc, which is valid and expected behaviour, but it is currently untested.

A suggested fixture: two files each with 6 LOC (max_loc=10, min_loc=7). Each file forms a single group of 6 LOC (below min_loc=7), but merging them would yield 12 LOC which exceeds max_loc=10. The repair pass must leave both groups as-is. Adding this as a test (and asserting len(groups) == 2) would confirm the graceful degradation path.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/test_partitioning_extensive.py Line: 121-133 Comment: **No test coverage for permanently-blocked repair pass** There is no test for the case where every possible merge for an undersized group is blocked — either because every pair would exceed `max_loc`, or because contiguity constraints rule out all candidates. In that scenario `_repair_graph_min_loc` returns groups that still violate `min_loc`, which is valid and expected behaviour, but it is currently untested. A suggested fixture: two files each with 6 LOC (`max_loc=10`, `min_loc=7`). Each file forms a single group of 6 LOC (below `min_loc=7`), but merging them would yield 12 LOC which exceeds `max_loc=10`. The repair pass must leave both groups as-is. Adding this as a test (and asserting `len(groups) == 2`) would confirm the graceful degradation path. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-24T16:36:50Z

pr_split/planner/partitioning.py

+    while True:
+        undersized_group_indices = sorted(
+            (
+                group_idx
+                for group_idx, group_units in enumerate(repaired_groups)
+                if _group_load(group_units) < settings.min_loc
+            ),
+            key=lambda group_idx: (
+                _group_load(repaired_groups[group_idx]),
+                _group_anchor_position(repaired_groups[group_idx]),
+                group_idx,
+            ),
+        )


Greedy source-selection order can miss globally better merge sequences

The undersized_group_indices list is sorted by (load, anchor, group_idx) — smallest-load group is always attempted first as the merge source. When there are three or more undersized groups with different affinities, this greedy priority can leave a higher-affinity merge unreachable.

For example, suppose groups A (load 2), B (load 2), C (load 3) all exist with min_loc=5, max_loc=6. A+C=5 (fits, resolves A's underflow) and B+C=5 (fits, resolves B's underflow), but A+B=4 (still undersized). The algorithm will pick A (smallest load) first. If it finds C as the best target it merges A+C, leaving B alone and unable to merge (B+merged=7 > max_loc). Had B+C been done first, A could still merge with B+C later (2+5=7, over max). Alternatively B+C first, then A is stuck. So here the result is the same, but in more complex scenarios the greedy source order can lead to more residual undersized groups than an alternative ordering would.

This is a known limitation of greedy repair and may be acceptable for the current goals, but it is worth documenting with a comment above the sort so future maintainers understand why this ordering was chosen.

Prompt To Fix With AI

This is a comment left during a code review. Path: pr_split/planner/partitioning.py Line: 289-301 Comment: **Greedy source-selection order can miss globally better merge sequences** The `undersized_group_indices` list is sorted by `(load, anchor, group_idx)` — smallest-load group is always attempted first as the merge source. When there are three or more undersized groups with different affinities, this greedy priority can leave a higher-affinity merge unreachable. For example, suppose groups A (load 2), B (load 2), C (load 3) all exist with `min_loc=5`, `max_loc=6`. A+C=5 (fits, resolves A's underflow) and B+C=5 (fits, resolves B's underflow), but A+B=4 (still undersized). The algorithm will pick A (smallest load) first. If it finds C as the best target it merges A+C, leaving B alone and unable to merge (B+merged=7 > max_loc). Had B+C been done first, A could still merge with B+C later (2+5=7, over max). Alternatively B+C first, then A is stuck. So here the result is the same, but in more complex scenarios the greedy source order can lead to more residual undersized groups than an alternative ordering would. This is a known limitation of greedy repair and may be acceptable for the current goals, but it is worth documenting with a comment above the sort so future maintainers understand why this ordering was chosen. How can I resolve this? If you propose a fix, please make it concise.

feat: repair undersized graph groups

f8f28e1

gemini-code-assist bot reviewed Mar 24, 2026

View reviewed changes

greptile-apps bot reviewed Mar 24, 2026

View reviewed changes

vitali87 merged commit df2302b into main Mar 27, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add graph min-loc repair pass#29

feat: add graph min-loc repair pass#29
vitali87 merged 1 commit intomainfrom
feat/graph-loc-bounds-repair

vitali87 commented Mar 24, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Uh oh!

greptile-apps bot commented Mar 24, 2026

Uh oh!

greptile-apps bot Mar 24, 2026

Uh oh!

greptile-apps bot Mar 24, 2026

Uh oh!

greptile-apps bot Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vitali87 commented Mar 24, 2026

Summary

Testing

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 24, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant