Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the graph-based partitioning strategy by introducing a new repair pass. This pass intelligently merges undersized groups to meet a specified minimum size ("min_loc"), while strictly adhering to the maximum group size ("max_loc") and preserving the contiguous order of changes within shared files. This ensures more robust and well-formed code partitions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new feature to the graph-based partitioning strategy, allowing for the repair of undersized groups by merging them based on a min_loc setting. This involves several new helper functions for calculating group load, affinity, and finding the best merge targets, as well as a new _repair_graph_min_loc function. Corresponding tests have been added to verify this new functionality, including its determinism. The review comment points out an inefficiency in the _merge_group_units function due to deep copying and suggests a more performant approach.
| merged_group = sorted( | ||
| grouped_units[target_idx] + grouped_units[source_idx], | ||
| key=lambda unit: unit.position, | ||
| ) | ||
| repaired_groups = [list(group_units) for group_units in grouped_units] | ||
| repaired_groups[target_idx] = merged_group | ||
| del repaired_groups[source_idx] | ||
| return repaired_groups |
There was a problem hiding this comment.
The current implementation of _merge_group_units is inefficient as it creates a deep copy of all groups ([list(group_units) for group_units in grouped_units]) on every merge. This can be costly if there are many groups. Additionally, using del with indices can be subtle to reason about.
A clearer and more performant approach is to build a new list from scratch, which avoids the expensive copy and the del operation.
| merged_group = sorted( | |
| grouped_units[target_idx] + grouped_units[source_idx], | |
| key=lambda unit: unit.position, | |
| ) | |
| repaired_groups = [list(group_units) for group_units in grouped_units] | |
| repaired_groups[target_idx] = merged_group | |
| del repaired_groups[source_idx] | |
| return repaired_groups | |
| merged_group = sorted( | |
| grouped_units[target_idx] + grouped_units[source_idx], | |
| key=lambda unit: unit.position, | |
| ) | |
| repaired_groups = [] | |
| for i, group in enumerate(grouped_units): | |
| if i == source_idx: | |
| continue | |
| if i == target_idx: | |
| repaired_groups.append(merged_group) | |
| else: | |
| repaired_groups.append(group) | |
| return repaired_groups |
Greptile SummaryThis PR adds a deterministic post-processing repair pass ( Key changes:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[partition_diff] --> B[_group_units_graph]
B --> C[_repair_graph_min_loc]
C --> D{min_loc set AND\ngroups >= 2?}
D -- No --> E[Return groups as-is]
D -- Yes --> F[Sort undersized groups\nby load, anchor, idx]
F --> G{Any undersized group\nhas valid merge target?}
G -- No --> H[Return repaired groups]
G -- Yes --> I[_best_graph_merge_target\nfor first source]
I --> J{merged_load\n<= max_loc?}
J -- No --> K[Skip target]
J -- Yes --> L{_shared_file_merge\n_is_contiguous?}
L -- No --> K
L -- Yes --> M{merged_underflow\n< current_underflow?}
M -- No --> K
M -- Yes --> N[Score merge via\n6-element key tuple]
N --> O[_merge_group_units\nsource into target]
O --> F
Prompt To Fix All With AIThis is a comment left during a code review.
Path: tests/test_partitioning_extensive.py
Line: 187-197
Comment:
**Duplicate test scenario reduces determinism-test value**
`test_min_loc_repair_is_deterministic` uses the exact same `UNRELATED_DIFF`, `max_loc=10`, `min_loc=5`, and `ORTHOGONAL` priority as `test_min_loc_merges_undersized_groups_when_possible`. Because `UNRELATED_DIFF` produces exactly two groups with a single forced merge path, the determinism test doesn't exercise any branching in `_best_graph_merge_target`. A truly meaningful determinism test would use a diff with three or more groups so there are multiple valid merge candidates and the tiebreaking logic is actually exercised.
Consider replacing `UNRELATED_DIFF` here with a multi-group fixture, for example a diff with three files each adding 3 LOC (`max_loc=10`, `min_loc=5`), which would produce three undersized groups and have at least two possible merge paths to verify against.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: tests/test_partitioning_extensive.py
Line: 121-133
Comment:
**No test coverage for permanently-blocked repair pass**
There is no test for the case where every possible merge for an undersized group is blocked — either because every pair would exceed `max_loc`, or because contiguity constraints rule out all candidates. In that scenario `_repair_graph_min_loc` returns groups that still violate `min_loc`, which is valid and expected behaviour, but it is currently untested.
A suggested fixture: two files each with 6 LOC (`max_loc=10`, `min_loc=7`). Each file forms a single group of 6 LOC (below `min_loc=7`), but merging them would yield 12 LOC which exceeds `max_loc=10`. The repair pass must leave both groups as-is. Adding this as a test (and asserting `len(groups) == 2`) would confirm the graceful degradation path.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: pr_split/planner/partitioning.py
Line: 289-301
Comment:
**Greedy source-selection order can miss globally better merge sequences**
The `undersized_group_indices` list is sorted by `(load, anchor, group_idx)` — smallest-load group is always attempted first as the merge source. When there are three or more undersized groups with different affinities, this greedy priority can leave a higher-affinity merge unreachable.
For example, suppose groups A (load 2), B (load 2), C (load 3) all exist with `min_loc=5`, `max_loc=6`. A+C=5 (fits, resolves A's underflow) and B+C=5 (fits, resolves B's underflow), but A+B=4 (still undersized). The algorithm will pick A (smallest load) first. If it finds C as the best target it merges A+C, leaving B alone and unable to merge (B+merged=7 > max_loc). Had B+C been done first, A could still merge with B+C later (2+5=7, over max). Alternatively B+C first, then A is stuck. So here the result is the same, but in more complex scenarios the greedy source order can lead to more residual undersized groups than an alternative ordering would.
This is a known limitation of greedy repair and may be acceptable for the current goals, but it is worth documenting with a comment above the sort so future maintainers understand why this ordering was chosen.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "feat: repair undersized graph groups" | Re-trigger Greptile |
| def test_min_loc_repair_is_deterministic(self, monkeypatch: pytest.MonkeyPatch) -> None: | ||
| settings = _settings( | ||
| monkeypatch, | ||
| max_loc=10, | ||
| min_loc=5, | ||
| partition_strategy=PartitionStrategy.GRAPH, | ||
| priority=Priority.ORTHOGONAL, | ||
| ) | ||
| parsed = parse_diff(UNRELATED_DIFF) | ||
| signatures = {_group_signature(partition_diff(parsed, settings)) for _ in range(3)} | ||
| assert len(signatures) == 1 |
There was a problem hiding this comment.
Duplicate test scenario reduces determinism-test value
test_min_loc_repair_is_deterministic uses the exact same UNRELATED_DIFF, max_loc=10, min_loc=5, and ORTHOGONAL priority as test_min_loc_merges_undersized_groups_when_possible. Because UNRELATED_DIFF produces exactly two groups with a single forced merge path, the determinism test doesn't exercise any branching in _best_graph_merge_target. A truly meaningful determinism test would use a diff with three or more groups so there are multiple valid merge candidates and the tiebreaking logic is actually exercised.
Consider replacing UNRELATED_DIFF here with a multi-group fixture, for example a diff with three files each adding 3 LOC (max_loc=10, min_loc=5), which would produce three undersized groups and have at least two possible merge paths to verify against.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/test_partitioning_extensive.py
Line: 187-197
Comment:
**Duplicate test scenario reduces determinism-test value**
`test_min_loc_repair_is_deterministic` uses the exact same `UNRELATED_DIFF`, `max_loc=10`, `min_loc=5`, and `ORTHOGONAL` priority as `test_min_loc_merges_undersized_groups_when_possible`. Because `UNRELATED_DIFF` produces exactly two groups with a single forced merge path, the determinism test doesn't exercise any branching in `_best_graph_merge_target`. A truly meaningful determinism test would use a diff with three or more groups so there are multiple valid merge candidates and the tiebreaking logic is actually exercised.
Consider replacing `UNRELATED_DIFF` here with a multi-group fixture, for example a diff with three files each adding 3 LOC (`max_loc=10`, `min_loc=5`), which would produce three undersized groups and have at least two possible merge paths to verify against.
How can I resolve this? If you propose a fix, please make it concise.| def test_min_loc_merges_undersized_groups_when_possible( | ||
| self, monkeypatch: pytest.MonkeyPatch | ||
| ) -> None: | ||
| settings = _settings( | ||
| monkeypatch, | ||
| max_loc=10, | ||
| min_loc=5, | ||
| partition_strategy=PartitionStrategy.GRAPH, | ||
| priority=Priority.ORTHOGONAL, | ||
| ) | ||
| groups = partition_diff(parse_diff(UNRELATED_DIFF), settings) | ||
| assert len(groups) == 1 | ||
| _assert_valid_plan(groups, UNRELATED_DIFF, 10, min_loc=5) |
There was a problem hiding this comment.
No test coverage for permanently-blocked repair pass
There is no test for the case where every possible merge for an undersized group is blocked — either because every pair would exceed max_loc, or because contiguity constraints rule out all candidates. In that scenario _repair_graph_min_loc returns groups that still violate min_loc, which is valid and expected behaviour, but it is currently untested.
A suggested fixture: two files each with 6 LOC (max_loc=10, min_loc=7). Each file forms a single group of 6 LOC (below min_loc=7), but merging them would yield 12 LOC which exceeds max_loc=10. The repair pass must leave both groups as-is. Adding this as a test (and asserting len(groups) == 2) would confirm the graceful degradation path.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/test_partitioning_extensive.py
Line: 121-133
Comment:
**No test coverage for permanently-blocked repair pass**
There is no test for the case where every possible merge for an undersized group is blocked — either because every pair would exceed `max_loc`, or because contiguity constraints rule out all candidates. In that scenario `_repair_graph_min_loc` returns groups that still violate `min_loc`, which is valid and expected behaviour, but it is currently untested.
A suggested fixture: two files each with 6 LOC (`max_loc=10`, `min_loc=7`). Each file forms a single group of 6 LOC (below `min_loc=7`), but merging them would yield 12 LOC which exceeds `max_loc=10`. The repair pass must leave both groups as-is. Adding this as a test (and asserting `len(groups) == 2`) would confirm the graceful degradation path.
How can I resolve this? If you propose a fix, please make it concise.| while True: | ||
| undersized_group_indices = sorted( | ||
| ( | ||
| group_idx | ||
| for group_idx, group_units in enumerate(repaired_groups) | ||
| if _group_load(group_units) < settings.min_loc | ||
| ), | ||
| key=lambda group_idx: ( | ||
| _group_load(repaired_groups[group_idx]), | ||
| _group_anchor_position(repaired_groups[group_idx]), | ||
| group_idx, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Greedy source-selection order can miss globally better merge sequences
The undersized_group_indices list is sorted by (load, anchor, group_idx) — smallest-load group is always attempted first as the merge source. When there are three or more undersized groups with different affinities, this greedy priority can leave a higher-affinity merge unreachable.
For example, suppose groups A (load 2), B (load 2), C (load 3) all exist with min_loc=5, max_loc=6. A+C=5 (fits, resolves A's underflow) and B+C=5 (fits, resolves B's underflow), but A+B=4 (still undersized). The algorithm will pick A (smallest load) first. If it finds C as the best target it merges A+C, leaving B alone and unable to merge (B+merged=7 > max_loc). Had B+C been done first, A could still merge with B+C later (2+5=7, over max). Alternatively B+C first, then A is stuck. So here the result is the same, but in more complex scenarios the greedy source order can lead to more residual undersized groups than an alternative ordering would.
This is a known limitation of greedy repair and may be acceptable for the current goals, but it is worth documenting with a comment above the sort so future maintainers understand why this ordering was chosen.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pr_split/planner/partitioning.py
Line: 289-301
Comment:
**Greedy source-selection order can miss globally better merge sequences**
The `undersized_group_indices` list is sorted by `(load, anchor, group_idx)` — smallest-load group is always attempted first as the merge source. When there are three or more undersized groups with different affinities, this greedy priority can leave a higher-affinity merge unreachable.
For example, suppose groups A (load 2), B (load 2), C (load 3) all exist with `min_loc=5`, `max_loc=6`. A+C=5 (fits, resolves A's underflow) and B+C=5 (fits, resolves B's underflow), but A+B=4 (still undersized). The algorithm will pick A (smallest load) first. If it finds C as the best target it merges A+C, leaving B alone and unable to merge (B+merged=7 > max_loc). Had B+C been done first, A could still merge with B+C later (2+5=7, over max). Alternatively B+C first, then A is stuck. So here the result is the same, but in more complex scenarios the greedy source order can lead to more residual undersized groups than an alternative ordering would.
This is a known limitation of greedy repair and may be acceptable for the current goals, but it is worth documenting with a comment above the sort so future maintainers understand why this ordering was chosen.
How can I resolve this? If you propose a fix, please make it concise.
Summary
min_locis setmax_locTesting
Part of #6