Preserve remaining lines when splitting large diff chunks by Haraguroicha · Pull Request #530 · so-fancy/diff-so-fancy

Haraguroicha · 2026-04-29T17:33:11Z

Summary

This commit fixes a chunk-splitting bug in diff-so-fancy when processing large rename-only diffs (more than 100 lines).

Previously, when input exceeded 100 lines, the code cleared @lines entirely after chunking, which dropped the trailing partial chunk.
Now it preserves the last incomplete chunk with:

@lines = @{ pop(@chunks) // [] };

so remaining lines continue to be processed correctly.

Root Cause

In the large-input path:

get_diff_chunks(\@lines) returns complete chunks plus a trailing incomplete chunk.
The old code did @lines = ();, discarding that trailing data.
As a result, some diff entries were silently lost.

Observed Failure

Before this commit, the reproduction can trigger:

Use of uninitialized value $next in pattern match (m//) at diff-so-fancy line 409, <STDIN> line 202.

This warning is consistent with missing trailing lines after chunk splitting.

Debug Evidence

Using --debug, the output clearly stops right after the second line of block 51, showing the processing boundary ends too early.

Sample around block 51:

-------------- Chunk --------------
diff --git a/49.a.file b/49.b.file
similarity index 100%
rename from 49.a.file
rename to 49.b.file
-------------- Chunk --------------
diff --git a/50.a.file b/50.b.file
similarity index 100%
rename from 50.a.file
rename to 50.b.file
-------------- Chunk --------------
diff --git a/51.a.file b/51.b.file
similarity index 100%
-------------- Chunk --------------
rename from 51.a.file
rename to 51.b.file
-------------- Chunk --------------
diff --git a/52.a.file b/52.b.file
similarity index 100%
rename from 52.a.file
rename to 52.b.file

If the warning location is temporarily forced to use an empty string so execution continues, another breakpoint appears around 76, which confirms there is additional data that should have remained connected but was cut by chunk handling.

Sample around block 76:

-------------- Chunk --------------
diff --git a/75.a.file b/75.b.file
similarity index 100%
rename from 75.a.file
rename to 75.b.file
-------------- Chunk --------------
diff --git a/76.a.file b/76.b.file
similarity index 100%
rename from 76.a.file
-------------- Chunk --------------
rename to 76.b.file
-------------- Chunk --------------
diff --git a/77.a.file b/77.b.file
similarity index 100%
rename from 77.a.file
rename to 77.b.file

This is the key evidence that the issue is a chunk-truncation problem, not an upstream input issue.

Fix

Keep the trailing chunk in @lines and only process the completed chunks immediately.

Reproduction

Run:

for i in {1..100}; do
  echo -e "diff --git a/$i.a.file b/$i.b.file\nsimilarity index 100%\nrename from $i.a.file\nrename to $i.b.file"
done | ./diff-so-fancy

Behavior Comparison

Before this commit

Output may miss tail entries because remaining lines are dropped during chunk split.
The warning about uninitialized $next may appear at diff-so-fancy line 409.
--debug output ends early around block 51 (second line), indicating a broken chunk boundary.

After this commit

All rename entries are preserved and rendered; no tail loss after chunk split.
No early cutoff at the same debug boundary; processing continues into later blocks (including where 76 was previously observed after forcing continuation).

Verification

Reproduced with the command above.
Confirmed output consistency between input count and rendered rename blocks after the fix.

scottchiefbaker · 2026-04-30T00:10:10Z

I had to really read and process the line in the PR. It's not clear from the code what this does. I would consider this if you re-work it, add a comment about what it's doing, and make the code more obvious. Something like this:

# Leave the last chunk in @lines so we do not miss any lines on a split
my $last   = pop(@chunks) // [];
@lines     = @$last;

Haraguroicha · 2026-04-30T04:52:29Z

Hi @scottchiefbaker,

Thank you for the thoughtful feedback.

I have reworked this part to make the intent explicit in the code:

preserve the trailing chunk with a clearly named variable ($last_chunk)
add an inline comment explaining why the last chunk must remain in @lines

This keeps the split-boundary behavior self-documenting at the call site, reduces the chance of future misuse/regressions, and avoids extra history tracing for context.

Preserve remaining lines when splitting large diff chunks

fa1532e

Haraguroicha force-pushed the next branch from a9d9201 to fa1532e Compare April 30, 2026 04:48

scottchiefbaker merged commit 9a8b325 into so-fancy:next Apr 30, 2026
1 check passed

artem-nefedov mentioned this pull request May 5, 2026

Massive performance regression for git log #531

Closed

aydintb mentioned this pull request May 6, 2026

probably not important #532

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve remaining lines when splitting large diff chunks#530

Preserve remaining lines when splitting large diff chunks#530
scottchiefbaker merged 1 commit into
so-fancy:nextfrom
Haraguroicha:next

Haraguroicha commented Apr 29, 2026

Uh oh!

scottchiefbaker commented Apr 30, 2026

Uh oh!

Haraguroicha commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Haraguroicha commented Apr 29, 2026

Summary

Root Cause

Observed Failure

Debug Evidence

Fix

Reproduction

Behavior Comparison

Before this commit

After this commit

Verification

Uh oh!

scottchiefbaker commented Apr 30, 2026

Uh oh!

Haraguroicha commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants