Skip to content

Preserve remaining lines when splitting large diff chunks#530

Merged
scottchiefbaker merged 1 commit into
so-fancy:nextfrom
Haraguroicha:next
Apr 30, 2026
Merged

Preserve remaining lines when splitting large diff chunks#530
scottchiefbaker merged 1 commit into
so-fancy:nextfrom
Haraguroicha:next

Conversation

@Haraguroicha
Copy link
Copy Markdown
Contributor

Summary

This commit fixes a chunk-splitting bug in diff-so-fancy when processing large rename-only diffs (more than 100 lines).

Previously, when input exceeded 100 lines, the code cleared @lines entirely after chunking, which dropped the trailing partial chunk.
Now it preserves the last incomplete chunk with:

@lines = @{ pop(@chunks) // [] };

so remaining lines continue to be processed correctly.

Root Cause

In the large-input path:

  • get_diff_chunks(\@lines) returns complete chunks plus a trailing incomplete chunk.
  • The old code did @lines = ();, discarding that trailing data.
  • As a result, some diff entries were silently lost.

Observed Failure

Before this commit, the reproduction can trigger:

Use of uninitialized value $next in pattern match (m//) at diff-so-fancy line 409, <STDIN> line 202.

This warning is consistent with missing trailing lines after chunk splitting.

Debug Evidence

Using --debug, the output clearly stops right after the second line of block 51, showing the processing boundary ends too early.

Sample around block 51:

-------------- Chunk --------------
diff --git a/49.a.file b/49.b.file
similarity index 100%
rename from 49.a.file
rename to 49.b.file
-------------- Chunk --------------
diff --git a/50.a.file b/50.b.file
similarity index 100%
rename from 50.a.file
rename to 50.b.file
-------------- Chunk --------------
diff --git a/51.a.file b/51.b.file
similarity index 100%
-------------- Chunk --------------
rename from 51.a.file
rename to 51.b.file
-------------- Chunk --------------
diff --git a/52.a.file b/52.b.file
similarity index 100%
rename from 52.a.file
rename to 52.b.file

If the warning location is temporarily forced to use an empty string so execution continues, another breakpoint appears around 76, which confirms there is additional data that should have remained connected but was cut by chunk handling.

Sample around block 76:

-------------- Chunk --------------
diff --git a/75.a.file b/75.b.file
similarity index 100%
rename from 75.a.file
rename to 75.b.file
-------------- Chunk --------------
diff --git a/76.a.file b/76.b.file
similarity index 100%
rename from 76.a.file
-------------- Chunk --------------
rename to 76.b.file
-------------- Chunk --------------
diff --git a/77.a.file b/77.b.file
similarity index 100%
rename from 77.a.file
rename to 77.b.file

This is the key evidence that the issue is a chunk-truncation problem, not an upstream input issue.

Fix

Keep the trailing chunk in @lines and only process the completed chunks immediately.

Reproduction

Run:

for i in {1..100}; do
  echo -e "diff --git a/$i.a.file b/$i.b.file\nsimilarity index 100%\nrename from $i.a.file\nrename to $i.b.file"
done | ./diff-so-fancy

Behavior Comparison

Before this commit

  • Output may miss tail entries because remaining lines are dropped during chunk split.
  • The warning about uninitialized $next may appear at diff-so-fancy line 409.
  • --debug output ends early around block 51 (second line), indicating a broken chunk boundary.

After this commit

  • All rename entries are preserved and rendered; no tail loss after chunk split.
  • No early cutoff at the same debug boundary; processing continues into later blocks (including where 76 was previously observed after forcing continuation).

Verification

  • Reproduced with the command above.
  • Confirmed output consistency between input count and rendered rename blocks after the fix.

@scottchiefbaker
Copy link
Copy Markdown
Contributor

I had to really read and process the line in the PR. It's not clear from the code what this does. I would consider this if you re-work it, add a comment about what it's doing, and make the code more obvious. Something like this:

# Leave the last chunk in @lines so we do not miss any lines on a split
my $last   = pop(@chunks) // [];
@lines     = @$last;

@Haraguroicha
Copy link
Copy Markdown
Contributor Author

Hi @scottchiefbaker,

Thank you for the thoughtful feedback.

I have reworked this part to make the intent explicit in the code:

  • preserve the trailing chunk with a clearly named variable ($last_chunk)
  • add an inline comment explaining why the last chunk must remain in @lines

This keeps the split-boundary behavior self-documenting at the call site, reduces the chance of future misuse/regressions, and avoids extra history tracing for context.

@scottchiefbaker scottchiefbaker merged commit 9a8b325 into so-fancy:next Apr 30, 2026
1 check passed
@aydintb aydintb mentioned this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants