Skip to content

Conversation

blinxen
Copy link

@blinxen blinxen commented Apr 24, 2025

As part of the effort to migrate bat away from libgit2, I’m exploring ways to enhance the file diffs generated by gitoxide, which relies on this library for the actual diff calculation. In this issue the gitoxide maintainer suggested looking into the slider problem and try to implement that for imara-diff. That is what I tried to do in this PR.

Before I continue to:

  • write some more tests
  • optimize the code
  • (maybe find better names for variables, structs and the Sink itself)
  • write documentation
  • check the licensing since it was heavily inspired by the git implementation

I wanted to have your thoughts on this. Does it even make sense to implement this as a Sink?

Copy link
Collaborator

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so cool! Thanks so much!

What follows is based on a brief look at the PR, and is no code-review.

Implementing it as a sink, if it's the 'best' way or not, appears to be the way to get it merged most quickly - after all it's just plugging into an existing mechanism.
And even if for whatever reason it was considered too specific here, gix-diff will certainly welcome it with open arms.

In theory, when computing line-counts for instance, gitoxide probably could still use its current approach as the line-counts are unaffected by hunk-sliding. But I might be wrong about that.

When it comes to actual diffing, gitoxide doesn't actually provide more than the UnifiedDIff type which also is a Sink. From what I can tell, the hunks in a ChangeGroup could be passed directly to UnifiedDiff and it would produce a reasonable patch, and once ChangeType is supported as well the patches should be even better - maybe that would affect newlines mostly?

I'd really be interested in what @pascalkuthe has to say about the approach. If a Sink is good, it seems like gix-diff really could be a home for this implementation, just because imara-diff doesn't necessarily have to tie itself to Git specifically, and maybe is better off for it, too.

Anyway, let me close by reiterating how exciting this contribution is to me, and how much I look forward to seeing it finalised and used in our tools :).

@blinxen
Copy link
Author

blinxen commented Apr 25, 2025

In theory, when computing line-counts for instance, gitoxide probably could still use its current approach as the line-counts are unaffected by hunk-sliding. But I might be wrong about that.

That is true, the sliding mechanism does not change the line counts of a change group. When a change group is moved, for whatever reason, then both start and end are affected.

and once ChangeType is supported as well the patches should be even better

Maybe ChangeType could be moved out of this Sink? I think that this kind of information is helpful in a lot of cases.

@Byron
Copy link
Collaborator

Byron commented Apr 25, 2025

Maybe ChangeType could be moved out of this Sink? I think that this kind of information is helpful in a lot of cases.

Where would it move? To me it seemed natural that the GitDiffimplementation determines the value forChangeType`. I am probably missing something.

@blinxen
Copy link
Author

blinxen commented Apr 25, 2025

The reason why I said we could move it out is because the change type is not tied to git itself but to the diff. So technically instead of passing a Range to process_change, the diffing algorithm could pass a data structure that contains both the Range and the change type.

@Byron
Copy link
Collaborator

Byron commented Apr 25, 2025

I see, this pertains to the question if support for this could also be built-in (or supported better) by the core algorithm.
My intuition here is that this wouldn't be the case as the core algorithm looks only at tokens, without assigning them meaning. Again, I might be wrong about that.

@blinxen
Copy link
Author

blinxen commented May 14, 2025

Closing this PR in favor of GitoxideLabs/gitoxide#2011

@blinxen blinxen closed this May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants