Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(diff): diff without producing temporary files #128

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lorenzleutgeb
Copy link
Contributor

@lorenzleutgeb lorenzleutgeb commented Apr 29, 2024

This takes a different approach to resolving #94. Uses the Git integration for Visual Studio Code (an extension that is bundled with VSCode and can neither be disabled or uninstalled, thus can be effectively considered part of VSCode) to read files directly off of Git, eliminating the need for temporary files, and thus all other dependent hacks.

I found that GitLens in fact registers it's own FileSystemProvider, which might have some additional benefits. However, I don't think that this is necessary in the case of the Radicle VSCode extension (yet).

For reference, the implementation of FileSystemProvider that this PR uses is at https://github.com/microsoft/vscode/blob/1.88.1/extensions/git/src/fileSystemProvider.ts

With regards to TextDocumentContentProvider refer to microsoft/vscode#55110 for a discussion of its limitations.

@lorenzleutgeb
Copy link
Contributor Author

lorenzleutgeb commented Apr 30, 2024

One more thing: I am not sure this handles renames/moves well. I am looking into using the Git extension API and Change to do the heavy lifting for us.

@maninak
Copy link
Collaborator

maninak commented Apr 30, 2024

Thank you so much for taking a stab at this, Lorenz!! 🙇

This PR is still marked as Draft, but one early feedback in the general direction is that we should reduce its scope and split it into two PRs. One replacing the existing solution with the in-mem diffing and a new PR removing the "ignore temp files" feature.

This helps both with reviewing and also would make it easier in the future if we need to bring the "ignore temp files" feature back.

@maninak
Copy link
Collaborator

maninak commented Apr 30, 2024

This would also be a good resource to consult regarding virtual documents straight from the docs https://code.visualstudio.com/api/extension-guides/virtual-documents

@lorenzleutgeb
Copy link
Contributor Author

This PR is still marked as Draft, but one early feed back in the general direction is that we should reduce it's scope and split it into two PRs. One replacing the existing solution with the in-mem diffing and a new PR removing the "ignore temp files" feature.

This helps both with reviewing and also would make it easier in the future if we need to bring the "ignore temp files" feature back.

Done, see #129.

This would also be a good resource to consult regarding virtual documents straight from the docs https://code.visualstudio.com/api/extension-guides/virtual-documents

Could you please elaborate how this is relevant? I fail to understand.

@lorenzleutgeb lorenzleutgeb changed the title refactor(diff): Don't create temporary files refactor(diff): use VSCode Git extension Apr 30, 2024
@lorenzleutgeb lorenzleutgeb marked this pull request as ready for review April 30, 2024 10:14
@lorenzleutgeb lorenzleutgeb force-pushed the use-git-extension branch 2 times, most recently from 5f71ff9 to 6a91efa Compare April 30, 2024 10:20
@maninak
Copy link
Collaborator

maninak commented Apr 30, 2024

Could you please elaborate how this is relevant? I fail to understand.

I believe "virtual documents" are what we (should?) use for in-memory diffing (vs on-disk saved temp) documents for the same purpose.

@lorenzleutgeb
Copy link
Contributor Author

lorenzleutgeb commented Apr 30, 2024

Could you please elaborate how this is relevant? I fail to understand.

I believe "virtual documents" are what we (should?) use for in-memory diffing (vs on-disk saved temp) documents for the same purpose.

No. This is the reason I refer to microsoft/vscode#55110 in OP. Specifically in microsoft/vscode#55110 (comment) the authors of the GitLens extension (who probably went far, far deeper down the rabbit hole), say that TextDocumentContentProvider is not the way to go.

throw new Error(`Failed access Git repository.`)
}

const changes: Change[] = await repo?.diffBetween(range.old, range.new)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const changes: Change[] = await repo?.diffBetween(range.old, range.new)
const changes = await repo?.diffBetween(range.old, range.new)

@maninak
Copy link
Collaborator

maninak commented Apr 30, 2024

@lorenzleutgeb as discussed synchronously, there's a major architectural design decision coming with this PR that hasn't been communicated in advance: the diff is no longer sourced via httpd, which can be configured to be any remote seed other than http://127.0.0.1:8080, but via the local system's git itself.

This could either be a great idea (yay "local-first") or the opposite (diffs must be local, or made local before shown, which involves using the rad cli). It needs to be decided how this aligns with the RIT Org's directive to depend as little as possible to the CLI and also fetch as much as possible data via httpd.

I'll need to come back to this with a clearer head in a few days, discuss it with the team and come back to you.


Regardless, that's an epic contribution in itself, without even taking into account that its on the same day you checked the repo's source for the first time, and working mainly on our most complicated file! 👏

The options onwards from here are:

  • keep the main approach and just proceed with cleaning up smaller things including aligning with the repo's style and conversions before merging
  • change the approach to use httpd but keep the in-mem diffing (if possible), then as above, polish and merge
  • close the PR but keep the overall approach for both referencing the git plugin (may be very useful for our new gitStore) and in-mem diffing to be done in future PRs.

My gut feeling says we might end up with the third option. I'll be able to share more in a few days.

Thank you so much for being excited about this project and going out of your way to actionably help!

@maninak maninak added the enhancement New feature extending the app's current capabilities label Apr 30, 2024
@maninak maninak changed the title refactor(diff): use VSCode Git extension feat(diff): diff without producing temporary files Apr 30, 2024
@lorenzleutgeb
Copy link
Contributor Author

lorenzleutgeb commented Apr 30, 2024

Alright. Let me just write down my view so that you have a record of it. I'd be very happy if you could include my opinion somehow in your discussion with the team.


Browsing Repositories on Remote Radicle Nodes

One of the reasons for communicating with radicle-httpd, and from what I understand, the most relevant with regards to the architectural changes in this PR, is that you would want to enable browsing (copies of) repositories on remote nodes (that expose radicle-httpd). A potential use-case is that someone sends you a rad:-URI and you want to use VSCode as a client to inspect patches on that repository, and of course, the resulting differences per patch. I acknowledge the usefulness of this feature. The difference lies in how I thought this would be implemented. In my opinion, whenever you have a local node (which you do, if you're running this extension) you should use it to leverage the Radicle network as directly and as much as possible.

In my imagination, the VSCode extension would instruct the local node to gossip with the network, in order to get a the state of the repository on the remote node. In short: It downloads the repository over the Radicle network. Then, since the user is not interested in actually checking out a worktree for this repository, you could still use the local bare clone of the repository to run diffs.

For example, to browse the issues of rad:z3Makm6fsQQXmpSFE43DZqwupaEhk without creating a worktree:

$ cd
$ git init tmp
Initialized empty Git repository in ~/tmp/.git/
$ cd tmp
$ git remote add rad rad://z3Makm6fsQQXmpSFE43DZqwupaEhk
$ rad issue list
╭────────────────────────────────────────────────────────────────────────────────╮
│ ●   ID        Title        Author                      Labels     Opened       │
├────────────────────────────────────────────────────────────────────────────────┤
│ ●   0c3634d   test issue   maninak   z6MkvAF…GphDRLV   in-review  5 months ago │
╰────────────────────────────────────────────────────────────────────────────────╯

The main reason for creating the (worktree) repository ~/tmp in this example is that the rad CLI is not compatible with GIT_DIR. For example, even though /tmp is not a Git repository, we can use GIT_DIR to have git read from the bare repository at ~/.radicle/storage/z3Makm6fsQQXmpSFE43DZqwupaEhk, while this (sadly) does not work for rad:

$ cd /tmp
$ GIT_DIR=~/.radicle/storage/z3Makm6fsQQXmpSFE43DZqwupaEhk git rev-parse HEAD
c0216c45b43444c589d95c638e090da7fe3ffd8b
$ GIT_DIR=~/.radicle/storage/z3Makm6fsQQXmpSFE43DZqwupaEhk rad issue list
✗ Error: git: could not find repository at '.'; class=Repository (6); code=NotFound (-3)

We can also do this for patches:

$ cd ~/tmp
$ git remote set-url rad rad://z3gqcJUoA1n9HaHKufZs5FCSGazv5
$ rad patch list
╭──────────────────────────────────────────────────────────────────────╮
│ ●  ID       Title                 Author                       Revi… │
├──────────────────────────────────────────────────────────────────────┤
│ ●  f2659b3  httpd: Decouple [..]  sebastinez  z6MkkfM…jXVsVz5  - - - │
╰──────────────────────────────────────────────────────────────────────╯
$ rad patch show f2659b3
╭─────────────────────────────────────────────────────────────────────────────╮
│ Title    httpd: Decouple repo stats from tree endpoint                      │
│ Patch    f2659b3488965a4d2f4f67beb2f2317b9bb62f02                           │
│ Author   sebastinez z6MkkfM…jXVsVz5                                         │
│ Head     766c1e250d87ab1ee2645a75002ebf9226b03b45                           │
│ Commits  ahead 1, behind 2                                                  │
│ Status   open                                                               │
├─────────────────────────────────────────────────────────────────────────────┤
│ 766c1e2 httpd: Decouple repo stats from tree endpoint                       │
├─────────────────────────────────────────────────────────────────────────────┤
│ ● opened by sebastinez z6MkkfM…jXVsVz5 (23d592b) 6 hours ago                │
│ ↑ updated to ff268203e1aa0243733a80cf658e42c021cd9a7b (f0a42c5) 6 hours ago │
│ ↑ updated to 4260aecaaa53f475fc7bd0c81ab28d384dd63690 (766c1e2) 5 hours ago │
╰─────────────────────────────────────────────────────────────────────────────╯
$ git fetch
$ git diff rad/master..rad/patches/f2659b3488965a4d2f4f67beb2f2317b9bb62f02
[shows diff]

Note that for all of this, I have not created a worktree of either of the two repos rad:z3Makm6fsQQXmpSFE43DZqwupaEhk and rad:z3gqcJUoA1n9HaHKufZs5FCSGazv5. And I have not used radicle-httpd either.

In fact this approach is very much "local first". Apart from the initial gossiping, I can browse the repository, without any further networking. Everything will be fast. You might now be inclined to argue that obtaining the initial copy for the repository is costly, but I concur. Many repositories are small. And as Radicle is used for bigger and bigger repositories, this problem will have to be solved on another layer anyway (probably enabling shallow fetching). Additionally, copies of the same repo will end up as remotes to the same bare Git repository. This means that Git objects are shared, thus browsing a copy of a repository that you already have locally will be very efficiently accelerated by Git.

What you need radicle-httpd for in this design is as a substitute for the rad CLI (probably also JSON or JSONL output from the rad CLI would be good enough) to command your local node to go and fetch some repository. On top of that, and even more importantly, you will probably need it to help with rendering COBs (projecting all operations on a COB into a state can be rendered in the UI).

Remote Development

There's one more angle: Remote development. It might be the case that you intended this design to grow into supporting remote development in some way, i.e. running the VSCode extension on a laptop but the Radicle node on a workstation/server. I am doing this. I run a VSCode client on Windows machine, and connect via VSCode Remote-SSH and Tailscale SSH to a NixOS workstation, which in turn runs Radicle. This works flawlessly, and already perfectly fits my remote development workflow. Building/compiling happens on the workstation. My Radicle node also runs on the workstation, and the extension is snappy and keeping up perfectly well. I even develop the VSCode extension using my remote setup. To make radicle-interface work, i.e. have app.radicle.at detect my workstation as the "local node", I have the following in my settings.json:

{ "remote.SSH.defaultForwardedPorts": [{ "name": "http-alt", "localPort": 8080, "remotePort": 8080 }]}

Light Nodes/Clients

It might be the case that Radicle develops in a direction where something like a "light node" (browsing only) is required. However, I don't think that the VSCode extension is the place that should pioneer anything in that direction. VSCode is for developers, and if you're somewhat serious about developing software on using Radicle, you will use a "full node". Adding support in this direction is premature at the current stage. Unneeded complexity that decreases your velocity.

Acting on Behalf of Remote Nodes

There might be a time where I can sign something locally using a different key than my own node key. E.g. I use node A to publish updates for node B. If this ever becomes a thing I really hope it's gonna be native to Heartwood (or at least very low level in the stack), and not via radicle-httpd. So I also don't see such a use-case.

Grokking Remote Nodes

Apart from the technical design, I also think that what I am proposing is much easier to reason about for the user. I think it is easier to conceptualize that the VSCode extension is a "supercharger" /"batteries" for rad and rad node, and not a remote interface for any node (via radicle-httpd). This crashes with the notion that you have a local node, run rad CLI locally, and so on. I don't see any use for it except "browsing other nodes", where radicle-interface is already well established, and I would prefer a local-fist approach anyway.

Conclusion

The Radicle VSCode extension SHOULD NOT be the interface to an arbitrary remote Radicle node. Instead, it SHOULD be the interface to the local node that is part of the Radicle network. It should wrap the steps I did above using git/rad in great great UX, at the same time making it as transparent as possible what is going on under the hood. This enables you to use all the goodies that come with VSCode (@maninak you know about file paths, integration with other extensions, etc.), be local first, and be a first-class citizen of the Radicle network.

This discussion definitely turned my view on this extension upside down. I misunderstood the design completely.

@lorenzleutgeb
Copy link
Contributor Author

lorenzleutgeb commented May 6, 2024

I have identified an issue with this PR: As things are currently set up, there might be objects in the object database of the bare repository in $RAD_HOME/storage/$ID that are not in the object database of the working copy (e.g. $HOME/foo). This results in Git not being able to produce diffs. The error produced in the output of the Git extension in VSCode looks as follows:

[info] > git diff --name-status -z --diff-filter=ADMR bd8e0eb…8a43fea...583592b…cea5780[2ms]
[info] fatal: Invalid symmetric difference expression bd8e0eb…8a43fea...583592b…cea5780

The result is that when an affected patch is expanded in the patch list, no file changes are shown.

There are at least three approaches to solve this.

1. Fetching

The VSCode Extension would have to figure out from which remote to fetch in order to obtain the required objects. I have no idea how that could be implemented.

2. Share the Git Object Database

2.A Use the Worktree Mechanism

Instead of having two decoupled repositories, make the working copy a worktree of the bare repo. This can be achieved as follows:(

$ rad seed $ID
$ git -C "$RAD_HOME/storage/$ID" worktree add $PWD/foo
$ cd foo
$ git remote add rad rad://$ID
$ git remote set-url --push rad rad://$ID/$(rad self --nid)

2.B Use the "alternates" Mechanism

It is possible to create a file in .git/objects/info/alternates in the working copy that contains the path of the bare repo to share objects with.

@lorenzleutgeb
Copy link
Contributor Author

rad-alternate implements proposal 2.B.

This takes a different approach to resolving
cytechmobile#94

Use the Git integration for Visual Studio Code (an extension that is
bundled with VSCode and neither be disabled or uninstalled, thus can be
effectively considered part of VSCode) to read files directly off of
Git.

Signed-off-by: Lorenz Leutgeb <[email protected]>
Missing typings. See cytechmobile#130

Signed-off-by: Lorenz Leutgeb <[email protected]>
@gsaslis
Copy link
Collaborator

gsaslis commented May 23, 2024

Thanks so much for this contribution @lorenzleutgeb ! 🙏🏼

I'd like to add some context:

Uses the Git integration for Visual Studio Code (an extension that is bundled with VSCode and can neither be disabled or uninstalled, thus can be effectively considered part of VSCode) to read files directly off of Git, eliminating the need for temporary files, and thus all other dependent hacks.

This is exactly the direction we've followed in the Radicle IntelliJ plugin, so it makes perfect sense for me.

It needs to be decided how this aligns with the RIT Org's directive to depend as little as possible to the CLI and also fetch as much as possible data via httpd.

There is a lot of speculation in the discussion above as to why we depend on httpd, so please allow me to clarify:

Originally, in the IntelliJ plugin (that predated this one, since it was the first one our team started working on) we were using only the Command Line Interface (CLI). This made sense for an Integrated Development Environment (IDE) where we wanted a single dependency - rather than needing to ask our user to define both the path to the CLI and the httpd endpoint.

However, the CLI does not return json (or machine readable output, in general). As the httpd already does, the core team advised us to just use that instead because adding --json output to all cli commands (and ensuring that output is the same across CLI + httpd) would involve quite a bit of work that - given the very early phase radicle was in back then (around a year ago) - it didn't make sense to prioritize.

This is the only reason we're relying on httpd today. I do still think the CLI is a better option for IDEs, but I do understand why it doesn't do that still and the httpd is a good compromise for the time being.

For broader context, I'm also linking here a zulip topic where it was discussed how to address this - and where it also mentioned that the upcoming Terminal UI (TUI) would also need this in the CLI.

I have identified an issue with this PR: As things are currently set up, there might be objects in the object database of the bare repository in $RAD_HOME/storage/$ID that are not in the object database of the working copy (e.g. $HOME/foo).

@lorenzleutgeb using the bare repo is an interesting approach, but, I am wondering: would we still have this problem if we had a full working copy of the radicle repo that we were working in ? Would cloning a radicle repository in $HOME/foo_radicle_repo and then opening that repo in VS Code allow us to overcome the issue you're highlighting above? And, if that is the case, would fetching be a simpler way to ensure we're up to date?

cc @JChrist

@lorenzleutgeb
Copy link
Contributor Author

lorenzleutgeb commented May 25, 2024

I'm reply to @gsaslis.

However, the CLI does not return json (or machine readable output, in general). As the httpd already does, the core team advised us to just use that instead because adding --json output to all cli commands (and ensuring that output is the same across CLI + httpd) would involve quite a bit of work that - given the very early phase radicle was in back then (around a year ago) - it didn't make sense to prioritize.

I disagree. rad cob show (see below) was rather straightforward to implement, and is the most important puzzle piece. Maybe its output format would have changed as the COB itself changed, yes, but I will probably never understand the decision for going with the much more complex design involving radicle-httpd.

I think what is crucial here is to acknowledge that just slapping --json is not gonna work, because this interferes too much with work on the porcelain CLI. You need dedicated, purposely stupid/simple, plumbing commands.

This is the only reason we're relying on httpd today. I do still think the CLI is a better option for IDEs, but I do understand why it doesn't do that still and the httpd is a good compromise for the time being.

I realized the situation about missing stable plumbing commands only after opening this PR. However, I also immediately took action, and Heartwood now contains a new rad cob show which allows to get materialized COBs as JSON. For now, only the three COBs xyz.radicle.{identity,issue,patch} are supported, but I am working on a more general extension already (looking towards CI, Planning Boards, ...).
I am also working on enabling write access via command, tentatively called rad cob create and rad cob act that could be used to create and interact with any COB.

My learnings from this PR and its discussion is the immediate motivation.

I'll probably step on toes, but: It appears to me that the first bottleneck for development of this extension was missing plumbing commands, but it is still bottlenecked by radicle-httpd.

I am trying to go back to the root here and implement what is necessary as commands. The deployment story for users will be simpler ("Why do I need to run a server?!"), more reliable (less moving parts without radicle-httpd), more performant (no HTTP overhead), more secure (people cannot accidentally bind radicle-httpd to 0.0.0.0). I think that there are only a few pieces missing to a foundation that will enable the VSCode Extension to pick up speed.

If anyone wants to be a step ahead: The VSCode Extension will likely need type definitions based on the datatypes defined in Rust, because that currently defines the shape of the data exchanged both via commands and the control socket.

I have identified an issue with this PR: As things are currently set up, there might be objects in the object database of the bare repository in $RAD_HOME/storage/$ID that are not in the object database of the working copy (e.g. $HOME/foo).

@lorenzleutgeb using the bare repo is an interesting approach, but, I am wondering: would we still have this problem if we had a full working copy of the radicle repo that we were working in ? Would cloning a radicle repository in $HOME/foo_radicle_repo and then opening that repo in VS Code allow us to overcome the issue you're highlighting above? And, if that is the case, would fetching be a simpler way to ensure we're up to date?

Fetching was my first suggestion, see Item in the comment you are quoting.

I am saying that the problem exists for a working copy. Fetching can of course copy the correct objects over, but only if you know what to fetch. I am not too sure on the Radicle terminology here, but the problem lies with patches on remotes that are not "followed" (?!). You'd have to go and fetch the head of a patch from a specific NID. So a simple git fetch rad does not do the trick. I don't know whether the logic for finding out the NID creating and pulling the required ref is is currently implemented in the extension. My guess would be that it is not. It would have to live on the code path that checks out a patch. If this just calls rad patch checkout then there'd be no need to duplicate the logic so far. To give a clear answer, I would have to look into the implementation of rad patch checkout as well.

Please note that there is another benefit to my workaround: With it, there's no need to copy files between the bare repo and your working copy. The Git object database of the working copy is extended with the one of the bare repo. One copy is enough! Also, this does not leak any private data, because the "extension" is one-way.

I'll go one step further and say that really working copies should use Git's Worktree feature. It was made for that.

}

return undefined
const repo = gitExtensionApi.getRepository(Uri.file(repoRoot))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of getting the working copy here, it would be possible to use the bare repository. This would resolve issues of missing objects.

@@ -0,0 +1,348 @@
/* ---------------------------------------------------------------------------------------------
Copy link
Collaborator

@maninak maninak May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a vendored file we should have a comment mentioning the fact and also its source, along with any changes vs the source. See https://github.com/cytechmobile/radicle-vscode-extension/blob/main/src/webviews/src/assets/codicon.css#L1-L11 for how it's already handled in the repo.

Also we could benefit from a filename that better communicates that its conents are typings specifically for the native git extension.


// Ignore, since this file is vendored from VSCode and should not be changed.
// See `CONTRIBUTING.md`.
'src/types/git.ts',
Copy link
Collaborator

@maninak maninak May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file only requires auto-fixable issues, it's preferable to just apply them, and not add this ignore.

If that's not the case then let's go with the contents as they are.

const gitExtension = vscode.extensions.getExtension<GitExtension>(gitExtensionId)

if (gitExtension === undefined) {
throw new Error(`Could not load VSCode Git extension with id '${gitExtensionId}'`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the chances that this ever breaks are very low, we still should handle the case of it throwing.

It could be either falling back to httpd or just notifying the user with an error message of what happened.

]
}

return changes
Copy link
Collaborator

@maninak maninak May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this repo we're avoiding any non-trivial return statements. All such cases should instead be an assignment to a well named function and then a return of that one.

Copy link
Collaborator

@maninak maninak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for spending your time on this fundamental improvement of the extension's tech! 🙇
I definitely would like to explore further opportunities for independence from httpd over time too, wherever those would make sense.

There are still a couple of things left that I'd like to check on this PR and I will get to it ASAP, but until then I wanna say that it looks great and I've only been able to find a couple of nits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature extending the app's current capabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants