Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace GitPython with pygit2 #2120

Merged
merged 18 commits into from
Dec 13, 2024
Merged

Replace GitPython with pygit2 #2120

merged 18 commits into from
Dec 13, 2024

Conversation

mgorny
Copy link
Contributor

@mgorny mgorny commented Nov 6, 2024

Checklist

  • Added a news entry
  • Regenerated schema JSON if schema altered (python conda_smithy/schema.py)

Replace the use of GitPython package with pygit2. The latter seems to have better git support, in particular it supports the newer index versions 3 and 4. Since it is backed by the libgit2 library that is also used by Cargo, it seems to have the best chances of being updated for compatibility with new git versions.

Admittedly, the API feels very low-level. In particular, it is necessary to explicitly request writing changes to index back, and explicitly reread it when it's modified externally (e.g. via another pygit2.Repository instance, as in tests). On the plus side, it does not invoke git at all -- everything is done by the library.

Fixes #2116


So far focused on feedstock_io.py and its tests. I need to figure out how to test the changes to other files properly, given that the tests mock the entire git.Repo.clone_from call.

@mgorny mgorny requested a review from a team as a code owner November 6, 2024 16:16
@mgorny mgorny marked this pull request as draft November 6, 2024 16:16
@mgorny mgorny force-pushed the pygit2 branch 2 times, most recently from f6dadb8 to e90a067 Compare November 6, 2024 16:21
conda_smithy/feedstock_io.py Outdated Show resolved Hide resolved
Replace the use of GitPython package with pygit2.  The latter seems
to have better git support, in particular it supports the newer index
versions 3 and 4.  Since it is backed by the libgit2 library that is
also used by Cargo, it seems to have the best chances of being updated
for compatibility with new git versions.

Admittedly, the API feels very low-level.  In particular, it is
necessary to explicitly request writing changes to index back,
and explicitly reread it when it's modified externally (e.g. via another
`pygit2.Repository` instance, as in tests).  On the plus side, it does
not invoke `git` at all -- everything is done by the library.

Fixes conda-forge#2116
Remove the `search_parent_directories` kwarg that's never been used,
and instead always enable searching parent directories for better
cross-version pygit2 compatibility.
@mgorny mgorny marked this pull request as ready for review November 11, 2024 14:20
Copy link
Member

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API change and is not allowed under semantic versioning. Please put back this functionality.

@mgorny
Copy link
Contributor Author

mgorny commented Nov 11, 2024

Restored, and instead added backwards compatibility for pygit2 < 1.14.

@mgorny
Copy link
Contributor Author

mgorny commented Nov 11, 2024

@beckermr, another use is in conda_smithy.feedstocks.feedstocks_repos() function. This function is used only in conda_smithy.feedstocks.feedstocks_yaml(), and FWICS the latter isn't exposed anywhere in the CLI. On top of that, feedstocks_repos() is currently broken:

In [10]: list(conda_smithy.feedstocks.feedstocks_repos(None, "/home/mgorny/git/conda"))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 1
----> 1 list(conda_smithy.feedstocks.feedstocks_repos(None, "/home/mgorny/git/conda"))

File ~/git/conda-smithy/conda_smithy/feedstocks.py:197, in feedstocks_repos(organization, feedstocks_directory, pull_up_to_date, randomise, regexp)
    195 for feedstock in feedstocks:
    196     repo = git.Repo(feedstock.directory)
--> 197     upstream = repo.remotes.upstream
    199     if pull_up_to_date:
    200         print("Fetching ", feedstock.package)

File ~/miniforge3/envs/conda-smithy/lib/python3.12/site-packages/git/util.py:1198, in IterableList.__getattr__(self, attr)
   1196         return item
   1197 # END for each item
-> 1198 return list.__getattribute__(self, attr)

AttributeError: 'IterableList' object has no attribute 'upstream'

FWICS git.Repo.remotes was always a list, so I don't think this code ever worked. Should I migrate it and fix it, or remove both this function and feedstocks_yaml()?

@beckermr
Copy link
Member

We need to migrate and fit.

@mgorny
Copy link
Contributor Author

mgorny commented Nov 11, 2024

Ah, sorry — I was wrong, it wasn't broken. I've just realized it expected a remote called upstream, my bad.

environment.yml Outdated Show resolved Hide resolved
@mgorny
Copy link
Contributor Author

mgorny commented Nov 11, 2024

Another question: do we need SSH support for feedstock_tokens? FWICS the default is to use the repository over HTTPS. If we were to support SSH as well, there are two snags:

  1. libgit2 on Conda has a bug that results in SSH support being disabled. I've filed Actually enabling SSH support while building libgit2-feedstock#80 to fix that.
  2. Since libgit2 is quite low-level by design, SSH support requires a bit of effort. I'm not sure how much yet, but the example is a bit worrying.

Of course, another option is to call git directly for the push (much like GitPython does).

@beckermr
Copy link
Member

Anything that is currently supported in the code needs to be supported in this PR. We cannot have API changes like this.

@mgorny
Copy link
Contributor Author

mgorny commented Nov 11, 2024

Anything that is currently supported in the code needs to be supported in this PR. We cannot have API changes like this.

Is it okay to call git push as a subprocess then? That would guarantee the best compatibility (and be pretty much what GitPython did anyway). There's also a matter of whether to call git commit, or reimplement getting author/committer name from git config.

@beckermr
Copy link
Member

Subprocesses are fine, but we should be mindful of how complex this might get. The point of this PR is to support new git index versions. IDK anything about those. Are these in use? How have we not hit bugs for this before new?

@mgorny
Copy link
Contributor Author

mgorny commented Nov 11, 2024

Subprocesses will probably be less complex than doing everything via libgit2. Another option is to stick to GitPython for some of the code, at least until it actually breaks for someone.

From what I understand, to hit this issue you need to use new enough git to clone the repository. It is also possible that the repository itself must have some characteristics that actually trigger the use of new index format. Maybe I was just unlucky that the first feedstock that I've cloned triggered this, or it is possible that more people will hit this as they upgrade their git to new versions and clone new feedstocks.

@beckermr
Copy link
Member

I'd rather have one tool used and defer to subprocesses. Let's use a subprocess.

@mgorny
Copy link
Contributor Author

mgorny commented Nov 13, 2024

I'm really sorry about that. Fixed now. Tests pass for me now with GitPython uninstalled, and I can't grep any more import git or from git.

Copy link
Member

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a news item.

@mgorny
Copy link
Contributor Author

mgorny commented Nov 13, 2024

Added.

Co-authored-by: Matthew R. Becker <[email protected]>
Copy link
Member

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this PR changes some pretty low-level functionality. The test suite uses mocks which is great, but we don't have tests of functioning code for all of the changes. We need to test these changes live for both staged-recipes and feedstock token handling before we can merge them.

@beckermr
Copy link
Member

I am happy to do the testing of these changes live, but it will be a bit before I can get to it.

@mgorny
Copy link
Contributor Author

mgorny commented Nov 13, 2024

Not sure if you're asking me to do anything here. If you're asking whether I've tested them locally, I did test calling every function with some args suitable for local/semi-remote testing.

@beckermr
Copy link
Member

Yep, nothing for you to do here. Sorry for blocking this one. Testing much of the code in smithy is hard since it runs against external services. :/

@mgorny
Copy link
Contributor Author

mgorny commented Nov 13, 2024

Yeah, I know. If I only had more time, I'd have added some more tests.

@jaimergp
Copy link
Member

jaimergp commented Dec 9, 2024

Anything I can do here to help @beckermr?

@beckermr
Copy link
Member

beckermr commented Dec 9, 2024

Right. We need to patch staged recipes to test out this branch in making a few new feedstocks. If those are working ok, then it seems ok to merge.

@jaimergp
Copy link
Member

We need to patch staged recipes

I guess the patch goes here, right? Usual conda remove --force + pip install --no-deps?

@beckermr
Copy link
Member

Yep. I'd put the uninstall --force plus pip install --no-deps . right after that block of code. You will need to install pygit2 as well since current versions of smithy do not have that in their requirements.

@jaimergp
Copy link
Member

jaimergp added a commit to conda-forge/staged-recipes that referenced this pull request Dec 13, 2024
@jaimergp
Copy link
Member

Looks like it's working:

@jaimergp jaimergp merged commit c17de58 into conda-forge:main Dec 13, 2024
2 checks passed
jaimergp added a commit to conda-forge/staged-recipes that referenced this pull request Dec 13, 2024
jaimergp added a commit to conda-forge/staged-recipes that referenced this pull request Dec 13, 2024
@mgorny mgorny deleted the pygit2 branch December 13, 2024 17:04
@mgorny
Copy link
Contributor Author

mgorny commented Dec 13, 2024

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

conda smithy rerender -c auto failing with AssertionError on git repositories with index v3
3 participants