-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gazelle): Remove integrity field from Gazelle manifest #1666
Conversation
When I introduced this integrity verification, the intent was to have speedy checks on whether the manifest was up to date without re-generating it. Did you consider this? |
@f0rmiga in my experience, and the reason I opened #1465, in a large repo the integrity field is an extremely common source of merge conflicts. The test itself pretty much only runs in our CI system; locally we always run update anyway so we don't notice any speed gains from a faster test. (And if you've run update before, then the test is fast anyway because everything gets cached.) All in all, in our case it's a huge benefit to no longer run into these merge conflicts, because in the end they take much more engineering time to deal with than running the test. What do you think? |
@adzenith, what is the time it takes to perform the check for staleness in the following scenarios:
If we are removing this in the sake of simplicity it would be great to know the trade offs. In reality I found that when having multiple requirements files (one for each os) this extra field may add extra mental overhead to understand how everything works, especially if the requirements files are updated incrementally. |
I just tested on our monorepo. With the integrity field, both with and without pypi repository cache it takes ~0.7s. Without the integrity field, if we have our pip deps cached it takes the same amount of time at about 0.7s. In our case we depend on pytorch, which is 2.3GB. So a "cold" check takes about 3 minutes in our case - but in actuality we never encounter this case, because pytorch (and most of the pip deps) are downloaded and cached during the course of normal work. In real use cases it would probably be just a few seconds to download any pip deps that the developer doesn't have yet, and this would only need to happen once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few questions, but at the first glance looks like the right direction.
a87ba02
to
4b7583c
Compare
I changed the permissions to 0644. Let me know your thoughts on the aspect dependency! |
1aafcde
to
2d869eb
Compare
Everything is passing CI except this one Windows job:
not sure what's going on there... have you seen this before? |
This is a Windows CI flake |
You are missing the case when this test is performed on CI without a separate pypi cache, which is the most common case. |
I would assume that CI would either maintain the Bazel cache between runs (this is what we do), or be running |
2d869eb
to
2f96aed
Compare
I guess the thing is that @f0rmiga, are we missing something else here? If I remember you were working at Aspect at the time of authoring this and I am wondering if this work was sponsored by some client who had some specific requirements? |
We have to be careful every time we use assumptions to make changes to a repository that countless repositories depend on. We can't force everyone to do what we believe to be the best workflow. @mattem can you shed some light here? You have a more recent experience with a very large Python repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking merge on this while we discuss further.
I don't think we've ever hit conflicts on the integrity field, but I think that depends on if the repo has one requirements or many (we have 1000's), so the likelihood of conflicts is reduced (fwiw, the main contention I see is from the I'm on the fence about the removal of the field, however if it is removed, I don't think I agree with adding Aspect's bazel-lib as a dependency (seems odd for a core ruleset to depend on a 3p library). I'm also a -1 on "manifest generator from Go to Python", just for the speed of running the tool I'd prefer to keep it as a native binary. |
We have a monorepo with a 3k-line pip lockfile. When we submitted the Gazelle manifest last week, we had lots of conflicts -- so many that we nearly reverted the manifest. This PR was perfectly timed, so I grabbed the diff as a Bazel repository patch and we've been running with it for a few days now. It seems to be working well for us (anecdotally). I don't have strong feelings about the manner by which it is removed (i.e. 3p libs etc.) but it definitely has to go for this manifest to be usable in a monorepo (with a mono-lockfile). That's my two cents. |
Perhaps it could be configurable/opt-in/opt-out? |
I also wanted to propose this to be an opt out/in setting. We could make the requirements file passing an optional field, which would either use the go_test (old code) or the new code. Experience from @lpulley mirrors my experience as well and we opted to disabling the integrity check by using a different file as the source for the integrity. @f0rmiga, any other ideas how monorepos with many/single requirements lock files could coexist? |
I think an opt out feature is ideal since it doesn't require everyone else to opt in, and it still satisfies repos that change the requirements too often. |
@f0rmiga so I guess we are on the same page here that the @adzenith, what do you think? |
Works for me. I'll add an argument to the manifest target that lets you pick whether there's an integrity field. I'll default it to having the integrity there for back compat. |
Awesome. Thanks for the contribution, @adzenith! |
2f96aed
to
47d95df
Compare
Ok, updated with a new |
@@ -52,6 +52,7 @@ gazelle_python_manifest( | |||
"//:requirements_windows.txt", | |||
], | |||
tags = ["exclusive"], | |||
use_integrity_field = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the thinking behind having use_integrity_field
and requirements
passed into the macro? IIRC, requirements
are only used to generate the integrity field. Do you think we get a better API that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requirements
is now a kwarg and the value is only used if use_integrity_field
is set. I think having use_integrity_field
be explicit is nice, but I'm also happy to make it just skip the integrity if requirements
is unset. Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with using the requirements arg as a switch as I am not sure I see a usecase where having two args would be useful. @f0rmiga, let us know if you would like to have this toggled explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the use_integrity_field
argument in favor of switching based on whether requirements
is set.
c65a54c
to
b9911ec
Compare
0bb086a
to
9e08f1a
Compare
02863ad
to
b4d1c53
Compare
defeb86
to
89f23bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it, fast, before Windows decides to throw a tantrum!
Thanks all for the reviews and the feedback! |
As per the discussion in #1465 and in this PR, this PR does the following:
requirements
parameter to thegazelle_python_manifest
macro optional.requirements
is not provided, no integrity field is added to the manifest, anddiff_test
is used to see if the manifest is stale instead of the existinggo_test
that just checks the integrity.go_binary
togenrule
to generate the manifest file that can be used withdiff_test
. (There's still ago_binary
under the hood, but the manifest macro itself uses a genrule now rather than a wrappergo_binary
.)A custom
copy_to_source.sh
script is included, which is used to update the manifest in the source tree.Compatibility discussion:
//:gazelle_python_manifest.update
and//:gazelle_python_manifest.test
.