Feature Idea: Minimizing churn in shrinkwrap.yaml #3203

octogonz · 2018-02-24T00:41:44Z

octogonz
Feb 24, 2018
Collaborator

Lately we've had an increasing amount of churn in our shrinkwrap.yaml file, which is causing annoying merge conflicts for people. For example, if a person's PR build takes 45 minutes to finish, during that time an automated bot may commit package.json updates that cause a merge conflict when the build finally completes. Our repo’s .gitattributes prevents automerging of shrinkwrap.yaml (i.e. classifies it as a "binary"), because its complicated structure (a serialized directed-acyclic-graph) could be corrupted by a merge.

We realized that today’s shrinkwrap is solving several unrelated problems:

1. Determinism: Ensure different people on different days will install the same versions, for a given branch
2. Sanity: Ensure that versions are mapped to reasonable results (e.g. avoid side-by-side duplication, counterintuitive downgrades, etc)
3. Integrity: Use a hash to detect cases where an NPM version was republished, replacing its contents with something inconsistent
4. Caching: Store certain additional metadata to avoid blocking “pnpm install” behind an extra REST call
5. Reporting: Provide a complete report of all versions being installed, including the full topology of the directed-acyclic-graph

Hypothetically, if we only care about eliminating merge conflicts, we actually only need #1 and #2.

For an extreme thought experiment, consider this package.json:

{
  "name": "my-lib",
  "version": "0.5.0",
  "dependencies": {
    "@types/fs-extra": "0.0.37",  // --> @types/[email protected]
    "fs-extra": "0.26.7",         // --> [email protected]
    "jju": "^1.3.0"
  }
}

We need shrinkwrap to deterministically choose a version for jju and @types/node, but all other versions (including the indirect dependency on jsonfile) are fully determined by the package.json file itself. In theory, PNPM could calculate a unique consistent shrinkwrap.yaml for the above package.json, using only this additional information:

"@types/node":
    version: 9.4.6
"jju":
    version: 1.3.1

The key idea is that this file format is only consulted when SemVer ranges are encountered.

What about integrity checks (#3)? To support that, we would need to expand the file format to include every referenced dependency again, something like this:

"@types/fs-extra":
    version: 0.0.37
    integrity: sha1-BOaSb2YolTVPPdAVIDYzuFcpfiw=
"@types/node":
    version: 9.4.6
    integrity: sha1-Nr54Mgr+WAH2zqPueLblqrlA6gw=
"jju":
    version: 1.3.1
    integrity: sha1-pWad4StC87HV6D7QPHEEb8SPQe8=
"jsonfile":
    version: 4.0.0
    integrity: sha1-US322mKHFEMW3EwY/hzx2UBzm+M=

This is going to churn more, but it's still completely safe for Git auto-merging, because additions/removals aren't reshuffling a serialized directed-acyclic-graph structure. One possible optimization would be to make these extra records optional; a human can avoid including them in their PR, if they know that the bot will come along an hour later and add them.

What about caching (#4)? The "dependencies" block could be reintroduced, if that avoids some REST calls.

What about reporting (#5)? This seems counterproductive to me. If we want a nice report of our tree structure, it would be better to generate a proper HTML report and attach it e.g. as a VSTS build artifact. We don't like to churn our Git history with machine-generated reports, and YAML isn't even a particularly readable format.

Thoughts?

octogonz · 2018-02-24T00:41:51Z

octogonz
Feb 24, 2018
Collaborator Author

@nickpape-msft FYI

0 replies

zkochan · 2018-02-24T14:05:52Z

zkochan
Feb 24, 2018
Maintainer

The current format of shrinkwrap.yaml has nothing extra, only the things that are needed to create node_modules w/o the resolution stage. So I don't think it can be minimized further.

I think what you need is a custom git merge driver for shrinkwrap.yaml. npm is now able to automatically resolve conflicts in package-lock.json and npm-shrinkwrap.json. We can as well implement a merge driver for shrinkwrap.yaml.

0 replies

octogonz · 2018-02-28T02:20:33Z

octogonz
Feb 28, 2018
Collaborator Author

The current format of shrinkwrap.yaml has nothing extra, only the things that are needed to create node_modules w/o the resolution stage. So I don't think it can be minimized further.

I didn't understand this comment. Aside from a minor bit of avoidable networking nondeterminism, the node_modules folder is basically a function of whatever's in your package.json files, plus whatever's in your NPM registry, right? The package.json files are already in Git. So theoretically the variables we need to lock down are the matching versions in the NPM registry.

0 replies

octogonz · 2018-02-28T02:22:08Z

octogonz
Feb 28, 2018
Collaborator Author

BTW on this page I saw a note:

Note that, much like git hooks, the .git/config file can't be checked in/shared through the repository. A common way of distributing merge drivers is to check the configuration file in elsewhere and provide a script to copy it to .git/config. In this repository, the merge driver is configured in the .gitconfig file, which is copied to .git/config by the mergetool-setup.sh script.

This seems like it would complicate the setup for people who want to enlist in our monorepo.

0 replies

zkochan · 2018-02-28T19:18:14Z

zkochan
Feb 28, 2018
Maintainer

So theoretically the variables we need to lock down are the matching versions in the NPM registry.

as it is right now, one version spec can be resolved differently for different packages in node_modules.
So if there is foo and bar, both of which have "qar": "^1.0.0", then qar can be resolved differently for foo and bar, that's why shrinkwrap.yaml has to map the resolved dependencies for each package:

/foo/1.0.0:
  dependencies:
    qar: 1.0.0
  ...
/bar/1.0.0:
  dependencies:
    qar: 1.1.0

This seems like it would complicate the setup for people who want to enlist in our monorepo.

a postinstall script can configure the merge driver

0 replies

octogonz · 2018-04-03T22:58:15Z

octogonz
Apr 3, 2018
Collaborator Author

This problem is turning into a significant source of pain for us. On average our shrinkwrap.yaml file now changes a minimum of 3 times per day. This doesn't sound like a lot, but if those 3 people are trying to merge their PRs around the same time, it can play out like this:

Person 1 and 2 create PRs that affect shrinkwrap
Person 3 merges a PR that churns shrinkwrap in master
Person 1 and 2's build complete (takes 45 minutes), then they see a merge conflict - uh oh!
Both people have to redo their shrinkwrap changes, and restart their builds
Person 1's build finishes first, so they merge their PR
Now Person 2's build has another merge conflict, so they redo and restart. It's 5pm so they go home
At 3am Person 4 creates and merges a PR in China timezone that changes shrinkwrap
Person 2 comes back in the next morning -- another merge conflict!
Person 2 complains loudly to everyone that it sucks to work in our repo

Lately this cycle seems to be recurring several times per week.

So if there is foo and bar, both of which have "qar": "^1.0.0", then qar can be resolved differently for foo and bar, that's why shrinkwrap.yaml has to map the resolved dependencies for each package:

Sure, but as long as that decision is made deterministically, it doesn't need to be stored. In other words, as long as Person 1 and Person 2 are making unrelated changes, in principle they aren't actually "conflicting". (Whereas if Person 1 tried to upgrade a library, but Person 2 tried to downgrade it, then that's a legitimate merge conflict, and no one would blame the tooling.)

The "qar" merge conflict arises only because shrinkwrap.yaml is storing the detailed output of the PNPM algorithm, instead of storing a minimal set of inputs. If I understand right, the reason is that the PNPM version selection algorithm is potentially nondeterministic about certain choices that it makes. (Even if this nondeterminism provides a 10% speed gain for installs, we'd be willing to forgo it if that's what's needed to make Person 2's life happier.)

CC @lahuey

0 replies

zkochan · 2018-04-05T23:46:11Z

zkochan
Apr 5, 2018
Maintainer

I published a "beta" version of the merge driver: @pnpm/merge-driver. In order to install it, install the driver globally: pnpm i -g @pnpm/merge-driver (or `npm i -g @pnpm/merge-driver)

Then run this command in the repo which gets the conflicts on merge:

pnpx npm-merge-driver install --driver-name pnpm-merge-driver --driver "pnpm-merge-driver %A %O %B %P" --files shrinkwrap.yaml

P.S. The installation steps will be easier but I wanted to give it sooner as it seems to be a big problem for you.

The auto-merger will only fail if the same fields were modified in package.json. So it should solve all the issues

0 replies

octogonz · 2018-04-06T00:22:36Z

octogonz
Apr 6, 2018
Collaborator Author

Awesome, thanks! We'll give it a try and provide some feedback.

0 replies

octogonz · 2018-04-23T06:55:30Z

octogonz
Apr 23, 2018
Collaborator Author

I read up on Git merge drivers this weekend. In order to use this, we would need to introduce setup scripts that enable the merge driver for each person's enlistments. We would also need to configure the lab machines to install and enable the merge driver before they perform the merge (e.g. for a PR build). This work would have some benefits (e.g. it would also give us a way to use Git hooks, which maybe would be useful), but it's a nontrivial cost. Currently VSTS does the merge at the start of the build definition, so we would need to setup some command that installs and configures the merge driver beforehand. Since it's independent of Git, this setup would need to be compatible with any branch for any point in the Git history.

I'm also wondering about the implementation of the PNPM driver. If I understand right, if the packages entries differ, the current algorithm just overwrites the value, choosing whichever YAML file has less package entries. (?) I'm unsure whether this would always give an internally consistent result. If not, it's unclear what an "ideal" algorithm should do (assuming that it cannot query the NPM registry).

Anyway, I'll see what's involved to give this a try. If the algorithm doesn't work reliably, we will find out very quickly. Our current shrinkwrap.yaml is getting a lot of activity throughout the day. We now have a steady influx of new projects coming into our monorepo every week. This has caused us to start thinking a lot more deeply about scalability.

0 replies

octogonz · 2018-04-23T07:06:16Z

octogonz
Apr 23, 2018
Collaborator Author

BTW a third possible approach occurred to me: Suppose that we treated shrinkwrap.yaml as a cache file that gets updated periodically by a bot. The rush install command can already detect whether a shrinkwrap.yaml file is usable or not. If not, we would ignore the one from Git, and instead use a temporary locally generated one. (This would be slower because it cannot benefit from --frozen-shrinkwrap, but that would only affect people in PR branches that are changing versions.)

The obvious problem with this approach is nondeterminism. However, suppose that PNPM had a hook that could say "when querying the NPM registry, ignore any package versions that were published after timestamp X". Rush would determine X by looking at the Git commit times (e.g. the maximum commit timestamp for each of the input package.json files). Since NPM registries don't allow a version to be republished, and people almost never unpublish a version, the behavior would be pretty deterministic. (It also assumes that pnpm install is determinstic for a given registry state.)

A downside of this approach is that we would lose the integrity checks. An upside is that our Git history wouldn't be flooded with constant changes to a huge YAML file.

0 replies

octogonz · 2018-04-28T15:48:59Z

octogonz
Apr 28, 2018
Collaborator Author

We can't figure out an easy way to enable a merge driver in our lab. Generally people seem to have a negative reaction to this design, because they don't like the idea of tampering with the behavior a foundational and already complex system such as Git.

@zkochan what would be involved for the other approach? Today is there a way to tell "pnpm install" to "ignore any version from the registry that was published after timestamp X?" If not, would it be a nontrivial work item?

I believe that would give Rush the building block it needs to move shrinkwrap changes to be bot-managed operations on the master branch, rather than something people have to deal with on their PR branches.

0 replies

octogonz · 2018-04-28T15:51:12Z

octogonz
Apr 28, 2018
Collaborator Author

@nickpape-msft @iclanton

0 replies

zkochan · 2018-05-01T20:20:24Z

zkochan
May 1, 2018
Maintainer

@rarkins I noticed that @renovate-bot can solve the conflicts in shrinkwrap.yaml files.

And I don't think that a custom git merge driver is used.

Could you please share how renovate resolves conflicts in the lockfiles?

EDIT:

I guess I understand, the bot realizes that there is a conflict, it removes the shrinkwrap, creates a new one and resubmits the shrinkwrap.

0 replies

zkochan · 2018-05-01T20:27:10Z

zkochan
May 1, 2018
Maintainer

@zkochan what would be involved for the other approach? Today is there a way to tell "pnpm install" to "ignore any version from the registry that was published after timestamp X?" If not, would it be a nontrivial work item?

I think that would be possible to do with npm-hosted dependencies.

0 replies

octogonz · 2018-05-01T20:50:45Z

octogonz
May 1, 2018
Collaborator Author

There are several conflicting requirements that this idea has to satisfy:

Minimal upgrades - If I upgrade B from ^1.2.3 to ^1.3.0, this should avoid upgrading other libraries that are unrelated to this change. This is what requires shrinkwrap.yaml to track version data for each branch of the tree. (This seems obvious, but originally Rush would always upgrade everything at once -- that's why most of the shrinkwrap.yaml file format seemed redundant when I started this thread.)
No merge conflicts - If we can't use a merge driver, then shrinkwrap.yaml could become a painful mutex that allows only one PR to change package.json at a time.
Determinism - When someone installs a given Git hash, they must always get the same install outcome. This is a fundamental axiom that we can't give up.

Thinking about this more, a naive timestamp-based would not fully solve requirement 1. Counterexample:

At 1pm the bot writes a new shrinkwrap.yaml.
At 2pm I create a PR branch and change A from ~1.2.3 to ~1.3.0 which upgrades A from 1.2.4 to 1.3.1 in my local temp/shrinkwrap.yaml (which is not tracked by Git)
At 3pm someone publishes a new A 1.3.2.
At 4pm another PR branch changes B from ~5.5.0 to ~5.6.0 which upgrades B to 5.6.1 in their local temp/shrinkwrap.yaml. (A is held back to 1.2.4 by the minimal upgrade, since their branch can't see my change.)
At 5pm someone publishes a new A 1.3.3 and B 5.6.2.
At 6pm both PRs merge into master. When someone installs this, their temp/shrinkwrap.yaml will be based on the timestamp from 4pm (the max modification time for all package.json files). This means they will get A 1.3.2 and B 5.6.1.
At 7pm the bot merges this state back into the official shrinkwrap.yaml for everyone.

This is counterintuitive because the final merge upgrades us to A 1.3.2 which was not tested in either branch. It could cause a build break in master. The idea could be refined by remembering multiple timestamps, but it gets a little complicated...

But since the PR builds always use a hot merge with the latest master, perhaps these breaks would be very rare in practice. If it's our best option, it might be a tolerable way to avoid merge conflicts. The implementation seems fairly straightforward (since Rush is already managing a temp/shrinkwrap.yaml).

0 replies

rarkins · 2018-05-02T01:00:39Z

rarkins
May 2, 2018

@zkochan yeah, you guessed it - no magic on Renovate's side

0 replies

octogonz · 2018-05-02T01:07:57Z

octogonz
May 2, 2018
Collaborator Author

That would definitely cause merge conflicts for everyone's PR branches.

0 replies

jayvdb · 2018-12-29T19:50:30Z

jayvdb
Dec 29, 2018

#1395 has been raised about merge conflicts specifically.

0 replies

zkochan · 2021-02-26T23:10:53Z

zkochan
Feb 26, 2021
Maintainer

I did some updates to the formatting of the lockfile to reduce merge conflicts: #3195
This will ship in pnpm v6. Now available through pnpm@dev.

Though we still cannot guarantee that the lockfile merges don't cause issues, I created #3202 to address that.

I don't know if there is a silver bullet here. There are pros and cons to each approach. I don't think I will make a breaking change to the lockfile format in pnpm v6 but we can discuss alternative formats as proposed in this issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pnpm

Feature Idea: Minimizing churn in shrinkwrap.yaml #3203

{{title}}

Replies: 19 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

pnpm

Feature Idea: Minimizing churn in shrinkwrap.yaml #3203

octogonz Feb 24, 2018 Collaborator

Replies: 19 comments

octogonz Feb 24, 2018 Collaborator Author

zkochan Feb 24, 2018 Maintainer

octogonz Feb 28, 2018 Collaborator Author

octogonz Feb 28, 2018 Collaborator Author

zkochan Feb 28, 2018 Maintainer

octogonz Apr 3, 2018 Collaborator Author

zkochan Apr 5, 2018 Maintainer

octogonz Apr 6, 2018 Collaborator Author

octogonz Apr 23, 2018 Collaborator Author

octogonz Apr 23, 2018 Collaborator Author

octogonz Apr 28, 2018 Collaborator Author

octogonz Apr 28, 2018 Collaborator Author

zkochan May 1, 2018 Maintainer

zkochan May 1, 2018 Maintainer

octogonz May 1, 2018 Collaborator Author

rarkins May 2, 2018

octogonz May 2, 2018 Collaborator Author

jayvdb Dec 29, 2018

zkochan Feb 26, 2021 Maintainer

octogonz
Feb 24, 2018
Collaborator

octogonz
Feb 24, 2018
Collaborator Author

zkochan
Feb 24, 2018
Maintainer

octogonz
Feb 28, 2018
Collaborator Author

octogonz
Feb 28, 2018
Collaborator Author

zkochan
Feb 28, 2018
Maintainer

octogonz
Apr 3, 2018
Collaborator Author

zkochan
Apr 5, 2018
Maintainer

octogonz
Apr 6, 2018
Collaborator Author

octogonz
Apr 23, 2018
Collaborator Author

octogonz
Apr 23, 2018
Collaborator Author

octogonz
Apr 28, 2018
Collaborator Author

octogonz
Apr 28, 2018
Collaborator Author

zkochan
May 1, 2018
Maintainer

zkochan
May 1, 2018
Maintainer

octogonz
May 1, 2018
Collaborator Author

rarkins
May 2, 2018

octogonz
May 2, 2018
Collaborator Author

jayvdb
Dec 29, 2018

zkochan
Feb 26, 2021
Maintainer