Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducable Build #171

Open
Stebalien opened this issue Mar 30, 2022 · 17 comments · May be fixed by #634
Open

Reproducable Build #171

Stebalien opened this issue Mar 30, 2022 · 17 comments · May be fixed by #634
Labels
Milestone

Comments

@Stebalien
Copy link
Member

Stebalien commented Mar 30, 2022

Given that the built-in actors will be a critical part of the Filecoin infrastructure, and given that everyone will have to use the same pre-built actors, we really need to consider reproducible builds.

Proposal:

Provide a Dockerfile that:

  1. (Optionally) Bootstraps rust.
  2. Deterministically builds the actors.
@Kubuxu
Copy link
Contributor

Kubuxu commented Mar 30, 2022

Agreed, I have it in my notes as well as one of the high impact items of importance.
Reproducible Builds are essential to make actor code auditable end-to-end, otherwise without specialised tooling or auditing the wasm code it is impossible to guarantee that the rust code exactly corresponds to whatever is running on-chain.

@Stebalien
Copy link
Member Author

@jennijuju jennijuju added the P2 label Aug 11, 2022
@jennijuju jennijuju added this to the Network v17 milestone Aug 11, 2022
@anorth anorth moved this to Todo in Network nv17 Aug 11, 2022
@anorth anorth moved this from Todo to Opportunistic in Network nv17 Aug 11, 2022
@lemmih
Copy link
Contributor

lemmih commented Aug 23, 2022

A dockerized build environment isn't enough to get reproducible builds. A virtual machine might do the trick but it's a cumbersome and inherently fragile solution.

It might be worthwhile to track down the root causes for the differences and fix them directly (in the crates we're using or in rustc). What are your thoughts on this, @jennijuju / @Stebalien?

@Stebalien
Copy link
Member Author

It might be worthwhile to track down the root causes for the differences and fix them directly (in the crates we're using or in rustc).

Yeah, I think we need to do that. We can probably compile in two different containers, then compare the target directories to figure out what went wrong.

I assume you:

  • Picked a specific rustc version.
  • Built with cargo --locked.

@Stebalien
Copy link
Member Author

@lemmih
Copy link
Contributor

lemmih commented Aug 23, 2022

Hmm, if we're okay with aarch64 and x86-64 giving different results then we might have reproducible results already. I added a bundle-mainnet-repro rule to the Makefile in my latest branch. It should be nearly as robust as CosmWasm.

@Kubuxu
Copy link
Contributor

Kubuxu commented Aug 23, 2022

It is the WASM build we are primarily concerned about.

Also this might be of use although I'm not sure: https://github.com/cbeuw/lotus

@jennijuju jennijuju moved this from Opportunistic to In Progress in Network nv17 Aug 25, 2022
@jennijuju
Copy link
Member

@lemmih any updates on this?

@lemmih
Copy link
Contributor

lemmih commented Sep 10, 2022

@lemmih any updates on this?

I'll open a pull request. Then we can continue the discussion on the pros and cons.

@lemmih
Copy link
Contributor

lemmih commented Sep 12, 2022

Draft PR created: #634

@lemmih lemmih linked a pull request Sep 12, 2022 that will close this issue
@anorth anorth moved this from In Progress to Opportunistic in Network nv17 Nov 7, 2022
@ianconsolata
Copy link

It sounds like we aren't yet sure exactly what is not reproducible about the build, so the first step is to figure out what is causing the lack of reproducibility. The ultimate goal here is to get something that con be independently built on separate systems, including third party systems, and reliable generate a consistent CID.

We know that some things are already set to use specific versions (like rust-toolchain, and cargo.lock), but other tools can still potentially use separate versions (for example, the Dockerfile currently uses apt to fetch the latest versions of packages like build-essentials and clang, which may be responsible for the non-determinism.

First step is to take the docker container in the PR and run it a few times to see if it generates a repoducible CID. If it does, we might be done. It it doesn't, we have more investigation to do.

@ianconsolata
Copy link

Why do we use the nightly rustup toolchain? That seems like it would change very regularly, and would potentially cause different versions of the build tooling to be used on a daily / nightly basis.

@ianconsolata
Copy link

Ok, first round of testing:

I've built is sequentially mutltiple times today, and compared those CIDs to each other.

  • All CIDs built with docker match
  • All CIDs built today with docker match CIDs built yesterday / Friday with docker (same image)
  • All CIDs built in my local env match
  • CIDs built with docker and CIDs built locally do not match

Next is to determine whether the date matters at all (because of the nightly toolchain):

  • Rebuild the image today, and see if it matches the results from the image built yesterday.
  • Rebuild locally tomorrow and see if it matches the local builds I did today.
  • Build in CI (using both Github Actions and CircleCI) to see if similar environments in the cloud (linux and darwin) match my local docker/linux and darwin builds.

@anorth
Copy link
Member

anorth commented Nov 15, 2022

Why do we use the nightly rustup toolchain? That seems like it would change very regularly, and would potentially cause different versions of the build tooling to be used on a daily / nightly basis.

Where do you see that? rust-toolchain says 1.63.0

@ianconsolata
Copy link

ianconsolata commented Nov 16, 2022

In the dockerfile created for #634 we are installing the nightly toolchain, though I'm not sure it's actually being using in the build at all. Sounds like we don't need to do that?

@ianconsolata
Copy link

Ok, CIDs are consistent day to day, so the differences in the CID of the build artifacts seems mostly due to differences in the underlying OS used to build it:

  • CIDs built with docker on my Mac match CIDs built with docker running in an x86 linux VM in CircleCI
  • CIDs built natively on my Mac don't match CIDs built using docker
  • CIDs built natively in an x86 linux VM in CircleCI don't match CIDs built using docker
  • CIDs built natively in an arm VM don't match CIDs built using docker on x86 architectures
  • CIDs built with docker in an arm VM don't match CIDs built using docker x86 architectures
  • CIDs are also different if an alpine based docker image is used, instead of the default debian one

So I think, based on these tests, that the existing solution should result in reproducible builds if always built on an x86 architecture. I have some minor tweaks to the image in #634, but it otherwise should resolve this issue for now.

@ianconsolata
Copy link

New PR with the changes here: #865

@jennijuju jennijuju removed this from Network nv17 Nov 25, 2022
@jennijuju jennijuju moved this to 🏗 In progress in Network v18 Nov 25, 2022
@jennijuju jennijuju moved this from 🏗 In progress to 👀 In review in Network v18 Nov 25, 2022
@lemmih lemmih removed their assignment Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Status: 👀 In review
Development

Successfully merging a pull request may close this issue.

6 participants