Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mainnet bootstrap strategy, how to get the power table? #596

Closed
5 tasks
Kubuxu opened this issue Aug 27, 2024 · 17 comments · Fixed by filecoin-project/lotus#12552
Closed
5 tasks

Mainnet bootstrap strategy, how to get the power table? #596

Kubuxu opened this issue Aug 27, 2024 · 17 comments · Fixed by filecoin-project/lotus#12552
Assignees
Labels

Comments

@Kubuxu
Copy link
Contributor

Kubuxu commented Aug 27, 2024

Option 1: Save the power table in a new field in the PowerTable actor during migration
Option 2: Bootstrap from chain lookback, oh-shit-store, initial power table cid snapshots, in first update after upgrade Lotus includes initial power table CID in binary.

Tasks

@Stebalien
Copy link
Member

So... I still want to do option 2, but option 1 is nice because it requires no coordination and it doesn't preclude option 2. It does require a small FIP update, but I don't expect it'll be that controversial.

The issue with option 2 is that the CAR "roots" are currently expected to be a tipset. Ideally, we'd have a single root metadata object pointing to the chain and whatever else we want, but... that's not what we have right now.

@Stebalien
Copy link
Member

So, I'd say go with option 1 and punt option 2 into the future.

@Stebalien Stebalien self-assigned this Sep 3, 2024
@Stebalien
Copy link
Member

Stebalien commented Sep 3, 2024

Proposal:

  1. Add a new Option<Cid> field to the power actor that stores the power table, post bootstrap.
  2. When the network version bumps to v25, record the power table in the cron tick. We're using a network version so we can disable F3 by disabling a migration.
  3. Expose this power table via some F3InitialPowerTable -> Option<Cid> function.
  4. Change the bootstrap logic: At every epoch, lookback finality (900 epochs) and call F3InitialPowerTable. If it returns a power table, bootstrap F3 with that power table.

@Stebalien
Copy link
Member

Note: the alternative is to do this in the migration itself. However, I'd like to:

  1. Do some mainnet testing with the final-final version before launching F3.
  2. Avoid 2 state migrations.

@Stebalien
Copy link
Member

Ah, so, we need all the worker keys. This is best done through a migration of some form, unfortunately.

@Stebalien
Copy link
Member

Ok, discussed with @jennijuju: we can do two migrations but avoid migrating the actor code in the second migration. Instead, the second migration will just create the power table and attach it to the power actor.

@rjan90
Copy link
Contributor

rjan90 commented Sep 11, 2024

Some open questions for option 2 is how do we write the migration? Is there a need to create a nv-skeleton in Lotus/GST/Filecoin-FFI? Will it be similar to the Lightning/Thunder upgrade?

We should also give Forest a early heads up on our strategy here, so that they can prep for this migration.

@BigLep
Copy link
Member

BigLep commented Sep 11, 2024

Additional 2024-09-11 conversation:

  • @Stebalien is going to write up options
  • This is going to involve a FIP update to document the change.

I added these tasks to the issue description:

  • Enumerate the options (with pros/cons or in a decision table)
  • Get decision made with stakeholders
  • Update FIP with decision
  • Lotus implementation work
  • Forest implementation work

Please update/correct where wrong or outdated.

@BigLep BigLep added the P1 label Sep 11, 2024
@Stebalien
Copy link
Member

We discussed the migration option in standup. Unfortunately, Forest would have to implement the migration as well and the migration will likely be non-standard (likely) because we don't want to bump the actors version to make the migration small. We can still do that, but we need to discuss it with them.

We also discussed some alternatives:

  1. When a user syncs from a snapshot after the F3 bootstrap epoch (syncs from a snapshot that doesn't include the power table from the bootstrap epoch), either (a) require that they use a new client version that hard-codes the bootstrap powertable CID or (b) require that they pass said power table CID on the command-line when importing a snapshot. The downside of this approach is that it requires user intervention and could cause issues for automated deployment setups.
  2. We could just add the CID to snapshots (e.g., stuff it in the CAR header). From what I can tell, this isn't too terrible, but it's kind of an abuse of the CAR format. This will require cooperation from chainsafe (they produce the snapshots) but the effort should be minimal.
  3. We could preserve the F3 bootstrap state-tree (both in the datastore and in the snapshot). This will require some work, but not too much work. However, this will keep extra state around, which will grow the datastore a bit.

@Stebalien
Copy link
Member

I've discussed this with the F3 team and @jennijuju and it sounds like option 1 isn't so bad after all.

We'd have two releases:

  1. Release A: Before the network upgrade.
  2. Release B: Immediately after the network upgrade.

Release A will have (a) an environment variable to specify the F3 bootstrap power table CID, (b) the ability to specify it when importing a snapshot, and (c) will be able to import snapshots without specifying the variable (?) (we'll have to assess the risk of this as the peer won't be able to participate in F3).

Release B will be identical to release A except the bootstrap power table CID will be set.

We'll need to coordinate with Forest/Venus to make sure this works for them.

@Stebalien
Copy link
Member

While writing this up, I did have another thought... technically, we can start late and our certificate store even supports this (technically). To bootstrap, we:

  1. Fetch the earliest finality certificate signed by a power table we have.
  2. Validate that finality certificate.
  3. Start from there.

@ruseinov
Copy link

  • Fetch the earliest finality certificate signed by a power table we have.
  • Validate that finality certificate.
  • Start from there.

Correct me if I understand this wrong: we're looking for the earliest cert signed by the current PT and then just verify all the subsequent certificates until the boostrap is finished.

@Stebalien
Copy link
Member

Correct me if I understand this wrong: we're looking for the earliest cert signed by the current PT and then just verify all the subsequent certificates until the boostrap is finished.

Basically?

By bootstrap, I mean the F3 bootstrap (network-wide, not local to the current client). The issue here is that the F3 bootstrap epoch may be far enough in the past such that we no longer have a power table.

We have a bit of a lookback for the power table so it's a little more complex but the lookback is at most 990 epochs (+/-). So:

  1. I fetch the latest certificate (no verification yet). Call it cert A.
  2. I fetch the certificate 10 before that (still no verification). Call it cert B. The power table committed in the "head" of the chain finalized in this certificate should be the power table used to verify cert A.
  3. I load the power table from the head tipset referenced by cert B from my state (snapshot).
  4. Then I validate cert A with this power table.

All this tells me is that some 2/3rds of the power within the last 990 epochs claim that cert A is correct. But that should be good enough for our purposes here.

@ruseinov
Copy link

Correct me if I understand this wrong: we're looking for the earliest cert signed by the current PT and then just verify all the subsequent certificates until the boostrap is finished.

Basically?

By bootstrap, I mean the F3 bootstrap (network-wide, not local to the current client). The issue here is that the F3 bootstrap epoch may be far enough in the past such that we no longer have a power table.

We have a bit of a lookback for the power table so it's a little more complex but the lookback is at most 990 epochs (+/-). So:

  1. I fetch the latest certificate (no verification yet). Call it cert A.
  2. I fetch the certificate 10 before that (still no verification). Call it cert B. The power table committed in the "head" of the chain finalized in this certificate should be the power table used to verify cert A.
  3. I load the power table from the head tipset referenced by cert B from my state (snapshot).
  4. Then I validate cert A with this power table.

All this tells me is that some 2/3rds of the power within the last 990 epochs claim that cert A is correct. But that should be good enough for our purposes here.

Right, that makes sense. And an alternative would be to store the power table as part of the first migration to then be able do the above without lookbacks, correct?
The lookback approach seems good to me as long as it goes smooth. And if it does not - there's always another try, assuming that if f3 is somehow broken - we're just falling back to normal EC.

@Stebalien
Copy link
Member

Right, that makes sense. And an alternative would be to store the power table as part of the first migration to then be able do the above without lookbacks, correct?

Yes. The issue is getting that power table when restoring from snapshot without messing with the snapshot format this late in the game.

We considered snapshotting the power-table on-chain and storing it there, but we'd rather not touch the chain (that and it would have been two migrations in a row).

@ruseinov
Copy link

Yes. The issue is getting that power table when restoring from snapshot without messing with the snapshot format this late in the game.

Well, afaik we will still need to accommodate certificates when it comes to snapshots, but indeed if it's possible to do without - much better.

@Stebalien
Copy link
Member

Well, afaik we will still need to accommodate certificates when it comes to snapshots, but indeed if it's possible to do without - much better.

Yeah, I'd like to eventually ship certificates in snapshots but it's a bit late to try to ship that before the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants