Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-004 proposal on consumer expiration #712

Closed
wants to merge 2 commits into from

Conversation

albttx
Copy link
Contributor

@albttx albttx commented Feb 6, 2023

Description

Proposal to set an expiration date on the consumer chain updated on each VSC Packet received to ensure the chain is up-to-date.

Type of change

  • Docs: Adds documentation

@albttx
Copy link
Contributor Author

albttx commented Feb 6, 2023

I set ADR-003 because it came after #711

@shaspitz
Copy link
Contributor

shaspitz commented Feb 6, 2023

I set ADR-003 because it came after #711

See #711 (comment), lets change the ADR number to 4 for this one, thanks!

@albttx
Copy link
Contributor Author

albttx commented Feb 7, 2023

There is a PR of the code on my fork albttx#1

Once the ADR passed, i'll create the PR here on this repo

@jtremback
Copy link
Contributor

Hey @albttx very interested in the potential this has to make things much simpler, but we'll probably table this for now and pick it back up after launch.

docs/architecture/adr-003-consumer-expiration Outdated Show resolved Hide resolved
docs/architecture/adr-003-consumer-expiration Outdated Show resolved Hide resolved
docs/architecture/adr-003-consumer-expiration Outdated Show resolved Hide resolved
@albttx albttx changed the title feat: ADR proposal on consumer expiration ADR-004 proposal on consumer expiration Feb 13, 2023
@mpoke
Copy link
Contributor

mpoke commented Mar 8, 2023

@albttx Thank for the contribution. I have some concerns with the proposed solution.

Benefits of the existing protocol:

  • The default action is to have the collateral (that enables validators to produce blocks) locked. This collateral is unlocked in specific cases that we can clearly reason about.
  • Consumer chains have their own unbonding period, which doesn't depend on the provider unbonding period. This simplifies how IBC clients to the consumers are updated: Consumer chains are treated just like other chains.
  • The current ICS protocol is asynchronous. As a result, the assumptions concerning time delays are minimized.
  • The current ICS protocol is very robust. Given that ICS is crucial infrastructure in Cosmos, it's important to have a very secured protocol.

All these properties will be removed by the proposed solution.

  • The default action would be to have the collateral unlocked after the provider unbonding elapses, without any information of the state of consumer chains. This means that we would need to reason about the potential states consumer chains can be in. I find it hard to be confident that all relevant states were considered.
  • Consumer chains would not have their own unbonding period. IBC clients to these chains (even clients on other chains) would need to query the provider unbonding period and then estimate a lower bound on the consumer unbonding. This complicates the interaction with consumer chains.
  • More assumptions on the synchrony between the provider and consumers are introduced, which will clearly reduce the robustness of the protocol.

IMO, the only benefits of the proposed solutions are:

  • Simplify the protocol, which I agree that in general is good, but not at the cost of robustness. Plus, I do believe that the extra logic needed to deal with edge-cases of the proposed solution will result in a much more complex protocol than the one proposed.
  • Reduce the of packets that need to be relayed by a factor of 2x. This can be solved much easier by just introducing epochs which will have a much bigger impact on the relaying cost. For example, sending VSCPackets every 10m, with reduce the number of packets by a factor of ~100x.

cc @josef-widder @AdityaSripal @jtremback

@jtremback
Copy link
Contributor

@mpoke All of your points sounds good on their own, but allow me to introduce some counterpoints-

We cannot actually lock collateral for much longer than the provider chain's own unbonding period, it is not tenable from a financial or PR perspective to have people's money locked up with no end in sight. For this reason, we will be setting consumer chain unbonding periods to shorter than the provider's unbonding. We also introduced a timeout, so that if a consumer chain does not acknowledge VSCs for somewhat longer than the provider's unbonding period, that consumer chain is removed from the provider.

Let's briefly consider not what is described in this ADR but a hypothetical alternative where we just don't care about the consumer's unbonding period at all. In this hypothetical protocol VSCs are sent to the consumer and are not acknowledged. The only packets sent from the consumer to the provider would be slashing/jailing packets. Assuming perfect and synchronous transmission of VSCs to the consumer, this protocol would have no problem since the validator sets would always be perfectly in sync.

But in the real world, transmission over IBC is not perfect and synchronous. There are two possible sources of asynchrony that I am aware of- clock drift and IBC relaying delays.

First, let's look at clock drift. The clocks on two IBC connected chains cannot drift more than a few seconds (or minutes? I don't remember) apart before the IBC connection breaks. So I wouldn't consider clock drift to be significant. We should probably confirm this more rigorously, but I'm pretty certain that we can set an upper bound on clock drift to be a few minutes, or even an hour if we want to be extremely generous. But I do concede that this may need more thought.

Next, let's look at relaying delay. The delay we are concerned about is a delay in relaying a VSC from the provider to the consumer. If a validator begins unbonding on the provider, but the VSC containing this change is delayed, then they may double sign on the consumer during this delay. After this double signing, there will be less time on the provider to submit the equivocation evidence. For example, if there is a relaying delay of one week, and then a validator double signs on the consumer, they will already be through one week of their unbonding period on the provider. Then there will only be two weeks remaining to submit the evidence.

Stated simply, VSC relaying delays reduce the effective unbonding period of the consumer by the duration of the delay.

This is why this ADR makes sense. The amount of time it takes for the consumer chain to stop after it has stopped receiving VSCs is an upper bound on how much its unbonding period can effectively be reduced by relaying delays.

The benefit of simplifying the protocol is a big one. It will make future development much faster and is something to be strived for. You're talking about edge cases, but you haven't named any. Even one surprising edge case would convince me that this change would indeed not simplify the protocol. But as it is, it currently seems simpler and perhaps superior to our current protocol.

@jtremback
Copy link
Contributor

In conclusion, I guess I have a few concrete questions-

  • Is it true that VSC relaying delays reduce the effective unbonding period of the consumer by the duration of the delay?
  • Does clock drift have the same effect or is it more complicated?
  • Can we disregard clock drift because IBC itself cannot work in the presence of clock drift anyway?
  • What are some of the edge cases you are thinking of?

@josef-widder
Copy link

There is another argument for the asynchronous explicit mature packet design (compared to the synchronous design with additional assumptions on delays and relative clock drifts etc). To the best of my understanding, the mature packet design it is aligned with incentives. If a validator wants to unbond, it is in its interest

  • to behave nicely on the consumer chain
  • keep the channel open and relay the mature packet to get its stake back

If you don't need an explicit acknowledgement to get the stake back, this alignment is gone. This may lead to one possible edge case is that some +1/3 of the validators freeze the channel on the provider side by a lightclient attack. At this point we need to figure out what to do about the consumer chain. Even if we slash the +1/3, the consumer chain keeps running and is not secured.

In addition, every synchronous design based on assumption on delays (e.g., packet delays)

  • either breaks if the assumption is violated at run-time
  • needs to have machinery in place to check the assumption and go to a safe mode when the assumption is broken (edge cases)

More generally, I think that discussions about the possible solution are a bit tricky without understanding what are the precise properties we need to ensure under what assumptions. @mpoke has written about the benefits of this design. I think they are all undisputed. I added the alignment of the incentives here as additional benefit. If some of the benefits are not needed, we should identify which. Then I think we can have a better discussion about alternative solutions.

@mpoke
Copy link
Contributor

mpoke commented Mar 23, 2023

As a side note, to avoid the delays stacking up, each VSCPacket should have the timestamp of when the provider sent it. Under the assumption that the clock drift is bounded, the consumer can use this timestamp to decide whether to accept or not the VSCPacket. This would be similar to the IBC timeout timestamp.

@mpoke
Copy link
Contributor

mpoke commented May 5, 2023

Converting to draft as more discussion is needed.

@mpoke mpoke marked this pull request as draft May 5, 2023 10:23

This time must be:

- Shorter than 1/2 of the provider unbonding period.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it have to be less than 1/2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure someone don't have the time to unbond on the provider chain and harm the consumer chain then, if there is no VSC package.

@jaekwon
Copy link

jaekwon commented Dec 4, 2023

The intent is to halt the chain until the relevant updates come from the provider chain, so that operations on provider chains can be timely. In that sense, 8 days might be quite a while. And this is mostly about halting logic on the consumer side, but the unbonding considerations came later. I agree that it needs to be specified out first, but what is clear to me is that consumer chains will need to halt if it hasn't heard an update from the hub in a while because other things will depend on synchronicity, including the app itself.

Arguably it should be a per-app configuration, but where the default (maximum) allowable delay is set by the hub.
But if the hub doesn't have a constitution that describes the spec and rails for safety, then governance can become a threat to safety as well.

@mpoke
Copy link
Contributor

mpoke commented Aug 6, 2024

@albttx Closing in favor of https://github.com/cosmos/interchain-security/blob/main/docs/docs/adrs/adr-018-remove-vscmatured.md. The ADR was partially implemented by #2098 (the provider-side changes).

@mpoke mpoke closed this Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants