Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: RLN inconsistent state rejects valid messages #2928

Open
alrevuelta opened this issue Jul 24, 2024 · 3 comments
Open

bug: RLN inconsistent state rejects valid messages #2928

alrevuelta opened this issue Jul 24, 2024 · 3 comments
Labels
bug Something isn't working effort/weeks Estimated to be completed in a few weeks

Comments

@alrevuelta
Copy link
Contributor

Originally reported in #2870 this issue refers to the same problem but expands on the root cause of it, with a possible solution to fix it.

In short, valid messages containing valid RLN proofs are rejected by some nodes with invalid message: provided root does not belong to acceptable window of roots. This bug is difficult to reproduce but shows up more frequently when new RLN memberships are being inserted in the contract. With a static amount of memberships this won't show up.

The problem is the following:

  • Each node contains a Merkle Tree with all RLN memberships.
  • The tree is constructed with onchain information.
  • Blockchain is queried every DefaultBlockPollRate.
  • This means that when a new membership is added to the contract, nwaku nodes will add it to the tree at different times, varying from 0 to 6 seconds (DefaultBlockPollRate).
  • During this interval, nodes may have different Merkle trees, with different roots.
  • Hence leading to the invalid message: provided root does not belong to acceptable window of roots error in some nodes.

As shown in the image, we have some sort of race condition during this interval. If nwaku1 creates a message at t1 and propagates it to the network, nwaku2 will flag it as invalid and nwaku3 as valid. There is an inconsistency between the nodes, where the merkle trees are not guaranteed to be the same.

image

Known the issue, there would be 2 ways to mitigate it (but not completely fix it):

  • Reduce DefaultBlockPollRate. A lower value would reduce how likely this is to happen, since it will shorten the "ambiguity window". But the issue will still be there.
  • Use an event-based approach (eg websockets). Subscribe to the tip of the blockchain, and for every new block head event, update the tree. This reduces the "ambiguity window", but in theory the problem can still happen. And well, syncing state based on event-based-ws-like showed clear problems in the past as afaik there are not guarantees that you won't miss an event.

So my proposal to fix this would be the following. We need to somehow connect RLN timestamps (aka epochs) to blockchain timestamps (aka blocks or slots). This would allow a proper ordering of events, so we can know if the RLN message we are receiving is from the future. In other words, if we receive a RLN message with timestamp 10 but we have just synced the blockchain till timestamp 9, then we know that we can't process this message, since we are not up to date. In this case, the node would need to sync first and validate after. With this, we will no longer reject a valid message, since we first sync, process after. The solution has to be ofc well designed to avoid unnecessary delays.

It's up for discussion how "connecting RLN timestamps to blockchain timestamps" will work. If using Ethereum L1, slots could be an option, since the happen at fixed 12s intervals. Unsure how it will work in L2s. And well, as a requirement we need a solution that is portable across chains/layers.

@rymnc
Copy link
Contributor

rymnc commented Jul 24, 2024

Use an event-based approach (eg websockets). Subscribe to the tip of the blockchain, and for every new block head event, update the tree. This reduces the "ambiguity window", but in theory the problem can still happen. And well, syncing state based on event-based-ws-like showed clear problems in the past as afaik there are not guarantees that you won't miss an event.

this approach is probably less robust than our current implementation

@gabrielmer gabrielmer added the effort/weeks Estimated to be completed in a few weeks label Jul 25, 2024
@rymnc
Copy link
Contributor

rymnc commented Jul 26, 2024

As discussed in our call, the option of linking RLN epochs to the underlying blockchain's slot is a promising idea, with some noteworthy tradeoffs -

  1. the slot time will become our epoch size, i.e with ethereum mainnet, it's 12 seconds, and therefore the rln epoch size should be aligned to that.
  2. unclear if this solution will work with chains that do not have slot-based block production/fixed-time intervals for block production
  3. This will need a reconciliation mechanism for missed block proposals

Here's what the pseudocode would look like

[cached] lastProcessedBlockNumber
[cached] lastProcessedBlockTimestamp
[constant] blockSlotSize

func onNewBlock(block) ->
  processBlock(block)
  
  lastProcessedBlockNumber = block.number
  lastProcessedBlockTimestamp = block.timestamp
  
  
func validateMessage(wakuMessage) ->
  validateEpoch(wakuMessage)
  
  if wakuMessage.timestamp > lastProcessedBlockTimestamp + blockSlotSize:
    
    # this means that 
    # we may be lagging behind the block that the sender has processed
    # so, we force process the latest block, and then verify the proof
    # this leads to a probable dos vulnerability where the sender intentionally spams
    # a node with this condition, and makes unnecessary calls to the rpc provider/ethereum node
    forceProcessLatestBlock()
    verifyProof(wakuMessage.proof)

We will have to do some preliminary analysis on variance of block-times of L2s we may be interested in for mainnet deployment to estimate actual values of blockSlotSize, and subsequently, the epochSize

cc: @alrevuelta

@alrevuelta
Copy link
Contributor Author

@rymnc thanks! In general I like the idea, but perhaps there is an attack vector here.

wakuMessage.timestamp is set by the sender. One attacker could artificially set a timestamp in the future. This will force forceProcessLatestBlock to be executed. Perhaps not a big deal because if the node is already in sync, forceProcessLatestBlock will return without any onchain call? But perhaps we need to add some checks in here to prevent that. For example, reject timestamps greater than the lastHeadBlockTimestamp, but well this has other problems. If the clock of the sender is off by a bit, we may reject valid messages.

On the other hand, I'm concerned because this still doesn't deterministically ensure messages are processed correctly since it relies on the local timestamp of each node. The shorter the block time, the more impact a biased clock will have.

But in any case, I think it's a great way to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working effort/weeks Estimated to be completed in a few weeks
Projects
Status: Priority
Development

No branches or pull requests

3 participants