Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaxFilterResults is applied before filtering for eth logs #12636

Closed
rvagg opened this issue Oct 24, 2024 · 4 comments
Closed

MaxFilterResults is applied before filtering for eth logs #12636

rvagg opened this issue Oct 24, 2024 · 4 comments
Assignees

Comments

@rvagg
Copy link
Member

rvagg commented Oct 24, 2024

Noticed by @ArseniiPetrovich:

$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"eth_getLogs","params":[{"fromBlock":"0x42B4AC","toBlock":"0x42B4D4"}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result[].blockNumber' | uniq | wc -l
32
$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"eth_getLogs","params":[{"fromBlock":"0x42B4AC","toBlock":"0x42B4D5"}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result[].blockNumber' | uniq | wc -l
32

and

$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"eth_getLogs","params":[{"fromBlock":"0x42B4AC","toBlock":"0x42B4D4"}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result[].blockNumber' | head -1
"0x42b4ac"
ubuntu@ip-192-168-16-104:~$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"eth_getLogs","params":[{"fromBlock":"0x42B4AC","toBlock":"0x42B4D5"}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result[].blockNumber' | head -1
"0x42b4ae"

i.e. extending the range of epochs while leaving the start of the range static drops events from the front of the list.

This is because we apply MaxFilterResults max (the invisible cap) before filtering for eth events, not after.

For that same block range:

$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"Filecoin.GetActorEventsRaw","params":[{"fromHeight":4371628,"toHeight":4371668}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result|length'
10000
$ curl -s -X POST -H "Content-Type: application/json" --data '{"method":"Filecoin.GetActorEventsRaw","params":[{"fromHeight":4371628,"toHeight":4371669}],"id":2,"jsonrpc":"2.0"}' http://127.0.0.1:1234/rpc/v1 | jq '.result|length'
10000

10000 is the default value for MaxFilterResults.

  1. We should apply that max after filtering for eth events BUT this limit is there to deal with DoS potential and one of the DoS potentials we're dealing with is memory consumption in collecting and processing these events. We can't just uncap it when collecting and then cap it after filtering because it would be trivial to make a query that collects events off a huge range of epochs. So we need to move the eth filter closer in to the query. I think we might be able to either build in some SQL that can do it, or add something to ChainIndexer in the filtering that can do it.
  2. We should consider returning an error when we hit MaxFilterResults. The fact that you can't know is a big problem. The best signal you have is if you hit 10,000 events returned. I'd like to build pagination into GetActorEvents (maybe not Raw) but we can't do that for eth APIs, the user has to have a clear way of knowing that they need to craft finer filters. go-ethereum might have a precedent here that we should investigate.
@github-project-automation github-project-automation bot moved this to 📌 Triage in FilOz Oct 24, 2024
@aarshkshah1992
Copy link
Contributor

@rvagg
Copy link
Member Author

rvagg commented Oct 25, 2024

Some thoughts I jotted down in that thread, here for the record:

  • Filtering on events that only have entries that are raw is a good first pass at tackling this, it would certainly help with eth API performance and get us closer to fixing this. Probably a good option for an initial pass at this. But it doesn't solve the problem completely.
  • This function is the source of the biggest filtering differential, it's the second layer filter for eth events:
    func ethLogFromEvent(entries []types.EventEntry) (data []byte, topics []ethtypes.EthHash, ok bool) {
  • We could do all of that in SQL, but it wouldn't be pretty. The more we do, the more gnaly the SQL would be, but we could get close to all of the critical pieces. For now at least, I wouldn't expect to see events in there that are raw entries and don't match that schema, it would be weird, but not out of the question.
  • Pushing a filter function down into the index query so we never accumulate useless entries coming out of the db and are able to get a perfect match for maxResults instead of having maxResults applied at the bottom layer and a second filter trimming them down at an upper layer. If we had an optional function to filter by in this block then we could achieve that:
    if row.id != currentID {
    if ce != nil {
    ces = append(ces, ce)
    ce = nil
    // Unfortunately we can't easily incorporate the max results limit into the query due to the
    // unpredictable number of rows caused by joins
    // Break here to stop collecting rows
    if f.maxResults > 0 && len(ces) >= f.maxResults {
    break
    }
    }
    currentID = row.id
    ce = &CollectedEvent{
    EventIdx: row.eventIndex,
    Reverted: row.reverted,
    Height: abi.ChainEpoch(row.height),
    MsgIdx: row.messageIndex,
    }
    ce.EmitterAddr, err = address.NewFromBytes(row.emitterAddr)
    if err != nil {
    return xerrors.Errorf("parse emitter addr: %w", err)
    }
    ce.TipSetKey, err = types.TipSetKeyFromBytes(row.tipsetKey)
    if err != nil {
    return xerrors.Errorf("parse tipsetkey: %w", err)
    }
    ce.MsgCid, err = cid.Cast(row.messageCid)
    if err != nil {
    return xerrors.Errorf("parse message cid: %w", err)
    }
    }
    (ChainIndexer has a similar block of code).
  • Some combination of making our SQL more sophsticated for eth events and adding a filter function would (a) force us to pull fewer events out of the db and (b) allow us to strictly apply the limit.

I also think we should error in the eth APIs if we go over max, not having a signal for users makes for a big footgun. But let's deal with that separately.

@aarshkshah1992
Copy link
Contributor

aarshkshah1992 commented Oct 25, 2024

@rvagg @akaladarshi

A great first step that will get us most of the way there is to add a "filter for Raw codec" condition to every query made by all the ETH Event APIs.

@rvagg
Copy link
Member Author

rvagg commented Dec 16, 2024

#12671

@rvagg rvagg closed this as completed Dec 16, 2024
@github-project-automation github-project-automation bot moved this from 📌 Triage to 🎉 Done in FilOz Dec 16, 2024
@rjan90 rjan90 moved this from 🎉 Done to ☑️ Done (Archive) in FilOz Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ☑️ Done (Archive)
Development

No branches or pull requests

3 participants