Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing events as to make it possible to better analyze network activity #4245

Open
MonsieurNicolas opened this issue Mar 14, 2024 · 3 comments

Comments

@MonsieurNicolas
Copy link
Contributor

MonsieurNicolas commented Mar 14, 2024

#3847 took a first pass at this for transaction level metrics.

Goal:
make it possible to run queries in BigQuery (based off the meta) as to surface information needed to help support decisions when making network settings changes or quantify impact of certain protocol changes.

Design principle:
reduce coupling between systems by avoiding duplicating complex logic between core and downstream systems (like complex formulas, or subtle protocol behavior).

In particular understanding of:

  • how resources are used in general
  • resource fees
  • inclusion fees, and possibly try to understand why inclusion fees would be higher

At the ledger level:

  • values of network settings should be accessible in downstream systems. Derived values, while technically can be recomputed should be emitted:
  • consider emitting information on a per tx set component basis
    • we may be able to emit information that cannot be trivially computed outside of core (surge pricing related for example)
  • overall: do we have the right information to reason about ledger wide resource utilization vs limits?

At the transaction level:

  • total resource utilization per resource type (that corresponds to aggregate from contract invoke + whatever is charged in C++)
    • we could opt to instead only emit the missing parts like the computed resource fee
  • per contract invocation resource utilization. this is needed to answer questions such as "overhead caused by a specific WASM, or a specific instance". We may already have this.
  • overall: do we have the right information to reason about per tx resource utilization vs limits?
@MonsieurNicolas
Copy link
Contributor Author

@sydneynotthecity tagging you here

@MonsieurNicolas
Copy link
Contributor Author

Also related to this: it may be worth standardizing (SEPs?) on certain types of diagnostic events so that we can ensure some level of stability over time on structure/semantics so that downstream processors can consume them.
Note that in general diagnostic events are likely aggregated/parsed by downstream, we want to be able to provide better/more accurate events over time and not be tied up to specific traces: imagine if we have a diagnostic event that gives information when entering/leaving some "scope" (the stable concept), then the runtime could emit events when the scope is as coarse as a "VM" or as granular as method/host function.

@sydneynotthecity
Copy link

➕ Yes, I agree that diagnostic events should be standardized where applicable. Otherwise the code to parse and aggregate events will be brittle and the maintenance burden will be high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
@MonsieurNicolas @sydneynotthecity and others