Fine grained CPU profiling #804

sandreim · 2022-06-24T11:48:18Z

There have been efforts to solve this earlier like paritytech/polkadot#4871 but the results we got were not providing enough insights due to the low sampling rate/storage - https://pyroscope.io/docs/storage-design/ . We should continue the effort to implement something that works better for our usecase. We need as fine grained as possible CPU profiling (100us) with visualisation tooling to increase accuracy and decrease the scope of debugging when dealing with node performance issues or optimization work.

The solution should also consider this must also work easily with Zombienet to test performance regression in the CI pipeline.

ordian · 2022-08-19T12:54:48Z

Just to clarity, the profiler frequency used in pyroscope is configurable. The problem is that it accumulates the profiling info into segments of 10s, which is hardcoded in their server in many places. Cf grafana/pyroscope#901.
Thus the critical section we want to profile gets lost in the noise of other tasks such as networking.

Also worth mentioning that we don't need to run it on every node. One validator and one collator per parachain would be fine.

We could try another tool in the same category of continuous profilers or try to fork their server.

alindima · 2023-07-28T14:08:46Z

After my latest experiences re-enabling pyroscope and pprof-rs, I can confirm that we need a different solution because:

pprof-rs has a high overhead, since it's doing profiling being based on signal handlers (and pyroscope unregisters/registers the signal handlers every 10 seconds).
libunwind mechanism is buggy and causes SIGABORTs: SIGABORT when profiling with pyroscope-rs tikv/pprof-rs#219
pyroscope-rs repo does not seem under a lot of active maintenance. I also discovered a bug there: fix spurious exit when epoll_wait is interrupted by a signal grafana/pyroscope-rs#125
even with frame pointer unwinding and the above fix, after a couple of hours of running the validator with pyroscope enabled, most of the nodes tasks are killed, including the pyroscope agent (I assume there's a memory leak that triggers an OOM somewhere), which leaves the node in an inconsistent but still running state.

Ideally, we'd want to use perf on linux to profile, since it's very mature and flexible. However, I don't know how feasible it is considering that I couldn't find a continuous profiler with flamegraph visualisations that uses perf as a data source.

And there is also the storage problem. Profiling spits out a ton of data. If the frequency is too small, the data is not representative. If the frequency is too high, it can overwhelm the server and become unmanageable.

petethepig · 2023-08-01T03:52:00Z

@alindima FWIW it's possible to send perf or eBPF profling data to pyroscope in collapsed (or folded) format, we have documentation on how to upload that data to Pyroscope using an HTTP API here.

* Add evm-chain-id pallet Signed-off-by: koushiro <[email protected]> * Add some doc Signed-off-by: koushiro <[email protected]>

* Remove unused Config types from `pallet-finality-verifier` * Remove unused AncestryChecker trait * Remove ancestry proof parameter from relayer calls * Update docs to reflect current state of pallet * Remove mock ancestry checker * Remove unused error * Write headers outside of function used for authority set changes * Move justification verification into helper function * Add documentation suggestions Co-authored-by: Tomasz Drwięga <[email protected]> * Clean up module level documentation a bit Co-authored-by: Tomasz Drwięga <[email protected]>

alexggh · 2024-05-31T08:41:35Z

Superseeded by using subsystem benchmarks with pyroscope.

ordian added the T4-parachains_engineering label Aug 16, 2022

korniltsev mentioned this issue Aug 1, 2023

pyroscope-rs feedback grafana/pyroscope-rs#127

Open

Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023

the-right-joyce added T8-parachains_engineering and removed T4-parachains_engineering labels Aug 25, 2023

the-right-joyce added this to parachains team board Oct 23, 2023

the-right-joyce moved this to Backlog in parachains team board Oct 23, 2023

the-right-joyce removed the T8-parachains_engineering label Oct 23, 2023

helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024

Add evm-chain-id pallet (paritytech#804)

cf0fae7

* Add evm-chain-id pallet Signed-off-by: koushiro <[email protected]> * Add some doc Signed-off-by: koushiro <[email protected]>

alexggh closed this as completed May 31, 2024

github-project-automation bot moved this from Backlog to Completed in parachains team board May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine grained CPU profiling #804

Fine grained CPU profiling #804

sandreim commented Jun 24, 2022 •

edited

Loading

ordian commented Aug 19, 2022

alindima commented Jul 28, 2023

petethepig commented Aug 1, 2023

alexggh commented May 31, 2024

Fine grained CPU profiling #804

Fine grained CPU profiling #804

Comments

sandreim commented Jun 24, 2022 • edited Loading

ordian commented Aug 19, 2022

alindima commented Jul 28, 2023

petethepig commented Aug 1, 2023

alexggh commented May 31, 2024

sandreim commented Jun 24, 2022 •

edited

Loading