Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat_: add ability to process prometheus metrics in telemetry client #5782

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

vpavlin
Copy link
Member

@vpavlin vpavlin commented Aug 29, 2024

This PR adds a general avility to telemetry client to work with Prometheus metrics exposed by status-go and underlying libraries.

It then uses an extended waku2_envelopes_validated_total (the extension is storing pubsub topic and type in labels, which allows us to filter for messages received due to regular store queries - status-im/telemetry#23) metric to showcase how this new feature works besically snapshoting and filtering recorded values for that metric based on MessageType = missing and then adding it to the telemetry request cache (which is regularly published to telemetry service)

This allows us to add new metrics via Prometheus - hence make them standardized reusable (and also tap into metrics provided by libraries - e.g. go-waku)

This is still a draft or the most part

@vpavlin vpavlin force-pushed the feat/telemetry-prometheus branch from d10769e to e8f9704 Compare August 29, 2024 13:34
@status-im-auto
Copy link
Member

status-im-auto commented Aug 29, 2024

Jenkins Builds

Click to see older builds (13)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ e8f9704 #1 2024-08-29 13:39:10 ~4 min ios 📦zip
✖️ e8f9704 #1 2024-08-29 13:41:29 ~7 min tests 📄log
✔️ e8f9704 #1 2024-08-29 13:41:57 ~7 min linux 📦zip
✔️ e8f9704 #2 2024-08-29 13:42:51 ~3 min ios 📦zip
✔️ e8f9704 #1 2024-08-29 13:43:04 ~8 min tests-rpc 📄log
✔️ e8f9704 #1 2024-08-29 13:43:37 ~9 min android 📦aar
✔️ e8f9704 #2 2024-08-29 13:44:10 ~1 min linux 📦zip
✔️ e8f9704 #2 2024-08-29 13:45:17 ~1 min android 📦aar
✔️ acf238e #2 2024-09-02 14:14:08 ~2 min tests-rpc 📄log
✖️ acf238e #2 2024-09-02 14:15:21 ~3 min tests 📄log
✔️ acf238e #3 2024-09-02 14:15:55 ~4 min linux 📦zip
✔️ acf238e #3 2024-09-02 14:17:04 ~5 min android 📦aar
✔️ acf238e #3 2024-09-02 14:17:12 ~5 min ios 📦zip
Commit #️⃣ Finished (UTC) Duration Platform Result
✖️ 4912bcb #3 2024-09-02 16:12:18 ~1 min tests 📄log
✔️ 4912bcb #4 2024-09-02 16:13:24 ~2 min android 📦aar
✔️ 4912bcb #3 2024-09-02 16:13:25 ~2 min tests-rpc 📄log
✔️ 4912bcb #4 2024-09-02 16:14:15 ~3 min ios 📦zip
✔️ 4912bcb #4 2024-09-02 16:14:54 ~3 min linux 📦zip
✖️ 722d47e #4 2024-09-03 10:52:52 ~1 min tests 📄log
✔️ 722d47e #5 2024-09-03 10:53:41 ~2 min android 📦aar
✔️ 722d47e #4 2024-09-03 10:54:15 ~2 min tests-rpc 📄log
✔️ 722d47e #5 2024-09-03 10:55:41 ~4 min linux 📦zip
✔️ 722d47e #5 2024-09-03 10:57:56 ~6 min ios 📦zip

func (c *Client) ProcessReuglarStoryRetrievedMsgs(data MetricPayload) {
fmt.Println(data)

postBody := map[string]interface{}{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to consider is to drop any processing on client and just push the metric as it is structured by Prometheus to the Telemetry server - then we can do whatever we want/need with it on the server - simplifies client quite a lot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great

for {
select {
case <-ctx.Done():
fmt.Println("exit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably remove

func (c *Client) ProcessReuglarStoryRetrievedMsgs(data MetricPayload) {
fmt.Println(data)

postBody := map[string]interface{}{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great

@@ -413,3 +450,26 @@ func (c *Client) UpdateEnvelopeProcessingError(shhMessage *types.Message, proces
c.logger.Error("Error sending envelope update to telemetry server", zap.Error(err))
}
}

func (c *Client) ProcessReuglarStoryRetrievedMsgs(data MetricPayload) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ProcessRegularStoreRetrievedMsgs?


w.wg.Add(1)
go func() {
w.wg.Done()
defer w.wg.Done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how this wait group is intended to behave cc @richard-ramos

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the defer is correct. This is a bug i introduced when doing the refactoring

@vpavlin vpavlin marked this pull request as draft August 29, 2024 16:10
@@ -180,6 +193,8 @@ func (c *Client) Start(ctx context.Context) {
c.telemetryCacheLock.Unlock()

if len(telemetryRequests) > 0 {
d, _ := json.MarshalIndent(telemetryRequests, "", " ")
fmt.Println(string(d))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

}

//client.promMetrics.Register("waku_connected_peers", GaugeType, nil, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

@@ -413,3 +450,26 @@ func (c *Client) UpdateEnvelopeProcessingError(shhMessage *types.Message, proces
c.logger.Error("Error sending envelope update to telemetry server", zap.Error(err))
}
}

func (c *Client) ProcessReuglarStoryRetrievedMsgs(data MetricPayload) {
fmt.Println(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

gatherer := prometheus.DefaultGatherer
metrics, err := gatherer.Gather()
if err != nil {
log.Fatalf("Failed to gather metrics: %v", err)
Copy link
Member

@richard-ramos richard-ramos Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should use Fatalf. If i remember correctly, this panics, and IMO not being able to push metrics isn't an end of the world situation!

Copy link

github-actions bot commented Sep 2, 2024

Thank you for opening this pull request!

We require commits to follow the Conventional Commits, but with _ for non-breaking changes.
And it looks like your PR needs to be adjusted.

Details:

checking commits between: origin/develop HEAD
Commit message "move metric processing to server" is not well-formed
Commit message "test commit" is not well-formed

@@ -52,7 +52,7 @@ var (

func init() {
prom.MustRegister(EnvelopesReceivedCounter)
prom.MustRegister(EnvelopesRejectedCounter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this metric supposed to be removed?

@@ -1347,7 +1351,7 @@ func (w *Waku) OnNewEnvelopes(envelope *protocol.Envelope, msgType common.Messag
trouble = true
}

common.EnvelopesValidatedCounter.Inc()
common.EnvelopesValidatedCounter.With(prometheus.Labels{"pubsubTopic": envelope.PubsubTopic(), "type": msgType}).Inc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we need the meaning of common.MissingMessageType to be consistent moving forward, otherwise it can mess up our metrics. i.e. only messages returned by the periodic store query can have this type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants