You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently treat an instance as active if it is ingesting data. So if it receives traffic it will transmit transactions/spans/errors and/or if metrics is enabled it'll periodically transmit system metrics (cpu, memory).
However, it is theoretically possible for an instance to be active but not receiving any traffic, and metrics being disabled. In this case it'll be treated as not active (terminated).
How would this be different to having cpu/memory metrics enabled? Are you suggesting that these metrics could not be disabled? Then what is the difference to not allowing cpu/memory metrics to be disabled?
Related to this, I think the agents should periodically send statistics to the server for monitoring the agents themselves. e.g. how many transactions/spans/errors have been sent, how many requests were sent to the server, how many of those requests failed (e.g. due to 503). We could use those as a sort of heartbeat.
How would this be different to having cpu/memory metrics enabled? Are you suggesting that these metrics could not be disabled?
Yes, that's what I had in mind. If cpu/memory reporting has an overhead it makes sense to allow disabling. Whereas the overhead of a heartbeat should be smaller (in theory at least).
I think the agents should periodically send statistics to the server for monitoring the agents themselves. e.g. how many transactions/spans/errors have been sent
Yes, this also came up at the UI weekly. @dgieselaar had some thoughts in the same vein.
mikker
changed the title
Proposal: Introducing heatbeat metric document
Proposal: Introducing heartbeat metric document
Feb 5, 2021
We currently treat an instance as active if it is ingesting data. So if it receives traffic it will transmit transactions/spans/errors and/or if metrics is enabled it'll periodically transmit system metrics (cpu, memory).
However, it is theoretically possible for an instance to be active but not receiving any traffic, and metrics being disabled. In this case it'll be treated as not active (terminated).
This has implications on how throughput for the instance is calculated.
Question
Would it make sense to have a "heartbeat" document that is broadcasted periodically (Addendum: And is not possible to disable)?
cc @felixbarny @axw
The text was updated successfully, but these errors were encountered: