Add number shards by node #246

Rajaeelfarsi · 2024-01-05T10:12:34Z

Description

Following the issue opened by arob1n, I made some modifications to the source code to add the metric that gives the number of shard per node. I've named the metric nodes_shards_number.

All my commits include DCO.
DCO stands for Developer Certificate of Origin and it is your declaration that your contribution is correctly attributed and licensed. Please read more about how to attach DCO to your commits here (spoiler alert: in most cases it is as simple as using -s option when doing git commit).
Please be aware that commits without DCO will cause failure of PR CI workflow and can not be merged.

Signed-off-by: Rajaeelfarsi <[email protected]>

lukas-vlcek

Hi @Rajaeelfarsi

First of all thanks a lot for your PR. (And my apologies that I did not get to it earlier... 🙏 )

Looking at the proposed implementation – is there actually any reason why we need to exclude UNASSIGNED shards and then sum all the rest under a single category? Did you consider more general approach and simply report counts for all shard states? Similarly to how it is done at the cluster level?

It is always possible to create your own summary metrics using Prometheus recording rules. So this way you can provide accumulated summary for all non-unassigned shards. (And if you are using jsonnet then you can use it to include your recording rules into mixin such as one found in mixin branch of this repository. Or you can use different mechanism, but the point is that it is enough if the plugin/exporter provides the raw metrics and then you can build a custom metrics on top of it directly in Prometheus.)

WDYT?

I am happy to help with that. Let me know if you want to give it try or I can jump on it and take your commit into a new PR.

Reagrds,
Lukáš

Rajaeelfarsi · 2024-02-09T13:10:29Z

Hi @lukas-vlcek,

Thank you for your response. I really appreciate your feedback and guidance.

I wanted to mention that I am relatively new to Opensearch and development in general, so I may not have a deep understanding of all the intricacies yet. However, I am eager to learn and improve.

Regarding the corrections I plan to make, I based my implementation on the API GET /_cat/shards, which provides the different states(types) of a shard:

(Default) State of the shard. Returned values are:

**INITIALIZING**: The shard is recovering from a peer shard or gateway.
**RELOCATING**: The shard is relocating.
**STARTED**: The shard has started.
**UNASSIGNED**: The shard is not assigned to any node.

I will follow your example for the cluster and implement the same approach for each node.

However, I have a question regarding the inclusion of the "unassigned" state. As far as I understand, a shard in the "unassigned" state is not assigned to any node, unlike the cluster level where there can be unassigned shards. Therefore, I am unsure about the need to include the "unassigned" state for each node, as nodes do not typically contain shards with the "unassigned" state.

If you could provide further clarification on why I should include the "unassigned" state for each node, I would greatly appreciate it.

Thank you once again for your support.

Best regards, Rajae

Baarsgaard · 2024-04-18T05:51:12Z

Would you consider also exposing the currently configured cluster.max_shards_per_node?
This would allow for a dead simple alert when you're nearing the limit and not require an update to your monitoring if you change the max_shards_per_node.
Other than parsing the metric when scraping, it should be almost free to store in any vector database as it's essentially a static number.

lukas-vlcek · 2024-04-18T06:12:28Z

@Baarsgaard Makes sense!

Baarsgaard · 2024-04-22T05:33:39Z

nodes do not typically contain shards with the "unassigned" state.

This is the exact reason I would include unassigned shards as well, simply because it's atypical and I would therefore like it exposed.

smbambling · 2024-05-10T13:12:49Z

Any status on this, I would love to be able to have per node shard monitoring / tracking as I move away from ElasticSearch

lukas-vlcek · 2024-05-10T13:42:31Z

Okey, let's make some progress with this over the next week. 💪

smbambling · 2024-06-25T11:21:49Z

Any movement on this, this is a key monitoring feature we are currently missing.

smbambling · 2024-08-06T11:50:22Z

I unfortunately an unable to assist with Java coding and it looks like @Rajaeelfarsi might be a little busy at the moment and unable to complete this PR. Is anyone able to pick up where he left off ? This feature will be key to some more robust monitoring

Rajaeelfarsi and others added 3 commits January 4, 2024 17:09

add shards number by node

1f6ad45

Signed-off-by: Rajaeelfarsi <[email protected]>

add unit test

f1525d6

Signed-off-by: Rajaeelfarsi <[email protected]>

Merge branch 'Aiven-Open:main' into add_number_shards_by_node

6423ef4

Rajaeelfarsi requested a review from lukas-vlcek as a code owner January 5, 2024 10:12

lukas-vlcek reviewed Feb 1, 2024

View reviewed changes

lukas-vlcek mentioned this pull request Feb 1, 2024

Add number of shards by node #189

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add number shards by node #246

Add number shards by node #246

Rajaeelfarsi commented Jan 5, 2024 •

edited

Loading

lukas-vlcek left a comment •

edited

Loading

Rajaeelfarsi commented Feb 9, 2024 •

edited

Loading

Baarsgaard commented Apr 18, 2024 •

edited

Loading

lukas-vlcek commented Apr 18, 2024

Baarsgaard commented Apr 22, 2024 •

edited

Loading

smbambling commented May 10, 2024

lukas-vlcek commented May 10, 2024

smbambling commented Jun 25, 2024

smbambling commented Aug 6, 2024

Add number shards by node #246

Are you sure you want to change the base?

Add number shards by node #246

Conversation

Rajaeelfarsi commented Jan 5, 2024 • edited Loading

Description

lukas-vlcek left a comment • edited Loading

Choose a reason for hiding this comment

Rajaeelfarsi commented Feb 9, 2024 • edited Loading

Baarsgaard commented Apr 18, 2024 • edited Loading

lukas-vlcek commented Apr 18, 2024

Baarsgaard commented Apr 22, 2024 • edited Loading

smbambling commented May 10, 2024

lukas-vlcek commented May 10, 2024

smbambling commented Jun 25, 2024

smbambling commented Aug 6, 2024

Rajaeelfarsi commented Jan 5, 2024 •

edited

Loading

lukas-vlcek left a comment •

edited

Loading

Rajaeelfarsi commented Feb 9, 2024 •

edited

Loading

Baarsgaard commented Apr 18, 2024 •

edited

Loading

Baarsgaard commented Apr 22, 2024 •

edited

Loading