Gaps when displaying Metrics in Grafana Dashbaords (not related to retention_period) #8664
Unanswered
mglaub
asked this question in
Help and support
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have 6 Openshift clusters all sending metrics to the same mimir via remote write. When I look at my Grafana dashboards I see gaps in the data.
This happens with every Dashboard and every Cluster.
Example:
Every Cluster hast 2 Prometheus Replicas that both send Metrics to Mimir.
I gave each Cluster its own tenant ID/remote write user with X-Scope-OrgID "c50-inte", c50-preprod" etc...
I can see that Blocks are regularly pushed to the ingester PVC´s with the right tenant names
As suggested in #4696 i changed the blocks_storage.tsdb.retention_period to > 12h but this did not change anything in my case.
I cant see any pattern in the gaps, they seem very random. for example last 12 hours of every cluster:
2.
3.
4.
5.
In Prometheus as well as ingester logs i see a ton of "400 err-mimir-sample-duplicate-timestamp" but AFAIK this wouldnt cause those gaps?
example:
ts=2024-07-09T13:51:37.860893841Z caller=grpc_logging.go:43 level=warn method=/cortex.Ingester/Push duration=8.448342ms err="rpc error: code = Code(400) desc = user=c50-preprod: the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2024-07-09T12:37:14.101Z and is from series {__name__=\"pod:container_cpu_usage:sum\", cluster=\"c50-preprod\", namespace=\"trident\", pod=\"trident-node-linux-nsbbj\", prometheus=\"openshift-monitoring/k8s\"}" msg=gRPC
i dont see any other relevant errors in mimirs components.
this is my mimir config:
mimir-values.yaml.txt
I would really appreciate some help and Ideas on this Topic.
Please let me know if you need any more information.
Beta Was this translation helpful? Give feedback.
All reactions