Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opencost grafana dashboard partly not working #2

Open
andriktr opened this issue Oct 20, 2023 · 48 comments
Open

Opencost grafana dashboard partly not working #2

andriktr opened this issue Oct 20, 2023 · 48 comments

Comments

@andriktr
Copy link

Hi,
I'm trying to use the following grafana https://github.com/opencost/opencost-helm-chart/blob/main/examples/dashboard/kube-prometheus-stack-opencost-dashboard.json dashboard to view opencost data. In my case it's only working partly and some boards throws the following errors:
image

image

Any thoughts?

Thanks in advance.

@dwbrown2
Copy link

@mattray any chance you can take a look?

@mattray mattray transferred this issue from opencost/opencost Oct 24, 2023
@mattray
Copy link
Contributor

mattray commented Oct 24, 2023

@andriktr I'll try to recreate and see what I find. I'm moving it over to the opencost-helm-chart repository since that's where it originated.

@andriktr
Copy link
Author

@mattray Thanks a lot.

@sossickd
Copy link

I am also getting the same partial success with the grafana dashboard. If it helps the partial success in on a AWS EKS cluster and am not experiencing the issue on an Azure AKS cluster.

@sossickd
Copy link

sossickd commented Nov 7, 2023

Scape my last comment the dashboard is now partially working on an AKS cluster that was previously working

@sossickd
Copy link

sossickd commented Nov 14, 2023

Hi @mattray by the looks of things the project is really active so no how busy you must be but have you been able to replicate this issue. The dashboard is super useful when it was working so any progress on this would be useful

@mattray
Copy link
Contributor

mattray commented Nov 17, 2023

Sorry, I've been swamped on other projects. A couple of folks brought it up at KubeCon that they'd be interested in working on it, but I haven't heard from anyone else yet. I'm not a Grafana expert by any means, so if someone's interested and wants to take this feel free. I'll try to circle back to this soon.

@sossickd
Copy link

sossickd commented Dec 8, 2023

@dwbrown2 did you resolve a similar issue here?

kubecost/cost-analyzer-helm-chart#303

Could the same logic be applied?

@dwbrown2
Copy link

@sossickd I'd have to dig in to share for sure. I'm unfortunately tied up on other projects right now, would love to extra help if others are able to review. Will do my best to circle back when free.

@sossickd
Copy link

@dwbrown2, @mattray OK found the issue, this was caused when running the opencost deployment with more than one replica.

Created a PR.

opencost/opencost-helm-chart#157

This adds a variable to filter on pod, amended each pane that had the many-to-many error and filtered on the pod label.

Not too sure if this is the best solution but fixed it in my case.

@andriktr
Copy link
Author

Hmm... it sounds strange as I'm running opencost with single replica and still have same issues.

@sossickd
Copy link

@andriktr can you copy and paste one of the errors in a code snippet from one of the broken panes so i can see if its the same issue i was experiencing

@andriktr
Copy link
Author

Errors I got are in the very first post of this issue.

@sossickd
Copy link

@andriktr would you mind pasting into a code snippet so i can copy easier?

@andriktr
Copy link
Author

Sure:

Here is the error output for Top 20 by Namespace dashboard:

Status: 500. Message: execution: found duplicate series for the match group {instance="10.162.208.9:9003"} on the right hand-side of the operation: [{arch="amd64", container="opencost", endpoint="http", exported_instance="aks-default-29533205-vmss00000k", instance="10.162.208.9:9003", instance_type="Standard_D4s_v3", job="opencost", namespace="opencost", node="aks-default-29533205-vmss00000k", pod="opencost-858f6d4597-rr644", provider_id="azure:///subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/aks-nodes-west-dev/providers/Microsoft.Compute/virtualMachineScaleSets/aks-default-29533205-vmss/virtualMachines/20", region="westeurope", service="opencost"}, {arch="amd64", container="opencost", endpoint="http", exported_instance="aks-default-29533205-vmss00000i", instance="10.162.208.9:9003", instance_type="Standard_D4s_v3", job="opencost", namespace="opencost", node="aks-default-29533205-vmss00000i", pod="opencost-858f6d4597-rr644", provider_id="azure:///subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/aks-nodes-west-dev/providers/Microsoft.Compute/virtualMachineScaleSets/aks-default-29533205-vmss/virtualMachines/18", region="westeurope", service="opencost"}];many-to-many matching not allowed: matching labels must be unique on one side

@sossickd
Copy link

Can you open up prometheus and enter in this query:

topk( 30, 
  sum(sum(container_memory_allocation_bytes{namespace=~"$namespace"}) by (container,instance,pod) * on(instance) group_left() (
				node_ram_hourly_cost{pod=~".*opencost.*",pod=~"opencost-858f6d4597-rr644"} / 1024 / 1024 / 1024
				+ on(node,instance_type,pod) group_left()
					label_replace
					(
						kube_node_labels{}, "instance_type", "$1", "label_node_kubernetes_io_instance_type", "(.*)"
					) * 0
			)
  + 
  sum(container_cpu_allocation{namespace=~"$namespace"}) by (container,instance,pod) * on(instance) group_left() (
	  			node_cpu_hourly_cost{pod=~".*opencost.*",pod=~"opencost-858f6d4597-rr644"} + on(node,instance_type,pod) group_left()
		  			label_replace
		  			(
		  				kube_node_labels{}, "instance_type", "$1", "label_node_kubernetes_io_instance_type", "(.*)"
		  			) * 0
		  	)) by (container)
)

Do you get the many-to-many matching not allowed: matching labels must be unique on one side error?

@andriktr
Copy link
Author

I receive no data actually for this particular query:
image

@sossickd
Copy link

sossickd commented Dec 12, 2023

OK looking at the error a bit further it looks like your issue maybe slightly different to mine.

[{arch="amd64", 
container="opencost", 
endpoint="http", 
exported_instance="aks-default-29533205-vmss00000k", 
instance="10.162.208.9:9003", 
instance_type="Standard_D4s_v3", 
job="opencost", 
namespace="opencost", 
node="aks-default-29533205-vmss00000k", 
pod="opencost-858f6d4597-rr644", 
provider_id="azure:///subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/aks-nodes-west-dev/providers/Microsoft.Compute/virtualMachineScaleSets/aks-default-29533205-vmss/virtualMachines/20", 
region="westeurope", 
service="opencost"}, 

{arch="amd64", 
container="opencost", 
endpoint="http", 
exported_instance="aks-default-29533205-vmss00000i",
instance="10.162.208.9:9003", 
instance_type="Standard_D4s_v3", 
job="opencost", 
namespace="opencost", 
node="aks-default-29533205-vmss00000i", 
pod="opencost-858f6d4597-rr644", 
provider_id="azure:///subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/aks-nodes-west-dev/providers/Microsoft.Compute/virtualMachineScaleSets/aks-default-29533205-vmss/virtualMachines/18", 
region="westeurope", 
service="opencost"}]

The query is returning tow matches from what i can see, the instance label looks like it might be being renamed to exported_instance, thats not happening for me. What doesn't look right to me is that the instance IP address is the same on both outputs.

Has the node recently been destroyed?

@sossickd
Copy link

sossickd commented Dec 12, 2023

@andriktr can you tun the following query in prometheus and paste the return in a code snippet.

node_ram_hourly_cost

Also can you show me the output from a kubectl get nodes -o wide on the AKS cluster

@andriktr
Copy link
Author

Hi,
Node name for the opencost could change when opencost pod is rescheduled to another node for one or another reason i.e. cluster maintenance, workload rebalancig with descheduler, cluster upgrade, autoscaling etc.
image

And here is the error from same cluster

Status: 500. Message: execution: found duplicate series for the match group {instance="10.162.208.9:9003"} on the right hand-side of the operation: [{arch="amd64", container="opencost", endpoint="http", exported_instance="aks-default-29533205-vmss00000k", instance="10.162.208.9:9003", instance_type="Standard_D4s_v3", job="opencost", namespace="opencost", node="aks-default-29533205-vmss00000k", pod="opencost-858f6d4597-rr644", provider_id="azure:///subscriptions/000000-0000-0000-0000-00000000000/resourceGroups/aks-nodes-west-dev/providers/Microsoft.Compute/virtualMachineScaleSets/aks-default-29533205-vmss/virtualMachines/20", region="westeurope", service="opencost"}, {arch="amd64", container="opencost", endpoint="http", exported_instance="aks-default-29533205-vmss00000i", instance="10.162.208.9:9003", instance_type="Standard_D4s_v3", job="opencost", namespace="opencost", node="aks-default-29533205-vmss00000i", pod="opencost-858f6d4597-rr644", provider_id="azure:///subscriptions/000000-0000-0000-0000-00000000000/resourceGroups/aks-nodes-west-dev/providers/Microsoft.Compute/virtualMachineScaleSets/aks-default-29533205-vmss/virtualMachines/18", region="westeurope", service="opencost"}];many-to-many matching not allowed: matching labels must be unique on one side

Mentioned query returns the following:
image

@sossickd
Copy link

sossickd commented Dec 13, 2023

@andriktr OK this is different from what i am seeing. In my case the instance value matches the node value.

image

Are you using any relabelings of metrics? Can you determine what 10.162.208.9:9003 refers too? From what i can see 10.162.208.9 isn't a nodes ip address, is it the ip address of the opencost pod?

Also what version of opencost and helm chart are you using?

@andriktr
Copy link
Author

Yes, 10.162.208.9:9003 is the opencost pod ip. Version is 1.107.0, but I saw same in earlier versions as well.

@sossickd
Copy link

sossickd commented Dec 13, 2023

@andriktr are you doing any relabelings of metrics? Trying to get my head around why you are getting a exported_instance label and the instance label is being transformed to 10.162.208.9:9003

If yu are using helm to deploy could you share your values?

It would also be useful to share you kube-prometheus-stack helm values

@andriktr
Copy link
Author

Hey we do not do any relabelings of metrics. I have my own helm chart but in general it more less same as official + some addons for AAD Pod Identity:

serviceAccount:
  create: true
  annotations: {}
  # eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/eksctl-opencost
  automountServiceAccountToken: true

annotations: {}
  #azure.workload.identity/inject-proxy-sidecar: "true"

service:
  annotations: {}
  labels: {}
  type: ClusterIP

opencost:
  exporter:
    # The GCP Pricing API requires a key. This is supplied just for evaluation.
    # cloudProviderApiKey: 'asdfasdfasdf'
    # Default cluster ID to use if cluster_id is not set in Prometheus metrics.
    defaultClusterId: "aks-experimental"
    image:
      registry: redacted
      repository: kubecost-cost-model
      tag: prod-1.107.0
    resources:
      requests:
        cpu: '10m'
        memory: '55M'
      limits:
        cpu: '999m'
        memory: '1G'
    extraEnv:
      {}
      # FOO: BAR

  metrics:
    serviceMonitor:
      enabled: true
      additionalLabels:
        release: 'kube-prometheus-stack'
      ## The label to use to retrieve the job name from.
      ## jobLabel: "app.kubernetes.io/name"
      namespace: 'kube-prometheus-stack'
      namespaceSelector: {}
      ## Default: scrape .Release.Namespace only
      ## To scrape all, use the following:
      ## namespaceSelector:
      ##   any: true
      scrapeInterval: 30s
      # honorLabels: true
      targetLabels: []
      relabelings: []
      metricRelabelings: []

  prometheus:
    # username:
    # password:
    external:
      enabled: false
      url: 'https://mimir-dev-push.infra.alto.com/prometheus'
    internal:
      enabled: true
      serviceName: kube-prometheus-stack-prometheus
      namespaceName: kube-prometheus-stack
      port: 9090

  ui:
    enabled: true
    image:
      registry: redacted
      repository: opencost-ui
      tag: prod-1.107.0
    resources:
      requests:
        cpu: '10m'
        memory: '55M'
      limits:
        cpu: '999m'
        memory: '1G'

  tolerations: []

# Baltic IF Custom Values
customAzureConfig: 
  enabled: true
  azureTenantId: "redacted"
  azureSubscriptionId: "redacted"

azurePodIdentity:
  enabled: true
  azureIdentity:
    name: opencost-identity
    resourceID: "redacted"
    clientID: "redacted"
  azureIdentityBinding:
    name: opencost-identity-binding
    selector: opencost-identity

azureWorkloadIdentity:
  enabled: false
  clientID: ""

ingress:
  enabled: true
  annotations: {}
  labels: {}
  ingress-class: internal-nginx
  hosts:
    - host: opencost-experimental.eu
      paths:
      - path: /
        pathType: ImplementationSpecific
        serviceName: opencost
        servicePort: 9090
  tls:
  - hosts:
    - opencost-experimental.eu
    secretName: ""

@Dkaykay
Copy link

Dkaykay commented Feb 13, 2024

Hey,
are there any updates on this issue? i am also seeing the "duplicate" issue in e.g. Top 20 by Namespace:

Status: 500. Message: execution: found duplicate series for the match group {instance_type="c5n.large", node="ip-10-250-0-163.eu-central-1.compute.internal"} on the right hand-side of the operation: [{cluster="cluster01-dmi", clusterType="dmi", instance="kube-state-metrics.monitoring.svc:8080", instance_type="c5n.large", job="kube-state-metrics", k8sType="aws", label_beta_kubernetes_io_arch="amd64", label_beta_kubernetes_io_instance_type="c5n.large", label_beta_kubernetes_io_os="linux", label_cluster_update_csi_sap_com_anytime="1", label_failure_domain_beta_kubernetes_io_region="eu-central-1", label_failure_domain_beta_kubernetes_io_zone="eu-central-1c", label_hana_cloud_workload_class_edge="1", label_kubernetes_io_arch="amd64", label_kubernetes_io_hostname="ip-10-250-0-163.eu-central-1.compute.internal", label_kubernetes_io_os="linux", label_networking_gardener_cloud_node_local_dns_enabled="false", label_node_kubernetes_io_instance_type="c5n.large", label_node_kubernetes_io_role="node", label_topology_ebs_csi_aws_com_zone="eu-central-1c", label_topology_kubernetes_io_region="eu-central-1", label_topology_kubernetes_io_zone="eu-central-1c", label_worker_garden_sapcloud_io_group="edge", label_worker_gardener_cloud_cri_name="containerd", label_worker_gardener_cloud_kubernetes_version="1.26.11", label_worker_gardener_cloud_pool="edge", landscape="cluster01", node="ip-10-250-0-163.eu-central-1.compute.internal", project="hc-dev", prometheus="monitoring/prometheus", region="eu-central-1"}, {cluster="cluster01-dmi", clusterType="dmi", container="opencost", endpoint="http", instance="100.96.10.107:9003", instance_type="c5n.large", job="opencost", k8sType="aws", label_beta_kubernetes_io_arch="amd64", label_beta_kubernetes_io_instance_type="c5n.large", label_beta_kubernetes_io_os="linux", label_cluster_update_csi_sap_com_anytime="1", label_failure_domain_beta_kubernetes_io_region="eu-central-1", label_failure_domain_beta_kubernetes_io_zone="eu-central-1c", label_hana_cloud_workload_class_edge="1", label_kubernetes_io_arch="amd64", label_kubernetes_io_hostname="ip-10-250-0-163.eu-central-1.compute.internal", label_kubernetes_io_os="linux", label_networking_gardener_cloud_node_local_dns_enabled="false", label_node_kubernetes_io_instance_type="c5n.large", label_node_kubernetes_io_role="node", label_topology_ebs_csi_aws_com_zone="eu-central-1c", label_topology_kubernetes_io_region="eu-central-1", label_topology_kubernetes_io_zone="eu-central-1c", label_worker_garden_sapcloud_io_group="edge", label_worker_gardener_cloud_cri_name="containerd", label_worker_gardener_cloud_kubernetes_version="1.26.11", label_worker_gardener_cloud_pool="edge", landscape="cluster01", namespace="opencost", node="ip-10-250-0-163.eu-central-1.compute.internal", pod="opencost-5f7549c7f8-8kxtf", project="hc-dev", prometheus="monitoring/prometheus", region="eu-central-1", service="opencost"}];many-to-many matching not allowed: matching labels must be unique on one side

I tested with 2 and with 1 replicas. Same result.

Any hints on how to solve / work around?

Thanks!

@andriktr
Copy link
Author

Most probably main reason here is that opencost duplicates kube-state-metrics (uses same names) for it's metrics

image

To check u can try to simply search for kube_node_info metrics in grafana explorer you probably will see it doubled with additional instances related to opencost.

P.S. i have tried to adjust setting mentioned in https://www.opencost.io/docs/installation/helm#example-configuration

...
opencost:
  exporter:
    extraEnv:
      EMIT_KSM_V1_METRICS: "false"
      EMIT_KSM_V1_METRICS_ONLY: "true"

however for some reason this not worked, so I ended up with uninstalling opencost and switching to aks-cost-analysis addon which is actually also based on opencost :)

@mattray mattray transferred this issue from opencost/opencost-helm-chart Apr 1, 2024
@dlahn
Copy link

dlahn commented Apr 18, 2024

Does anyone have any update here? We are facing a similar issue.

@asdfgugus
Copy link
Contributor

Does anyone have any update here? We are facing a similar issue.

I was able to fix the issue with these settings:

opencost:
  metrics:
    kubeStateMetrics:
      emitKsmV1Metrics: false
      emitKsmV1MetricsOnly: true

I deployed the changes and waited an hour. After, I changed the time range on the dashboard to 15min in order to see if the changes work. However the dashboard uses a large fixed time range for some visualizations. These visualizations take some time until they show you the correct data. But some of the them should already work.

@AjayTripathy
Copy link

Is there a clear description of which dashboards don't work somewhere? Would love any community support on this.

@asdfgugus
Copy link
Contributor

@dlahn there are two different issues here:

  • duplicate metric series
  • labels with prefix "exported_"

For the first issue, I mentioned the fix above.
For the second issue you need to check your scrape job. In my case, I use the vmagent and needed to honor the labels from opencost. Otherwise labels like namespace get renamed to exported_namespace.

It seems to me that we need to document this somewhere.
@mattray any suggestions?

@mattray
Copy link
Contributor

mattray commented Apr 22, 2024

We can put configuration/work-arounds notes in the README

@asdfgugus
Copy link
Contributor

We can put configuration/work-arounds notes in the README

Sounds good! I will add it to the draft PR.

@dlahn
Copy link

dlahn commented Apr 24, 2024

@asdfgugus We have had this change for quite some time, but we are still running into this issue.

        - name: EMIT_KSM_V1_METRICS
          value: 'false'
        - name: EMIT_KSM_V1_METRICS_ONLY
          value: 'true'

We have also made sure to drop out the exported_ labels. The instance is unique across all of these metrics.

@dlahn
Copy link

dlahn commented Apr 24, 2024

@asdfgugus Further to the above...

Should the instance be unique to the opencost pod? At the momemt, we are using k8s-monitoring-helm and it sets the instance to be the same for the opencost scrape.

However.. looking at the Top 20 Namespaces part of the dashboard as an example:

sum(container_memory_allocation_bytes) by (namespace,instance) * on(instance) group_left() (
				node_ram_hourly_cost{} / 1024 / 1024 / 1024 * 730
				+ on(node,instance_type) group_left()
					label_replace
					(
						kube_node_labels{}, "instance_type", "$1", "label_node_kubernetes_io_instance_type", "(.*)"
					) * 0
			)

The 2nd part where it looks up node_ram_hourly_cost is going to return multiple results because there are multiple nodes. Am I missing something here? Maybe we are very confused.

@dlahn
Copy link

dlahn commented Apr 25, 2024

Just an update here, our issue was that the instance label was being re-written by k8s-monitoring-helm, so they were all the same across all of the opencost metrics being scraped. This is why we also ended up with exported_instance labels. We re-wrote these back to the correct value for instance, and now our dashboard is working!

@asdfgugus
Copy link
Contributor

asdfgugus commented Apr 29, 2024

Just an update here, our issue was that the instance label was being re-written by k8s-monitoring-helm, so they were all the same across all of the opencost metrics being scraped. This is why we also ended up with exported_instance labels. We re-wrote these back to the correct value for instance, and now our dashboard is working!

Thanks for sharing your solution! As we collaborated on debugging via Slack, I'd like to expand on it. It is crucial to honor the labels, as I mentioned earlier. By honoring the labels, I mean ensuring that the scrape job does not append the exported_ prefix. Essentially, this involves retaining the original source labels (e.g., instance) instead of dropping them and avoiding the addition of renamed labels (e.g. exported_instance).

@dlahn do you re-write them when scraping or querying?

@dlahn
Copy link

dlahn commented May 2, 2024

@asdfgugus I am re-writing at the scrape side.

@dlahn
Copy link

dlahn commented May 6, 2024

For anyone using k8s-monitoring-helm who may run into this issue, a fix has been made in the chart to add the honor_labels for the instance to make this work. grafana/k8s-monitoring-helm#514

@dholeshu
Copy link

dholeshu commented Jun 18, 2024

Im still seeing the duplicate issue, even with

        - name: EMIT_KSM_V1_METRICS
          value: 'false'
        - name: EMIT_KSM_V1_METRICS_ONLY
          value: 'true'

Would it be possible to disable certain metrics that are duplicated using value in values.yaml?

    disabledMetrics:
      - <metric-to-be-disabled>
      - <metric-to-be-disabled> 

This was also mentioned in opencost/opencost#1571

How to identify the duplicated metrics?

@Momotoculteur
Copy link

Hello guys,

same issue for me, i can't use your dashboard cause of duplicate metrics. Even with your 2 env ver about ksmV1.

@asdfgugus
Copy link
Contributor

@Momotoculteur, could you please check which metrics are affected?
Furthermore, could you provide more information about your setup?

  • Do you use a dedicated metrics store for OpenCost?
  • If not, which other services are storing metrics in the same store?
  • Do you use OpenCost on multiple clusters with one shared metrics store?
  • ...

@asdfgugus
Copy link
Contributor

@dholeshu

Would it be possible to disable certain metrics that are duplicated using value in values.yaml?

    disabledMetrics:
      - <metric-to-be-disabled>
      - <metric-to-be-disabled> 

Yes, you can disable metrics of OpenCost. When I remember correctly, the current dashboard only uses metrics produced by OpenCost. Therefore, I would first identify the duplicated metrics.
Do you deploy OpenCost multiple times?
Which metrics store are you using?

How to identify the duplicated metrics?

Edit the dashboard and check which queries are not working. You can also query the metrics store directly for these metrics: https://docs.kubecost.com/v/1.0x/architecture/user-metrics

@Momotoculteur
Copy link

Momotoculteur commented Jul 3, 2024

@Momotoculteur, could you please check which metrics are affected? Furthermore, could you provide more information about your setup?

  • Do you use a dedicated metrics store for OpenCost?
  • If not, which other services are storing metrics in the same store?
  • Do you use OpenCost on multiple clusters with one shared metrics store?
  • ...

Hello @asdfgugus thanks for your quick answer.

I have this setup:

  • Running on EKS with 1.28 kubernetes
  • Kubestatemetrics in latest version
  • Opencost is latest version

Metrics endpoints from KSM & opencost are scrapped via Vector and sended in a grafana Mimir TSDB in AWS S3 buckets

I scrapped others services like cAdvisor, metrics-server, custom Jenkins metrics for betclic company via a prometheus push gateway, node-exporter, nginx exporter, jfrogArtifactory,and other stuff but i think we do not care about these apps.

All is deployed via helm chart via ArgoCD.

I have tested 3 differents dashboard, but got same results :

  • KubeCost from grafana marketplace
  • Opencost from your github org, the basic one
  • Opencost from your github org, the detailled one

The error is : Status: 422. Message: execution: found duplicate series for the match group

Need extra information about specific metrics which cause issue on specific dashboards ?

I try to setup opencost to expose none KSM metrics as i have already one which expose metrics needed for opencost like this :

kubecostMetrics:
  emitKsmV1Metrics: false
  emitKsmV1MetricsOnly: false

I have also tested this setup following some previous tips, but that doesn't fix my problem

kubecostMetrics:
  emitKsmV1Metrics: false
  emitKsmV1MetricsOnly: true

Last idea i have is to let emitKsmV1MetricsOnly: true and comment mine from my own KSMv2 to expose thats metrics https://docs.kubecost.com/architecture/ksm-metrics#ksm-metrics-emitted-by-kubecost, but currently that seem to not work as OpenCost need some metrics in V1 format..

@asdfgugus
Copy link
Contributor

Thanks @Momotoculteur for the details.

Careful, this is the configuration for the official OpenCost Helm chart:

opencost:
  metrics:
    kubeStateMetrics:
      emitKsmV1Metrics: false
      emitKsmV1MetricsOnly: true

Btw. you can find the Helm chart on ArtifactHub: https://artifacthub.io/packages/helm/opencost/opencost

@Momotoculteur
Copy link

Momotoculteur commented Jul 3, 2024

Thanks @Momotoculteur for the details.

Careful, this is the configuration for the official OpenCost Helm chart:

opencost:
  metrics:
    kubeStateMetrics:
      emitKsmV1Metrics: false
      emitKsmV1MetricsOnly: true

Btw. you can find the Helm chart on ArtifactHub: https://artifacthub.io/packages/helm/opencost/opencost

I have exactly this configuration, sorry my copy/paste was wrong :)

Edit: i try tonight to desactivate KSMv2 metrics which is emit already by OpenCost (and described in kubecost documentation) in order to avoid duplicate, but still have the issue i'm clearly lost now why i got that problem :(

@Momotoculteur
Copy link

Momotoculteur commented Jul 9, 2024

My duplicate metrics which cause this issues gave me that values :

[
    {__name__="node_cpu_hourly_cost", arch="amd64", instance="IP_XXX", instance_type="c6a.xlarge", node="IP_XXX", provider_id="aws:///eu-west-1", region="eu-west-1"
    },
    {__name__="node_cpu_hourly_cost", arch="amd64", instance="IP_XXX", instance_type="c6a.xlarge", node="IP_XXX", provider_id="aws:///eu-west-1", region="eu-west-1"
    }
]

i used Karpenter for node autoscaling and spot instance from AWS

EDIT : I made some tests, in order to delete duplicate items from my left_join PROMQL request. I'm based on min, or max or avg funtion, and seems to works now

Request from your dashboard:

sum by(namespace, container) (container_cpu_allocation * on (node) group_left node_cpu_hourly_cost

Updated request :

sum by(namespace, container) (container_cpu_allocation * on (node) group_left avg(node_cpu_hourly_cost) by (node))

@asdfgugus
Copy link
Contributor

@Momotoculteur
I am glad, you found a solution that works for you! We would love to see a contribution for that.
Do you always have duplicate metrics or does this only happen in a specific case (e.g. when a rollout of OpenCost gets triggered)?

@Momotoculteur
Copy link

@asdfgugus

I've only 14 days of data cause i just recently installed open cost helm chart. But i have no more problem now even with a large timestamp in my dashboards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants