Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: apisix report duplicate metrics #11934

Open
FeiYing9 opened this issue Jan 22, 2025 · 1 comment
Open

bug: apisix report duplicate metrics #11934

FeiYing9 opened this issue Jan 22, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@FeiYing9
Copy link

FeiYing9 commented Jan 22, 2025

Current Behavior

there are several k8s clusters running with apisix, but just one cluster (prod cluster) with the problem that apisix report lots of duplicate metrics.

for example:

apisix_http_status{code="200",route="3fb9d6c2",matched_uri="/api/v1/*",matched_host="xxx",service="",consumer="",node="10.244.19.254",host="xxx",upstream_addr="10.244.19.254:8080",upstream_status="200",uri="/api/v1/cluster_metric/list_task_dimension",method="POST"} 96
apisix_http_status{code="200",route="3fb9d6c2",matched_uri="/api/v1/*",matched_host="xxx",service="",consumer="",node="10.244.19.254",host="xxx",upstream_addr="10.244.19.254:8080",upstream_status="200",uri="/api/v1/cluster_metric/list_task_dimension",method="POST"} 96
...
apisix_http_status{code="200",route="3fb9d6c2",matched_uri="/api/v1/*",matched_host="xxx",service="",consumer="",node="10.244.19.254",host="xxx",upstream_addr="10.244.19.254:8080",upstream_status="200",uri="/api/v1/file/upload",method="POST"} 3188
apisix_http_status{code="200",route="3fb9d6c2",matched_uri="/api/v1/*",matched_host="xxx",service="",consumer="",node="10.244.19.254",host="xxx",upstream_addr="10.244.19.254:8080",upstream_status="200",uri="/api/v1/file/upload",method="POST"} 3188

so we will see lots of error logs from prometheus:

ts=2025-01-22T08:51:08.867Z caller=scrape.go:1793 level=debug component="scrape manager" scrape_pool=serviceMonitor/apisix/apisix/0 target=http://10.244.5.32:9091/apisix/prometheus/metrics msg="Duplicate sample for timestamp" series="apisix_http_latency_bucket{type=\"apisix\",route=\"3fb9d6c2\",service=\"\",consumer=\"\",node=\"10.244.10.60\",host=\"xxx\",upstream_addr=\"10.244.10.60:8080\",upstream_status=\"200\",uri=\"/api/v1/user/routes/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0\",method=\"GET\",le=\"100\"}"

this metrics is too large, we run 6 pod instance of apisix, i just curl one apisix metrics url, i got about 100mb results.

Expected Behavior

No response

Error Logs

all error logs is about the shdict:

2025/01/22 15:07:27 [error] 534#534: *2088505577 [lua] prometheus_resty_counter.lua:39: increasing counter in shdict: lru eviction: key=http_latency_bucket{type="request",route="3fb9d6c2",service="",consumer="",node="10.244.11.36",host="xxx",upstream_addr="10.244.11.36:8080",upstream_status="200",uri="/api/v1/notebook/7eb9852a-be8d-4fac-a593-31f5f7d864b0",method="GET",le="30000.0"}, context: ngx.timer
...
2025/01/22 16:53:00 [error] 499#499: *2098016584 [lua] prometheus.lua:973: log_error(): Shared dictionary used for prometheus metrics is full. REPORTED METRIC DATA MIGHT BE INCOMPLETE. Please increase the size of the dictionary or decrease metric cardinality.; key index: add key: idx=__ngx_prom__key_115158, key=http_latency_bucket{type="request",route="3fb9d6c2",service="",consumer="",node="10.244.11.36",host="xxx",upstream_addr="10.244.11.36:8080",upstream_status="200",uri="/api/v1/project/project-cc83c686-1515-454e-870b-202a20a67727",method="GET",le="Inf"} while logging request, client: 10.245.13.201, server: _, request: "GET /api/v1/project/project-cc83c686-1515-454e-870b-202a20a67727 HTTP/2.0", upstream: "http://10.244.11.36:8080/api/v1/project/project-cc83c686-1515-454e-870b-202a20a67727", host: "qz.sii.edu.cn", referrer: "https://xxx/jobs/distributedTraining?spaceId=ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0"

We accept the issue of insufficient shared dict memory, just hope to know why apisix report duplicate metrics.

Steps to Reproduce

no ideas

apisix config:

    nginx_config:    # config for render the template to genarate nginx.conf
        lua_shared_dict:                  
          prometheus-metrics: 200m            # yes, it's 200m
...
    plugin_attr:
      opentelemetry:
        set_ngx_var: true
      prometheus:
        expire: 16
        export_addr:
          ip: 0.0.0.0
          port: 9091
        export_uri: /apisix/prometheus/metrics
        metric_prefix: apisix_
        metrics:
          bandwidth:
            extra_labels:
            - host: $host
            - upstream_addr: $upstream_addr
            - upstream_status: $upstream_status
            - uri: $uri
            - method: $request_method
          http_latency:
            extra_labels:
            - host: $host
            - upstream_addr: $upstream_addr
            - upstream_status: $upstream_status
            - uri: $uri
            - method: $request_method
          http_status:
            extra_labels:
            - host: $host
            - upstream_addr: $upstream_addr
            - upstream_status: $upstream_status
            - uri: $uri
            - method: $request_method
        prefer_name: true

Environment

  • APISIX version (run apisix version): 3.7.0 (helm version: 2.5.0)
  • Operating system (run uname -a): Linux cpu-001 5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.21.4.2
  • k8s version:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.16", GitCommit:"cbb86e0d7f4a049666fac0551e8b02ef3d6c3d9a", GitTreeState:"clean", BuildDate:"2024-07-17T01:53:56Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.16", GitCommit:"cbb86e0d7f4a049666fac0551e8b02ef3d6c3d9a", GitTreeState:"clean", BuildDate:"2024-07-17T01:44:26Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}
@dosubot dosubot bot added the bug Something isn't working label Jan 22, 2025
@yurkovoznyak
Copy link

I believe the issue is with metric expiration logic.

I was able to reproduce it locally when I set the expiration time to a low value (like expire: 10) and had more than 1 worker process (I tested on 6, but with more workers, it's easier to reproduce).

Removing metrics expiration configuration resolves duplicates

Environment

APISIX version (run apisix version): 3.11.0
Operating system (run uname -a): Linux 82435988518b 6.10.14-linuxkit #1 SMP Fri Nov 29 17:22:03 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.25.3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants