You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We deploy apisix in K8s cluster and have problem with prometheus metrics.
We noticed that lua_shared_dict prometheus-metrics overflows, then the number of apisix_nginx_metric_errors_total errors starts to grow and all metrics stop displaying correctly.
We try increase the prometheus-metrics parameter to 40m in the ConfigMap (config.yaml), but after 2 months this lua_shared_dict was full on all pods and errors started to occur again.
nginx_config: # config for render the template to genarate nginx.conferror_log: "/dev/stderr"error_log_level: "warn"# warn,errorworker_processes: "auto"enable_cpu_affinity: trueworker_rlimit_nofile: 20480# the number of files a worker process can open, should be larger than worker_connectionsevent:
worker_connections: 10620http:
enable_access_log: trueaccess_log: "/dev/stdout"access_log_format: '$remote_addr - $remote_user [$time_local] $http_host \"$request\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\"'access_log_format_escape: defaultkeepalive_timeout: "60s"client_header_timeout: 60s# timeout for reading client request header, then 408 (Request Time-out) error is returned to the clientclient_body_timeout: 60s# timeout for reading client request body, then 408 (Request Time-out) error is returned to the clientsend_timeout: 10s# timeout for transmitting a response to the client.then the connection is closedunderscores_in_headers: "on"# default enables the use of underscores in client request header fieldsreal_ip_header: "X-Real-IP"# http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_headerreal_ip_from: # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
- 127.0.0.1
- 'unix:'lua_shared_dict:
prometheus-metrics: 40m
Plugins: basic-auth and kafka-logger on all routes
Expected Behavior
No response
Error Logs
No response
Steps to Reproduce
Run apisix with default lua_shared_dict: prometheus-metrics
After 2-3 weeks prometheus-metrics overflows and apisix_nginx_metric_errors_total errors starts to grow and all metrics stop displaying correctly
Change lua_shared_dict: prometheus-metrics to 40m
After 2-3 months lua_shared_dict overflows again and we get a similar problem with displaying metrics
Environment
APISIX version (run apisix version): 3.10.0
Operating system (run uname -a): Linux apisix-69cfdc5fbf-m7k27 5.14.0-362.13.1.el9_3.x86_64 SMP PREEMPT_DYNAMIC Fri Nov 24 01:57:57 EST 2023 x86_64 GNU/Linux
OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.25.3.2
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info): 3.5.0
APISIX Dashboard version, if relevant: 3.0.0
Plugin runner version, for issues related to plugin runners:
LuaRocks version, for installation issues (run luarocks --version):
The text was updated successfully, but these errors were encountered:
Prometheus plugin works on all pods and it returns 69000+ rows from each pod.
We display metrics from Prometheus in Grafana (https://github.com/apache/apisix/blob/master/docs/assets/other/json/apisix-grafana-dashboard.json).
When apisix_shared_dict_free_space_bytes{name="prometheus-metrics"} reached value "0" then apisix_nginx_metric_errors_total start grows up and all apisix metrics show incorrect values.
Current Behavior
We deploy apisix in K8s cluster and have problem with prometheus metrics.
We noticed that lua_shared_dict prometheus-metrics overflows, then the number of apisix_nginx_metric_errors_total errors starts to grow and all metrics stop displaying correctly.
We try increase the prometheus-metrics parameter to 40m in the ConfigMap (config.yaml), but after 2 months this lua_shared_dict was full on all pods and errors started to occur again.
Current Apisix state
Expected Behavior
No response
Error Logs
No response
Steps to Reproduce
Environment
apisix version
): 3.10.0uname -a
): Linux apisix-69cfdc5fbf-m7k27 5.14.0-362.13.1.el9_3.x86_64 SMP PREEMPT_DYNAMIC Fri Nov 24 01:57:57 EST 2023 x86_64 GNU/Linuxopenresty -V
ornginx -V
): openresty/1.25.3.2curl http://127.0.0.1:9090/v1/server_info
): 3.5.0luarocks --version
):The text was updated successfully, but these errors were encountered: