Skip to content

lchenn/py-grpc-prometheus

Repository files navigation

py-grpc-prometheus

Instrument library to provide prometheus metrics similar to:

Status

Currently, the library has the parity metrics with the Java and Go library.

Server side:

  • grpc_server_started_total
  • grpc_server_handled_total
  • grpc_server_msg_received_total
  • grpc_server_msg_sent_total
  • grpc_server_handling_seconds

Client side:

  • grpc_client_started_total
  • grpc_client_handled_total
  • grpc_client_msg_received_total
  • grpc_client_msg_sent_total
  • grpc_client_handling_seconds
  • grpc_client_msg_recv_handling_seconds
  • grpc_client_msg_send_handling_seconds

How to use

pip install py-grpc-prometheus

Client side:

Client metrics monitoring is done by intercepting the gPRC channel.

import grpc
from py_grpc_prometheus.prometheus_client_interceptor import PromClientInterceptor

channel = grpc.intercept_channel(grpc.insecure_channel('server:6565'),
                                         PromClientInterceptor())
# Start an end point to expose metrics.
start_http_server(metrics_port)

Server side:

Server metrics are exposed by adding the interceptor when the gRPC server is started. Take a look at tests/integration/hello_world/hello_world_client.py for the complete example.

import grpc
from concurrent import futures
from py_grpc_prometheus.prometheus_server_interceptor import PromServerInterceptor
from prometheus_client import start_http_server

Start the gRPC server with the interceptor, take a look at tests/integration/hello_world/hello_world_server.py for the complete example.

server = grpc.server(futures.ThreadPoolExecutor(max_workers=10),
                         interceptors=(PromServerInterceptor(),))
# Start an end point to expose metrics.
start_http_server(metrics_port)

Histograms

Prometheus histograms are a great way to measure latency distributions of your RPCs. However, since it is bad practice to have metrics of high cardinality the latency monitoring metrics are disabled by default. To enable them please call the following in your interceptor initialization code:

server = grpc.server(futures.ThreadPoolExecutor(max_workers=10),
                     interceptors=(PromServerInterceptor(enable_handling_time_histogram=True),))

After the call completes, its handling time will be recorded in a Prometheus histogram variable grpc_server_handling_seconds. The histogram variable contains three sub-metrics:

  • grpc_server_handling_seconds_count - the count of all completed RPCs by status and method
  • grpc_server_handling_seconds_sum - cumulative time of RPCs by status and method, useful for calculating average handling times
  • grpc_server_handling_seconds_bucket - contains the counts of RPCs by status and method in respective handling-time buckets. These buckets can be used by Prometheus to estimate SLAs (see here)

Server Side:

  • enable_handling_time_histogram: Enables 'grpc_server_handling_seconds'

Client Side:

  • enable_client_handling_time_histogram: Enables 'grpc_client_handling_seconds'
  • enable_client_stream_receive_time_histogram: Enables 'grpc_client_msg_recv_handling_seconds'
  • enable_client_stream_send_time_histogram: Enables 'grpc_client_msg_send_handling_seconds'

Legacy metrics:

Metric names have been updated to be in line with those from https://github.com/grpc-ecosystem/go-grpc-prometheus.

The legacy metrics are:

server side:

  • grpc_server_started_total
  • grpc_server_handled_total
  • grpc_server_handled_latency_seconds
  • grpc_server_msg_received_total
  • grpc_server_msg_sent_total

client side:

  • grpc_client_started_total
  • grpc_client_completed
  • grpc_client_completed_latency_seconds
  • grpc_client_msg_sent_total
  • grpc_client_msg_received_total

In order to be able to use these legacy metrics for backwards compatibility, the legacy flag can be set to True when initialising the server/client interceptors

For example, to enable the server side legacy metrics:

server = grpc.server(futures.ThreadPoolExecutor(max_workers=10),
                     interceptors=(PromServerInterceptor(legacy=True),))

How to run and test

make initialize-development
make test

TODO:

Reference