Prometheus exporter - Histograms are disappearing. #3089

mshebeko-twist · 2022-12-14T11:48:51Z

Describe your environment Describe any aspect of your environment relevant to the problem, including your Python version,
python 3.7.10
opentelemetry_api-1.14.0
opentelemetry_sdk-1.14.0
opentelemetry_exporter_prometheus-1.12.0rc1

Steps to reproduce
Create Histogram and expose it with prometheus-exporter

What is the expected behavior?
Histograms should consistently appear on /metrics endpoint

What is the actual behavior?
Histograms disappears from /metrics endpoint after request/scrape. A change to metrics (in my case would be query to application itself) will make histogram appear again until next request/scrape. This is not happening neither in prometheus_client nor in javascript opentelemetry prometheus exporter.

Additional context
This behaviour is not optimal when multiple prometheus servers are scrapping metrics.
I have both auto instrumented and manually instrumented histograms both of them are disappearing.

Edit:
Don't know if its actually related but I hope it may be helpful.
I have additional observation. When pushing Histograms to OTEL Collector I sporadically encounter following error:

[email protected]/accumulator.go:105	failed to translate metric	{"kind": "exporter", "data_type": "metrics", "name": "prometheus", "data_type": "\u0000", "metric_name": "http.server.duration"}

But its not the case with JS implementation its just works without any errors.

The text was updated successfully, but these errors were encountered:

robotadam · 2022-12-20T00:15:00Z

I was able to duplicate this, and I also found this related discussion: open-telemetry/opentelemetry-collector-contrib#13443

It appears to happen when there is no data collected for the histogram during an exporter interval.

Here's a minimal app.py that exhibits this, using python 3.10.9, otel 1.15.0/0.36b0

from fastapi import FastAPI
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.metrics import set_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import (
    ConsoleMetricExporter,
    PeriodicExportingMetricReader,
)
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

app = FastAPI()


@app.get("/")
def read_root():
    return {"Hello": "World"}


exporter = OTLPMetricExporter(endpoint="http://otel-collector:4317")
console = ConsoleMetricExporter()
reader = PeriodicExportingMetricReader(exporter, export_interval_millis=5_000)
provider = MeterProvider(
    resource=Resource.create({SERVICE_NAME: "fastapi-test-app"}),
    metric_readers=[reader],
)
set_meter_provider(provider)

FastAPIInstrumentor.instrument_app(app)

and otel-collector-config.yaml, running on latest docker image otel/opentelemetry-collector-contrib, which is 0.67.0.

receivers:
  otlp:
    protocols:
      grpc:

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889

extensions:
  health_check:

service:
  extensions: [health_check]
  pipelines:
    metrics:
      receivers: [otlp]
      processors: []
      exporters: [prometheus]

Running this app the first metric is exported after the first web request successfully. After that every export will emit the error message and traceback every 5 seconds:

2022-12-20T00:12:05.663Z	error	[email protected]/accumulator.go:105	failed to translate metric	{"kind": "exporter", "data_type": "metrics", "name": "prometheus", "data_type": "\u0000", "metric_name": "http.server.duration"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*lastValueAccumulator).addMetric
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/accumulator.go:105
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*lastValueAccumulator).Accumulate
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/accumulator.go:82
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*collector).processMetrics
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/collector.go:67
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*prometheusExporter).ConsumeMetrics
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/prometheus.go:96
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:65
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/common.go:208
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:358
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:136
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:294
go.opentelemetry.io/collector/exporter/exporterhelper.NewMetricsExporter.func2
	go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:116
go.opentelemetry.io/collector/consumer.ConsumeMetricsFunc.ConsumeMetrics
	go.opentelemetry.io/collector/[email protected]/metrics.go:36
go.opentelemetry.io/collector/receiver/otlpreceiver/internal/metrics.(*Receiver).Export
	go.opentelemetry.io/collector/receiver/[email protected]/internal/metrics/otlp.go:63
go.opentelemetry.io/collector/pdata/pmetric/pmetricotlp.rawMetricsServer.Export
	go.opentelemetry.io/collector/[email protected]/pmetric/pmetricotlp/grpc.go:71
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1._MetricsService_Export_Handler.func1
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/metrics/v1/metrics_service.pb.go:311
go.opentelemetry.io/collector/config/configgrpc.enhanceWithClientInformation.func1
	go.opentelemetry.io/[email protected]/config/configgrpc/configgrpc.go:412
google.golang.org/grpc.chainUnaryInterceptors.func1.1
	google.golang.org/[email protected]/server.go:1162
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/[email protected]/interceptor.go:341
google.golang.org/grpc.chainUnaryInterceptors.func1.1
	google.golang.org/[email protected]/server.go:1165
google.golang.org/grpc.chainUnaryInterceptors.func1
	google.golang.org/[email protected]/server.go:1167
go.opentelemetry.io/collector/pdata/internal/data/protogen/collector/metrics/v1._MetricsService_Export_Handler
	go.opentelemetry.io/collector/[email protected]/internal/data/protogen/collector/metrics/v1/metrics_service.pb.go:313
google.golang.org/grpc.(*Server).processUnaryRPC
	google.golang.org/[email protected]/server.go:1340
google.golang.org/grpc.(*Server).handleStream
	google.golang.org/[email protected]/server.go:1713
google.golang.org/grpc.(*Server).serveStreams.func1.2
	google.golang.org/[email protected]/server.go:965

aabmass · 2022-12-20T18:43:11Z

Hey folks, thanks for raising this issue and debugging. Would one of you be willing to send a PR to fix this?

Longer term, I'm wondering if we should move away from using the Prometheus client library altogether and just generate prometheus text format on our own. This is what the JS SIG went with.

robotadam · 2022-12-20T19:16:54Z

My test showed this failing with OTLP/grpc export to an opentelemetry collector exporting prometheus, not using the built in prometheus exporter. So it may be that there are two separate issues or that the underlying cause is deeper in the metrics generation rather than inside the exporter.

In the related discussion they called out the duplicate data_type fields in the error message:
failed to translate metric {"kind": "exporter", "data_type": "metrics", "name": "prometheus", "data_type": "\u0000", "metric_name": "http.server.duration"}
perhaps that can point the way? If you have a suggestion of where to look I can see if I spot something.

robotadam · 2022-12-22T00:12:44Z

I think I isolated it a little further. In my testing this only occurs if there is a period where a previously-active histogram has no data points & and there is a different metric with data. I dumped some data from the OTLP exporter before it went over the wire. When there is data in the histogram, the request looks like this:

...
scope_metrics {
    scope {
      name: "__main__"
    }
    metrics {
      name: "http.server.duration"
      description: "A histogram."
      unit: "ms"
      histogram {
        data_points {
          start_time_unix_nano: 1671666510029448000
          time_unix_nano: 1671666515034673000
          count: 1
          sum: 1
          bucket_counts: 0
...

When there is no data, the output looks like this:

...
  scope_metrics {
    scope {
      name: "__main__"
    }
    metrics {
      name: "http.server.duration"
      description: "A histogram."
      unit: "ms"
    }
...

So it's not just that data_points is empty, it's also that histogram doesn't exist as a key.

I don't know why this only causes an error when the collector receives another metric, but it does seem like if there are no data points the Histogram metric should not be included. Adding a continue here in that case avoids sending it along:

opentelemetry-python/opentelemetry-sdk/src/opentelemetry/sdk/metrics/_internal/metric_reader_storage.py

Line 182 in 0f9cfdd

data = Histogram(

I have no idea if that's a valid solution, but wanted to share what I learned.

mplachter · 2023-05-23T13:28:22Z

Any update on this?

ocelotl · 2023-06-07T17:42:26Z

@robotadam how are you running that script to produce histogram data points?

robotadam · 2023-06-07T17:52:49Z

@ocelotl I ran it with uvicorn app:app --host 0.0.0.0 --port $PORT --workers 2 in a docker container. Details:

pyproject.toml:

[tool.poetry]
name = "oteltestapp"
version = "0.1.0"
description = ""
authors = ["Your Name <[email protected]>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"
fastapi = "^0.88.0"
opentelemetry-exporter-otlp = "^1.15.0"
uvicorn = "^0.20.0"
opentelemetry-instrumentation-fastapi = "^0.36b0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

docker-compose.yml:

version: "3.8"
services:
  app:
    build:
      context: .
    volumes:
      - ./app.py:/app/app.py
    ports:
      - "4000:4000"
  otel-collector:
    image: otel/opentelemetry-collector-contrib
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "8888:8888"   # Prometheus metrics exposed by the collector
      - "8889:8889"   # Prometheus exporter metrics
      - "13133:13133" # health_check extension
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP http receiver
      - "55679:55679" # zpages

Dockerfile:

FROM python:3.10-slim-bullseye

ENV PYTHONUNBUFFERED=1 \
    # prevents python creating .pyc files
    PYTHONDONTWRITEBYTECODE=1 \
    \
    # pip
    PIP_NO_CACHE_DIR=off \
    PIP_DISABLE_PIP_VERSION_CHECK=on \
    PIP_DEFAULT_TIMEOUT=100 \
    \
    # poetry
    POETRY_VERSION=1.2.2 \
    # make poetry install to this location
    POETRY_HOME="/opt/poetry" \
    # We will use our own created virtualenv to make it easier to cache
    POETRY_VIRTUALENVS_CREATE=false \
    # do not ask any interactive question
    POETRY_NO_INTERACTION=1 \
    # the path with our venv
    VENV_PATH=/opt/venv \
    APP_PATH=/app \
    PORT=4000

# prepend poetry and our venv to path
ENV PATH="$POETRY_HOME/bin:$VENV_PATH/bin:$PATH"

# Install builder dependencies
RUN apt-get update && \
    apt-get install -y curl
RUN curl -sSL https://install.python-poetry.org | python3 -

# copy project requirement files here to ensure they will be cached.
WORKDIR $APP_PATH
COPY . $APP_PATH
WORKDIR $APP_PATH

# Install Deps
RUN python -m venv $VENV_PATH
ENV VIRTUAL_ENV=$VENV_PATH
RUN poetry install --no-ansi --only main

CMD uvicorn app:app --host 0.0.0.0 --port $PORT --workers 2

the poetry.lock file is noisy so I don't want to post it here, but you should be able to swap basically all of this out for a simple requirements file. I don't have time to do that today but if it will help I can see when I can free up time.

crisog · 2023-12-05T22:11:10Z

@ocelotl I have some spare time and a high interest into getting this fixed. If I come up with a fix, would you be able to review or will it have to go through a different process?

bastbu · 2023-12-22T13:51:04Z

@ocelotl to clarify - will this be addressed via #3407 (based on the discussions around #3277 (comment))?

ocelotl · 2024-01-11T23:07:03Z

@crisog I would surely, but it may be a fix already in #3429. @robotadam @mshebeko-twist @bastbu could you try again using the fix in #3429?

bastbu · 2024-01-12T05:28:47Z

@ocelotl can confirm, the fix solves the issue for me.

mshebeko-twist added the bug Something isn't working label Dec 14, 2022

aabmass added exporters prometheus labels Dec 20, 2022

DraggonFantasy added a commit to DraggonFantasy/opentelemetry-python that referenced this issue Feb 22, 2023

Fix open-telemetry#3089 'histograms are dissapearing'

e40098a

DraggonFantasy mentioned this issue Feb 22, 2023

Fix #3089 'histograms are dissapearing' #3196

Closed

7 tasks

ocelotl self-assigned this Jun 9, 2023

ocelotl mentioned this issue Feb 5, 2024

Fix explicit bucket histogram aggregation #3429

Merged

ocelotl closed this as completed in #3429 Feb 8, 2024

ocelotl mentioned this issue Jun 28, 2024

Fix ExponentialBucketHistogramAggregation #3978

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus exporter - Histograms are disappearing. #3089

Prometheus exporter - Histograms are disappearing. #3089

mshebeko-twist commented Dec 14, 2022 •

edited

Loading

robotadam commented Dec 20, 2022

aabmass commented Dec 20, 2022

robotadam commented Dec 20, 2022

robotadam commented Dec 22, 2022

mplachter commented May 23, 2023

ocelotl commented Jun 7, 2023

robotadam commented Jun 7, 2023

crisog commented Dec 5, 2023

bastbu commented Dec 22, 2023

ocelotl commented Jan 11, 2024

bastbu commented Jan 12, 2024

Prometheus exporter - Histograms are disappearing. #3089

Prometheus exporter - Histograms are disappearing. #3089

Comments

mshebeko-twist commented Dec 14, 2022 • edited Loading

robotadam commented Dec 20, 2022

aabmass commented Dec 20, 2022

robotadam commented Dec 20, 2022

robotadam commented Dec 22, 2022

mplachter commented May 23, 2023

ocelotl commented Jun 7, 2023

robotadam commented Jun 7, 2023

crisog commented Dec 5, 2023

bastbu commented Dec 22, 2023

ocelotl commented Jan 11, 2024

bastbu commented Jan 12, 2024

mshebeko-twist commented Dec 14, 2022 •

edited

Loading