Skip to content

Add Linux host metrics receiver for OTAP Dataflow#2840

Merged
lquerel merged 70 commits intoopen-telemetry:mainfrom
lalitb:lalitb/host-metrics-receiver-complete
May 7, 2026
Merged

Add Linux host metrics receiver for OTAP Dataflow#2840
lquerel merged 70 commits intoopen-telemetry:mainfrom
lalitb:lalitb/host-metrics-receiver-complete

Conversation

@lalitb
Copy link
Copy Markdown
Member

@lalitb lalitb commented May 5, 2026

Change Summary

Adds a Linux host metrics receiver for OTAP Dataflow that collects host-level system.* metrics from procfs/sysfs and emits OTAP Arrow metrics directly.

Highlights:

  • Implements the Phase 1 host metric families with current semconv coverage: CPU, memory, paging/swap, system uptime, disk, filesystem, network, and aggregate process summary.
  • Emits metrics using OpenTelemetry semantic conventions pinned to schema 1.41.0.
  • Centralizes semconv metric/attribute constants and adds a semconv drift check.
  • Builds OTAP Arrow records directly without constructing intermediate OTLP/protobuf metric objects.
  • Uses OTAP metric rows efficiently by grouping device/interface datapoints under shared metric handles.
  • Supports per-family intervals inside one singleton receiver.
  • Enforces the one-core host collection model and duplicate receiver lease guard.
  • Supports host root views for container/DaemonSet deployments, including host network namespace handling via /proc/1/net/dev.
  • Keeps system.process.count limited to registered process.state values; Linux procs_blocked is intentionally not emitted until semconv defines a matching state.

Notes:

  • load is intentionally not emitted in this PR because current OpenTelemetry system semantic conventions do not define a stable system load metric.
  • Receiver self-observability is implemented with the current MetricSet support. Per-family / error-class labelled internal telemetry is a follow-up because the internal telemetry API does not currently support attributes on individual metric observations.

What issue does this PR close?

How are these changes tested?

  • Rust unit/config/projection tests
  • Semconv drift check against OpenTelemetry semantic-conventions v1.41.0
  • df_engine runtime validation on Ubuntu
  • End-to-end validation: host metrics receiver → OTLP exporter → OpenTelemetry Collector → Prometheus → Grafana

Are there any user-facing changes?

Yes, a receiver.

Screenshot 2026-05-04 at 9 48 05 PM Screenshot 2026-05-04 at 9 48 20 PM

Comment thread rust/otap-dataflow/crates/core-nodes/src/receivers/host_metrics_receiver/mod.rs Outdated
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

//! Direct OTAP Arrow record construction for host metrics.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertlockett or @JakeDern could you take a look at this part of the PR. Thanks

}

/// Collects one host snapshot for the due family set.
pub fn scrape_due(&mut self, due: ProcfsFamilies) -> io::Result<HostScrape> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What worries me a little is that this method is not async. It may not be a problem in this specific case, but we should make sure that:

  1. it cannot block, and
  2. the time spent in this method is not too long or dependent on the system configuration.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this with the df-engine threading model in mind. I kept the procfs/sysfs reads synchronous because they are short kernel virtual-file reads served from in-memory kernel state. Routing them through
tokio::fs would mostly hand the same reads to Tokio’s global blocking pool, adding overhead without giving us a stronger bound, and weakening core locality.

The part that can really block is statvfs on remote/userspace mounts. I tightened that path in this PR: statvfs is isolated behind a dedicated bounded worker thread using sync_channel(1), each mount has a timeout, the filesystem scrape has a total budget, and remote/FUSE filesystems are skipped by default unless explicitly opted in via include_remote_filesystems.

Does this direction look reasonable to you?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further discussed with @utpilla as well - And also added cooperative yield points around the heavier scrape phases, so a large scrape does not run as one uninterrupted task on the current-thread runtime. So the direction is - keep short procfs/sysfs reads synchronous, bound the risky statvfs path, and yield between larger scrape phases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the synchronous scrape approach is probably completely fine for v1, especially since:

  • process metrics are currently limited to aggregate host summaries instead of full per-process scraping
  • remote/userspace filesystem collection is opt-in
  • statvfs calls already have timeout protection
  • collection is centralized into a shared scrape path instead of multiple independent scrapers repeatedly traversing procfs/sysfs

So I don’t think this PR should necessarily change direction immediately.

That said, I suspect we will eventually want to move toward a model where:

  • the runtime thread remains responsible for scheduling, control, projection, and downstream flow control
  • while the procfs/sysfs/statvfs traversal itself executes inside a dedicated bounded blocking subsystem

Something roughly like:

  • runtime schedules scrape tick
  • dedicated scraper worker builds a HostSnapshot
  • runtime projects/emits OTAP metrics

The idea would be to keep the existing one-core collection model while isolating potentially long-running synchronous host inspection work behind explicit resource and latency boundaries:

  • max 1 in-flight scrape
  • bounded channels
  • scrape deadlines/budgets
  • overrun policies
  • stronger self-observability around scrape latency/cost

Copy link
Copy Markdown
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really great to see this critical receiver getting close to being integrated natively into our engine.

My 3 main pieces of feedback are:

  • scraping procfs, which has the potential to block the async runtime for too long if there are many network interfaces, disks, CPUs, and so on (one slow or hung mount could potentially leave that thread blocked indefinitely).
  • observe_key inserts and updates per series counter state, but nothing ever removes entries for devices or interfaces that disappear. On long running nodes with churny veth, loop, or ephemeral block devices, states grows monotonically, which violates the bounded resource requirement and gradually increases hash/allocation cost on every scrape.
  • the size of the files, which makes maintainability problematic.

Copy link
Copy Markdown
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host metrics receiver is one of the high-value components in the Collector, and it has a high degree of configurability. It has many "Scrapers" defined, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/scraping-receivers.md as an overview.

See an example metadata.yaml where generated instruments are defined, which determines compile-time views much like in my PR #2623 ans gives operators control over attribute dimensions. Like @lquerel's comment, I don't want to block this PR. We should consider it unstable and aim to converge with the Collector's hostmetrics receiver, which is close to stable.

@jmacd
Copy link
Copy Markdown
Contributor

jmacd commented May 7, 2026

I think it will help (maybe now) to replace "Family" with "Scraper" as a nod to Collector's terminology.

@lalitb
Copy link
Copy Markdown
Member Author

lalitb commented May 7, 2026

It is really great to see this critical receiver getting close to being integrated natively into our engine.

My 3 main pieces of feedback are:

  • scraping procfs, which has the potential to block the async runtime for too long if there are many network interfaces, disks, CPUs, and so on (one slow or hung mount could potentially leave that thread blocked indefinitely).
  • observe_key inserts and updates per series counter state, but nothing ever removes entries for devices or interfaces that disappear. On long running nodes with churny veth, loop, or ephemeral block devices, states grows monotonically, which violates the bounded resource requirement and gradually increases hash/allocation cost on every scrape.
  • the size of the files, which makes maintainability problematic.

Thanks Laurent, this feedback was very helpful.

I addressed these in the latest commits:

  • Split the large config/procfs code into smaller modules for maintainability.
  • Added pruning for disappeared disk/network counter series, while preserving state across transient read failures.
  • Kept procfs/sysfs reads synchronous because they are short kernel virtual-file reads served from in-memory kernel state.
  • Tightened the risky statvfs path: it now runs behind a dedicated bounded worker, has per-mount timeouts, a total filesystem scrape budget, and remote/FUSE filesystems are skipped by default unless explicitly
    enabled.

Please let me know if this direction looks reasonable to you.

@lalitb
Copy link
Copy Markdown
Member Author

lalitb commented May 7, 2026

The host metrics receiver is one of the high-value components in the Collector, and it has a high degree of configurability. It has many "Scrapers" defined, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/scraping-receivers.md as an overview.

See an example metadata.yaml where generated instruments are defined, which determines compile-time views much like in my PR #2623 ans gives operators control over attribute dimensions. Like @lquerel's comment, I don't want to block this PR. We should consider it unstable and aim to converge with the Collector's hostmetrics receiver, which is close to stable.

Thanks @jmacd , that makes sense. This PR is intended to cover the Step 1 / v1 Linux core scope from #2741.

I agree we should keep evolving this incrementally and continue converging with the Collector hostmetrics receiver where it makes sense. The Collector-style metadata.yaml / generated-instrument model and operator-controlled attribute dimensions seem like good follow-up work, since that is broader than the v1 scope and probably needs its own design pass.

We can align the long-term config and emitted metric surface there without blocking this first native integration.

@lalitb
Copy link
Copy Markdown
Member Author

lalitb commented May 7, 2026

I think it will help (maybe now) to replace "Family" with "Scraper" as a nod to Collector's terminology.

Great point @jmacd - That makes sense as a convergence point. We kept family in this PR because it follows the v1 scheduler/config terminology from #2741, and renaming it now would add churn without changing behavior.

scraper is closer to Collector terminology, so we can consider that as part of the follow-up alignment work around the Collector hostmetrics model ?

@lquerel
Copy link
Copy Markdown
Contributor

lquerel commented May 7, 2026

The host metrics receiver is one of the high-value components in the Collector, and it has a high degree of configurability. It has many "Scrapers" defined, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/scraping-receivers.md as an overview.

See an example metadata.yaml where generated instruments are defined, which determines compile-time views much like in my PR #2623 ans gives operators control over attribute dimensions. Like @lquerel's comment, I don't want to block this PR. We should consider it unstable and aim to converge with the Collector's hostmetrics receiver, which is close to stable.

@jmacd, we tried to take into account the extensive feedback that was given on the Go version when designing this new host metrics receiver.

For example, instead of having many largely independent scrapers repeatedly traversing procfs / sysfs, we have a single scraper that inspects procfs / sysfs in one pass. That is significantly more efficient. This is why the concept of a "family" was introduced: 1 scraper -> n families.

We also chose to support the host-related semantic conventions directly, and more generally I think we are trying to adopt the semantic conventions + Weaver ecosystem as natively as possible. So I am not sure that the metadata.yaml direction is the one we want to pursue.

@lquerel
Copy link
Copy Markdown
Contributor

lquerel commented May 7, 2026

I think it will help (maybe now) to replace "Family" with "Scraper" as a nod to Collector's terminology.

I disagree with this one. See my previous comment.

Copy link
Copy Markdown
Contributor

@lquerel lquerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aligned with this first version. The proposal I made in that comment can absolutely be explored in a future PR.

@lquerel lquerel enabled auto-merge May 7, 2026 21:49
@lquerel lquerel added this pull request to the merge queue May 7, 2026
Merged via the queue into open-telemetry:main with commit 3c03f86 May 7, 2026
85 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-repo Repository maintenance, build, GH workflows, repo cleanup, or other chores rust Pull requests that update Rust code

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Implement Linux v1 OTAP-native host_metrics receiver

5 participants