Add Linux host metrics receiver for OTAP Dataflow#2840
Add Linux host metrics receiver for OTAP Dataflow#2840lquerel merged 70 commits intoopen-telemetry:mainfrom
Conversation
| // Copyright The OpenTelemetry Authors | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| //! Direct OTAP Arrow record construction for host metrics. |
There was a problem hiding this comment.
@albertlockett or @JakeDern could you take a look at this part of the PR. Thanks
| } | ||
|
|
||
| /// Collects one host snapshot for the due family set. | ||
| pub fn scrape_due(&mut self, due: ProcfsFamilies) -> io::Result<HostScrape> { |
There was a problem hiding this comment.
What worries me a little is that this method is not async. It may not be a problem in this specific case, but we should make sure that:
- it cannot block, and
- the time spent in this method is not too long or dependent on the system configuration.
There was a problem hiding this comment.
I looked into this with the df-engine threading model in mind. I kept the procfs/sysfs reads synchronous because they are short kernel virtual-file reads served from in-memory kernel state. Routing them through
tokio::fs would mostly hand the same reads to Tokio’s global blocking pool, adding overhead without giving us a stronger bound, and weakening core locality.
The part that can really block is statvfs on remote/userspace mounts. I tightened that path in this PR: statvfs is isolated behind a dedicated bounded worker thread using sync_channel(1), each mount has a timeout, the filesystem scrape has a total budget, and remote/FUSE filesystems are skipped by default unless explicitly opted in via include_remote_filesystems.
Does this direction look reasonable to you?
There was a problem hiding this comment.
Further discussed with @utpilla as well - And also added cooperative yield points around the heavier scrape phases, so a large scrape does not run as one uninterrupted task on the current-thread runtime. So the direction is - keep short procfs/sysfs reads synchronous, bound the risky statvfs path, and yield between larger scrape phases.
There was a problem hiding this comment.
Right now the synchronous scrape approach is probably completely fine for v1, especially since:
- process metrics are currently limited to aggregate host summaries instead of full per-process scraping
- remote/userspace filesystem collection is opt-in
statvfscalls already have timeout protection- collection is centralized into a shared scrape path instead of multiple independent scrapers repeatedly traversing procfs/sysfs
So I don’t think this PR should necessarily change direction immediately.
That said, I suspect we will eventually want to move toward a model where:
- the runtime thread remains responsible for scheduling, control, projection, and downstream flow control
- while the procfs/sysfs/statvfs traversal itself executes inside a dedicated bounded blocking subsystem
Something roughly like:
- runtime schedules scrape tick
- dedicated scraper worker builds a
HostSnapshot - runtime projects/emits OTAP metrics
The idea would be to keep the existing one-core collection model while isolating potentially long-running synchronous host inspection work behind explicit resource and latency boundaries:
- max 1 in-flight scrape
- bounded channels
- scrape deadlines/budgets
- overrun policies
- stronger self-observability around scrape latency/cost
lquerel
left a comment
There was a problem hiding this comment.
It is really great to see this critical receiver getting close to being integrated natively into our engine.
My 3 main pieces of feedback are:
- scraping
procfs, which has the potential to block the async runtime for too long if there are many network interfaces, disks, CPUs, and so on (one slow or hung mount could potentially leave that thread blocked indefinitely). - observe_key inserts and updates per series counter state, but nothing ever removes entries for devices or interfaces that disappear. On long running nodes with churny veth, loop, or ephemeral block devices, states grows monotonically, which violates the bounded resource requirement and gradually increases hash/allocation cost on every scrape.
- the size of the files, which makes maintainability problematic.
jmacd
left a comment
There was a problem hiding this comment.
The host metrics receiver is one of the high-value components in the Collector, and it has a high degree of configurability. It has many "Scrapers" defined, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/scraping-receivers.md as an overview.
See an example metadata.yaml where generated instruments are defined, which determines compile-time views much like in my PR #2623 ans gives operators control over attribute dimensions. Like @lquerel's comment, I don't want to block this PR. We should consider it unstable and aim to converge with the Collector's hostmetrics receiver, which is close to stable.
|
I think it will help (maybe now) to replace "Family" with "Scraper" as a nod to Collector's terminology. |
Thanks Laurent, this feedback was very helpful. I addressed these in the latest commits:
Please let me know if this direction looks reasonable to you. |
Thanks @jmacd , that makes sense. This PR is intended to cover the Step 1 / v1 Linux core scope from #2741. I agree we should keep evolving this incrementally and continue converging with the Collector hostmetrics receiver where it makes sense. The Collector-style We can align the long-term config and emitted metric surface there without blocking this first native integration. |
Great point @jmacd - That makes sense as a convergence point. We kept
|
@jmacd, we tried to take into account the extensive feedback that was given on the Go version when designing this new host metrics receiver. For example, instead of having many largely independent scrapers repeatedly traversing We also chose to support the host-related semantic conventions directly, and more generally I think we are trying to adopt the semantic conventions + Weaver ecosystem as natively as possible. So I am not sure that the |
I disagree with this one. See my previous comment. |
Change Summary
Adds a Linux host metrics receiver for OTAP Dataflow that collects host-level
system.*metrics from procfs/sysfs and emits OTAP Arrow metrics directly.Highlights:
1.41.0./proc/1/net/dev.system.process.countlimited to registeredprocess.statevalues; Linuxprocs_blockedis intentionally not emitted until semconv defines a matching state.Notes:
loadis intentionally not emitted in this PR because current OpenTelemetry system semantic conventions do not define a stable system load metric.MetricSetsupport. Per-family / error-class labelled internal telemetry is a follow-up because the internal telemetry API does not currently support attributes on individual metric observations.What issue does this PR close?
How are these changes tested?
v1.41.0Are there any user-facing changes?
Yes, a receiver.