Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
f74ff58
Add host metrics receiver
lalitb May 3, 2026
1b89403
Add host metrics family scheduler
lalitb May 3, 2026
8f99645
Add host view validation modes
lalitb May 3, 2026
6985bbe
Add host metrics receiver telemetry
lalitb May 3, 2026
204a41d
Add host architecture resource attribute
lalitb May 3, 2026
a709c89
Reject host metrics on unsupported platforms
lalitb May 3, 2026
ad0c973
Add host metrics semantic shape test
lalitb May 3, 2026
1b943c1
Handle CPU per-core config option
lalitb May 3, 2026
cc0d813
Test host metrics duplicate lease
lalitb May 3, 2026
502e1a2
Allow partial host metrics scrapes
lalitb May 3, 2026
5abf68e
Track host metrics source read errors
lalitb May 3, 2026
f2a06b8
Derive host metrics process mode default
lalitb May 3, 2026
fc6f3f2
Use system clock ticks for CPU metrics
lalitb May 3, 2026
26e4d9b
Add opt-in CPU utilization metric
lalitb May 3, 2026
38e0ee6
Project host updown metrics as sums
lalitb May 3, 2026
6d8801c
Add opt-in disk limit metric
lalitb May 3, 2026
8e747e4
Add host filesystem metrics
lalitb May 3, 2026
494fdaa
Add filesystem filters
lalitb May 3, 2026
8aeb3a8
Add memory opt-in metrics
lalitb May 3, 2026
3aa1428
Fix host metrics config test initializer
lalitb May 3, 2026
ddef929
Complete host metrics receiver phase 1
lalitb May 3, 2026
6602933
Add host memory hugepage metrics
lalitb May 3, 2026
77aa4da
Align host metric attributes with semconv
lalitb May 3, 2026
e715992
Add host metrics semconv drift check
lalitb May 3, 2026
25d09b9
Run host metrics semconv check in CI
lalitb May 3, 2026
4d0063d
Fix host metrics semconv conformance gaps
lalitb May 3, 2026
2b20d2f
Harden host metrics review findings
lalitb May 3, 2026
4c14e78
Fix host metrics review regressions
lalitb May 3, 2026
05ee33e
replace nix sysconf with libc in host metrics receiver
lalitb May 4, 2026
8657a6a
add direct OTAP Arrow builder for host metrics
lalitb May 4, 2026
2b6deae
replace proto encoding path with direct OTAP Arrow construction
lalitb May 4, 2026
9bf73f4
remove proto intermediate path and dead push_* helpers
lalitb May 4, 2026
f199def
fix clippy: allow dead_code on SEMCONV_VERSION constant
lalitb May 4, 2026
84a26b6
append flags=0 on every number datapoint
lalitb May 4, 2026
2550105
add compile-time assertion that schema URL matches semconv version
lalitb May 4, 2026
d4931c4
gate procfs and scheduler on linux, keep platform check for non-linux
lalitb May 4, 2026
3633343
implement CollectTelemetry handler to emit receiver metrics snapshot
lalitb May 4, 2026
2a86b04
propagate OTAP set errors instead of panicking in finish_batch; repla…
lalitb May 4, 2026
bea7421
cargo fmt
lalitb May 4, 2026
993fc3d
Centralize host metrics semconv constants
lalitb May 4, 2026
1e2f34e
Fix host metrics review issues
lalitb May 5, 2026
d403791
Harden host metrics scrape behavior
lalitb May 5, 2026
dbf01f2
Align host process state with semconv
lalitb May 5, 2026
6c12c98
Document host semconv constants ownership
lalitb May 5, 2026
4d86475
Harden host metrics validation and start times
lalitb May 5, 2026
bc47e78
Tighten host metrics OTAP builder API
lalitb May 5, 2026
759bda3
Fix paging operation counter starts
lalitb May 5, 2026
230cbe6
Reduce host metrics projection rows
lalitb May 5, 2026
a2f3922
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 5, 2026
c00ef86
Merge host metric shapes in semconv check
lalitb May 5, 2026
7480490
Group host metric datapoints by parent
lalitb May 5, 2026
055d8dd
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 5, 2026
c9d8b2c
Align host metrics docs with implementation
lalitb May 5, 2026
4a734c0
Fix host metrics partial scrape test
lalitb May 5, 2026
a8a36ec
Fix host metrics netdev test fixture
lalitb May 5, 2026
7091f44
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 5, 2026
1e8e422
Accept Unix host roots on Windows tests
lalitb May 6, 2026
b9ad3b7
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 6, 2026
dce01e4
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 6, 2026
131b7d6
Address host metrics review feedback
lalitb May 7, 2026
0b4821e
Split remote filesystem filtering
lalitb May 7, 2026
a8e9862
Fix host metrics dev-tools imports
lalitb May 7, 2026
6eff210
Skip more virtual filesystems
lalitb May 7, 2026
6626c5f
Stabilize host metrics start time fallback
lalitb May 7, 2026
2b1bbba
Add cooperative host metrics scrape yields
lalitb May 7, 2026
41efd71
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 7, 2026
69eb42c
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 7, 2026
6d0fb5b
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 7, 2026
27fc046
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lalitb May 7, 2026
38df871
Merge branch 'main' into lalitb/host-metrics-receiver-complete
lquerel May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/rust-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -656,6 +656,32 @@ jobs:
cargo clippy --all-targets --all-features --workspace -- -D warnings
working-directory: ./rust/${{ matrix.folder }}

host-metrics-semconv:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
repository: open-telemetry/semantic-conventions
ref: v1.41.0
path: semantic-conventions
- uses: dtolnay/rust-toolchain@3c5f7ea28cd621ae0bf5283f0e981fb97b8a7af9
with:
toolchain: stable
- uses: Swatinem/rust-cache@c19371144df3bb44fab255c43d04cbc2ab54d1c4 # v2.9.1
with:
workspaces: ./rust/otap-dataflow
- name: Run host metrics semconv drift check
env:
OTAP_HOST_METRICS_SEMCONV_REGISTRY: ${{ github.workspace }}/semantic-conventions/model
run: |
cargo test -p otap-df-core-nodes \
--features dev-tools,otap-df-otap/crypto-ring \
emitted_phase1_metric_shapes_match_weaver_semconv --lib -- --ignored
working-directory: ./rust/otap-dataflow
Comment thread
lalitb marked this conversation as resolved.

# Required matrix combinations for deny: otap-dataflow only
deny_required:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -804,6 +830,7 @@ jobs:
- pest-fmt
- no_default_features_check
- pipeline_perf_test
- host-metrics-semconv
steps:
- name: Check if all required jobs succeeded
run: |
Expand Down Expand Up @@ -843,4 +870,8 @@ jobs:
echo "pipeline_perf_test failed or was cancelled"
exit 1
fi
if [[ "${{ needs.host-metrics-semconv.result }}" != "success" ]]; then
echo "host-metrics-semconv failed or was cancelled"
exit 1
fi
echo "All required checks passed!"
1 change: 1 addition & 0 deletions rust/otap-dataflow/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ tikv-jemalloc-sys = "0.6.1"
memchr = "2.8.0"
memmap2 = "0.9"
memory-stats = "1"
libc = "0.2"
nix = { version = "0.31.0", features = ["process", "signal", "fs", "mman"] }
notify = "8.0" # Uses platform-native backend: inotify (Linux), kqueue (macOS), ReadDirectoryChanges (Windows)
num_enum = "0.7"
Expand Down
5 changes: 5 additions & 0 deletions rust/otap-dataflow/crates/core-nodes/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ object_store = {workspace = true, features = ["fs"]}
parquet.workspace = true
prost.workspace = true
rand.workspace = true
regex.workspace = true
serde.workspace = true
serde_json.workspace = true
slotmap.workspace = true
Expand All @@ -65,6 +66,10 @@ weaver_resolved_schema = { workspace = true, optional = true }
weaver_resolver = { workspace = true, optional = true }
weaver_semconv = { workspace = true, optional = true }

[target.'cfg(target_os = "linux")'.dependencies]
libc.workspace = true
nix.workspace = true

[features]
dev-tools = ["dep:weaver_common", "dep:weaver_forge", "dep:weaver_resolved_schema", "dep:weaver_resolver", "dep:weaver_semconv"]
bench = []
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Host Metrics Receiver

<!-- markdownlint-disable MD013 -->

**URN:** `urn:otel:receiver:host_metrics`

Linux host metrics receiver backed by procfs and sysfs. It emits OpenTelemetry
`system.*` metrics for CPU, memory, paging, system uptime, disk, filesystem,
network, and aggregate process counts.

## Configuration

Minimal configuration:

```yaml
groups:
host:
pipelines:
collect:
policies:
resources:
core_allocation:
type: core_count
count: 1
nodes:
host_metrics:
type: receiver:host_metrics
config:
collection_interval: 10s
publish:
type: exporter:topic
config:
topic: host_metrics
connections:
- from: host_metrics
to: publish
```

Collect from a host root mounted into a container:

```yaml
groups:
host:
pipelines:
collect:
policies:
resources:
core_allocation:
type: core_count
count: 1
nodes:
host_metrics:
type: receiver:host_metrics
config:
collection_interval: 10s
host_view:
root_path: /host
validation: fail_selected
publish:
type: exporter:topic
config:
topic: host_metrics
connections:
- from: host_metrics
to: publish
```

Enable selected opt-in metrics:

```yaml
groups:
host:
pipelines:
collect:
policies:
resources:
core_allocation:
type: core_count
count: 1
nodes:
host_metrics:
type: receiver:host_metrics
config:
families:
cpu:
utilization: true
memory:
limit: true
hugepages: true
disk:
limit: true
filesystem:
limit: true
publish:
type: exporter:topic
config:
topic: host_metrics
connections:
- from: host_metrics
to: publish
```

## Configuration Options

| Field | Type | Default | Description |
| ----- | ---- | ------- | ----------- |
| `collection_interval` | duration | `10s` | Default scrape interval. |
| `initial_delay` | duration | `0s` | Delay before the first scrape. |
| `host_view.root_path` | path | `/` | Host filesystem root to read procfs/sysfs from. |
| `host_view.validation` | enum | `fail_selected` | One of `fail_selected`, `warn_selected`, or `none`. |
| `families.<name>.enabled` | bool | `true` | Enables or disables a metric family. |
| `families.<name>.interval` | duration | unset | Per-family interval; falls back to `collection_interval`. |
| `families.cpu.utilization` | bool | `false` | Emits derived CPU utilization gauges. |
| `families.memory.limit` | bool | `false` | Emits `system.memory.limit`. |
| `families.memory.shared` | bool | `false` | Emits Linux shared memory. |
| `families.memory.hugepages` | bool | `false` | Emits Linux hugepage metrics. |
| `families.disk.limit` | bool | `false` | Emits disk capacity from sysfs. |
| `families.filesystem.limit` | bool | `false` | Emits filesystem capacity. |
| `families.filesystem.include_virtual_filesystems` | bool | `false` | Includes virtual filesystems such as tmpfs. |
| `families.filesystem.include_remote_filesystems` | bool | `false` | Includes remote and userspace filesystems such as NFS, CIFS, 9p, and FUSE. |

Families are `cpu`, `memory`, `paging`, `system`, `disk`, `filesystem`,
`network`, and `processes`.

Host-wide collection must run in a one-core source pipeline. Use a topic
exporter to fan out to multicore downstream processing when needed.

## Filters

Disk, filesystem, and network families support include and exclude filters.
Filter `match_type` values are `strict`, `glob`, and `regexp`.

```yaml
groups:
host:
pipelines:
collect:
policies:
resources:
core_allocation:
type: core_count
count: 1
nodes:
host_metrics:
type: receiver:host_metrics
config:
families:
disk:
exclude:
match_type: glob
devices: ["loop*", "ram*"]
network:
exclude:
match_type: strict
interfaces: ["lo"]
filesystem:
exclude_fs_types:
match_type: strict
fs_types: ["tmpfs", "proc", "sysfs"]
publish:
type: exporter:topic
config:
topic: host_metrics
connections:
- from: host_metrics
to: publish
```

## Current Limits

- Linux only.
- Load metrics are not emitted in v1 because Semantic Conventions 1.41.0 does
not register a system load metric.
- `families.cpu.per_cpu` is rejected in v1.
- `families.network.include_connection_count` is rejected in v1.
- Process metrics are aggregate host summaries, not per-process scrapes.
- `system.process.count` emits the registered `process.state=running` summary.
Linux `procs_blocked` is parsed but not emitted because `blocked` is not a
registered `process.state` value.
- Filesystem collection can time out individual `statvfs` calls; avoid enabling
remote filesystems unless the host environment is known to be healthy.
Loading
Loading