Call metrics API for units related stats#158
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for collecting systemd unit-related statistics via the new varlink metrics API (systemd v260+), alongside a refactor that centralizes unit state constants.
Changes:
- Introduces a varlink metrics client (
src/varlink/metrics.rs) and a new varlink-based unit stats collector (src/varlink_units.rs). - Refactors unit state enums /
is_unit_unhealthyintosrc/unit_constants.rsand re-exports them fromsrc/units.rs. - Adds a
[varlink]config section and switches collectors to choose varlink vs D-Bus based on config.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
src/varlink_units.rs |
New varlink-based unit stats collection and parsing logic. |
src/varlink/mod.rs |
Adds varlink module namespace. |
src/varlink/metrics.rs |
Varlink proxy + output types for Metrics interface. |
src/units.rs |
Removes duplicated enums; re-exports from unit_constants. |
src/unit_constants.rs |
New shared enums + is_unit_unhealthy implementation. |
src/machines.rs |
Switches container unit stats collection to varlink when enabled. |
src/lib.rs |
Switches host unit stats collection to varlink when enabled; exposes modules. |
src/config.rs |
Adds VarlinkConfig and embeds into main config struct. |
monitord.conf |
Documents [varlink] enabled = true. |
Cargo.toml |
Adds futures-util and zlink-related dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Looks a good start.
- Let's add a section on varlink usage and flow in README.md please
- Let's remove all
.unwrap()s and see if we can throw an Error OR at least move to.expect("Helpful message")for now - Consider other suggestions.
Lots of other comments + let's see what copilot says.
| anyhow = "1.0" | ||
| clap = { version = "4.5", features = ["derive", "std", "help"], default-features = false } | ||
| configparser = { version = "3.1.0", features = ["indexmap"] } | ||
| futures-util = "0.3" |
There was a problem hiding this comment.
I have been trying to avoid bringing in futures, but at least here we only bring in utils ...
|
Would love @ikruglov take a look if you have time please ... |
|
Also, @YapingLi04 - If you don't have time to do the follow ups say so, and I can merge and chip away at them |
|
@cooperlees thanks for the review! I addressed all of your comments. Please let know if anything else is needed. |
cooperlees
left a comment
There was a problem hiding this comment.
Awesome. Thanks for the sweet docs too!
Just so you know this will auto sync to https://monitord.xyz/ when we release.
| let object_name = metric.object_name(); | ||
|
|
||
| match metric_name_suffix { | ||
| "unit_active_state" => { |
There was a problem hiding this comment.
@YapingLi04, I recall you rename these metrics to make them camelCase in systemd
There was a problem hiding this comment.
oops, seems I'm a bit late :-)
There was a problem hiding this comment.
We can forward fix things if you find stuff - It's not released to crates.io etc. - Only merged to main
| if config.varlink.enabled { | ||
| join_set.spawn(crate::varlink_units::update_unit_stats( | ||
| Arc::clone(&config), | ||
| sdc.clone(), |
There was a problem hiding this comment.
does varlink implementation needs dbus?
| match get_unit_stats(&config, &socket_path).await { | ||
| Ok(units_stats) => { | ||
| let mut machine_stats = locked_machine_stats.write().await; | ||
| machine_stats.units = units_stats; | ||
| } | ||
| Err(err) => { | ||
| warn!( | ||
| "Varlink units stats failed, falling back to D-Bus: {:?}", | ||
| err | ||
| ); | ||
| crate::units::update_unit_stats(Arc::clone(&config), connection, locked_machine_stats) | ||
| .await?; | ||
| } | ||
| } | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
I'd personally prefer the fallback happen in an outer layer to drop any dependency of varlink on dbus. Not sure if it's possible in rust. Not an expert here.
| if should_skip_unit(&object_name, config) { | ||
| return Ok(()); | ||
| } | ||
| let value = metric.value_as_string("" /* default_value */); |
There was a problem hiding this comment.
@YapingLi04, why value has a default_value? I recall that value is a mandatory field in systemd. Plus, any of its value types (int, bool, enum, etc) can be converted to string. So, I'd prefer to fail and go to next metric in case of a error rather then silently ignore the failure.
| } | ||
|
|
||
| /// Returns the int value as u64 or default_value if not present | ||
| pub fn value_as_int(&self, default_value: u64) -> u64 { |
There was a problem hiding this comment.
negative number are completely legit value. Please don't mix gauge vs counter here. Counter is always positive, but gauge can be negative.
There was a problem hiding this comment.
also, I'm not sure I'm happy with silenty ignoring the "wrong" value and converting them into default. I think we should warn/error about it and move forward. But not silently produce wrong value.
| .service_stats | ||
| .entry(object_name.to_string()) | ||
| .or_default() | ||
| .nrestarts = metric.value_as_int(0 /* default_value */) as u32; |
There was a problem hiding this comment.
here in all other similar case. I'm not happy about "default_value". We should warn/error and move forward. But not silently produce 0.
This implements support for the new metrics API in monitord.
The metrics API will be available in systemd v260. The zlink support for ANY type is in zlink release 0.4.0.