Skip to content

[metrics] Add basic metrics List method#1

Open
YapingLi04 wants to merge 6 commits intomainfrom
metrics
Open

[metrics] Add basic metrics List method#1
YapingLi04 wants to merge 6 commits intomainfrom
metrics

Conversation

@YapingLi04
Copy link
Owner

@YapingLi04 YapingLi04 commented Jul 15, 2025

see title
See full design

SD_VARLINK_FIELD_COMMENT("Value of the metric"),
SD_VARLINK_DEFINE_OUTPUT(value, SD_VARLINK_INT, 0),
SD_VARLINK_FIELD_COMMENT("fields"),
SD_VARLINK_DEFINE_OUTPUT(fields, SD_VARLINK_STRUCT, SD_VARLINK_NULLABLE));

This comment was marked as outdated.

This comment was marked as outdated.

@YapingLi04 YapingLi04 force-pushed the metrics branch 3 times, most recently from 4ff630e to 06c0758 Compare July 16, 2025 18:30
if (r < 0)
return r;

(void) mkdir_label("/run/systemd/metrics", 0755);

This comment was marked as outdated.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I searched code base for (void) mkdir_label(. No error checking for all the use cases.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added error checking

Copy link

@ikruglov ikruglov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more nits

@YapingLi04 YapingLi04 force-pushed the metrics branch 5 times, most recently from 17530e2 to dc61d3c Compare July 22, 2025 20:48
@YapingLi04 YapingLi04 force-pushed the metrics branch 2 times, most recently from 885e53b to 7656c5c Compare July 25, 2025 08:31
@YapingLi04 YapingLi04 force-pushed the metrics branch 4 times, most recently from a89dec1 to 288c5e2 Compare July 28, 2025 10:11
@YapingLi04 YapingLi04 force-pushed the metrics branch 2 times, most recently from 0e2d745 to 091607d Compare July 30, 2025 13:46
@YapingLi04 YapingLi04 force-pushed the metrics branch 2 times, most recently from e3ac1ef to 2a46191 Compare September 16, 2025 14:49
Comment on lines 118 to 123
#define JSON_BUILD_METRIC_FIELD5(k1, v1, k2, v2, k3, v3, k4, v4, k5, v5) \
JSON_BUILD_PAIR_STRING_NON_EMPTY(k1, v1), \
JSON_BUILD_PAIR_STRING_NON_EMPTY(k2, v2), \
JSON_BUILD_PAIR_STRING_NON_EMPTY(k3, v3), \
JSON_BUILD_PAIR_STRING_NON_EMPTY(k4, v4), \
JSON_BUILD_PAIR_STRING_NON_EMPTY(k5, v5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a variadic macro #define(, ...). There are plenty of examples in the code base.
for instance,

src/udev/udev-rules.c:409:#define log_line_debug_errno(line, error, ...)   log_line_full_errno(line, LOG_DEBUG, error, __VA_ARGS__)

@YapingLi04 YapingLi04 force-pushed the metrics branch 7 times, most recently from 8676b1c to 265d614 Compare September 18, 2025 14:06
@YapingLi04 YapingLi04 force-pushed the metrics branch 2 times, most recently from c402760 to 3db1702 Compare September 21, 2025 14:58
Comment on lines 152 to 153
if (r < 0)
return r;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation + brackets

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed indentation.
are brackets needed here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you fix two-different metrics in one metric family this question will go away automatically. Same for unit_states_total_build_json() btw

assert(manager);

UnitType *t, *previous_type = NULL;
UnitType type;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't use this variable outside of the loop. Please embed its definion:

UnitType type = (UnitType) i;

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After embedding, got this error

../src/core/varlink-metrics.c:93:16: warning: dangling pointer ‘previous_type’ to ‘type’ may be used [-Wdangling-pointer=]
   93 |         return unit_types_total_build_json_one(link, manager, previous_type, more);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

if (r < 0)
return r;
} else
return sd_varlink_error(link, VARLINK_ERROR_METRICS_NO_SUCH_METRIC, NULL);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this if condition even possible? I don't see a case where previous_type would be NULL.
Also, I don't think we should ever return VARLINK_ERROR_METRICS_NO_SUCH_METRIC from a callback.

vl_method_list() can return it though as result of the filtering (which is not implemented yet)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if _UNIT_TYPE_MAX is somehow 0, then previous_type can be NULL. However, _UNIT_TYPE_MAX is defined to be greater than 0.

dropped the condition.

return 0;
}

static int unit_states_total_build_json(sd_varlink *link, void *userdata, bool more) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, a metric family should not return metrics with names which do no match the metric family.

Copy link
Owner Author

@YapingLi04 YapingLi04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix the comment about separating METRIC_IO_SYSTEMD_MANAGER_UNIT_ACTIVE_STATE and METRIC_IO_SYSTEMD_MANAGER_UNIT_LOAD_STATE into different metrics family in next version.

Fixed everything else.

SD_VARLINK_FIELD_COMMENT("Metric name"),
SD_VARLINK_DEFINE_FIELD(name, SD_VARLINK_STRING, 0),
SD_VARLINK_FIELD_COMMENT("Metric description"),
SD_VARLINK_DEFINE_FIELD(value, SD_VARLINK_STRING, 0),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to description

SD_VARLINK_FIELD_COMMENT("Metric description"),
SD_VARLINK_DEFINE_FIELD(value, SD_VARLINK_STRING, 0),
SD_VARLINK_FIELD_COMMENT("Metric type"),
SD_VARLINK_DEFINE_FIELD(type, SD_VARLINK_STRING, 0));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the reference code. defined an enum type MetricType.

Comment on lines 41 to 42
SD_VARLINK_SUPPORTS_MORE,
SD_VARLINK_DEFINE_OUTPUT_BY_TYPE(metrics, Metric, 0));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. dropped Metric type.

static SD_VARLINK_DEFINE_METHOD_FULL(
Describe,
SD_VARLINK_SUPPORTS_MORE,
SD_VARLINK_DEFINE_OUTPUT_BY_TYPE(metadata, MetricMetadata, 0));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped MetricMetadata type.

Comment on lines 152 to 153
if (r < 0)
return r;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed indentation.
are brackets needed here?

&v,
METRIC_IO_SYSTEMD_MANAGER_UNIT_LOAD_STATE,
unit->id,
"load_state", unit_load_state_to_string(unit->load_state),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix METRIC_IO_SYSTEMD_MANAGER_UNIT_ACTIVE_STATE in next upload.

}

if (previous) {
r = previous->cb(link, userdata, false);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


r = METRIC_JSON_BUILD_STRING(
&v,
METRIC_IO_SYSTEMD_MANAGER_UNIT_ACTIVE_STATE,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix this in next version

@YapingLi04 YapingLi04 force-pushed the metrics branch 2 times, most recently from d81bbea to e615614 Compare September 23, 2025 05:41
YapingLi04 pushed a commit that referenced this pull request Oct 21, 2025
This test occasionally fails due to a race where systemd processes
kernel's SIGKILL before the OOM notification, so the test service dies
with Result=signal instead of the expected Result=oom-kill:

[   51.008765] TEST-55-OOMD.sh[906]: + systemd-run --wait --unit oom-kill -p OOMPolicy=kill -p Delegate=yes -p DelegateSubgroup=init.scope /tmp/script.sh
[   51.048747] TEST-55-OOMD.sh[907]: Running as unit: oom-kill.service; invocation ID: 456645347d554ea2878463404b181bd8
[   51.066296] sysrq: Manual OOM execution
[   51.066596] kworker/1:0 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=-1, oom_score_adj=0
[   51.066915] CPU: 1 UID: 0 PID: 27 Comm: kworker/1:0 Not tainted 6.17.1-arch1-1 #1 PREEMPT(full)  d2b229857b2eb4001337041f41d3c4f131433540
[   51.066919] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.17.0-2-2 04/01/2014
[   51.066921] Workqueue: events moom_callback
[   51.066928] Call Trace:
[   51.066931]  <TASK>
[   51.066936]  dump_stack_lvl+0x5d/0x80
[   51.066942]  dump_header+0x43/0x1aa
<...snip...>
[   51.087814] 47583 pages reserved
[   51.087969] 0 pages cma reserved
[   51.088208] 0 pages hwpoisoned
[   51.088519] Out of memory: Killed process 908 (sleep) total-vm:3264kB, anon-rss:256kB, file-rss:1916kB, shmem-rss:0kB, UID:0 pgtables:44kB oom_score_adj:1000
[   51.090263] TEST-55-OOMD.sh[907]:           Finished with result: signal
[   51.094416] TEST-55-OOMD.sh[907]: Main processes terminated with: code=killed, status=9/KILL
[   51.094898] TEST-55-OOMD.sh[907]:                Service runtime: 58ms
[   51.095436] TEST-55-OOMD.sh[907]:              CPU time consumed: 22ms
[   51.095854] TEST-55-OOMD.sh[907]:                    Memory peak: 1.6M (swap: 0B)
[   51.096722] TEST-55-OOMD.sh[912]: ++ systemctl show oom-kill -P Result
[   51.106549] TEST-55-OOMD.sh[879]: + assert_eq signal oom-kill
[   51.107394] TEST-55-OOMD.sh[913]: + set +ex
[   51.108256] TEST-55-OOMD.sh[913]: FAIL: expected: 'oom-kill' actual: 'signal'
[FAILED] Failed to start TEST-55-OOMD.service.

To mitigate this, let's spawn a child process and move it to the
subcgroup to get killed instead of the main process, so systemd has more
time to react to the OOM notification and terminate the service with the
expected oom-kill result.
YapingLi04 pushed a commit that referenced this pull request Jan 13, 2026
Otherwise, if the VM is unexpectedly rebooted, then `importctl --user pull-tar`
may fail as the file may already exist.
```
[  123.351751] TEST-13-NSPAWN.sh[3946]: + run0 -u testuser importctl --user pull-tar file:///var/tmp/image-tar/kurps.tar.gz nurps --verify=checksum -m
[  123.541603] TEST-13-NSPAWN.sh[4311]: Enqueued transfer job 3. Press C-c to continue download in background.
[  123.552456] TEST-13-NSPAWN.sh[4311]: Pulling 'file:///var/tmp/image-tar/kurps.tar.gz', saving as 'nurps'.
[  123.552788] TEST-13-NSPAWN.sh[4311]: Operating on image directory '/home/testuser/.local/state/machines'.
[  123.819942] TEST-13-NSPAWN.sh[4311]: Got 1% of file:///var/tmp/image-tar/kurps.tar.gz.
[  124.156557] TEST-13-NSPAWN.sh[4311]: * shutting down connection #0
[  124.156896] TEST-13-NSPAWN.sh[4311]: * Could not open file /var/tmp/image-tar/kurps.tar.gz.sha256
[  124.157223] TEST-13-NSPAWN.sh[4311]: * closing connection #-1
[  124.159198] TEST-13-NSPAWN.sh[4311]: * Could not open file /var/tmp/image-tar/kurps.nspawn
[  124.159493] TEST-13-NSPAWN.sh[4311]: * closing connection #-1
[  124.159818] TEST-13-NSPAWN.sh[4311]: Acquired 68.5M.
[  124.160395] TEST-13-NSPAWN.sh[4311]: Download of file:///var/tmp/image-tar/kurps.tar.gz complete.
[  124.160664] TEST-13-NSPAWN.sh[4311]: Transfer failed: Could not read a file:// file
[  124.160923] TEST-13-NSPAWN.sh[4311]: Settings file could not be retrieved, proceeding without.
[  124.404733] TEST-13-NSPAWN.sh[4311]: * shutting down connection #1
[  124.405162] TEST-13-NSPAWN.sh[4311]: Acquired 79B.
[  124.406170] TEST-13-NSPAWN.sh[4311]: Download of file:///var/tmp/image-tar/SHA256SUMS complete.
[  124.406734] TEST-13-NSPAWN.sh[4311]: SHA256 checksum of file:///var/tmp/image-tar/kurps.tar.gz is valid.
[  124.455446] TEST-13-NSPAWN.sh[4311]: Failed to rename to final image name to /home/testuser/.local/state/machines/.tar-file:\x2f\x2f\x2fvar\x2ftmp\x2fimage-tar\x2fkurps\x2etar\x2egz: File exists
[  124.457251] TEST-13-NSPAWN.sh[4311]: Exiting.
```
Workaround for issue systemd#38240.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants