-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Linux process cgroup attribute #1364
base: main
Are you sure you want to change the base?
Conversation
model/registry/linux.yaml
Outdated
type: attribute_group | ||
brief: "Describes Linux Process attributes" | ||
attributes: | ||
- id: linux.process.cgroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When thinking about this attribute, two questions come to my mind:
- Is the idea to connect this attribute in some way with
container.id
? - How should cgroupv1 vs cgroupv2 represented with this attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea to connect this attribute in some way with container.id?
This would be an additional feature that could be build over this attribute, but I would say it should not be strictly connected as cgroups can be used outside containerization environments too (e.g systemd).
But I agree that this attribute would be very useful to extract container and k8s attributes without additional resource detection. This is a collector's transformation example build by @ChrsMark to extract them:
transform/cgroup:
error_mode: ignore
metric_statements:
- context: metric
conditions:
- resource.attributes["process.cgroup"] != nil
statements:
- merge_maps(cache,ExtractPatterns(resource.attributes["process.cgroup"],"/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/kubelet-kubepods-besteffort-pod(?P<pod_uid>.*).slice/cri-containerd-(?P<container_id>.*).scope$"), "upsert")
How should cgroupv1 vs cgroupv2 represented with this attribute?
The initial idea is to just provide the output of whatever is in /proc/PID/cgroup
(does not differentiate between v1 and v2), as the hostmetrics receiver is doing at the moment: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/f2cd3587a6d6d76e0f3515295c6bde3b38ac3eb2/receiver/hostmetricsreceiver/internal/scraper/processscraper/process.go#L125
Additional processing over this attribute should be perform to extract specific values. Do you think it would be interesting unwraping the /proc/PID/cgroup
file into more fine-grained attributes (e.g linux.process.cgroup.memory.slice.name
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would be interesting unwraping the /proc/PID/cgroup file into more fine-grained attributes?
The differences between v1 and v2 are significant.
Here is a v1 example for /proc/<PID>/cgroup
:
11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
10:freezer:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
9:cpuset:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
8:devices:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
7:blkio:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
6:perf_event:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
5:net_cls,net_prio:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
4:memory:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
3:hugetlb:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
2:cpu,cpuacct:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
1:name=systemd:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
While in this example, all parts, pids
, blkio
& others, are within the same scope, this assumption is not guaranteed every time.
Compared with a v2 example for /proc/<PID>/cgroup
:
0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope
With the linked processor the following will be returned:
cgroup | result from processor |
---|---|
v1 | 11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e |
v2 | 0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the linked processor provide all the file's content? I think it just trims the latest new line.
The differences between v1 and v2 are significant.
Maybe it would not be very useful without further processing, but the linux.process.cgroup
would not make any differentiation, it would just provide the corresponding cgroup's file content. To extend the cgroup attributes we could try to define some additional common fields between versions:
- linux.process.cgroup.path:
- v1:
/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
- v2:
/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope
- v1:
- linux.process.cgroup.scope:
- v1:
ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
- v2:
docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope
- v1:
- linux.process.cgroup.version: enum -> v1, v2
- linux.process.cgroup.slice: Most inner subslice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this topic in today's @open-telemetry/semconv-system-approvers SIG, and we agreed that this attribute is quite generic and might not give additional detail of which cgroup version is being used. Nonetheless, we still think it might be useful to provide the raw content of the cgroup file using this attribute. A similar example in the semantic convention repository, are the process.cmd_line
or the os.description
attributes (they are set to the content of /proc/<PID/cmdline
and /etc/os-release
without further processing).
The idea would be to mark it as opt-in and continue adding more fine-grained cgroup related attributes (e.g linux.process.cgroup.version
or linux.process.cgroup.scope
). What do you think about the approach @florianl ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the linked processor provide all the file's content? I think it just trims the latest new line.
Ah - you are right! I mixed TrimSuffix
with Split
🙈
[..] we agreed that this attribute is quite generic and might not give additional detail [..]
My concern is, that keeping the attribute as generic as possible makes it harder to implement and process on a receiving side. Consequently, more fine grade cgroup related attributes should be discussed and proposed first, before a generic one is introduced. As back- and forward compatibility is important to SemConv, I think, just introducing generic attributes might lead to conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As back- and forward compatibility is important to SemConv, I think, just introducing generic attributes might lead to conflicts.
I think the metrics covered by system semantic conventions are a bit of a special case in semconv. This cgroup
attribute for example is a direct reflection of cat /proc/PID/cgroup
, and there are numerous other examples of us providing metrics/attributes that directly map to what procfs/the kernel/system APIs would provide. We want to make sure that is covered in semconv, as users often want metrics/attributes that are exact matches to what they would consider looking at when manually investigating a system. So I think this attribute and other specific attributes like mentioned above can coexist, and we can even say we recommend the specific attributes but still ensure we have guidance in place for people who want to instrument a more direct mapping of what the system provides.
(I don't have any direct arguments about the usefulness of this specific attribute, I don't have direct experience using it, but it does match with the general way we try and define system semconv. It is also what currently exists in hostmetricsreceiver in the Collector, and there are users of it already)
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
model/registry/linux.yaml
Outdated
@@ -16,3 +16,15 @@ groups: | |||
stability: experimental | |||
brief: "The Linux Slab memory state" | |||
examples: ["reclaimable", "unreclaimable"] | |||
# linux.process.* attribute group | |||
- id: registry.linux.process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems windows has a similar concept of Job objects, see https://thomasvanlaere.com/posts/2021/06/exploring-windows-containers/ for some context.
The suggestion here is to add cross-platform attribute in process
namespace. I don't mind it being called process.cgroup
, process.group.name
or something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we think this attribute can be cross-platform, I am also in favor of removing the linux
keyword. As explained in https://github.com/open-telemetry/semantic-conventions/issues/1403 , and as a general concept, process.cgroup
would fit in Bucket 1.
Wdyt @open-telemetry/semconv-system-approvers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on #1403 (comment) (2024-10-10 System semantic conventions WG meeting), this attribute denotes very specialized information that maps directly to the underlying operating system; bucket 2 (OS namespace)
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
4607982
to
67c412f
Compare
4a0cc30
to
ec70436
Compare
@open-telemetry/semconv-system-approvers Could you take a look at this PR when you have a moment? Thanks! |
Fixes # #1357
Changes
Please provide a brief description of the changes here.
Note: if the PR is touching an area that is not listed in the existing areas, or the area does not have sufficient domain experts coverage, the PR might be tagged as experts needed and move slowly until experts are identified.
Merge requirement checklist
[chore]