Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Linux process cgroup attribute #1364

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

rogercoll
Copy link
Contributor

Fixes # #1357

Changes

Please provide a brief description of the changes here.

Note: if the PR is touching an area that is not listed in the existing areas, or the area does not have sufficient domain experts coverage, the PR might be tagged as experts needed and move slowly until experts are identified.

Merge requirement checklist

@rogercoll rogercoll requested review from a team August 23, 2024 18:39
type: attribute_group
brief: "Describes Linux Process attributes"
attributes:
- id: linux.process.cgroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When thinking about this attribute, two questions come to my mind:

  1. Is the idea to connect this attribute in some way with container.id?
  2. How should cgroupv1 vs cgroupv2 represented with this attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to connect this attribute in some way with container.id?

This would be an additional feature that could be build over this attribute, but I would say it should not be strictly connected as cgroups can be used outside containerization environments too (e.g systemd).

But I agree that this attribute would be very useful to extract container and k8s attributes without additional resource detection. This is a collector's transformation example build by @ChrsMark to extract them:

transform/cgroup:
  error_mode: ignore
  metric_statements:
  - context: metric
    conditions:
    - resource.attributes["process.cgroup"] != nil
    statements:
      - merge_maps(cache,ExtractPatterns(resource.attributes["process.cgroup"],"/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/kubelet-kubepods-besteffort-pod(?P<pod_uid>.*).slice/cri-containerd-(?P<container_id>.*).scope$"), "upsert")

How should cgroupv1 vs cgroupv2 represented with this attribute?

The initial idea is to just provide the output of whatever is in /proc/PID/cgroup (does not differentiate between v1 and v2), as the hostmetrics receiver is doing at the moment: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/f2cd3587a6d6d76e0f3515295c6bde3b38ac3eb2/receiver/hostmetricsreceiver/internal/scraper/processscraper/process.go#L125
Additional processing over this attribute should be perform to extract specific values. Do you think it would be interesting unwraping the /proc/PID/cgroup file into more fine-grained attributes (e.g linux.process.cgroup.memory.slice.name )?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be interesting unwraping the /proc/PID/cgroup file into more fine-grained attributes?

The differences between v1 and v2 are significant.
Here is a v1 example for /proc/<PID>/cgroup:

11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
10:freezer:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
9:cpuset:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
8:devices:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
7:blkio:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
6:perf_event:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
5:net_cls,net_prio:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
4:memory:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
3:hugetlb:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
2:cpu,cpuacct:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
1:name=systemd:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e

While in this example, all parts, pids, blkio & others, are within the same scope, this assumption is not guaranteed every time.

Compared with a v2 example for /proc/<PID>/cgroup:

0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope

With the linked processor the following will be returned:

cgroup result from processor
v1 11:pids:/docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
v2 0::/system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the linked processor provide all the file's content? I think it just trims the latest new line.

The differences between v1 and v2 are significant.

Maybe it would not be very useful without further processing, but the linux.process.cgroup would not make any differentiation, it would just provide the corresponding cgroup's file content. To extend the cgroup attributes we could try to define some additional common fields between versions:

  • linux.process.cgroup.path:
    • v1: /docker/ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
    • v2: /system.slice/docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope
  • linux.process.cgroup.scope:
    • v1: ffdd6f676b96f53ce556815731ca2a89d23c800f37d29976155d8c68e384337e
    • v2: docker-8ae5d36793164a2374bd9b4ceb81c6ca57a9152bdc69eafa9ce7919d22efff0d.scope
  • linux.process.cgroup.version: enum -> v1, v2
  • linux.process.cgroup.slice: Most inner subslice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this topic in today's @open-telemetry/semconv-system-approvers SIG, and we agreed that this attribute is quite generic and might not give additional detail of which cgroup version is being used. Nonetheless, we still think it might be useful to provide the raw content of the cgroup file using this attribute. A similar example in the semantic convention repository, are the process.cmd_line or the os.description attributes (they are set to the content of /proc/<PID/cmdline and /etc/os-release without further processing).

The idea would be to mark it as opt-in and continue adding more fine-grained cgroup related attributes (e.g linux.process.cgroup.version or linux.process.cgroup.scope). What do you think about the approach @florianl ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the linked processor provide all the file's content? I think it just trims the latest new line.

Ah - you are right! I mixed TrimSuffix with Split 🙈

[..] we agreed that this attribute is quite generic and might not give additional detail [..]

My concern is, that keeping the attribute as generic as possible makes it harder to implement and process on a receiving side. Consequently, more fine grade cgroup related attributes should be discussed and proposed first, before a generic one is introduced. As back- and forward compatibility is important to SemConv, I think, just introducing generic attributes might lead to conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As back- and forward compatibility is important to SemConv, I think, just introducing generic attributes might lead to conflicts.

I think the metrics covered by system semantic conventions are a bit of a special case in semconv. This cgroup attribute for example is a direct reflection of cat /proc/PID/cgroup, and there are numerous other examples of us providing metrics/attributes that directly map to what procfs/the kernel/system APIs would provide. We want to make sure that is covered in semconv, as users often want metrics/attributes that are exact matches to what they would consider looking at when manually investigating a system. So I think this attribute and other specific attributes like mentioned above can coexist, and we can even say we recommend the specific attributes but still ensure we have guidance in place for people who want to instrument a more direct mapping of what the system provides.

(I don't have any direct arguments about the usefulness of this specific attribute, I don't have direct experience using it, but it does match with the general way we try and define system semconv. It is also what currently exists in hostmetricsreceiver in the Collector, and there are users of it already)

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added Stale and removed Stale labels Sep 15, 2024
model/registry/linux.yaml Outdated Show resolved Hide resolved
@@ -16,3 +16,15 @@ groups:
stability: experimental
brief: "The Linux Slab memory state"
examples: ["reclaimable", "unreclaimable"]
# linux.process.* attribute group
- id: registry.linux.process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems windows has a similar concept of Job objects, see https://thomasvanlaere.com/posts/2021/06/exploring-windows-containers/ for some context.

The suggestion here is to add cross-platform attribute in process namespace. I don't mind it being called process.cgroup, process.group.name or something else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we think this attribute can be cross-platform, I am also in favor of removing the linux keyword. As explained in https://github.com/open-telemetry/semantic-conventions/issues/1403 , and as a general concept, process.cgroup would fit in Bucket 1.

Wdyt @open-telemetry/semconv-system-approvers ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #1403 (comment) (2024-10-10 System semantic conventions WG meeting), this attribute denotes very specialized information that maps directly to the underlying operating system; bucket 2 (OS namespace)

Copy link

github-actions bot commented Oct 8, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 8, 2024
@rogercoll
Copy link
Contributor Author

@open-telemetry/semconv-system-approvers Could you take a look at this PR when you have a moment? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

6 participants