[processor/k8sattributes] Implement `container.image.id` for k8sattributes processor #32314

evantorrie · 2024-04-11T07:26:23Z

Description:

Implements #32152

Adds container.image.id to list of Kubernetes attributes detected by k8sattributes processor.
Populates this attribute with the value returned from k8s pod status ImageID field.

Link to tracking Issue: #32152

Testing:

Adds unit tests for extracting the k8s ImageID field from container status.
Adds unit tests for including container.image.id in rules for extraction.
Includes container.image.id in the e2e tests for k8sattributes processor.

Documentation: Added container.image.id as an example of attributes that can be extracted from pod/container

This is to be consistent with what k8s does in practice.

processor/k8sattributesprocessor/config.go

processor/k8sattributesprocessor/metadata.yaml

Should address uniformity of versions within k8sattributesprocessor and any semconv version ugprade to a separate PR.

ChrsMark

I left a comment in order to ensure that we are aligned here with the SemConv spec.

Originally, when we dealt with those container.image.* attributes it was a bit of pain :) so now we hit that again from a different angle though. However I believe (and hope :)) that SemConv is quite accurate on that part so we can rely on this to clarify any remaining questions.

cc-ing @lmolkova and @marcsanmi who were involved in these conversations back then.

ref:

ChrsMark · 2024-04-12T13:22:16Z

processor/k8sattributesprocessor/metadata.yaml

@@ -94,6 +94,10 @@ resource_attributes:
    description: Container ID. Usually a UUID, as for example used to identify Docker containers. The UUID might be abbreviated. Requires k8s.container.restart_count.
    type: string
    enabled: false
+  container.image.id:
+    description: Container image digest. Requires container.id or k8s.container.name.


Can we align this description with the SemConv spec https://github.com/open-telemetry/semantic-conventions/blob/768a23c3627e4bd96146d9f713db214bee7aecf3/model/registry/container.yaml#L47?

ChrsMark · 2024-04-12T13:22:39Z

processor/k8sattributesprocessor/processor_test.go

+			wantAttrs: map[string]string{
+				kube.K8sIPLabelName:                   "1.1.1.1",
+				conventions.AttributeK8SContainerName: "app",
+				containerImageID:                      "test/app@sha256:deadbeef02",


I think that based on the SemConv spec we have agreed on at https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/container.yaml, the container.image.id should only be the sha256:deadbeef02 part, which is the container ID that the runtime gives.

If you want to use the full repository digest then you would need to use the container.image.repo_digests attribute.

The CRI protocol definition distinguish these 2 at https://github.com/kubernetes/cri-api/blob/c75ef5b473bbe2d0a4fc92f82235efd665ea8e9f/pkg/apis/runtime/v1/api.proto#L1237-L1238 but as far as I can see on k8s we can see both:

$ kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .status.containerStatuses[*]}{.imageID}{", "}{end}{end}' |\ sort coredns-5d78c9869d-29w9c: sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc, coredns-5d78c9869d-xkcll: sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc, etcd-kind-control-plane: sha256:86b6af7dd652c1b38118be1c338e9354b33469e69a218f7e290a0ca5304ad681, kindnet-885s7: sha256:b0b1fa0f58c6e932b7f20bf208b2841317a1e8c88cc51b18358310bbd8ec95da, kube-apiserver-kind-control-plane: docker.io/library/import-2023-06-15@sha256:0202953c0b15043ca535e81d97f7062240ae66ea044b24378370d6e577782762, kube-controller-manager-kind-control-plane: docker.io/library/import-2023-06-15@sha256:bdbeb95d8a0820cbc385e44f75ed25799ac8961e952ded26aa2a09b3377dfee7, kube-proxy-gr6w4: docker.io/library/import-2023-06-15@sha256:ce2145a147b3f1fc440ba15eaa91b879ba9cbf929c8dd8f3190868f4373f2183, kube-scheduler-kind-control-plane: docker.io/library/import-2023-06-15@sha256:9d6f903c0d4bf3b145c7bbc68727251ca1abf98aed7f8d2acb9f6a10ac81e8c2,

It seems that Kubernetes will either use the repoDigests if exists otherwise it will use the id to populate the .status.containerStatuses[*]}{.imageID}.

On the other hand, from inside a k8s node we can interact directly with the CRI api and inspect the container images directly and verify the differences:

$ crictl inspecti sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62 { "status": { "id": "sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62", "repoTags": [ "registry.k8s.io/metrics-server/metrics-server:v0.7.1" ], "repoDigests": [ "registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a" ], "size": "19478031", ... }

So I think it's more accurate if we either populate both container.image.id and container.image.repo_digests conditionally, or we would need to introduce k8s.container.image.id which would map exactly whatever the k8s API returns.

Originally when I first coded this, I did strip off any prefix and just leave the sha256:<digest> portion, and I think this is what most people do want since they can get the image name by using container.image.name.

But I read the container.image.id "notes" in the semantic conventions which I interpreted to indicate it's acceptable that the value may vary based on the runtime, and it made the code simpler to just propagate the value from the kube api's PodStatus.ContainerStatuses[].ImageID.

K8s defines a link to the container registry repository with digest "imageID": "registry.azurecr.io /namespace/service/dockerfile@sha256:bdeabd40c3a8a492eaf9e8e44d0ebbb84bac7ee25ac0cf8a7159d25f62555625".
The ID is assigned by the container runtime and can vary in different environments. Consider using oci.manifest.digest if it is important to identify the same image in different environments/runtimes.

I wasn't involved in any of the semantic convention discussions when these attributes were included though, so I'm happy to try and implement whichever way the semantic convention team thinks is best in this case.

K8s defines a link to the container registry repository with digest "imageID": "registry.azurecr.io /namespace/service/dockerfile@sha256:bdeabd40c3a8a492eaf9e8e44d0ebbb84bac7ee25ac0cf8a7159d25f62555625".

I think this is not 100% accurate since it's not always the case. It seems that when the repoDigests of the container image is empty then k8s will use the id. So we might need to "fix" the wording there in SemConv.

If we trim the full repoDigest to only keep the sha256:<digest> part then the remaining is nothing meaningful and definitely not the image's id that the runtime provides.

For example I see the following:

$ kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .status.containerStatuses[*]}{.imageID}{", "}{end}{end}' |\ sort metrics-server-6b4d6cb49f-pfzzs: registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a

This means that the imageID is the repoDigest.

Indeed:

$ crictl inspecti -q registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a { "status": { "id": "sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62", "repoTags": [ "registry.k8s.io/metrics-server/metrics-server:v0.7.1" ], "repoDigests": [ "registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a" ], "size": "19478031", "uid": { "value": "65534" }, "username": "", "spec": null, "pinned": false } }

So if I trim the registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a:

$ crictl inspecti -q sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a FATA[0000] no such image "sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a" present

But if I use the actual image id:

$ crictl inspecti -q sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62 { "status": { "id": "sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62", "repoTags": [ "registry.k8s.io/metrics-server/metrics-server:v0.7.1" ], "repoDigests": [ "registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a" ], "size": "19478031", "uid": { "value": "65534" }, "username": "", "spec": null, "pinned": false } }

This tells me that Kubernetes either uses the image.id or the image.repoDigests[0] of the runtime. I haven't checked its implementation though to verify this, maybe it worth checking that.
I believe we should be consistent with the runtime and populate the container.image.repo_digests if the imageID is a repo digest or use the container.image.id if the imageID is not repo digest.

Otherwise we need to use a k8s specific attribute called k8s.container.image.id for the imageID because container.image.* is tied to the container runtime and we should preserve this 🤔 ?

github-actions · 2024-04-30T05:19:26Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-05-24T05:19:53Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2024-06-07T05:20:12Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

evantorrie · 2024-07-04T00:02:47Z

I want to reopen this as this is still something we'd like to be able to obtain from the k8sattributes processor in the OpenTelemetry Collector. What we want to do is to be able to uniquely identify the images (that in our case, will always be sourced from an OCI or docker registry) that make up a telemetry emitting pod - i.e. we can't rely just on the image tags, since tags are mutable.

Based on what @ChrsMark said:

This tells me that Kubernetes either uses the image.id or the image.repoDigests[0] of the runtime. I haven't checked its implementation though to verify this, maybe it worth checking that.

I have looked into the k8s code, CRI spec and a couple of container runtimes and believe the following description is accurate for k8s 1.29, although it would be good to have this confirmed by someone more familiar with the k8s API.

The image and imageID fields in the k8s API response from kubectl get pod -o jsonpath '.status.containerStatuses[*].imageID' is sourced by calling the CRI rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse).

The CRI ContainerStatusResponse proto includes two fields image (of type ImageSpec) and image_ref (of type string) which are ultimately used to populate the k8s API image and imageID fields.
a. image = image.Image
b. imageID <= image_ref.

Note: In k8s-1.30 and later, there is an additional image_id field (alongside these image and image_ref fields), but by default it's initialized just to the same as "image_ref".

For containerd's CRI implementation, it appears to be somewhat confused (or at least working based on existing practice from dockershim) about how fields should be populated. See this comment:

// TODO(random-liu): Clean up the following logic in CRI.
// Current assumption:
// * ImageSpec in container config is image ID.
// * ImageSpec in container status is image tag.
// * ImageRef in container status is repo digest.

If we look further down in that same function, it attempts to parse the various references using ParseAnyReference from github.com/distribution/reference in containerd's image store, partitioning the references into "digests" (must be of type reference.Canonical) and "tags" (must be of type reference.Tagged). From what I can ascertain, reference.Canonical will be used if the reference contains some sort of digest and a "named repository", while reference.Tagged will be used if there is no digest but there is a named repository. It will always populate the CRI status response "image.Image" field with the first "tag" it finds from tags, and the "image_ref" field with the first "digest" it finds from digests.

The issue is that if there is no appropriate reference.Canonical type returned, then ImageRef will retain its initially set value = container.ImageRef which is sourced from containerd's own internal Container.Metadata.ImageRef set when CreateContainer() is called, and may be just an sha256 digest used by containerd's local image store without any corresponding repo name.

So yes, this explains why @ChrsMark saw the following:

$ kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .status.containerStatuses[*]}{.imageID}{", "}{end}{end}' |\
sort

coredns-5d78c9869d-29w9c:	sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc, 
coredns-5d78c9869d-xkcll:	sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc,

For the CRI-O runtime (prior to 1.30), the ContainerStatus image and image_ref fields are populated here. image must be a RegistryImageReference in the CRI-O world, while image_ref is initialized to the someRepoDigest parameter passed to NewContainer(), and which in the function docs says:

// someRepoDigest, if set, is some repo@some digest of the image imageID; it
// may have NO RELATIONSHIP to the users’ requested image name (and, which
// should be fixed eventually, may be a repo@digest combination which has never
// existed on a registry).

As @ChrsMark said:

I believe we should be consistent with the runtime and populate the container.image.repo_digests if the imageID is a repo digest or use the container.image.id if the imageID is not repo digest.

I think one way to proceed is to also use the github.com/distribution/reference package to attempt to parse the output of .imageID from the k8s API response. If it is a canonical reference, then populate the semantic attribute container.image.repo_digests with those values (of which it seems there will only be one from the existing CRI implementations (?)). If it is not, then populate container.image.id.

ChrsMark · 2024-07-04T12:54:33Z

Thank's @evantorrie for looking into this! Should we first clarify and fix anything missing (i.e. the inconsistency mentioned at #32314 (comment)) from the Sem Conv side? Then if the implementation is aligned with Sem Conv we should be fine.

evantorrie · 2024-07-11T01:38:17Z

@ChrsMark After thinking about this more (and also writing an implementation), I think I'm going to close this PR and start a new one, leaving this one here for reference.

Secondly, yes, I do think it would be good to clarify the comment around container.image.id in the context of Kubernetes. As noted above, the current wording for that semantic attribute includes this phrasing:

K8s defines a link to the container registry repository with digest "imageID": "registry.azurecr.io /namespace/service/dockerfile@sha256:bdeabd40c3a8a492eaf9e8e44d0ebbb84bac7ee25ac0cf8a7159d25f62555625".
The ID is assigned by the container runtime and can vary in different environments.

As evidenced by the exploration above, the containerd implementation of the k8s CRI interface populates this imageID field with the first repo digest (note: it only ever returns "one" RepoDigest) it can find in its local image registry store.

If it can't find a repo digest entry (e.g. the image was loaded directly into the image store without being pulled from a remote registry), then the imageID field populated in the k8s response will just be the containerd's local Container.Metadata.ImageRef.

My plan is to create a new feature request to populate container.image.repo_digests - ignoring container.image.id completely. For me, I can't see that much value incontainer.image.id (especially as it currently exists in k8s).

ChrsMark · 2024-07-11T07:00:02Z

Makes sense! Thank's @evantorrie for looking into this thoroughly!

I filed open-telemetry/semantic-conventions#1236 to clarify the conventions part.

evantorrie added 7 commits April 10, 2024 23:25

Use t.Run() to give correct name to subtest

2c8a403

Add container.image.id to metadata.yaml

e19d604

After running mdatagen metadata.yaml

517d7e3

Augment data structures with ImageID and add tests

e08ed65

Implement extraction of ImageID from container status

79b570b

Add container.image.id to k8sattributes README

ca3b304

Add container.image.id to e2e tests

d296ae1

github-actions bot added the processor/k8sattributes k8s Attributes processor label Apr 11, 2024

github-actions bot requested review from dmitryax, fatsheep9146, rmfitzpatrick and TylerHelmuth April 11, 2024 07:26

evantorrie added 6 commits April 11, 2024 00:31

Adds changelog entry

0381bfe

WIP: e2e_tests

808fa35

E2E_ClusterRBAC tests for container.image.id

5776119

Add MixedRBAC and NamespaceRBAC e2e tests for container.image.id

41c6104

Make tests use non-runtime prefixed image digest

fc75793

This is to be consistent with what k8s does in practice.

Standardize changelog entry

84001e4

This comment was marked as outdated.

Sign in to view

evantorrie marked this pull request as ready for review April 11, 2024 09:38

evantorrie requested a review from a team April 11, 2024 09:38

github-actions bot assigned evan-bradley Apr 11, 2024

evantorrie changed the title ~~Implement container.image.id for k8sattributes processor~~ [processor/k8sattributes] Implement container.image.id for k8sattributes processor Apr 11, 2024

evantorrie and others added 2 commits April 12, 2024 03:03

Merge branch 'main' into imageid-attribute

fb7dc03

Merge branch 'main' into imageid-attribute

1500fb6

TylerHelmuth reviewed Apr 11, 2024

View reviewed changes

processor/k8sattributesprocessor/config.go Outdated Show resolved Hide resolved

processor/k8sattributesprocessor/metadata.yaml Outdated Show resolved Hide resolved

evantorrie added 2 commits April 11, 2024 23:44

Clarified description for container.image.id

46a0912

Revert to existing semconv version and use hardcoded constants

f08da7c

Should address uniformity of versions within k8sattributesprocessor and any semconv version ugprade to a separate PR.

evantorrie requested a review from TylerHelmuth April 12, 2024 07:17

ChrsMark reviewed Apr 12, 2024

View reviewed changes

github-actions bot added the Stale label Apr 30, 2024

fatsheep9146 removed the Stale label May 9, 2024

github-actions bot added the Stale label May 24, 2024

github-actions bot closed this Jun 7, 2024

evantorrie changed the title ~~[processor/k8sattributes] Implement container.image.id for k8sattributes processor~~ [processor/k8sattributes] Implement container.image.repo_digests for k8sattributes processor Jul 9, 2024

evantorrie changed the title ~~[processor/k8sattributes] Implement container.image.repo_digests for k8sattributes processor~~ [processor/k8sattributes] Implement container.image.id for k8sattributes processor Jul 11, 2024

evantorrie mentioned this pull request Jul 11, 2024

Enable retrieving container.image.repo_digests attribute in k8sattributes processor #34029

Closed

ChrsMark mentioned this pull request Jul 11, 2024

Clarify the brief of container.image.id open-telemetry/semantic-conventions#1236

Open

ChrsMark mentioned this pull request Jul 11, 2024

[processor/k8sattributes] Add container.image.repo_digests #34031

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor/k8sattributes] Implement `container.image.id` for k8sattributes processor #32314

[processor/k8sattributes] Implement `container.image.id` for k8sattributes processor #32314

evantorrie commented Apr 11, 2024 •

edited

Loading

This comment was marked as outdated.

ChrsMark left a comment

ChrsMark Apr 12, 2024

ChrsMark Apr 12, 2024

evantorrie Apr 12, 2024

ChrsMark Apr 15, 2024 •

edited

Loading

github-actions bot commented Apr 30, 2024

github-actions bot commented May 24, 2024

github-actions bot commented Jun 7, 2024

evantorrie commented Jul 4, 2024

ChrsMark commented Jul 4, 2024

evantorrie commented Jul 11, 2024

ChrsMark commented Jul 11, 2024

[processor/k8sattributes] Implement container.image.id for k8sattributes processor #32314

[processor/k8sattributes] Implement container.image.id for k8sattributes processor #32314

Conversation

evantorrie commented Apr 11, 2024 • edited Loading

This comment was marked as outdated.

ChrsMark left a comment

Choose a reason for hiding this comment

ChrsMark Apr 12, 2024

Choose a reason for hiding this comment

ChrsMark Apr 12, 2024

Choose a reason for hiding this comment

evantorrie Apr 12, 2024

Choose a reason for hiding this comment

ChrsMark Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Apr 30, 2024

github-actions bot commented May 24, 2024

github-actions bot commented Jun 7, 2024

evantorrie commented Jul 4, 2024

ChrsMark commented Jul 4, 2024

evantorrie commented Jul 11, 2024

ChrsMark commented Jul 11, 2024

[processor/k8sattributes] Implement `container.image.id` for k8sattributes processor #32314

[processor/k8sattributes] Implement `container.image.id` for k8sattributes processor #32314

evantorrie commented Apr 11, 2024 •

edited

Loading

ChrsMark Apr 15, 2024 •

edited

Loading