-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adding k8 support for container id detection #1962
base: main
Are you sure you want to change the base?
feat: adding k8 support for container id detection #1962
Conversation
…' of https://github.com/abhee11/opentelemetry-js-contrib into enhancement/adding-k8-support-for-containerId-detection
@@ -51,6 +51,7 @@ | |||
"@opentelemetry/api": "^1.0.0" | |||
}, | |||
"dependencies": { | |||
"@kubernetes/client-node": "^0.20.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A concern here is that this is a large dependency, but mostly that it still uses the deprecated request
HTTP client library. This would bring in some CVEs (kubernetes-client/javascript#754 (comment)).
There is an effort (https://github.com/kubernetes-client/javascript/blob/master/FETCH_MIGRATION.md kubernetes-client/javascript#754) to move away from request
to fetch()
. That looks active. There was a recent release-candidate release with that work: '1.0.0-rc4': '2023-12-23T15:06:31.075Z'
. However, I haven't looked deeply and I'm not sure how soon that might come.
Even if that came, an issue with @kubernetes/[email protected]
might be that I believe it dropped compat with Node.js v14 in kubernetes-client/javascript@3a995fb
That commit unfortunately doesn't mention the Node.js version, nor is it tied to a PR with any discussion.
That same commit is dropping an AbortController shim for older Node.js version support. AbortController is avaialble starting in Node.js v14.17.0, so it is possible that @kubernetes/[email protected]
would work with node >=14.17.0
. They don't test with Node v14 though, so that is tenuous support.
I haven't read through this PR yet. Would it be possible to do whatever k8s client calls without using @kubernetes/client-node
? I don't know if it is a very complex API to call manually.
Another option would possibly be to wait for @kubernetes/[email protected]
and the OTel SDK 2.0 milestone (https://github.com/open-telemetry/opentelemetry-js/milestone/17), because that will drop support for Node.js v14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe - This api takes care of adding filepath required for certifcates etc to be present.
I think we can certainly manually do all the operations that this does. It may be bit more code . I can look into it.
I see the migration to fetch and partial compatibility for node 14 as well. I think officially node 14 is not supported now.
Let me take a look at the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the library itself.
It is officially maintained by Kubernetes SIG API Machinery.
I suggest we use this and create an issue there regarding node 14 and request --> fetch ( which is I believe is already in progress )
@trentm let me know what you think.
@@ -25,6 +25,9 @@ import * as fs from 'fs'; | |||
import * as util from 'util'; | |||
import { diag } from '@opentelemetry/api'; | |||
|
|||
const { KubeConfig, CoreV1Api } = require('@kubernetes/client-node'); | |||
require('dotenv').config(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this use of dotenv
your debugging code that should be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - makes sense -
describe('Detect containerId in k8 environment', () => { | ||
it('should return an empty Resource when not in a K8 environment', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could please change all uses of "k8" to "k8s" (upper and lower case). AFAIK "k8s" is the more common abbreviation -- at least "k8s" is used as a prefix in some OpenTelemetry specs / semantic conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely
@@ -100,6 +108,43 @@ export class ContainerDetector implements Detector { | |||
return containerIdStr || ''; | |||
} | |||
|
|||
private _isInKubernetesEnvironment(): boolean { | |||
return process.env.KUBERNETES_SERVICE_HOST !== undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know k8s well enough to have confidence in its standard envvars and its client API usage. I'd be happier if others with k8s experience could weigh in.
It might help to add some comments linking to official k8s docs justifying the envvars and APIs being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is purely based out of my research - let me find other PRs or articles supporting this.
Thanks for the review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trentm I have updated the PR as per comments and added relevant comments - please have a look. |
throw new Error('Container name not specified in environment'); | ||
} | ||
|
||
const response = await api.listNamespacePod(namespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this listing all pods in the namespace? What if there are multiple pods in the namespace with the same container name (as would happen with a k8s depoyment)? Will each detect a random container ID?
Listing all pods in a namespace is also much more expensive than getting a single pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @dashpole
I believe this statement is true where it will list all pods.
May be I can provide a condition where it will look for a specific pod defined as environment variable.
@abhee11 I had a few questions
Usually I've seen kind of annotation done with the collector's |
|
The actual k8s service account needs to have permission to call the k8s API. There's some info here. Or are you finding this isn't needed? |
@aabmass Sorry for delayed response. Yes I believe the the client library was handling certifications. Now the manual implementation with calling K8 api without any client lib - I am just asserting that the certificate file exists and is passed tot he https.request parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the manual implementation with calling K8 api without any client lib - I am just asserting that the certificate file exists and is passed tot he https.request parameter.
With the client library or calling the API directly, I believe the certificate is just providing authentication so the k8s API knows who is sending the request. However, the service account still has to have permission to make the API call.
I tried running a snippet of this code in a GKE cluster and it does fail with 403 error:
const { KubeConfig, CoreV1Api } = require('@kubernetes/client-node');
const kubeconfig = new KubeConfig();
kubeconfig.loadFromDefault();
const api = kubeconfig.makeApiClient(CoreV1Api);
api.listNamespacedPod("default")
pods is forbidden: User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" in the namespace "default"
. You can also check with kubectl can-i
:
kubectl auth can-i list pods --as=system:serviceaccount:default:default
outputs no
in my cluster with no additional config.
@@ -25,6 +25,8 @@ import * as fs from 'fs'; | |||
import * as util from 'util'; | |||
import { diag } from '@opentelemetry/api'; | |||
|
|||
const { KubeConfig, CoreV1Api } = require('@kubernetes/client-node'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to use require instead of import
?
throw new Error('Container name not specified in environment'); | ||
} | ||
|
||
const response = await api.listNamespacePod(namespace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a typo, should be api.listNamespacedPod
. I'm not sure why typescript isn't complaining
@aabmass I am pushing out a change without the client-node library I am not sure - if there is more I need to validate in terms of certificate. Here is my breakdown
Thanks again for the review . I should be able to push the changes today itself. Let me know what you think . |
We are doing similar for C# at open-telemetry/opentelemetry-dotnet-contrib#1699.
We also discussing to make it possible to provide container name / pod name on resource detector constructor making environment variable resolution as a fallback - the analogue here would be to read this values from |
@aabmass Can you please take a look at the implementation and see if it makes sense. I have addressed your comments and put a comment up for further clarification. |
@abhee11 please take another look at my comments. I understand this authenticates as the service account using a local certificate. There are no instructions here for authorizing said service account. The PR that @iskiselev linked for C# does have these instructions https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1699/files#diff-a77c04d30700216dec1c4dbb106e5c0d8490b01b1e9ca20d99fbcc174a4803b5 Reemphasizing my above concern with having every workload call the k8s API:
|
Which problem is this PR solving?
Short description of the changes