-
Notifications
You must be signed in to change notification settings - Fork 115
Network policy incorrectly applied due to mismatch between pod selector and agent pod labels #1490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One more note: not only that the pod selectors used to attach the network policies are not correct, the pod selectors used in - from:
- podSelector:
matchLabels:
app.kubernetes.io/instance: agent # no agent pod exists with this label
app.kubernetes.io/part-of: datadog-datadog |
@bogatuadrian Thank you for the detailed report 🙇 Indeed we're using different labels for the network policy selectors than we use for the agent pods. We will work on a fix for this |
Thanks for the reply, @khewonc. I want to mention one more issue that I found after more investigation. While trying to work around the label issue by deploying our custom |
@bogatuadrian Thanks for letting us know. I'll add a card in our backlog to add network policies to the admission controller feature |
Closing since this was completed in #1515 |
Context
We're using the Datadog Operator to deploy a Datadog agent with Kubernetes network policies enabled, but the operator doesn't use the correct pod selectors when creating the
NetworkPolicy
resources to target the deployed agent and cluster agent pods.Problem
This might lead to either a false sense of security if your network plugin allows traffic by default, or Datadog simply not working if your network plugin disallows traffic by default. We fall in the second category, having multiple network issues in both the agent and the cluster agent because traffic is denied by default and the network policies set up by the operator do not seem to work. For us, this leads to losing observability data.
Setup
We configure our Datadog agent something like this:
This leads to the creation of agents and cluster agents with the following spec (some labels omitted for brevity):
However the network policies created by the operator use the following pod selectors:
Notice how the pods use
datadog-agent
and the policies useagent
as value for theapp.kubernetes.io/instance
label.Code breadcrumbs
I took a look at the code and I might have found where the discrepancy might occur.
On one hand you have the Network Policy being created with the hard-coded
agent
value indatadog-operator/internal/controller/datadogagent/component/objects/network.go
Line 140 in 926370e
datadog-operator/api/datadoghq/common/const.go
Line 28 in 010c848
On the other hand, the
app.kubernetes.io/instance
label seems to be set heredatadog-operator/internal/controller/datadogagent/object/labels.go
Line 24 in 010c848
with the value from here
datadog-operator/internal/controller/datadogagent/component/agent/default.go
Lines 90 to 93 in 010c848
It looks like in our case the value would be
datadog-agent
, while the network policy has the hardcodedagent
, hence the mismatch in pod selector, leading to the network policies not targeting the correct pods.Potential solution
I'm guessing this discrepancy appeared when the Datadog Operator added support for running multiple DD agents at the same time, and probably migrated to using
<agent-name>-agent
,<agent-name>-cluster-agent
etc. as theapp.kubernetes.io/instance
value, while not updating the network policy accordingly.The solution would be to correctly set the pod selectors for the created network policies, for each Datadog agent created by the operator.
Environment
We are using the latest versions for both the operator and the agent, available at the time of writing.
Operator helm chart version:
2.1.0
Operator version:
1.9.0
Agent version:
7.58.1
Kubernetes Distribution: EKS
The text was updated successfully, but these errors were encountered: