Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kindnet network plugin #17158

Merged
merged 2 commits into from
Jan 8, 2025
Merged

Add kindnet network plugin #17158

merged 2 commits into from
Jan 8, 2025

Conversation

aojea
Copy link
Member

@aojea aojea commented Dec 29, 2024

Kindnet has been running in Kubernetes CI for a while and there are some people that uses it, I've been adding new features like dns cache or kernel bypass or admin network policies that are not present in all the other common cnis.

Integration with kops will help to improve its testing and also benefits users that are looking for a more minimalistic CNI plugin

https://github.com/aojea/kindnet

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 29, 2024
@k8s-ci-robot k8s-ci-robot added area/addons cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/api area/documentation size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 29, 2024
@aojea
Copy link
Member Author

aojea commented Dec 29, 2024

/assign @justinsb @hakman

@aojea
Copy link
Member Author

aojea commented Dec 30, 2024

Ok, it is working now

 kops validate cluster --wait 10m
Using cluster from kubectl context: myclustername.kindnet.io

Validating cluster myclustername.kindnet.io

I1230 18:02:56.188347 3334459 gce_cloud.go:307] Scanning zones: [us-central1-c us-central1-a us-central1-f us-central1-b]
INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-central1-c     ControlPlane    e2-medium       1       1       us-central1
nodes-us-central1-c             Node            e2-medium       1       1       us-central1

NODE STATUS
NAME                                    ROLE            READY
control-plane-us-central1-c-bw8j        control-plane   True
nodes-us-central1-c-3rkx                node            True

Your cluster myclustername.kindnet.io is ready

@aojea aojea changed the title [WIP] add kindnet network plugin add kindnet network plugin Dec 31, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 31, 2024
@aojea
Copy link
Member Author

aojea commented Dec 31, 2024

failed job is unrelated

Kubernetes e2e suite: [It] [sig-storage] In-tree Volumes [Driver: hostPathSymlink] [Testpattern: Inline-volume (default fs)] subPath should support readOnly directory specified in the volumeMount expand_less

This is ready for review

@hakman hakman changed the title add kindnet network plugin Add kindnet network plugin Dec 31, 2024
Copy link
Member

@hakman hakman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few comments. Thanks @aojea!
I will send a PR for a pre-submit for kindnet.

cmd/kops/integration_test.go Outdated Show resolved Hide resolved
pkg/apis/kops/networking.go Show resolved Hide resolved
Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't used it in a while but we may want to put this behind a kops feature gate to make it clear that it is experimental and we may make breaking changes. WDYT @hakman ?

pkg/model/components/kindnet.go Outdated Show resolved Hide resolved
@aojea
Copy link
Member Author

aojea commented Dec 31, 2024

We haven't used it in a while but we may want to put this behind a kops feature gate to make it clear that it is experimental and we may make breaking changes. WDYT @hakman ?

kindnet is already used in CI for other kubernetes jobs and projects https://grep.app/search?q=aojea/kindnet so breaking compatibility within kindnet is something I don't expect, however, how to get the better integration with Kops is something I need your advice

@hakman
Copy link
Member

hakman commented Dec 31, 2024

We haven't used it in a while but we may want to put this behind a kops feature gate to make it clear that it is experimental and we may make breaking changes. WDYT @hakman?

I think we cab skip the flag, as long as we add a warning in the doc file.
Also, may be a good idea to add a mention here:
https://kops.sigs.k8s.io/networking/#supported-networking-options

@hakman
Copy link
Member

hakman commented Jan 6, 2025

@aojea Not sure if Kindnet need the CNI Network Plugin binaries.
If it doesn't require those binaries in /opt/cni/bin/ please add Kindness to this function:

func (c *Cluster) InstallCNIAssets() bool {
return c.Spec.Networking.AmazonVPC == nil &&
c.Spec.Networking.Calico == nil &&
c.Spec.Networking.Cilium == nil
}

Besides the few remaining nits and the 3 failing tests, I think this is pretty good as it is.
The 3 tests can also be fixed at a later time, as Kindnet is marked as experimental.

@aojea
Copy link
Member Author

aojea commented Jan 7, 2025

@aojea Not sure if Kindnet need the CNI Network Plugin binaries. If it doesn't require those binaries in /opt/cni/bin/ please add Kindness to this function:

func (c *Cluster) InstallCNIAssets() bool {
return c.Spec.Networking.AmazonVPC == nil &&
c.Spec.Networking.Calico == nil &&
c.Spec.Networking.Cilium == nil
}

Besides the few remaining nits and the 3 failing tests, I think this is pretty good as it is. The 3 tests can also be fixed at a later time, as Kindnet is marked as experimental.

great, I will address last comments and work on this today

/test pull-kops-e2e-cni-kindnet

@aojea aojea force-pushed the kindnet branch 2 times, most recently from 2bdbd59 to a36c140 Compare January 7, 2025 15:18
@aojea
Copy link
Member Author

aojea commented Jan 7, 2025

/test pull-kops-e2e-cni-kindnet

get more networking information usefult to troubleshoot network issues.
@aojea aojea force-pushed the kindnet branch 3 times, most recently from 3827c06 to 0aae903 Compare January 8, 2025 00:58
add kindnet as an experimental network addon

containerd adds the requirement to use the loopback cni plugin,
kindnet provides that capability and containerd does not require it
since containerd/containerd/pull/10238

Change-Id: I1397a90186885b02e98b5ffa444fe629c1046757
@aojea
Copy link
Member Author

aojea commented Jan 8, 2025

@aojea Not sure if Kindnet need the CNI Network Plugin binaries.
If it doesn't require those binaries in /opt/cni/bin/ please add Kindness to this function:

it needs the loopback plugin so it does not hurt having those

Besides the few remaining nits and the 3 failing tests, I think this is pretty good as it is. The 3 tests can also be fixed at a later time, as Kindnet is marked as experimental.

I've added some rudementary network connection logging to kindnet that is very useful ,

Checking the test failed t] [sig-network] Networking Granular Checks: Services should be able to handle large requests: http , it tries to connect to &protocol=http&host=100.70.71.8&port=80&tries=1

The connections are done from

I0107 18:58:52.119749 49720 resource.go:175] test-container-pod i-09094242e1b8c58f1 Running [{PodReadyToStartContainers True 0001-01-01 00:00:00 +0000 UTC 2025-01-07 18:52:27 +0000 UTC } {Initialized True 0001-01-01 00:00:00 +0000 UTC 2025-01-07 18:52:26 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2025-01-07 18:52:27 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2025-01-07 18:52:27 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2025-01-07 18:52:26 +0000 UTC }]

node i-09094242e1b8c58f1 , the kindnet pod on that node (it is on the test output log) is https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/kops/17158/pull-kops-e2e-cni-kindnet/1876696173184552960/artifacts/cluster-info/kube-system/kindnet-kndvd/kindnet-cni.log

we can see the connections go through

I0107 18:54:02.672517 1 conntrack.go:175] Connection finished: [EventDestroy] Timeout: 0, <tcp, Src: 100.96.4.80:43422, Dst: 100.70.71.8:80>, Zone 0, Acct: [orig: 3 pkts/152 B] [reply: 2 pkts/112 B], , , , ,

but practically no bytes, the connection is established but closed immediately

I will work on those tests after this merge, it seems specific of this environment as I could not repro locally and in other environments, so @hakman this should be ready to go

@hakman
Copy link
Member

hakman commented Jan 8, 2025

Thanks @aojea! 🙂
/lgtm
/retest
/assign @justinsb

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 8, 2025
@hakman
Copy link
Member

hakman commented Jan 8, 2025

/retest

@aojea
Copy link
Member Author

aojea commented Jan 8, 2025

/test pull-kops-e2e-cni-kindnet

@hakman
Copy link
Member

hakman commented Jan 8, 2025

/test pull-kops-e2e-cni-kindnet

All green now!!! 👏

@aojea
Copy link
Member Author

aojea commented Jan 8, 2025

/test pull-kops-e2e-cni-kindnet

All green now!!! 👏

it was an mtu issue :) aojea/kindnet#143

@hakman
Copy link
Member

hakman commented Jan 8, 2025

/test pull-kops-e2e-cni-kindnet

All green now!!! 👏

it was an mtu issue :) aojea/kindnet#143

but could have been DNS 🤣

--zones $ZONES \
--networking kindnet \
--yes \
--name myclustername.mydns.io
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we normally use example.com, I think that is reserved for examples

@@ -5773,6 +5773,39 @@ spec:
description: GCPNetworkingSpec is the specification of GCP's native
networking mode, using IP aliases.
type: object
kindnet:
description: KindnetNetworkingSpec configures Kindnet settings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: an annoying thing about go docs mapping to OpenAPI docs is that we probably don't want to follow normal go conventions for comments. The reader of the OpenAPI docs doesn't see the name KindnetNetworkingSpec on the struct, because OpenAPI doesn't have structs (or we don't use them)

if v.Kindnet != nil {
if optionTaken {
allErrs = append(allErrs, field.Forbidden(fldPath.Child("kindnet"), "only one networking option permitted"))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should be setting optionTaken here, but we're also missing it on LyftVPC and GCP. I'll send a fix separately :-)

func validateNetworkingKindnet(cluster *kops.Cluster, v *kops.KindnetNetworkingSpec, fldPath *field.Path) field.ErrorList {
allErrs := field.ErrorList{}

if v.Masquerade != nil && v.Masquerade.Enabled != nil && *v.Masquerade.Enabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We have ValueOf, v.Masquerade.Enabled != nil && *v.Masquerade.Enabled === fi.ValueOf(v.MasqueradeEnabled)

@@ -345,7 +345,19 @@ func (n *logDumperNode) dump(ctx context.Context) []error {
if err := n.shellToFile(ctx, "sudo iptables -t filter --list-rules", filepath.Join(n.dir, "iptables-filter.log")); err != nil {
errors = append(errors, err)
}
if err := n.shellToFile(ctx, "ip route", filepath.Join(n.dir, "ip-routes.log")); err != nil {
if err := n.shellToFile(ctx, "sudo nft list ruleset", filepath.Join(n.dir, "nftables-ruleset.log")); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 👍

c.Masquerade.NonMasqueradeCIDRs = append(c.Masquerade.NonMasqueradeCIDRs, clusterSpec.Networking.PodCIDR)
}
if clusterSpec.Networking.ServiceClusterIPRange != "" {
c.Masquerade.NonMasqueradeCIDRs = append(c.Masquerade.NonMasqueradeCIDRs, clusterSpec.Networking.ServiceClusterIPRange)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like making this explicit 👍

@@ -777,6 +789,12 @@ func addKubeRouterSrcDstCheckPermissions(p *Policy) {
)
}

func addKindnetSrcDstCheckPermissions(p *Policy) {
p.unconditionalAction.Insert(
"ec2:ModifyInstanceAttribute",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but this has security/performance implications IIRC. Do we need this permission in all modes? (e.g. if we're running IPv6, do we need it?)

@justinsb
Copy link
Member

justinsb commented Jan 8, 2025

This LGTM - awesome work @aojea 🎉 Only question is about those src/dest checks, if we can avoid needing to turn them off that would be awesome. But my 2c is that the easiest way to do this is to merge this, and then send a follow on PR to try them off and seeing what breaks, so I'm in favor of merging!

Any objections to merging @hakman / @rifelpet ?

@hakman
Copy link
Member

hakman commented Jan 8, 2025

Any objections to merging @hakman / @rifelpet ?

@justinsb 👍 , though, would like to add an IPv6 pre-submit if we want to not disable src/dest checks for that case. In any case, I think enabling IPv6 support will require some additional work.

@aojea
Copy link
Member Author

aojea commented Jan 8, 2025

In any case, I think enabling IPv6 support will require some additional work.

for kindnet it should work out of the box.
Cloud providers may need some changes, those were my tests of ipv6 with kubeadm in aws https://github.com/aojea/k8s-aws-ipv6 and in gce https://gist.github.com/aojea/b5c18f99ed048ef6a0c06640e3ab4de7

@justinsb
Copy link
Member

justinsb commented Jan 8, 2025

would like to add an IPv6 pre-submit if we want to not disable src/dest checks for that case. In any case, I think enabling IPv6 support will require some additional work.

I think that's a great idea, and I think we should do that in follow-on PRs.

@justinsb
Copy link
Member

justinsb commented Jan 8, 2025

OK I think we can get this in!

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2025
@k8s-ci-robot k8s-ci-robot merged commit 2db9dbc into kubernetes:master Jan 8, 2025
33 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons area/api area/documentation area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants