-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standalone Egress NAT #299
base: main
Are you sure you want to change the base?
Standalone Egress NAT #299
Conversation
@p-strusiewiczsurmacki-mobica |
Thank you for checking it. It's better if we can run e2e tests for stand-alone egress mode. |
Hi, @p-strusiewiczsurmacki-mobica ! Thank you for waiting. The main changes look good to me. |
66abdf2
to
df10c75
Compare
I agree this would be great. However I need to think about how this test should be done. The problem with kindnet is, that it requires different CNI config on each node, so I would need to add some script that would get node subnet from the kind cluster's nodes after the cluster was created and use that data to update CNI config after |
Yes, it isn't easy to configure the cluster to use kindnet and coil. OK, in this PR, it's OK no e2e test for egress-only mode. |
@terassyi I've added egress-only e2e tests to |
59227cc
to
2eb29ee
Compare
2c074c1
to
8958f4b
Compare
@terassyi Just one important thing - it turned out that |
@p-strusiewiczsurmacki-mobica
In this PR, it's ok to turn on When I tried to turn out But, I think |
Could you update the |
@terassyi Done. :)
It seems that controller-gen only accepts directories for manifest generation, so I had to add some workaround for that (copying |
When trying to run e2e test, it fails with following error.
I ran following commands. $ make -C .. manifests
$ WITH_KINDNET=true TEST_IPV6=false make start
$ make install-coil-egress-v4 |
I believe you're missing a file So, you'd have to run
|
Thanks! It seems procedures to run e2e tests are getting complex, so cloud you update |
It seems webhook-related tests in small test fail. |
Lastly, I want you to update CI to run e2e tests for all combinations, such as egress-only and egress-only, with ipv6 and egress+ipam, etc. When you want to run CI, please mention me. |
a058344
to
9ed43c5
Compare
@terassyi Small test should be fixed and actions were added to the CI. Should I squash the commits before it'll be merged? |
Thanks! It passes all CIs.
Yes, please. |
9ed43c5
to
09d9a0e
Compare
@terassyi It's squashed now :) |
Thanks! I found small test seems to be flaky. |
Signed-off-by: Patryk Strusiewicz-Surmacki <[email protected]> Update v2/cmd/coil-egress-controller/sub/root.go Update v2/runners/coild_server.go Update v2/pkg/nodenet/pod.go Update v2/pkg/cnirpc/cni.proto Co-authored-by: Tomoya Terashima <[email protected]>
b065851
to
7483dee
Compare
@terassyi I've fixed the warnings and added some fixes for the CI jobs.
But I believe that this might be out of our reach here, as it seems it is somewhat known issue with controller runtime's cache. I've tried disabling the cache altogether but for some reason it did not work for me. I believe those were also not introduced by my changes as I can see them in other runs of CI, e.g: https://github.com/cybozu-go/coil/actions/runs/10804727159/job/29970647406 If you see anything else I could fix just let me know. |
Hi @terassyi, |
Hi, I'm checking this to be able to merge. |
I re-reviewed all changes again, and I want to discuss the need for I think it's enough to handle these flags in coild. Now, we check the features we want to use by values given by the coil CNI plugin. The only concern I have now is following the code. But we can solve this by moving this error handling to coild. If we can keep the CNI configuration simple and no changes, it's better for all users. |
Signed-off-by: Patryk Strusiewicz-Surmacki <[email protected]>
…ress-settings-fix Standalone egress settings fix
@terassyi |
"missing pod name/namespace", fmt.Sprintf("%+v", args.Args)) | ||
isChained, err := getSettings(args) | ||
if err != nil { | ||
return nil, newInternalError(fmt.Errorf("runtime error"), "failed to get CNi arguments") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, newInternalError(fmt.Errorf("runtime error"), "failed to get CNi arguments") | |
return nil, newInternalError(fmt.Errorf("runtime error"), "failed to get CNI arguments") |
@@ -88,7 +92,8 @@ func (n natSetup) Hook(l []GWNets, log *zap.Logger) func(ipv4, ipv6 net.IP) erro | |||
} | |||
|
|||
// NewCoildServer returns an implementation of cnirpc.CNIServer for coild. | |||
func NewCoildServer(l net.Listener, mgr manager.Manager, nodeIPAM ipam.NodeIPAM, podNet nodenet.PodNetwork, setup NATSetup, logger *zap.Logger) manager.Runnable { | |||
func NewCoildServer(l net.Listener, mgr manager.Manager, nodeIPAM ipam.NodeIPAM, podNet nodenet.PodNetwork, setup NATSetup, cfg *config.Config, logger *zap.Logger, | |||
aliasFunc func(interfaces map[string]bool, conf *nodenet.PodNetConf, logger *zap.Logger, pod *corev1.Pod) error) manager.Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add the callback function to add the interface alias?
if err != nil { | ||
logger.Sugar().Errorw("failed to allocate address", "error", err) | ||
return nil, newInternalError(err, "failed to allocate address") | ||
if !s.cfg.EnableIPAM && !s.cfg.EnableEgress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is enough to check only when starting the cold server.
if err != nil { | ||
return fmt.Errorf("netlink: failed to look up the host-side veth [%s]: %w", ifName, err) | ||
} | ||
logger.Sugar().Infof("link found: %v", hLink) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this log.
@p-strusiewiczsurmacki-mobica I added some comments :) |
This PR introduces Standalone Egress NAT as discussed in #274 .
coil-controller
is now divided intocoil-ipam-controller
andcoil-egress-controller
.coild
has now configuration flags to disable/enable egress and/or IPAM features.coil
has nowcapabilities
fields that can be used to disable/enable IPAM/Egress.setup.md
.veth
aliases will now use pod's UUID instead of container's ID (couldn't get e2e test working using container's ID, I believe container ID is changed during pod restart, but as IPAM is disabled it is not updated as required).PR was tested using egress related E2E tests with both Kindnet and Calico and tests that are provided in the repository passed.