Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix up startup / informer usage #438

Closed
danwinship opened this issue Jun 7, 2022 · 4 comments
Closed

fix up startup / informer usage #438

danwinship opened this issue Jun 7, 2022 · 4 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@danwinship
Copy link
Contributor

(We probably won't actually fix this, but I did the analysis and want to dump it somewhere.)

We keep running into problems with the fact that openshift-sdn-node doesn't wait for its informers to be ready at startup.

Here's what sdn-node startup currently looks like:

pkg/cmd/openshift-sdn/node/cmd.go: run()

  sdn.init()
    sdn.buildInformers()
    sdn.initSDN()
      sdnnode.New()
        NewNetworkPolicyPlugin()
          newNodeVNIDMap()
        NewOVSController()
        common.NewEgressDNS()
        newHostSubnetWatcher()
        newPodManager()
        newEgressIPWatcher()
          common.NewEgressIPTracker()
    sdn.initProxy()
      sdnproxy.New()
        common.NewEgressDNS()

  sdn.start()
    sdn.runSDN()
      sdn.osdnNode.Start()
        node.getLocalSubnet()
        newNodeIPTables()
        nodeIPTables.Setup()
        node.SetupSDN()
        hostSubnets.Start()
          watchHostSubnets()
        node.policy.Start()
          vnids.Start()
            populateVNIDs()
              common.ListAllNetNamespaces()
            watchNetNamespaces()
          initNamespaces()
            common.ListAllNamespaces()
            common.ListAllNetworkPolicies()
          watchNamespaces()
          watchPods()
          watchNetworkPolicies()
        node.SetupEgressNetworkPolicy()
          common.ListAllEgressNetworkPolicies()
          plugin.policy.GetVNID()
          plugin.watchEgressNetworkPolicies()
        node.egressIP.Start()
          eip.tracker.Start()
            watchHostSubnets()
            watchNetNamespaces()
            watchNodes()
            go WaitForCacheSync()
        node.watchServices()
        node.podManager.InitRunningPods()
          m.policy.GetVNID()
        node.podManager.Start()
        node.reattachPods()
          node.podManager.handleCNIRequest()
            podManager.setup()
              m.kClient.CoreV1().Pods().Get()
              m.policy.GetVNID()
        node.FinishSetupSDN()
    sdn.runProxy()
      newProxyServer()
      wrapProxy()
        sdn.SetBaseProxies()
          NewHybridProxier()
        sdn.osdnProxy.Start()
          common.GetParsedClusterNetwork()
          common.ListAllEgressNetworkPolicies()
	  proxy.watchEgressNetworkPolicies()
	  proxy.watchNetNamespaces()
      startProxyServer()
        informers.NewSharedInformerFactoryWithOptions()
        informerFactory.Start()

    sdn.informers.start()

ideally we would call sdn.informers.start() between sdn.init() and sdn.start() (and then get rid of all of the ListAll* methods and just use the informer listers). But this would require reorganizing stuff so that all of the "watch" methods get called at init time. Right now the split between init and start is mostly there to allow for unit tests that don't use clients/informers, but that could be fixed by just using fake clients/informers in the unit tests.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 6, 2022
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 6, 2022
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Nov 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 6, 2022

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

2 participants