Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VFs may get reseted after being allocated by other pod #219

Closed
ormergi opened this issue Aug 23, 2022 · 6 comments · Fixed by #220
Closed

VFs may get reseted after being allocated by other pod #219

ormergi opened this issue Aug 23, 2022 · 6 comments · Fixed by #220

Comments

@ormergi
Copy link

ormergi commented Aug 23, 2022

What happened?

In the scenario where pods with SR-IOV interface from the same resource pool are created and deleted a few times,
the underlying VF may end up with a default configuration instead of the desired one (i.e: no MAC address, no VLAN).

What did you expect to happen?

The pod underlying VF is to be configured correctly.

What are the minimal steps needed to reproduce the bug?

Terminal 1: Monitor SR-IOV VFs state on node worker1
Terminal 2:

  • Create a pod test with:
    • nodeSelector to node worker1.
    • SR-IOV interface with MAC address 02:02:02:02:02.
  • Wait for the pod to be ready.
  • Delete the pod test in the background and immediately create a similar pod test2.
  • Wait for pod test2 to be ready.

After a few iterations, we saw that the pod is Running but looking at the node VFs
it shows that it's not configured with the desired MAC address.

Anything else we need to know?

Scripts and manifests I used for reproducing the issue:
https://gist.github.com/ormergi/3ddbf901ddc95baf316b604994285a69

It seems that when the pod (1) is deleted and CNI cmdDEL command been executed, it will reset the VF whether it's been allocated by another pod or not.

Also, it's not guaranteed that as soon as a pod is disposed its underlying VF is reseted.
It takes 1-2 seconds for it to reset, which seem odd because I would expect all its resource to be free when it's gone.

Component Versions

Component Version
SR-IOV CNI Plugin v2.6
Multus v3.8
SR-IOV Network Device Plugin v3.3
Kubernetes v1.22.2
OS RHEL 8.6

Config Files

Config file locations may be config dependent.

CNI config (Try '/etc/cni/net.d/')
cat /etc/cni/net.d/00-multus.conf
{ "cniVersion": "0.3.1", "name": "multus-cni-network", "type": "multus", "capabilities": {"portMappings": true}, "logLevel": "debug", "logFile": "/var/log/multus.log", "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ { "cniVersion": "0.3.1", "name": "kindnet", "plugins": [ { "type": "ptp", "ipMasq": false, "ipam": { "type": "host-local", "dataDir": "/run/cni-ipam-state", "routes": [ { "dst": "0.0.0.0/0" } ], "ranges": [ [ { "subnet": "10.244.2.0/24" } ] ] } , "mtu": 1500 }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } ] }
cat /etc/cni/net.d/10-kindnet.conflist
{
        "cniVersion": "0.3.1",
        "name": "kindnet",
        "plugins": [
        {
                "type": "ptp",
                "ipMasq": false,
                "ipam": {
                        "type": "host-local",
                        "dataDir": "/run/cni-ipam-state",
                        "routes": [
                                { "dst": "0.0.0.0/0" }
                        ],
                        "ranges": [
                                [ { "subnet": "10.244.2.0/24" } ]
                        ]
                }
                ,
                "mtu": 1500
        },
        {
                "type": "portmap",
                "capabilities": {
                        "portMappings": true
                }
        }
        ]
}
Device pool config file location (Try '/etc/pcidp/config.json')
{
  "resourceList": [{
    "resourceName": "sriov_net",
    "selectors": {
      "drivers": ["vfio-pci"],
      "pfNames": ["enp4s0f0","enp4s0f1"]
    }
  }]
}
Multus config (Try '/etc/cni/multus/net.d')

see ##### CNI config

Kubernetes deployment type ( Bare Metal, Kubeadm etc.)

Kind deployment

Kubeconfig file
SR-IOV Network Custom Resource Definition
kind: NetworkAttachmentDefinition
metadata:
  annotations:
    k8s.v1.cni.cncf.io/resourceName: kubevirt.io/sriov_net
  name: sriov-network
spec:
  config: "{ \n    \"cniVersion\":\"0.3.1\", \n    \"name\":\"sriov-network\",\n
    \   \"type\":\"sriov\",\n    \"vlan\":0,\n    \"spoofchk\":\"on\",\n    \"trust\":\"off\",\n
    \   \"vlanQoS\":0,\n    \"link_state\":\"enable\",\n    \"ipam\":{}\n}"

Logs

SR-IOV Network Device Plugin Logs (use kubectl logs $PODNAME)
Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)

Journal log including Multus logs from when the issue accord https://pastebin.com/eGhyrevk.

@ormergi
Copy link
Author

ormergi commented Aug 24, 2022

Script and manifests I used in order to reproduce this https://gist.github.com/ormergi/3ddbf901ddc95baf316b604994285a69.
Journal log from one of the iterations when the issue occurred https://pastebin.com/eGhyrevk.

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 24, 2022

Hi @ormergi can this be related to #126 ?

@ormergi
Copy link
Author

ormergi commented Aug 25, 2022

@SchSeba thanks for looking at this.
It seems that #126 is about kubelet won't keep the pod alive so that the plugin will manage to reach its netns and do its thing.
In our case, the plugin executes as we can see the VFs get reset after the pod is allegedly disposed though.

EdDev added a commit to EdDev/kubevirt that referenced this issue Aug 25, 2022
SR-IOV tests execution on CI has shown flakes where the VF configuration
has been modified after assigning the device to the pod.

The primary suspect is a CNI DELETE instruction from a previous pod
which is somehow racing with the re-assignment of the device to the new
pod.

In order to mitigate this suspected issue (and as side effect, prove it)
this change assures the VMI/s used with SR-IOV are deleted and absent
before starting the next test.

The issue is tracked on the sriov-cni project:
k8snetworkplumbingwg/sriov-cni#219

Signed-off-by: Edward Haas <[email protected]>
@SchSeba
Copy link
Collaborator

SchSeba commented Aug 25, 2022

@ormergi will you be able to provide the logs from multus? that is the only way we will be able to try and trace the issue I think

@ormergi
Copy link
Author

ormergi commented Aug 25, 2022

Script and manifests I used in order to reproduce this https://gist.github.com/ormergi/3ddbf901ddc95baf316b604994285a69. Journal log from one of the iterations when the issue occurred https://pastebin.com/eGhyrevk.

@SchSeba are these logs good enough https://pastebin.com/eGhyrevk?
This is a journal log of the node since the pod was created + 1 minute before, it also includes Multus logs.

EDIT: FWIW its much easier to review these logs in vscode.

SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Aug 29, 2022
This commit add a type of lock to allocated pci address

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <[email protected]>
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Aug 29, 2022
This commit add a type of lock to allocated pci address

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <[email protected]>
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Aug 29, 2022
This commit add a type of lock to allocated pci address

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <[email protected]>
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Aug 30, 2022
This commit add a type of lock to allocated pci address

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <[email protected]>
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Aug 30, 2022
This commit add a type of lock to allocated pci address

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <[email protected]>
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Aug 30, 2022
This commit add a type of lock to allocated pci address

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <[email protected]>
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Aug 30, 2022
This commit add a type of lock to allocated pci address

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <[email protected]>
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Sep 7, 2022
This commit add a type of lock to allocated pci address

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <[email protected]>
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Sep 12, 2022
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Nov 14, 2022
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Nov 14, 2022
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Nov 14, 2022
SchSeba added a commit to SchSeba/sriov-cni that referenced this issue Nov 14, 2022
@ormergi
Copy link
Author

ormergi commented Nov 17, 2022

@adrianchiris is there a planned release so we could consume the fixed version? 🙂

EdDev added a commit to EdDev/kubevirt that referenced this issue Nov 22, 2022
Place a delay between tests to assure resources (VF/s) are
fully released before reused again on a new VMI.
(results show that waiting for VMI/s to disappear is not enough)
Ref: k8snetworkplumbingwg/sriov-cni#219

This workaround should be temporary until the fix [1] can be consumed.
[1] k8snetworkplumbingwg/sriov-cni#220

Signed-off-by: Edward Haas <[email protected]>
EdDev added a commit to EdDev/kubevirt that referenced this issue Nov 22, 2022
Place a delay between tests to assure resources (VF/s) are
fully released before reused again on a new VMI.
(results show that waiting for VMI/s to disappear is not enough)
Ref: k8snetworkplumbingwg/sriov-cni#219

This workaround should be temporary until the fix [1] can be consumed.
[1] k8snetworkplumbingwg/sriov-cni#220

Signed-off-by: Edward Haas <[email protected]>
ormergi added a commit to ormergi/kubevirtci that referenced this issue Nov 29, 2022
Following the flakes we have all over SR-IOV lanes due to [1] and [2],
bump sriov-cni to v2.7.0 in order to consume the fix [3].

[1] kubevirt/kubevirt#6776
[2] k8snetworkplumbingwg/sriov-cni#219
[3] k8snetworkplumbingwg/sriov-cni#220

Signed-off-by: Or Mergi <[email protected]>
kubevirt-bot pushed a commit to kubevirt/kubevirtci that referenced this issue Dec 1, 2022
Following the flakes we have all over SR-IOV lanes due to [1] and [2],
bump sriov-cni to v2.7.0 in order to consume the fix [3].

[1] kubevirt/kubevirt#6776
[2] k8snetworkplumbingwg/sriov-cni#219
[3] k8snetworkplumbingwg/sriov-cni#220

Signed-off-by: Or Mergi <[email protected]>

Signed-off-by: Or Mergi <[email protected]>
cgoncalves pushed a commit to cgoncalves/sriov-cni that referenced this issue Dec 6, 2022
* Create allocation interface and implementation.
This is needed to lock the allocation of the same PCI address until the cmdDel is called or the kernel remove the network namespace.

Signed-off-by: Sebastian Sch <[email protected]>

* Use the allocator interface to prevent allocation of still in used pci addresses

Fixes: k8snetworkplumbingwg/sriov-cni#219

Signed-off-by: Sebastian Sch <[email protected]>

Signed-off-by: Sebastian Sch <[email protected]>
cgoncalves pushed a commit to cgoncalves/sriov-cni that referenced this issue Dec 7, 2022
* Create allocation interface and implementation.
This is needed to lock the allocation of the same PCI address until the cmdDel is called or the kernel remove the network namespace.

Signed-off-by: Sebastian Sch <[email protected]>

* Use the allocator interface to prevent allocation of still in used pci addresses

Fixes: k8snetworkplumbingwg/sriov-cni#219

Signed-off-by: Sebastian Sch <[email protected]>

Signed-off-by: Sebastian Sch <[email protected]>
cgoncalves pushed a commit to cgoncalves/sriov-cni that referenced this issue Dec 7, 2022
* Create allocation interface and implementation.
This is needed to lock the allocation of the same PCI address until the cmdDel is called or the kernel remove the network namespace.

Signed-off-by: Sebastian Sch <[email protected]>

* Use the allocator interface to prevent allocation of still in used pci addresses

Fixes: k8snetworkplumbingwg/sriov-cni#219

Signed-off-by: Sebastian Sch <[email protected]>

Signed-off-by: Sebastian Sch <[email protected]>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/sriov-cni that referenced this issue Dec 7, 2022
* Create allocation interface and implementation.
This is needed to lock the allocation of the same PCI address until the cmdDel is called or the kernel remove the network namespace.

Signed-off-by: Sebastian Sch <[email protected]>

* Use the allocator interface to prevent allocation of still in used pci addresses

Fixes: k8snetworkplumbingwg/sriov-cni#219

Signed-off-by: Sebastian Sch <[email protected]>

Signed-off-by: Sebastian Sch <[email protected]>
cgoncalves pushed a commit to cgoncalves/sriov-cni that referenced this issue Feb 8, 2023
* Create allocation interface and implementation.
This is needed to lock the allocation of the same PCI address until the cmdDel is called or the kernel remove the network namespace.

Signed-off-by: Sebastian Sch <[email protected]>

* Use the allocator interface to prevent allocation of still in used pci addresses

Fixes: k8snetworkplumbingwg/sriov-cni#219

Signed-off-by: Sebastian Sch <[email protected]>

Signed-off-by: Sebastian Sch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants