Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple device plugin sockets are created for a single instance #551

Open
kate-goldenring opened this issue Jan 20, 2023 · 1 comment
Open
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed keep-alive

Comments

@kate-goldenring
Copy link
Contributor

kate-goldenring commented Jan 20, 2023

Describe the bug

Sometimes, the agent creates multiple device plugin sockets (under /var/lib/kubelet/device-plugins) for a single discovered device.
Notice how both /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249624.sock and /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249625.sock are created for instance udev-video-9d8a82.

kagold@kagold-ThinkPad-X1-Carbon-6th:~/projects/akri-notes$ sudo ls /var/lib/kubelet/device-plugins
kubelet_internal_checkpoint  udev-video-9d8a82-1674249624.sock	udev-video-d804b0-1674249623.sock
kubelet.sock		     udev-video-9d8a82-1674249625.sock	udev-video-d804b0-1674249625.sock

This means that when the Akri Configuration is later deleted, the extra sockets still persist:

kagold@kagold-ThinkPad-X1-Carbon-6th:~/projects/akri-notes$ kubectl delete akric udev-video
configuration.akri.sh "udev-video" deleted
kagold@kagold-ThinkPad-X1-Carbon-6th:~/projects/akri-notes$ sudo ls /var/lib/kubelet/device-plugins
kubelet_internal_checkpoint  kubelet.sock  udev-video-9d8a82-1674249624.sock  udev-video-d804b0-1674249623.sock

It seems like what is happening is that the discovery handler is re-sending the discovered devices before the DiscoveryOperator has successfully created the device plugins and the instances have been created. Notice how handle_discovery_results is called twice for each device.

Agent Logs

The agent logs the creation of both /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249624.sock and /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249625.sock for instance udev-video-9d8a82.

[2023-01-20T21:20:23Z TRACE agent::util::discovery_operator] internal_do_discover - got discovery results [Device { id: "/dev/video0", properties: {"UDEV_DEVNODE": "/dev/video0"}, mounts: [Mount { container_path: "/dev/video0", host_path: "/dev/video0", read_only: true }], device_specs: [] }, Device { id: "/dev/video2", properties: {"UDEV_DEVNODE": "/dev/video2"}, mounts: [Mount { container_path: "/dev/video2", host_path: "/dev/video2", read_only: true }], device_specs: [] }]
[2023-01-20T21:20:23Z TRACE agent::util::discovery_operator] handle_discovery_results - for config udev-video with discovery results [Device { id: "/dev/video0", properties: {"UDEV_DEVNODE": "/dev/video0"}, mounts: [Mount { container_path: "/dev/video0", host_path: "/dev/video0", read_only: true }], device_specs: [] }, Device { id: "/dev/video2", properties: {"UDEV_DEVNODE": "/dev/video2"}, mounts: [Mount { container_path: "/dev/video2", host_path: "/dev/video2", read_only: true }], device_specs: [] }]
[2023-01-20T21:20:23Z TRACE agent::util::discovery_operator] handle_discovery_results - new instance udev-video-d804b0 came online
[2023-01-20T21:20:23Z INFO  agent::util::device_plugin_builder] build_device_plugin - entered for device udev-video-d804b0
[2023-01-20T21:20:23Z INFO  agent::util::device_plugin_builder] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/udev-video-d804b0-1674249623.sock
[2023-01-20T21:20:24Z TRACE agent::util::discovery_operator] internal_do_discover - got discovery results [Device { id: "/dev/video2", properties: {"UDEV_DEVNODE": "/dev/video2"}, mounts: [Mount { container_path: "/dev/video2", host_path: "/dev/video2", read_only: true }], device_specs: [] }, Device { id: "/dev/video0", properties: {"UDEV_DEVNODE": "/dev/video0"}, mounts: [Mount { container_path: "/dev/video0", host_path: "/dev/video0", read_only: true }], device_specs: [] }]
[2023-01-20T21:20:24Z TRACE agent::util::discovery_operator] handle_discovery_results - for config udev-video with discovery results [Device { id: "/dev/video2", properties: {"UDEV_DEVNODE": "/dev/video2"}, mounts: [Mount { container_path: "/dev/video2", host_path: "/dev/video2", read_only: true }], device_specs: [] }, Device { id: "/dev/video0", properties: {"UDEV_DEVNODE": "/dev/video0"}, mounts: [Mount { container_path: "/dev/video0", host_path: "/dev/video0", read_only: true }], device_specs: [] }]
[2023-01-20T21:20:24Z TRACE agent::util::discovery_operator] handle_discovery_results - new instance udev-video-9d8a82 came online
[2023-01-20T21:20:24Z INFO  agent::util::device_plugin_builder] build_device_plugin - entered for device udev-video-9d8a82
[2023-01-20T21:20:24Z INFO  agent::util::device_plugin_builder] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249624.sock
[2023-01-20T21:20:25Z INFO  agent::util::device_plugin_builder] register - entered for Instance akri.sh/udev-video-d804b0 and socket_name: udev-video-d804b0-1674249623.sock
[2023-01-20T21:20:25Z TRACE agent::util::device_plugin_builder] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2023-01-20T21:20:25Z TRACE agent::util::device_plugin_service] get_device_plugin_options - kubelet called get_device_plugin_options
[2023-01-20T21:20:25Z INFO  agent::util::device_plugin_builder] register - entered for Instance akri.sh/udev-video-9d8a82 and socket_name: udev-video-9d8a82-1674249624.sock
[2023-01-20T21:20:25Z TRACE agent::util::device_plugin_builder] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2023-01-20T21:20:25Z TRACE agent::util::discovery_operator] handle_discovery_results - new instance udev-video-9d8a82 came online
[2023-01-20T21:20:25Z INFO  agent::util::device_plugin_builder] build_device_plugin - entered for device udev-video-9d8a82
[2023-01-20T21:20:25Z INFO  agent::util::device_plugin_builder] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/udev-video-9d8a82-1674249625.sock
[2023-01-20T21:20:25Z INFO  agent::util::device_plugin_service] list_and_watch - kubelet called list_and_watch for instance udev-video-d804b0
[2023-01-20T21:20:25Z TRACE agent::util::device_plugin_service] get_device_plugin_options - kubelet called get_device_plugin_options
[2023-01-20T21:20:25Z TRACE agent::util::discovery_operator] handle_discovery_results - new instance udev-video-d804b0 came online
[2023-01-20T21:20:25Z INFO  agent::util::device_plugin_builder] build_device_plugin - entered for device udev-video-d804b0
[2023-01-20T21:20:25Z INFO  agent::util::device_plugin_service] list_and_watch - kubelet called list_and_watch for instance udev-video-9d8a82
[2023-01-20T21:20:25Z INFO  agent::util::device_plugin_builder] serve - creating a device plugin server that will listen at: /var/lib/kubelet/device-plugins/udev-video-d804b0-1674249625.sock
[2023-01-20T21:20:26Z INFO  agent::util::device_plugin_builder] register - entered for Instance akri.sh/udev-video-d804b0 and socket_name: udev-video-d804b0-1674249625.sock
[2023-01-20T21:20:26Z INFO  agent::util::device_plugin_builder] register - entered for Instance akri.sh/udev-video-9d8a82 and socket_name: udev-video-9d8a82-1674249625.sock
[2023-01-20T21:20:26Z TRACE agent::util::device_plugin_builder] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2023-01-20T21:20:26Z TRACE agent::util::device_plugin_builder] register - before call to register with the kubelet at socket /var/lib/kubelet/device-plugins/kubelet.sock
[2023-01-20T21:20:26Z TRACE agent::util::device_plugin_service] get_device_plugin_options - kubelet called get_device_plugin_options
[2023-01-20T21:20:26Z INFO  agent::util::device_plugin_service] list_and_watch - kubelet called list_and_watch for instance udev-video-d804b0
[2023-01-20T21:20:27Z TRACE agent::util::device_plugin_service] get_device_plugin_options - kubelet called get_device_plugin_options
[2023-01-20T21:20:27Z INFO  agent::util::device_plugin_service] list_and_watch - kubelet called list_and_watch for instance udev-video-9d8a82

Potential solution

To avoid the race case of a device plugin being recreated while creation is in process, we should add the instance to the instance map before calling build_device_plugin here, setting (a new type of status of) InstanceConnectivityStatus::Connecting. Then once the DevicePluginService has been called by kubelet and the instance has been created, the status can be updated instead of created here. This may require wrapping the list_and_watch_mesage_sender of InstanceInfo in an option.

@kate-goldenring kate-goldenring added bug Something isn't working help wanted Extra attention is needed labels Jan 20, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 6, 2023

Issue has been automatically marked as stale due to inactivity for 90 days. Update the issue to remove label, otherwise it will be automatically closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed keep-alive
Projects
Status: In progress
Development

No branches or pull requests

2 participants