Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: keep missing dev but not select #207

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

sunya-ch
Copy link
Collaborator

@sunya-ch sunya-ch commented Jan 9, 2025

This PR is to fix #206.

change summary

controller

  • replace function interfaceChanged with UpdateNewInterfaces to keep old interface info even if the device does not exist anymore. replace old information with new information using the interface name as a primary key to determine updated item.
  • do not delete HostInterface even if the node is still alive even if the daemon pod is deleted.

daemon

  • check device exist before putting in the selected list (for policy-based selection) except the TEST_MODE is set
  • introduce function InitCache to read HostInterface at the beginning to fetch the existing information

testing

  • add unit test for UpdateNewInterfaces
  • test the following scenario in the real cluster
  1. deploy workload pod
  2. delete multi-nicd pod and wait until it is recreated
  3. delete workload pod

The log from the multi-nicd pod shows that it can first set the cache and can return corresponding list of interface names on deletion.

2025/01/09 04:23:27 set 32 devices cache from hostinterface CR
...
2025/01/09 04:26:13 return: {[(pciAddress values)] [(interface name values)]}

Signed-off-by: Sunyanan Choochotkaew <[email protected]>
@tatsuhirochiba tatsuhirochiba merged commit 8db700f into foundation-model-stack:v1.2.5 Jan 9, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[host-device] missing interface name if daemon pod is restarted
2 participants