Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name conflict in HTCondor config object name #3329

Open
lemaitre-aneo opened this issue Dec 2, 2024 · 1 comment
Open

Name conflict in HTCondor config object name #3329

lemaitre-aneo opened this issue Dec 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@lemaitre-aneo
Copy link
Contributor

Describe the bug

When both Central manager and Access point MIG template are changed in the same plan, there is a risk that Access point config object is deleted after being overwritten.

Steps to reproduce

Steps to reproduce the behavior:

  1. Deploy HTCondor
  2. Change the configuration of the central manager and the access point, like the disk size

Expected behavior

After deployment, the config object should be in the bucket

Actual behavior

  • The AP config object is marked for deletion because the CM IP is "known after apply".
  • CM is redeployed with the new configuration, but the IP is reused.
  • The new AP config object is deployed using the same name as the previous one because the config is actually the same (because the IP has not changed after all).
  • AP template is updated
  • The previous AP config object is deleted, but as it is the same name as the newer AP config object, it is also the new AP config object.

Version (gcluster --version)

1.36

According to the code within main branch, it should also affects the latest version (1.42 as of today).

Additional context

I can see multiple ways to solve the issue:

  • Allocate IPs outside of the MIG defintion to make them stable upon reconfiguration
  • Remove the IP from the computation of the hash
  • Add a null_resource that would be recreated when config changes, and use its ID instead of the hash of the config

I think the easiest and more robust way would be the null_resource, but I would be glad to have your opinion on the subject.

I can definitely implement the fix as soon as we agree on how to solve it.

@lemaitre-aneo lemaitre-aneo added the bug Something isn't working label Dec 2, 2024
@harshthakkar01
Copy link
Contributor

Hi, Thank you for the detailed context. This is currently being discussed internally around the root cause and potential fix. We will have update soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants