-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osbuild-worker: fix "crashing" on worker registration issues #4357
Conversation
I do acknowledge (and did before) that your PRs are smaller and cleaner and I'm also mostly interested in getting this released to unblock the ansible PR. The PR #4320 for sure is more extended as it also tries to make the startup cleaner (avoid 401 in the good case) Would you mind taking this test over (just changing the sequence to be "the current" one) so this doesn't get lost and should help in identifying other startup changes. |
although, sure, this can be done in a separate PR |
Thanks, part of why I opened this PR is precisely to avoid mixing this change with other changes. It's totally fine to have a PR about test improvements (which I really like) or changing the way the startup works, my point (and why I opened this PR) is that that those should (IMHO) be their own PRs. The rational is to avoid exactly what happend in #4320 - a lot of discussion/questions about topics that where not part of the (direct) goal that distracted from getting the change required to unblock ansible(?) landed. In my experience when a PR becomes controversial the best course of action is to make it smaller and try to extract the essence/most important changes on their own. Then one-by-one the other changes can be proposed/discussed/merged. So my suggestion would be:
Don't get me wrong, this is not a dig on your work, I'm really just trying to help establishing a flow that makes it quicker/easier to land it :) |
When the osbuild worker cannot register itself with the server on startup the worker will "crash". This is inconsistent with the existing behavior in `workerHeartbeat()` which deals with connectivity or other server issue gracefully and retries periodically. To unify the behavior this commit changes the behavior and only issues a `logrus.Warnf` instead of the previous `Falalf` when the registration fails. Co-authored-by: Florian Schüller <[email protected]>
f831649
to
7d47d5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- let's try to get this (or a variant of it) landed
🥳- open the test sequence changes as their own PR as they are usful on their own
- discuss/open the lazy vs non-lazy token refresh but keep it separate as it was causing the most controversy
Fixing the just the "crash" helps for sure, and I'll continue to roll out the task I referenced from my pr. I really doubt that the other two will happen soonish with this approach, let's see…
Great to hear that this will unblock the ansible worker PR. As for the other two:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
When the osbuild worker cannot register itself with the server on startup the worker will "crash". This is inconsistent with the existing behavior in
workerHeartbeat()
which deals with connectivity or other server issue gracefully and retries periodically.To unify the behavior this commit changes the behavior and only issues a
logrus.Warnf
instead of the previousFalalf
when the registration fails.This is an alternative version of #4320 that contains the minimal change to achieve the goal with a test. I feel quite rude to open this PR and I'm sorry for this. I'm mostly doing this as I feel after 51 discussion comments in the other PR it might be easier/more illustrative to offer an alternative version instead of keeping the discussion in the other PR going.
The key points (to me):
I hope this helps, I sincerely want to help here, this is not asserting domination or playing one-upmanship or anything like this. I'm sure there are things to improve with this PR too.
P.S. This does not contain the unix socket change, mostly because I did not want to spend more time thinking how to test it (it's probably easy but I wanted to timebox it). Also with unix sockets connectivity issues are more rare so it seemed less important (but I could be wrong here).
[edit: typo fix]