Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36856 - add check for katello-rake task #783

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

m-bucher
Copy link
Contributor

No description provided.

@m-bucher m-bucher force-pushed the add_clean_backend_objects_check branch 3 times, most recently from 5a07227 to a1791d9 Compare October 24, 2023 13:16

def clean_backend_objects_dry_run
found_orphans = nil
list_cmd = "export RUBYOPT='-W0'; foreman-rake katello:clean_backend_objects"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command as shown here executes in a dry run mode? How does a user execute it in a non-dry run mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By adding COMMIT=True, which is automatically done during the Katello update procedure (rake upgrade:run), triggered by foreman-installer AFAIK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are automatically cleaned up as part of the upgrade? What is the goal of the check in this case? What are the actions a user should take?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that customers wondered why many content hosts were suddenly deleted during upgrade run because candlepin subscriptions of hosts were no longer available. That's why we think that a warning in the pre-upgrade is good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation of customers hitting this and having their upgrade halted? To run the script by hand with commit? To skip this check after the first time they see it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the "foreman-maintain upgrade check" first - before they acutally run the upgrade. I suggest to extend this "pre-upgrade check" further in the future to make other things visible which hurts you during the upgrade step.

If you run the "upgrade check" and make it visible, that hundreds of hosts will be unregistered/deleted during upgrade run, you have the chance to react accordingly - like to read some knowledge base articles to examine the root cause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide links or more action forward language for the user?

My concern is the user runs upgrade check or upgrade sees this step fail and then is left without any clear understanding of what they should do next. Should they ignore it? Should they read about it? Should they perform some action independently of the tool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did miss some of the potential output of the rake-task.
I am not fully sure, what can be done about all the findings. I assume re-registering to the activation-key on the found hosts can fix this.
I am not aware of additional information, I can link to.
@ehelms do you know of any Discourse-threads or documentation-chapters that provide additional information?

@m-bucher m-bucher force-pushed the add_clean_backend_objects_check branch from a1791d9 to e2e56e2 Compare October 30, 2023 11:31
@m-bucher m-bucher force-pushed the add_clean_backend_objects_check branch from e2e56e2 to 0a03b65 Compare October 30, 2023 14:24
Copy link

@jeremylenz jeremylenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clean_backend_objects rake task clears out, among other things:

  • Hosts whose subscription facet uuid field is nil
  • Hosts whose subscription facet uuid field is not nil, but does not match a uuid found in Candlepin

These are the hosts that are unregistered by clean_backend_objects.

As we know, the subscription facet is Katello's record of Candlepin data about the host.

If we didn't remove them, Katello functionality wouldn't work at all for these hosts anyway, since there's no corresponding Candlepin consumer that Katello can talk to about them.

So my question is what's the situation where a user might want to avoid removing this useless data? (And also, how did we arrive in this state, which may be harder to answer.)

@sbernhard
Copy link
Contributor

I'm ok to remove the data in a later stage but first of all, it should be visible to users that something was wrong. Maybe they want to analyze WHY the association between candlepin and katello broke. For this, they need to have the data in the database; check logs; do the usual steps to analyze a issue. If it would just be cleaned out, there is no way to analyze such an issue. Therefore we wan to introduce one step to check and print the isuse first. Later on, it can be cleaned out.

@jeremylenz
Copy link

Maybe they want to analyze WHY the association between candlepin and katello broke. For this, they need to have the data in the database; check logs; do the usual steps to analyze a issue. If it would just be cleaned out, there is no way to analyze such an issue. Therefore we wan to introduce one step to check and print the isuse first. Later on, it can be cleaned out.

This makes sense, thanks. My question then, is if this is the best way to implement it?

As far as I understand (so far), the hosts cleared out during upgrade are those

  • with a subscription facet, meaning at one point they were registered; and
  • with a subscription facet uuid missing either from Katello or from Candlepin.

That means these hosts are supposed to be registered, but will never work properly as registered hosts because some of their data is gone from Katello or Candlepin.

What is the goal of the check in this case? What are the actions a user should take?

The remedy in this situation is to

a) remove the incomplete data; and
b) re-register the host, if the host still actually exists and you want to manage it.

The clean_backend_objects task that runs during upgrade takes care of a) for you. The question is, will most users want that to happen automatically (which is what happens now), or will they want to be warned beforehand so they can do some sleuthing and figure out how it happened? Personally I'm not sure that most users would want the latter. Maybe we go somewhere in the middle and say "We removed orphaned hosts with the following names; you may want to check on them"? Seems too harsh to me to halt the entire upgrade. If you want to be extra-conscientious, can't you just run clean_backend_objects yourself with COMMIT=false before upgrading?

@ehelms @parthaa thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants