Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Role - Forbid Scylla upgrade when not explicitly requested #140

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fee-mendes
Copy link
Member

This series introduces an early check to determine whether Scylla is already installed on the target system.
It provides a mechanism to prevent incorrectly (and dangerously) upgrading a system which may be already part of a running cluster.

In such scenarios, we now require upgrade_version: True, which delegates ALL upgrade logic to its relevant upgrade section in the role.

To prevent a "chicken and egg" situation when we fail in the middle of a role execution AND after Scylla has been installed, we introduce the skip_upgrade_check option as a failsafe mechanism, which - obviously - defaults to false.

Tests ran:

  1. Fail role execution after an install of Scylla in a single node cluster (this is - in fact, what led to TLS - Private key (insecure) copy fails when ansible is ran as a regular user #139 ;-)
  2. Try to execute role again, we see a failure complaining that Scylla is already installed.
  3. Add the skip_upgrade_check: True option, which allows us to get through and proceed with the installation/configuration process
  4. Revert the skip_upgrade_check: True option, we see a failure complaining that Scylla is already installed.
  5. Bump Scylla minor version and add the upgrade_version: True && upgrade_major: False options, upgrade completes.

Fixes #132

This commit introduces an early check to determine whether Scylla is already installed on the target system.
It provides a mechanism to prevent incorrectly (and dangerously) upgrading a system which may be already part of a running cluster.

In such scenarios, we now require `upgrade_version: True`, which delegates ALL upgrade logic to its relevant upgrade section in the role.
A funny situation arose when we introduced the option to fail early when Scylla is already installed and upgrade_version == False:

It may happen that the role failed during its runtime, therefore leaving the target systems in an inconsistent state. When this happens, if the failure happened AFTER the Scylla package was installed, then the user would be left in an chicken-and-egg situation:
They can not re-execute the playbook installing the specified version, as the upgrade will abort since we are upgrading to the same release. Similarly, they can not install Scylla again, as it will complain that the package is already installed.

We circumvent that situation by introducing the `skip_upgrade_check` option, which is a failsafe mechanism to allow a role failure to proceed on installation even when `upgrade_version` is False. Needless to say, this option is False by default.
@fee-mendes
Copy link
Member Author

For some reason I can't add a reviewer -- so @vladzcloudius :-)

@vladzcloudius vladzcloudius self-requested a review August 25, 2022 23:22
Copy link
Collaborator

@vladzcloudius vladzcloudius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't merge it because we use a Role's state is not only scylla packages but rather a lot more: swap configuration, systemd configuration, additional packages and lot-lot more.

And sometimes we run a Role on a node/nodes with already installed Scylla in order to apply some changes to that state, e.g. to change io_properties.yml values.

Therefore the new logic this PR suggests is no-go.

What DO need to prevent unintentional upgrades and this IS the case today AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Scylla Node Role]: Refuse to upgrade unless "upgrade_version" is set to true
2 participants