Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cluster_aws): moved var-lib-scylla.mount away after restart #2839

Merged
merged 1 commit into from
Nov 8, 2020

Conversation

ShlomiBalalis
Copy link
Contributor

Since scylla_create_devices fails if var-lib-scylla.mount already exists during its execution,
I moved var-lib-scylla.mount from its regular location

PR pre-checks (self review)

  • I followed KISS principle and best practices
  • I didn't leave commented-out/debugging code
  • I added the relevant backport labels
  • New configuration option are added and documented (in sdcm/sct_config.py)
  • I have added tests to cover my changes (Infrastructure only - under unit-test/ folder)
  • All new and existing unit tests passed (CI)
  • I have updated the Readme/doc folder accordingly (if needed)

Since scylla_create_devices fails if var-lib-scylla.mount already exists during its execution,
I moved var-lib-scylla.mount from its regular location
@@ -675,6 +675,12 @@ def restart(self):
try:
self.stop_scylla_server(verify_down=False)

# Moving var-lib-scylla.mount away, since scylla_create_devices fails if it already exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var-lib-scylla.mount file is created by scylla_setup.
what's the error if var-lib-scylla.mount exists?

If the var-lib-scylla.mount is moved, will the RAID be mounted correctly after reboot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t know, isn’t it created by create_devices?

We saw that create devices is failing when this file exist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem like @syuu1228 did this change few months ago:
scylladb/scylladb@c0b2933

I think we should remove it file inside create_devices script, if that's the case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t know, isn’t it created by create_devices?

We saw that create devices is failing when this file exist

In restart() we call scylla_create_devices script to setup RAID, the existing var-lib-scylla.mount won't effect it.
it only effects scylla_setup script, but we didn't call it in restart().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem like @syuu1228 did this change few months ago:
scylladb/scylla@c0b2933

The commit changed scylla_setup, but we won't call it in restart().

I think we should remove it file inside create_devices script, if that's the case

@ShlomiBalalis Need to check the source of the error someone touched, is it from scylla_setup, why it's called.

Currently we only directly call scylla_setup script for GCE backend in init stage.
A flag file /etc/scylla/machine_image_configured will be created after scylla_setup -> scylla_image_setup succeeds, then scylla-image-setup won't be called again.

@amoskong
Copy link
Contributor

amoskong commented Nov 5, 2020

There are two errors in executing scylla_setup or scylla_raid_setup:

I guess @ShlomiBalalis touched Error 2, I'm ok with this PR.

@@ -675,6 +675,12 @@ def restart(self):
try:
self.stop_scylla_server(verify_down=False)

# Moving var-lib-scylla.mount away, since scylla_create_devices fails if it already exists
mount_path = "/etc/systemd/system/var-lib-scylla.mount"
existance_check = self.remoter.run(f'sudo test -e {mount_path}', ignore_status=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to directly remove it without checking if it exists or not.

sudo rm -f /etc/systemd/system/var-lib-scylla.mount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roy wanted me to save the file for debugging purposes

@fruch fruch merged commit 5d7771b into scylladb:master Nov 8, 2020
@fruch
Copy link
Contributor

fruch commented Nov 8, 2020

@ShlomiBalalis I'm merging this, but please open a bug, so we'll fix it in the correct place

@ShlomiBalalis
Copy link
Contributor Author

@roydahan
Copy link
Contributor

roydahan commented Nov 8, 2020

@fruch Why do we need a bug for it in scylla-machine-image?

@fruch
Copy link
Contributor

fruch commented Nov 8, 2020

@fruch Why do we need a bug for it in scylla-machine-image?

We need to handle this inside create_devices scripts.
otherwise it would be break any user that would be trying to follow the procedure:
https://docs.scylladb.com/operating-scylla/procedures/cluster-management/rebuild-node/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants