Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-379329: check for missing iSCSI sessions and reconnect #627

Merged
merged 1 commit into from
Aug 1, 2023

Conversation

MarkSymsCtx
Copy link
Contributor

If, when an iSCSI SR is attached, some of the paths are not reachable they will not be subsequently connected should the connectivity issues be resolved, without detaching the SR or rebooting the host. Neither of these options give a good experience for users of the software. Add a systemd timer which will, periodically (default 10 minutes), ask each SR to do SR specific "health" checks. Currently only a checker for the iSCSI sessions is present but more can be added to this pattern in the future as needs arise.

import util


def check_iscsi_sessions():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit weird to me. The function is about checking iSCSI sessions but then it calls a generic check_sr() method. Would it be better to just call check_sr() on every SR and then let the SR decide what to check (if anything)?

rosslagerwall
rosslagerwall previously approved these changes Jul 20, 2023
@MarkSymsCtx
Copy link
Contributor Author

I've just realised that we already have test_LVHDoISCSISR.py so I'll move the tests in this into that.

rosslagerwall
rosslagerwall previously approved these changes Jul 25, 2023

sr_uuid = srs[sr]['uuid']
sr_obj = SR.SR.from_uuid(session, sr_uuid)
sr_obj.check_sr(sr_uuid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you have a SMAPIv3 SR here, will that work, or do you need to filter them out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find a list of SMAPIv1 SRs with something like this (can be translated into python API calls instead of xe):
xe sm-list required-api-version=1.0 params=type --minimal|tr ',' '\n'|xargs -n1 -I% xe sr-list type=% --minimal|tr ',' '\n'|xargs echo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, originally it only looked for lvmoiscsi SRs and then after addressing #627 (comment) it exposes that problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now fixed and checked with concurrent LVMoISCSI and GFS2 (on iSCSI) SRs

If, when an iSCSI SR is attached, some of the paths are not reachable
they will not be subsequently connected should the connectivity issues
be resolved, without detaching the SR or rebooting the host. Neither
of these options give a good experience for users of the software. Add
a systemd timer which will, periodically (default 10 minutes), ask
each SR to do SR specific "health" checks. Currently only a checker
for the iSCSI sessions is present but more can be added to this
pattern in the future as needs arise.

Signed-off-by: Mark Syms <[email protected]>
@MarkSymsCtx MarkSymsCtx merged commit d28dcc1 into xapi-project:master Aug 1, 2023
2 checks passed
@MarkSymsCtx MarkSymsCtx deleted the CA-379329 branch August 1, 2023 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants