Skip to content

Conversation

Neilhamza
Copy link

@Neilhamza Neilhamza commented Sep 16, 2025

  • What I did
    Added a new MachineConfig template file under templates/master/00-master/two-node-with-fencing/files/ that installs the fencing_validator.sh script to /usr/local/bin/ on control-plane nodes for Two-Node Fencing clusters.

  • How to verify it

Deploy a Two-Node Fencing cluster.

Verify the MachineConfig for masters includes the new file.

On a master node, run:
oc debug node/ -- chroot /host ls -l /usr/local/bin/fencing_validator
oc debug node/ -- chroot /host /usr/local/bin/fencing_validator --help

copy it into the hypervisor:
oc debug node/ -- chroot /host cat /usr/local/bin/fencing_validator > fencing_validator

chmod +x fencing_validator
The script should be present, executable (0755), and runnable.

  • Description for the changelog
    Ship /usr/local/bin/fencing_validator.sh via MCO for Two-Node Fencing clusters.
image

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 16, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 16, 2025

@Neilhamza: This pull request references OCPEDGE-2188 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

  • What I did
    Added a new MachineConfig template file under templates/master/00-master/two-node-with-fencing/files/ that installs the fencing_validator.sh script to /usr/local/bin/ on control-plane nodes for Two-Node Fencing clusters.

  • How to verify it

Deploy a Two-Node Fencing cluster.

Verify the MachineConfig for masters includes the new file.

On a master node, run:

oc debug node/ -- chroot /host ls -l /usr/local/bin/fencing_validator.sh
oc debug node/ -- chroot /host /usr/local/bin/fencing_validator.sh --help

The script should be present, executable (0755), and runnable.

  • Description for the changelog
    Ship /usr/local/bin/fencing_validator.sh via MCO for Two-Node Fencing clusters.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 16, 2025
Copy link
Contributor

openshift-ci bot commented Sep 16, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Neilhamza
Once this PR has been reviewed and has the lgtm label, please assign cheesesashimi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Neilhamza Neilhamza changed the title [WIP] OCPEDGE-2188: embed fencing validator into TNF MCO OCPEDGE-2188: embed fencing validator into TNF MCO Sep 16, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 16, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 16, 2025

@Neilhamza: This pull request references OCPEDGE-2188 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

  • What I did
    Added a new MachineConfig template file under templates/master/00-master/two-node-with-fencing/files/ that installs the fencing_validator.sh script to /usr/local/bin/ on control-plane nodes for Two-Node Fencing clusters.

  • How to verify it

Deploy a Two-Node Fencing cluster.

Verify the MachineConfig for masters includes the new file.

On a master node, run:

oc debug node/ -- chroot /host ls -l /usr/local/bin/fencing_validator.sh
oc debug node/ -- chroot /host /usr/local/bin/fencing_validator.sh --help

The script should be present, executable (0755), and runnable.

  • Description for the changelog
    Ship /usr/local/bin/fencing_validator.sh via MCO for Two-Node Fencing clusters.
image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@eggfoobar eggfoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I had some suggestions and questions. This is my initial pass, I'll give it another review once I deploy and test it on a cluster.

fi

for ip in "$IP_A" "$IP_B"; do
awk -F'|' -v ip="$ip" '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the output as json, we should be able to check if both IPs exist with jq here. wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same concern as above what are your thoughts?

@Neilhamza Neilhamza requested a review from eggfoobar September 21, 2025 07:06
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 22, 2025

@Neilhamza: This pull request references OCPEDGE-2188 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

  • What I did
    Added a new MachineConfig template file under templates/master/00-master/two-node-with-fencing/files/ that installs the fencing_validator.sh script to /usr/local/bin/ on control-plane nodes for Two-Node Fencing clusters.

  • How to verify it

Deploy a Two-Node Fencing cluster.

Verify the MachineConfig for masters includes the new file.

On a master node, run:

oc debug node/ -- chroot /host ls -l /usr/local/bin/fencing_validator.sh
oc debug node/ -- chroot /host /usr/local/bin/fencing_validator.sh --help

The script should be present, executable (0755), and runnable.

  • Description for the changelog
    Ship /usr/local/bin/fencing_validator.sh via MCO for Two-Node Fencing clusters.
image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 22, 2025

@Neilhamza: This pull request references OCPEDGE-2188 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

  • What I did
    Added a new MachineConfig template file under templates/master/00-master/two-node-with-fencing/files/ that installs the fencing_validator.sh script to /usr/local/bin/ on control-plane nodes for Two-Node Fencing clusters.

  • How to verify it

Deploy a Two-Node Fencing cluster.

Verify the MachineConfig for masters includes the new file.

On a master node, run:
oc debug node/ -- chroot /host ls -l /usr/local/bin/fencing_validator
oc debug node/ -- chroot /host /usr/local/bin/fencing_validator --help

copy it into the hypervisor:
oc debug node/ -- chroot /host cat /usr/local/bin/fencing_validator > fencing_validator

chmod +x fencing_validator
The script should be present, executable (0755), and runnable.

  • Description for the changelog
    Ship /usr/local/bin/fencing_validator.sh via MCO for Two-Node Fencing clusters.
image

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@eggfoobar eggfoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had some more minor suggestinos

@Neilhamza Neilhamza requested a review from eggfoobar September 29, 2025 13:33
Copy link
Contributor

openshift-ci bot commented Sep 30, 2025

@Neilhamza: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-mco-disruptive 7f5c253 link false /test e2e-aws-mco-disruptive
ci/prow/e2e-gcp-mco-disruptive 7f5c253 link false /test e2e-gcp-mco-disruptive
ci/prow/e2e-aws-ovn-upgrade 7f5c253 link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn 7f5c253 link true /test e2e-aws-ovn
ci/prow/e2e-gcp-op-2of2 7f5c253 link true /test e2e-gcp-op-2of2
ci/prow/bootstrap-unit 7f5c253 link false /test bootstrap-unit
ci/prow/e2e-gcp-op-ocl 7f5c253 link false /test e2e-gcp-op-ocl
ci/prow/e2e-azure-ovn-upgrade-out-of-change 7f5c253 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/okd-scos-e2e-aws-ovn 7f5c253 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-hypershift 7f5c253 link true /test e2e-hypershift

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@eggfoobar eggfoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, had some small suggestions

get_internal_ip() {
local node="$1"
oc_run get node "$node" -o json |
jq -r '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this a bit since we just want the first internal ip of the node, or else either empty or add ? to return the status code.

jq -re '[.status.addresses[] | select(.type == "InternalIP")][0].address // empty'

EXIT_FENCING_SECRETS_MISMATCH=26
EXIT_DAEMONS_BAD=22
EXIT_ETCD_NOT_READY=23
EXIT_ETCD_FATAL=24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unused

}

pcmk_online() {
local want="$1" s="${1%%.*}" names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets avoid using single letter variables, change s to a word to define it's purpose.

}

wait_not_ready() {
local n="$1" deadline=$((SECONDS + TIMEOUT))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, n to something descriptive

out="$(host_run "$tgt" \
"podman exec etcd sh -lc 'ETCDCTL_API=3 etcdctl -w json member list'")" &&
jq -e --arg ipa "$IP_A" --arg ipb "$IP_B" '
.members as $m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're explicit about etcd version 3 API, we should be able to simplify this a bit to just

.members | map(select(.isLearner | not)) | any(.clientURLs[] | contains($ipa)) and any(.clientURLs[] | contains($ipb))


wait_ready() {
local n="$1" deadline=$((SECONDS + TIMEOUT))
log "Waiting for '$n' Ready (API)…"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, if not only the variable but log messages should be a bit more descriptive about what we are waiting for to be ready

log_ok() {
printf '\033[32m[OK]\033[0m %s\n' "$*"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol small nit, this space is bothering me, it's the only cluster of functions that is not spread by one space, let's keep it uniform and add a single space between all these helper functions

[[ "$1" == *:* ]]
}
fmt_host() {
local h="$1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets clarify this a bit, make fmt_host a bit more descriptive, and change the h to something more descriptive as well. Since this is formatting the host IP or url to safe wrap for ipv6, something like this should be clear enough fmt_host_ip

local node="$1" ns="openshift-etcd" short_node
short_node="$(short_hostname "$node")"
oc_run -n "$ns" get secret -o json 2>/dev/null |
jq -e --arg node "$node" --arg short "$short_node" '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets simplify this a bit, if we know that the fencing-credentials will be a prefix, we can make this a bit more legible with this query, since we know the short-node name will always be present after the prefix, we can just look for fencing-credentials-$short.*. This avoids any mistakes of using secrets that might just be called the short hostname

'.items[] | select(.metadata.name | test("fencing-credentials-$short.*")) | .metadata.name?'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants