Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping Execution based on Cluster Service #1219

Merged
merged 1 commit into from
Aug 21, 2024

Conversation

sarthakaggarwal97
Copy link
Contributor

@sarthakaggarwal97 sarthakaggarwal97 commented Jul 31, 2024

Description

Currently, we make expensive node infos calls to verify if all the nodes are on the same version or not, before we execute index-management actions.

With this change, we would be able to avoid that broadcast request, and rely on the cluster service to provide us with the min and max versions across the cluster.

Related Issues

Resolves #1075

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Testing

Cluster Configuration:

  1. Data Node: OS v2.12
  2. Data Node: OS v2.15

Case 1: Mixed Cluster

Both nodes with different versions are present in the cluster.

[2024-08-17T10:51:04,896][INFO ][o.o.i.i.SkipExecution ] Current max 2.15.0
[2024-08-17T10:51:04,896][INFO ][o.o.i.i.SkipExecution    ] Current min 2.12.0
[2024-08-17T10:51:04,897][INFO ][o.o.i.i.SkipExecution    ] There are multiple versions of Index Management plugins in the cluster: [2.15.0, 2.12.0]
[2024-08-17T10:51:04,900][INFO ][o.o.d.PeerFinder         ] setting findPeersInterval to [1s] as node commission status = [true] for local node [{bcd0743e920a.ant.amazon.com}{mPc9ej1FQ4y9Z4w_yUOqTQ}{hc4kfB-LTr-GdPmU7VMrQQ}{127.0.0.1}{127.0.0.1:9300}{dimr}{shard_indexing_pressure_enabled=true}]
[2024-08-17T10:55:23,670][INFO ][o.o.j.s.JobSweeper       ] Running full sweep
[2024-08-17T10:56:04,921][INFO ][o.o.i.i.SkipExecution    ] Current max 2.15.0
[2024-08-17T10:56:04,922][INFO ][o.o.i.i.SkipExecution    ]  Current min 2.12.0
[2024-08-17T10:56:04,922][INFO ][o.o.i.i.SkipExecution    ] There are multiple versions of Index Management plugins in the cluster: [2.15.0, 2.12.0]

It is visible that the skip execution scheduler is able to check the cluster state multiple times to verify that different nodes exist.

Created a rollover policy with mixed cluster.

[2024-08-17T11:06:04,912][INFO ][o.o.i.i.SkipExecution ] Current max 2.15.0
[2024-08-17T11:06:04,914][INFO ][o.o.i.i.SkipExecution ] Current min 2.12.0
[2024-08-17T11:06:04,914][INFO ][o.o.i.i.SkipExecution ] There are multiple versions of Index Management plugins in the cluster: [2.15.0, 2.12.0]
[2024-08-17T11:07:23,924][INFO ][o.o.j.s.JobScheduler ] Will delay 30531 miliseconds for next execution of job log-000001
[2024-08-17T11:07:23,929][INFO ][o.o.i.i.ManagedIndexRunner] Cluster still has nodes running old version ISM plugin, skip execution on new nodes until all nodes upgraded

The plugin is delaying the execution of the rollover.

Case 2: Mixed Cluster

Stopping Data Node: OS v2.12

Rollover was successful once only one node was present or all the nodes were of the same OS version.

╰─ curl --location --request GET 'http://localhost:9201/_plugins/_ism/explain/log-000001?pretty' \
--data-raw ''
{
"log-000001" : {
"index.plugins.index_state_management.policy_id" : "rollover_policy",
"index.opendistro.index_state_management.policy_id" : "rollover_policy",
"index" : "log-000001",
"index_uuid" : "YxSf5gHiRhuUuv5J9oCP_g",
"policy_id" : "rollover_policy",
"policy_seq_no" : 0,
"policy_primary_term" : 1,
"index_creation_date" : 1723872442738,
"state" : {
"name" : "rollover",
"start_time" : 1723873754028
},
"retry_info" : {
"failed" : false,
"consumed_retries" : 0
},
"info" : {
"message" : "Successfully initialized policy: rollover_policy"
},
"enabled" : true
},
"total_managed_indices" : 1
}

Copy link
Collaborator

@vikasvb90 vikasvb90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change LGTM! Please check if failing tests in build are related. Make sure build is green.

@sarthakaggarwal97
Copy link
Contributor Author

Please check if failing tests in build are related. Make sure build is green.

@vikasvb90 doesn't look like PR related, the BWC workflow has been failing for other PRs (open and merged) as well.

@sarthakaggarwal97
Copy link
Contributor Author

@vikasvb90 @bowenlan-amzn please let me know if there are more comments. If it looks good, we can merge it!

@vikasvb90 vikasvb90 merged commit d6da55c into opensearch-project:main Aug 21, 2024
28 of 29 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/index-management/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/index-management/backport-2.x
# Create a new branch
git switch --create backport/backport-1219-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d6da55cdcb9ac96163644f0bfa295d56db5fd915
# Push it to GitHub
git push --set-upstream origin backport/backport-1219-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/index-management/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1219-to-2.x.

sarthakaggarwal97 added a commit to sarthakaggarwal97/index-management that referenced this pull request Aug 21, 2024
Signed-off-by: Sarthak Aggarwal <[email protected]>
(cherry picked from commit d6da55c)
sarthakaggarwal97 added a commit to sarthakaggarwal97/index-management that referenced this pull request Aug 22, 2024
Signed-off-by: Sarthak Aggarwal <[email protected]>
(cherry picked from commit d6da55c)
vikasvb90 pushed a commit that referenced this pull request Aug 23, 2024
* skipping execution based on cluster service (#1219)

Signed-off-by: Sarthak Aggarwal <[email protected]>
(cherry picked from commit d6da55c)

* compile time fixes

Signed-off-by: Sarthak Aggarwal <[email protected]>

---------

Signed-off-by: Sarthak Aggarwal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants