Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only delay the restart of fdbserver if the process exited with an exit code other than 0 #11802

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

johscheuer
Copy link
Contributor

Fixes: #11775

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux CentOS 7

  • Commit ID: ea6991d
  • Duration 0:21:31
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: ea6991d
  • Duration 0:49:52
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: ea6991d
  • Duration 0:51:37
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: ea6991d
  • Duration 0:52:29
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: ea6991d
  • Duration 0:54:48
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: ea6991d
  • Duration 0:57:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: ea6991d
  • Duration 0:58:30
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Collaborator

@spraza spraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good. But can you add how you tested this change in the pr description?

Copy link
Contributor Author

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good. But can you add how you tested this change in the pr description?

I haven't tested the changes yet. I'll be doing the tests manually and provide the logs after testing.

@spraza
Copy link
Collaborator

spraza commented Nov 22, 2024

Change looks good. But can you add how you tested this change in the pr description?

I haven't tested the changes yet. I'll be doing the tests manually and provide the logs after testing.

Sounds good. Let me know once your test is done, we can merge then.

Copy link
Contributor Author

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the code with a small FDB cluster. Here ate the logs from the fdbmonitor:

Time="1732523616.910170" Severity="20" LogGroup="jscheuermann-jdev" Process="fdbserver.1": Process 9 exited 0, restarting in 0 seconds
Time="1732523616.910459" Severity="10" LogGroup="jscheuermann-jdev" Process="fdbserver.1": Launching /usr/bin/fdbserver (207) for fdbserver.1
Time="1732523617.000728" Severity="10" LogGroup="jscheuermann-jdev" Process="fdbserver.1": FDBD joined cluster.
Time="1732523622.951968" Severity="20" LogGroup="jscheuermann-jdev" Process="fdbserver.1": Process 207 exited 0, restarting in 0 seconds
Time="1732523622.952255" Severity="10" LogGroup="jscheuermann-jdev" Process="fdbserver.1": Launching /usr/bin/fdbserver (317) for fdbserver.1
Time="1732523623.043013" Severity="10" LogGroup="jscheuermann-jdev" Process="fdbserver.1": FDBD joined cluster.
Time="1732523630.953343" Severity="20" LogGroup="jscheuermann-jdev" Process="fdbserver.1": Process 317 exited 0, restarting in 0 seconds
Time="1732523630.953650" Severity="10" LogGroup="jscheuermann-jdev" Process="fdbserver.1": Launching /usr/bin/fdbserver (427) for fdbserver.1
Time="1732523631.044001" Severity="10" LogGroup="jscheuermann-jdev" Process="fdbserver.1": FDBD joined cluster.

and for testing I ran the following command in fdbcli:

kill; kill <addresss:port> ; sleep 3; kill <addresss:port> ; sleep 5; kill <addresss:port>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fdbmonitor restart-delay unclear documentation
4 participants