Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

safe message passing sample #681

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Quinn-With-Two-Ns
Copy link
Contributor

@Quinn-With-Two-Ns Quinn-With-Two-Ns changed the title Sdk 2751 safe message passing sample Oct 4, 2024
Copy link
Contributor

@dandavison dandavison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, all looks good, except I have a concern that we're doing start/stop wrong in all our samples (except Go?).

try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you educate me on why we convert the one exception into the other?

// to make it easier to pass between runs
class ClusterManagerState {
public boolean clusterStarted;
public boolean clusterShutdown;
Copy link
Contributor

@dandavison dandavison Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mutually exclusive -- can we use an enum for them? (Also as booleans would it be more idiomatic in java for them to start with is or does that only apply to methods?)


@Override
public void stopCluster() {
Workflow.await(() -> state.clusterStarted);
Copy link
Contributor

@dandavison dandavison Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if someone sends the stop signal by mistake, it will just hang, and then the next time they start the cluster it will suddenly stop. That seems undesirable. That suggests this should be an update, so we can return success/fail.

(It's the same in the other non-Go languages)

}

@Override
public void startCluster() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check that the cluster's in an appropriate state to proceed to the next lines. That suggests this should be an update, so we can return success/fail.

(If so it's a bug in the other non-Go languages)

Comment on lines +118 to +120
// exception from there, or raise an ApplicationFailure. Other exceptions in the main
// handler
// will cause the workflow to keep retrying and get it stuck.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// exception from there, or raise an ApplicationFailure. Other exceptions in the main
// handler
// will cause the workflow to keep retrying and get it stuck.
// exception from there, or raise an ApplicationFailure. Other exceptions in the main
// handler will cause the workflow to keep retrying and get it stuck.

Comment on lines +154 to +156
// This call would be dangerous without nodesLock because it yields control and allows
// interleaving
// with assignNodesToJob and performHealthChecks, which all touch this.state.nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This call would be dangerous without nodesLock because it yields control and allows
// interleaving
// with assignNodesToJob and performHealthChecks, which all touch this.state.nodes.
// This call would be dangerous without nodesLock because it yields control and allows
// interleaving with assignNodesToJob and performHealthChecks, which all touch this.state.nodes.

Comment on lines +63 to +65
// The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint
// its state and
// continue-as-new.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint
// its state and
// continue-as-new.
// The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint
// its state and continue-as-new.

Comment on lines +37 to +39
// In workflows that continue-as-new, it's convenient to store all your state in one serializable
// structure
// to make it easier to pass between runs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I don't get your IDE's word-wrapping decisions! I've left a bunch of changes like these.

Suggested change
// In workflows that continue-as-new, it's convenient to store all your state in one serializable
// structure
// to make it easier to pass between runs
// In workflows that continue-as-new, it's convenient to store all your state in one serializable
// structure to make it easier to pass between runs

Comment on lines +157 to +160
// This is an update as opposed to a signal because the client may want to wait for nodes to be
// allocated
// before sending work to those nodes.
// Returns the list of node names that were allocated to the job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This is an update as opposed to a signal because the client may want to wait for nodes to be
// allocated
// before sending work to those nodes.
// Returns the list of node names that were allocated to the job.
// This is an update as opposed to a signal because the client may want to wait for nodes to be
// allocated before sending work to those nodes.
// Returns the list of node names that were allocated to the job.

Comment on lines +164 to +166
// Even though it returns nothing, this is an update because the client may want to track it, for
// example
// to wait for nodes to be unassigned before reassigning them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Even though it returns nothing, this is an update because the client may want to track it, for
// example
// to wait for nodes to be unassigned before reassigning them.
// Even though it returns nothing, this is an update because the client may want to track it, for
// example to wait for nodes to be unassigned before reassigning them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants