-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safe message passing sample #681
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, all looks good, except I have a concern that we're doing start/stop wrong in all our samples (except Go?).
try { | ||
Thread.sleep(100); | ||
} catch (InterruptedException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you educate me on why we convert the one exception into the other?
// to make it easier to pass between runs | ||
class ClusterManagerState { | ||
public boolean clusterStarted; | ||
public boolean clusterShutdown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are mutually exclusive -- can we use an enum for them? (Also as booleans would it be more idiomatic in java for them to start with is
or does that only apply to methods?)
|
||
@Override | ||
public void stopCluster() { | ||
Workflow.await(() -> state.clusterStarted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if someone sends the stop signal by mistake, it will just hang, and then the next time they start the cluster it will suddenly stop. That seems undesirable. That suggests this should be an update, so we can return success/fail.
(It's the same in the other non-Go languages)
} | ||
|
||
@Override | ||
public void startCluster() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to check that the cluster's in an appropriate state to proceed to the next lines. That suggests this should be an update, so we can return success/fail.
(If so it's a bug in the other non-Go languages)
// exception from there, or raise an ApplicationFailure. Other exceptions in the main | ||
// handler | ||
// will cause the workflow to keep retrying and get it stuck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// exception from there, or raise an ApplicationFailure. Other exceptions in the main | |
// handler | |
// will cause the workflow to keep retrying and get it stuck. | |
// exception from there, or raise an ApplicationFailure. Other exceptions in the main | |
// handler will cause the workflow to keep retrying and get it stuck. |
// This call would be dangerous without nodesLock because it yields control and allows | ||
// interleaving | ||
// with assignNodesToJob and performHealthChecks, which all touch this.state.nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This call would be dangerous without nodesLock because it yields control and allows | |
// interleaving | |
// with assignNodesToJob and performHealthChecks, which all touch this.state.nodes. | |
// This call would be dangerous without nodesLock because it yields control and allows | |
// interleaving with assignNodesToJob and performHealthChecks, which all touch this.state.nodes. |
// The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint | ||
// its state and | ||
// continue-as-new. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint | |
// its state and | |
// continue-as-new. | |
// The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint | |
// its state and continue-as-new. |
// In workflows that continue-as-new, it's convenient to store all your state in one serializable | ||
// structure | ||
// to make it easier to pass between runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha I don't get your IDE's word-wrapping decisions! I've left a bunch of changes like these.
// In workflows that continue-as-new, it's convenient to store all your state in one serializable | |
// structure | |
// to make it easier to pass between runs | |
// In workflows that continue-as-new, it's convenient to store all your state in one serializable | |
// structure to make it easier to pass between runs |
// This is an update as opposed to a signal because the client may want to wait for nodes to be | ||
// allocated | ||
// before sending work to those nodes. | ||
// Returns the list of node names that were allocated to the job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This is an update as opposed to a signal because the client may want to wait for nodes to be | |
// allocated | |
// before sending work to those nodes. | |
// Returns the list of node names that were allocated to the job. | |
// This is an update as opposed to a signal because the client may want to wait for nodes to be | |
// allocated before sending work to those nodes. | |
// Returns the list of node names that were allocated to the job. |
// Even though it returns nothing, this is an update because the client may want to track it, for | ||
// example | ||
// to wait for nodes to be unassigned before reassigning them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Even though it returns nothing, this is an update because the client may want to track it, for | |
// example | |
// to wait for nodes to be unassigned before reassigning them. | |
// Even though it returns nothing, this is an update because the client may want to track it, for | |
// example to wait for nodes to be unassigned before reassigning them. |
Add safe message passing sample.
Code based on https://github.com/temporalio/samples-python/tree/main/message_passing/safe_message_handlers