New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

safe message passing sample #681

Open

Quinn-With-Two-Ns wants to merge 5 commits into temporalio:main from Quinn-With-Two-Ns:SDK-2751

Contributor

Quinn-With-Two-Ns commented Oct 4, 2024

Add safe message passing sample.

Code based on https://github.com/temporalio/samples-python/tree/main/message_passing/safe_message_handlers

Quinn-With-Two-Ns added 3 commits

October 4, 2024 08:44


          Add safe message passing sample

14d674e

fix

17575da


          Add unit tests

45a0aa6

Quinn-With-Two-Ns requested review from tsurdilo, antmendoza and a team as code owners

October 4, 2024 16:07


          Fix licenses headers

71315e5

Quinn-With-Two-Ns changed the title ~~Sdk 2751~~ safe message passing sample


          run spotless

dandavison reviewed

View reviewed changes

Contributor

dandavison left a comment

Nice, all looks good, except I have a concern that we're doing start/stop wrong in all our samples (except Go?).

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerActivitiesImpl.java

+                  try {
+                    Thread.sleep(100);
+                  } catch (InterruptedException e) {
+                    throw new RuntimeException(e);

Contributor

dandavison Oct 4, 2024

Can you educate me on why we convert the one exception into the other?

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflow.java

+                // to make it easier to pass between runs
+                class ClusterManagerState {
+                  public boolean clusterStarted;
+                  public boolean clusterShutdown;

Contributor

dandavison Oct 4, 2024 •

edited

Loading

These are mutually exclusive -- can we use an enum for them? (Also as booleans would it be more idiomatic in java for them to start with is or does that only apply to methods?)

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflowImpl.java

+                @Override
+                public void stopCluster() {
+                  Workflow.await(() -> state.clusterStarted);

Contributor

dandavison Oct 4, 2024 •

edited

Loading

So if someone sends the stop signal by mistake, it will just hang, and then the next time they start the cluster it will suddenly stop. That seems undesirable. That suggests this should be an update, so we can return success/fail.

(It's the same in the other non-Go languages)

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflowImpl.java

+                }
+                @Override
+                public void startCluster() {

Contributor

dandavison Oct 4, 2024

I think we need to check that the cluster's in an appropriate state to proceed to the next lines. That suggests this should be an update, so we can return success/fail.

(If so it's a bug in the other non-Go languages)

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflowImpl.java

Comment on lines +118 to +120

+                      // exception from there, or raise an ApplicationFailure. Other exceptions in the main
+                      // handler
+                      // will cause the workflow to keep retrying and get it stuck.

Contributor

dandavison Oct 4, 2024

Suggested change

      
                    // exception from there, or raise an ApplicationFailure. Other exceptions in the main
          
                    // handler
          
                    // will cause the workflow to keep retrying and get it stuck.
          
                    // exception from there, or raise an ApplicationFailure. Other exceptions in the main
          
                    // handler will cause the workflow to keep retrying and get it stuck.

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflowImpl.java

Comment on lines +154 to +156

+                    // This call would be dangerous without nodesLock because it yields control and allows
+                    // interleaving
+                    // with assignNodesToJob and performHealthChecks, which all touch this.state.nodes.

Contributor

dandavison Oct 4, 2024

Suggested change

      
                  // This call would be dangerous without nodesLock because it yields control and allows
          
                  // interleaving
          
                  // with assignNodesToJob and performHealthChecks, which all touch this.state.nodes.
          
                  // This call would be dangerous without nodesLock because it yields control and allows
          
                  // interleaving with assignNodesToJob and performHealthChecks, which all touch this.state.nodes.

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflowImpl.java

Comment on lines +63 to +65

+                  // The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint
+                  // its state and
+                  // continue-as-new.

Contributor

dandavison Oct 4, 2024

Suggested change

      
                // The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint
          
                // its state and
          
                // continue-as-new.
          
                // The cluster manager is a long-running "entity" workflow so we need to periodically checkpoint
          
                // its state and continue-as-new.

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflow.java

Comment on lines +37 to +39

+                // In workflows that continue-as-new, it's convenient to store all your state in one serializable
+                // structure
+                // to make it easier to pass between runs

Contributor

dandavison Oct 4, 2024

Haha I don't get your IDE's word-wrapping decisions! I've left a bunch of changes like these.

Suggested change

      
              // In workflows that continue-as-new, it's convenient to store all your state in one serializable
          
              // structure
          
              // to make it easier to pass between runs
          
              // In workflows that continue-as-new, it's convenient to store all your state in one serializable
          
              // structure to make it easier to pass between runs

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflow.java

Comment on lines +157 to +160

+                // This is an update as opposed to a signal because the client may want to wait for nodes to be
+                // allocated
+                // before sending work to those nodes.
+                // Returns the list of node names that were allocated to the job.

Contributor

dandavison Oct 4, 2024

Suggested change

      
              // This is an update as opposed to a signal because the client may want to wait for nodes to be
          
              // allocated
          
              // before sending work to those nodes.
          
              // Returns the list of node names that were allocated to the job.
          
              // This is an update as opposed to a signal because the client may want to wait for nodes to be
          
              // allocated before sending work to those nodes.
          
              // Returns the list of node names that were allocated to the job.

core/src/main/java/io/temporal/samples/safemessagepassing/ClusterManagerWorkflow.java

Comment on lines +164 to +166

+                // Even though it returns nothing, this is an update because the client may want to track it, for
+                // example
+                // to wait for nodes to be unassigned before reassigning them.

Contributor

dandavison Oct 4, 2024

Suggested change

      
              // Even though it returns nothing, this is an update because the client may want to track it, for
          
              // example
          
              // to wait for nodes to be unassigned before reassigning them.
          
              // Even though it returns nothing, this is an update because the client may want to track it, for
          
              // example to wait for nodes to be unassigned before reassigning them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet