Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sub orchestrator retries #84

Merged
merged 4 commits into from
Nov 27, 2024
Merged

Conversation

famarting
Copy link
Contributor

second part for dapr/go-sdk#541

adds support for call suborchestrator with retries

based on the existing implementation for activities

@famarting
Copy link
Contributor Author

ping @cgillum

Signed-off-by: Fabian Martinez <[email protected]>
@famarting
Copy link
Contributor Author

ping

@cgillum
Copy link
Member

cgillum commented Nov 20, 2024

@famarting looks like your new tests are not passing. The CI is reporting timeouts.

@famarting
Copy link
Contributor Author

@cgillum tests are failing because of this error

ERROR: 2024/11/20 19:09:57 orchestration-processor: failed to complete work item: orchestration instance already exists

It makes me think that the strategy I followed to implement sub orchestrator retries is not a valid strategy? do you have any idea or suggestion how to address this?

First I implemented activity retries and I implemented a recursive algorithm that using timers scheduled the activity introducing delays between tries... however this doesnt seem to work OOB for sub orchestrations due to the "already exists" error. Additionally, looking at the python sdk, seems like retries are not implemented using a recursive algorithm, instead extra logic has been added into process event to account for delays and retries.... am I forced to follow the same strategy as python? or what options do we have?

@cgillum
Copy link
Member

cgillum commented Nov 22, 2024

My first guess is that the sqlite backend implementation doesn't allow reusing the existing sub-orchestration ID, so it's returning ErrDuplicateInstance from the call to CompleteOrchestrationWorkItem.

Your test might pass if you forgo providing a specific instance ID and allow the sub-orchestration to use a random instance ID. However, if you want the scenario of reusing a user-specific instance ID to work, then we'll need to change sqlite to handle the case where an existing instance already exists such that we delete the old (failed) instance and create the new instance. Note that the Dapr Actors backend might require similar changes.

@famarting famarting force-pushed the sub-orchestrator-retries branch from 404799e to 7572160 Compare November 26, 2024 09:42
@famarting
Copy link
Contributor Author

My first guess is that the sqlite backend implementation doesn't allow reusing the existing sub-orchestration ID, so it's returning ErrDuplicateInstance from the call to CompleteOrchestrationWorkItem.

Your test might pass if you forgo providing a specific instance ID and allow the sub-orchestration to use a random instance ID. However, if you want the scenario of reusing a user-specific instance ID to work, then we'll need to change sqlite to handle the case where an existing instance already exists such that we delete the old (failed) instance and create the new instance. Note that the Dapr Actors backend might require similar changes.

ok, it was much simpler than I thought, thank you

@cgillum
Copy link
Member

cgillum commented Nov 27, 2024

@famarting looks like the new test is still failing due to a mismatch in expected OTel span outputs:

 Error Trace:	/home/runner/work/durabletask-go/durabletask-go/tests/tracing_test.go:172
        	        /home/runner/work/durabletask-go/durabletask-go/tests/tracing_test.go:144
        	        /home/runner/work/durabletask-go/durabletask-go/tests/tracing_test.go:125
        	        /home/runner/work/durabletask-go/durabletask-go/tests/tracing_test.go:30
        	        /home/runner/work/durabletask-go/durabletask-go/tests/orchestrations_test.go:466
 Error:      	[]attribute.KeyValue{attribute.KeyValue{Key:"durabletask.type", Value:attribute.Value{vtype:4, numeric:0x0, stringly:"orchestration", slice:interface {}(nil)}}, attribute.KeyValue{Key:"durabletask.task.name", Value:attribute.Value{vtype:4, numeric:0x0, stringly:"Child", slice:interface {}(nil)}}, attribute.KeyValue{Key:"durabletask.task.instance_id", Value:attribute.Value{vtype:4, numeric:0x0, stringly:"8e5d728a-cce1-4b74-8928-d5df2a648de3_child", slice:interface {}(nil)}}, attribute.KeyValue{Key:"durabletask.runtime_status", Value:attribute.Value{vtype:4, numeric:0x0, stringly:"FAILED", slice:interface {}(nil)}}} does not contain attribute.KeyValue{Key:"durabletask.task.task_id", Value:attribute.Value{vtype:2, numeric:0x0, stringly:"", slice:interface {}(nil)}}
 Test:       	Test_SingleSubOrchestrator_Failed_Retries

It looks like you're attempting to assert a particular task ID for the sub-orchestrations, but there aren't any such task ID attributes getting generated for sub-orchestrations. I don't see similar asserts for task IDs in the other sub-orchestration test (suggesting that they're not expected to exist), so I wonder if you should just remove the task ID asserts you added for the sub-orchestrations.

Signed-off-by: Fabian Martinez <[email protected]>
@famarting
Copy link
Contributor Author

followed your feedback, thank you

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Everything's passing now.

@cgillum cgillum merged commit 6ab94e1 into microsoft:main Nov 27, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants