Skip to content

Conversation

maciejdudko
Copy link
Contributor

@maciejdudko maciejdudko commented Sep 30, 2025

What was changed

💥 BREAKING CHANGE: Redesigned the code flow around custom slot supplier in C bridge to ensure memory safety in case of cancellation. From Lang's point of view it works mostly the same as before, with the following differences:

  • All callbacks take an extra user_data argument (not used by .Net implementation but very useful for other languages).
  • In reserve, the ctx argument is short-lived, and completion_ctx replaces sender.
  • token_source is removed from SlotReserveCtx and function temporal_core_set_reserve_cancel_target is removed.
  • cancel_reserve takes completion_ctx argument instead of token_source.
  • temporal_core_complete_async_reserve is safe to call after cancellation, making it possible to avoid race condition. This function now returns bool indicating whether the reservation was completed or cancelled.
  • Added separate temporal_core_complete_async_cancel_reserve that has to be called after cancellation to clean up resources.

Additionally, implemented available_slots functionality, and renamed the callback type aliases to make their relation to CustomSlotSupplierCallbacks more obvious.

There is a matching PR in .Net SDK that implements the new API: temporalio/sdk-dotnet#532

Why?

The previous implementation had memory corruption bugs, see temporalio/sdk-dotnet#458

Checklist

  1. Part of [Bug] Issues with ReserveCtxFromBridge sdk-dotnet#458

  2. How was this tested:
    There is no test for the original bug. Due to the nature of this bug, it's impossible to write a test that reliably triggers the problematic conditions. The fix can be verified over time by the lack of transient CI failures in .Net SDK caused by this bug.

General tests of the custom slot supplier exercising these APIs are done inside .Net SDK, see temporalio/sdk-dotnet#532

@maciejdudko maciejdudko marked this pull request as ready for review October 1, 2025 22:59
@maciejdudko maciejdudko requested a review from a team as a code owner October 1, 2025 22:59
Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Appreciate the comments a lot. Only one comment here that's important about if we should panic in an extra spot

drop(unsafe { Arc::from_raw(completion_ctx) });
true
}
SlotReserveOperationState::Pending => false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docstring seems like this should also be a panic? It does seem wrong to call this if the cancel didn't happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it could panic, there is no reason to panic as no invariants have been broken. Safely returning false here may make it easier to make the implementation safe. The other branch panics because it potentially reads from freed memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised the docstring to make the intention clearer.

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@maciejdudko maciejdudko merged commit ee88c04 into master Oct 3, 2025
30 of 32 checks passed
@maciejdudko maciejdudko deleted the c-bridge-custom-slot-supplier-rework branch October 3, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants