Skip to content

Feat/child context#100

Merged
zhongkechen merged 23 commits intomainfrom
feat/child-context
Feb 20, 2026
Merged

Feat/child context#100
zhongkechen merged 23 commits intomainfrom
feat/child-context

Conversation

@maschnetwork
Copy link
Copy Markdown
Contributor

@maschnetwork maschnetwork commented Feb 16, 2026

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Issue Link, if available

#36

Description

Adds runInChildContext and runInChildContextAsync to the Java Durable Execution SDK. Each child context gets its own operation counter and checkpoint log, enabling concurrent branches of work with per-context determinism.

Implementation details:

  • ChildContextOperation extends BaseDurableOperation<T> and manages the child context lifecycle: START (fire-and-forget), execute user function in a separate thread, SUCCEED/FAIL (blocking checkpoint).
  • parentId propagated through BaseDurableOperation to all operation subclasses, replacing the hardcoded null.
  • Per-context replay state tracked via isReplaying on each DurableContext instance, since a child may be replaying while the parent is already executing.
  • Operation IDs are prefixed with the parent context's ID using - as separator (e.g., "3-1", "3-2" for operations inside parent "3"). This matches the JS SDK's stepPrefix convention and ensures global uniqueness — the backend validates type consistency by operation ID alone. Nested contexts chain naturally (e.g., "3-2-1").
  • Large results (≥256KB) trigger the ReplayChildren flow — SUCCEED checkpoint with empty payload + ContextOptions { replayChildren: true }, reconstructed via re-execution on replay.
  • ChildContextFailedException follows the same pattern as StepFailedException.

Deferred:

  • Orphan detection in CheckpointBatcher (preventing stale checkpoints from in-flight child operations after parent completes)
  • summaryGenerator for large-result observability

See docs/design-run-in-child-context.md for the full design.

Demo/Screenshots

image image

Checklist

  • I have filled out every section of the PR template
  • I have thoroughly tested this change

Testing

Unit Tests

  • ChildContextOperationTest — covers first execution, replay SUCCEEDED, replay FAILED, replay STARTED, replayChildren path, non-deterministic detection, and error handling.
  • DurableContextTest, ReplayValidationTest — updated for createRootContext factory method.

Integration Tests

  • ChildContextIntegrationTest — 15 different cases including waits and nesting.

Examples

  • ChildContextExample — demonstrates three concurrent child contexts (two with step+wait, one with a nested child context), collected via DurableFuture.allOf.
  • Cloud test added to CloudBasedIntegrationTest.

…ite key lookups

- Create OperationKey record in execution/ package with parentId/operationId
  fields and static factory methods of() and fromOperation()
- Refactor ExecutionManager.operations map from Map<String, Operation> to
  Map<OperationKey, Operation> for scoped operation lookups
- Update fetchAllPages collector to use OperationKey.fromOperation(op)
- Change getOperationAndUpdateReplayState to accept parentId parameter
- Refactor openPhasers map to Map<OperationKey, Phaser>
- Update startPhaser to accept parentId parameter
- Update BaseDurableOperation to pass null as parentId (temporary until
  parentId propagation is wired in task 3)
- Update all test files to match new method signatures
- Add parentId field to DurableContext to track parent-child relationships
- Pass parentId through all operation constructors (StepOperation, WaitOperation, InvokeOperation, CallbackOperation)
- Update BaseDurableOperation to accept and store parentId parameter
- Add convenience constructor in BaseDurableOperation for root-context operations where parentId is null
- Update ExecutionManager calls to use parentId for scoped operation lookups instead of null
- Add getContextId() method to DurableContext to retrieve parentId
- Add getParentId() protected method to BaseDurableOperation for accessing parent context ID
- Update operation update builder to include parentId instead of hardcoded null value
- Enables proper operation tracking and isolation within child execution contexts
- Add isReplaying field to track per-context replay mode state
- Initialize replay state based on cached operations in ExecutionManager
- Add setExecutionMode() to transition context from replay to execution
- Refactor DurableContext constructor to private shared initialization
- Add createRootContext() static factory methods for root context creation
- Add createChildContext() static factory method for child context creation
- Add hasOperationsForContext() to ExecutionManager for replay state detection
- Update DurableExecutor to use createRootContext() factory method
- Update test fixtures to use new factory methods
- Enables per-context replay tracking for improved operation handling
…nd unit tests

- Add ChildContextOperation extending BaseDurableOperation with execute/get lifecycle
- Handle first execution (START fire-and-forget, then run child context)
- Handle replay: SUCCEEDED (cached result), FAILED (re-throw), STARTED (re-execute)
- Implement large result handling (>=256KB) via ReplayChildren flow
- Add ChildContextFailedException for non-reconstructable exception fallback
- Add DurableContext.createChildContext() factory (no thread registration)
- Add per-context replay state (isReplaying field) to DurableContext
- Unit tests covering replay scenarios, failure preservation, and non-determinism detection
zhongkechen
zhongkechen previously approved these changes Feb 16, 2026
zhongkechen
zhongkechen previously approved these changes Feb 19, 2026
var contextId = getOperationId();

// Register child context thread before executor runs (prevents suspension)
registerActiveThread(contextId, ThreadType.CONTEXT);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's registered twice and also setCurrentContext is called twice here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same pattern as StepOperation.executeStepLogic. registerActiveThread runs on the parent thread to ensure the child is tracked before the parent can deregister (preventing a false "no active threads" suspension), while setCurrentContext sets the ThreadLocal on the actual child execution thread.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a more detailed comment for this


// Register root context thread as active
/** Creates a root context with the given contextId and registers the current thread. */
static DurableContext createRootContext(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a Root context if a contextId is specified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can actually merge the two constructors to make it more concrete. Good call.

@maschnetwork maschnetwork marked this pull request as ready for review February 20, 2026 14:48
zhongkechen
zhongkechen previously approved these changes Feb 20, 2026
@zhongkechen zhongkechen merged commit f40431b into main Feb 20, 2026
7 of 10 checks passed
@zhongkechen zhongkechen deleted the feat/child-context branch February 20, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants