-
Notifications
You must be signed in to change notification settings - Fork 575
Description
Summary
I’d like to discuss adding a "non-expiring" / "manual cleanup" mode for sandbox creation, so that sandboxes do not require TTL-based auto-expiration and can instead be cleaned up explicitly by the application layer.
This should ideally work consistently across both Docker and Kubernetes backends.
Why this is needed
Current, sandbox lifecycle is strongly TTL-driven:
- create requires
timeout - server computes
expiresAt - backend auto-expires the sandbox
- caller can renew expiration, but cannot disable expiration
This works well for short-lived workloads, but it is limiting for application-integrated scenarios where the upper layer already owns lifecycle and cleanup.
Use Cases
1. Session/workspace sandboxes
An application creates one sandbox per user session or workspace and wants to delete it only when the session ends.
Examples:
- web IDE
- notebook workspace
- coding interview environment
2. External workflow/orchestrator-managed cleanup
A higher-level system already decides when a sandbox should be deleted.
Examples:
- cleanup after workflow completion
- cleanup tied to external job state
- business-driven retry/recovery flows
3. Manual debugging / review
A sandbox should remain available until a human explicitly cleans it up.
Examples:
- failure investigation
- QA reproduction environment
- post-run inspection
4. Stateful application integration
A sandbox may need to stay alive while the application coordinates volume export, snapshot, or handoff.
Proposal
Introduce an explicit expiration mode instead of relying only on timeout.
For example:
ttl: current behavior, sandbox auto-expiresmanual: no server-side TTL expiration; sandbox is deleted only by explicit API call or external control-plane cleanup
I think this is better than using magic values like timeout=0, -1, or a far-future timestamp.
Expected behavior for manual mode
- no auto-expiration in Docker
- no auto-expiration in Kubernetes
expiresAtmay need to be nullable / omitted for this mode- cleanup responsibility belongs to the caller/application
- unsupported providers should fail clearly
Non-Goals
This proposal is not asking to remove TTL support.
TTL should remain the default and recommended mode for most short-lived workloads. The request is to add an opt-in manual cleanup mode for integrations that need it.