-
Notifications
You must be signed in to change notification settings - Fork 891
.NET: Add TTLs to durable agent sessions #2679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
31494ee
82b1a9c
53a76af
2257e56
db8b585
70cfd58
cf7bb08
28abfbe
85a8064
7893408
c14b8b6
9c56bf2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,150 @@ | ||
| # Time-To-Live (TTL) for durable agent sessions | ||
|
|
||
| ## Overview | ||
|
|
||
| The durable agents automatically maintain conversation history and state for each session. Without automatic cleanup, this state can accumulate indefinitely, consuming storage resources and increasing costs. The Time-To-Live (TTL) feature provides automatic cleanup of idle agent sessions, ensuring that sessions are automatically deleted after a period of inactivity. | ||
|
|
||
| ## What is TTL? | ||
|
|
||
| Time-To-Live (TTL) is a configurable duration that determines how long an agent session state will be retained after its last interaction. When an agent session is idle (no messages sent to it) for longer than the TTL period, the session state is automatically deleted. Each new interaction with an agent resets the TTL timer, extending the session's lifetime. | ||
|
|
||
| ## Benefits | ||
|
|
||
| - **Automatic cleanup**: No manual intervention required to clean up idle agent sessions | ||
| - **Cost optimization**: Reduces storage costs by automatically removing unused session state | ||
| - **Resource management**: Prevents unbounded growth of agent session state in storage | ||
| - **Configurable**: Set TTL globally or per-agent type to match your application's needs | ||
|
|
||
| ## Configuration | ||
|
|
||
| TTL can be configured at two levels: | ||
|
|
||
| 1. **Global default TTL**: Applies to all agent sessions unless overridden | ||
| 2. **Per-agent type TTL**: Overrides the global default for specific agent types | ||
|
|
||
| Additionally, you can configure a **minimum deletion delay** that controls how frequently deletion operations are scheduled. This prevents excessive deletion operations for agent sessions with short TTLs. | ||
|
|
||
| > [!NOTE] | ||
| > Setting the deletion delay is an advanced feature and should only be used if the default value results in excessive deletion operations (in which case you should increase the default value) or if you need to ensure that deletion operations are executed promptly (in which case you should decrease the default value). | ||
| ### Default values | ||
|
|
||
| - **Default TTL**: 30 days | ||
| - **Minimum TTL deletion delay**: 5 minutes (subject to change in future releases) | ||
|
|
||
| ### Configuration examples | ||
|
|
||
| #### .NET | ||
|
|
||
| ```csharp | ||
| // Configure global default TTL and minimum signal delay | ||
| services.ConfigureDurableAgents( | ||
| options => | ||
| { | ||
| // Set global default TTL to 7 days | ||
| options.DefaultTimeToLive = TimeSpan.FromDays(7); | ||
|
|
||
| // Set minimum signal delay to 10 minutes | ||
| options.MinimumTimeToLiveSignalDelay = TimeSpan.FromMinutes(10); | ||
|
|
||
| // Add agents (will use global default TTL) | ||
| options.AddAIAgent(myAgent); | ||
| }); | ||
|
|
||
| // Configure per-agent TTL | ||
| services.ConfigureDurableAgents( | ||
| options => | ||
| { | ||
| options.DefaultTimeToLive = TimeSpan.FromDays(30); // Global default | ||
| // Agent with custom TTL of 1 day | ||
| options.AddAIAgent(shortLivedAgent, timeToLive: TimeSpan.FromDays(1)); | ||
|
|
||
| // Agent with custom TTL of 90 days | ||
| options.AddAIAgent(longLivedAgent, timeToLive: TimeSpan.FromDays(90)); | ||
|
|
||
| // Agent using global default (30 days) | ||
| options.AddAIAgent(defaultAgent); | ||
| }); | ||
|
|
||
| // Disable TTL for specific agents by setting TTL to null | ||
| services.ConfigureDurableAgents( | ||
| options => | ||
| { | ||
| options.DefaultTimeToLive = TimeSpan.FromDays(30); | ||
|
|
||
| // Agent with no TTL (never expires) | ||
| options.AddAIAgent(permanentAgent, timeToLive: null); | ||
| }); | ||
| ``` | ||
|
|
||
| ## How TTL works | ||
|
|
||
| The following sections describe how TTL works in detail. | ||
|
|
||
| ### Expiration tracking | ||
|
|
||
| Each agent session maintains an expiration timestamp in its internally managed state that is updated whenever the session processes a message: | ||
|
|
||
| 1. When a message is sent to an agent session, the expiration time is set to `current time + TTL` | ||
| 2. The runtime schedules a delete operation for the expiration time (subject to minimum delay constraints) | ||
| 3. When the delete operation runs, if the current time is past the expiration time, the session state is deleted. Otherwise, the delete operation is rescheduled for the next expiration time. | ||
|
|
||
| ### State deletion | ||
|
|
||
| When an agent session expires, its entire state is deleted, including: | ||
|
|
||
| - Conversation history | ||
| - Any custom state data | ||
| - Expiration timestamps | ||
|
|
||
| After deletion, if a message is sent to the same agent session, a new session is created with a fresh conversation history. | ||
|
|
||
| ## Behavior examples | ||
|
|
||
| The following examples illustrate how TTL works in different scenarios. | ||
|
|
||
| ### Example 1: Agent session expires after TTL | ||
|
|
||
| 1. Agent configured with 30-day TTL | ||
| 2. User sends message at Day 0 → agent session created, expiration set to Day 30 | ||
| 3. No further messages sent | ||
| 4. At Day 30 → Agent session is deleted | ||
| 5. User sends message at Day 31 → New agent session created with fresh conversation history | ||
|
|
||
| ### Example 2: TTL reset on interaction | ||
|
|
||
| 1. Agent configured with 30-day TTL | ||
| 2. User sends message at Day 0 → agent session created, expiration set to Day 30 | ||
| 3. User sends message at Day 15 → Expiration reset to Day 45 | ||
| 4. User sends message at Day 40 → Expiration reset to Day 70 | ||
| 5. Agent session remains active as long as there are regular interactions | ||
|
|
||
| ## Logging | ||
|
|
||
| The TTL feature includes comprehensive logging to track state changes: | ||
|
|
||
| - **Expiration time updated**: Logged when TTL expiration time is set or updated | ||
| - **Deletion scheduled**: Logged when a deletion check signal is scheduled | ||
| - **Deletion check**: Logged when a deletion check operation runs | ||
| - **Session expired**: Logged when an agent session is deleted due to expiration | ||
| - **TTL rescheduled**: Logged when a deletion signal is rescheduled | ||
|
|
||
| These logs help monitor TTL behavior and troubleshoot any issues. | ||
|
|
||
| ## Best practices | ||
|
|
||
| 1. **Choose appropriate TTL values**: Balance between storage costs and user experience. Too short TTLs may delete active sessions, while too long TTLs may accumulate unnecessary state. | ||
|
|
||
| 2. **Use per-agent TTLs**: Different agents may have different usage patterns. Configure TTLs per-agent based on expected session lifetimes. | ||
|
|
||
| 3. **Monitor expiration logs**: Review logs to understand TTL behavior and adjust configuration as needed. | ||
|
|
||
| 4. **Test with short TTLs**: During development, use short TTLs (e.g., minutes) to verify TTL behavior without waiting for long periods. | ||
|
|
||
| ## Limitations | ||
|
|
||
| - TTL is based on wall-clock time, not activity time. The expiration timer starts from the last message timestamp. | ||
| - Deletion checks are durably scheduled operations and may have slight delays depending on system load. | ||
| - Once an agent session is deleted, its conversation history cannot be recovered. | ||
| - TTL deletion requires at least one worker to be available to process the deletion operation message. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,25 +16,19 @@ internal class AgentEntity(IServiceProvider services, CancellationToken cancella | |
| private readonly DurableTaskClient _client = services.GetRequiredService<DurableTaskClient>(); | ||
| private readonly ILoggerFactory _loggerFactory = services.GetRequiredService<ILoggerFactory>(); | ||
| private readonly IAgentResponseHandler? _messageHandler = services.GetService<IAgentResponseHandler>(); | ||
| private readonly DurableAgentsOptions _options = services.GetRequiredService<DurableAgentsOptions>(); | ||
| private readonly CancellationToken _cancellationToken = cancellationToken != default | ||
| ? cancellationToken | ||
| : services.GetService<IHostApplicationLifetime>()?.ApplicationStopping ?? CancellationToken.None; | ||
|
|
||
| public async Task<AgentRunResponse> RunAgentAsync(RunRequest request) | ||
| { | ||
| AgentSessionId sessionId = this.Context.Id; | ||
| IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>> agents = | ||
| this._services.GetRequiredService<IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>>>(); | ||
| if (!agents.TryGetValue(sessionId.Name, out Func<IServiceProvider, AIAgent>? agentFactory)) | ||
| { | ||
| throw new InvalidOperationException($"Agent '{sessionId.Name}' not found"); | ||
| } | ||
|
|
||
| AIAgent agent = agentFactory(this._services); | ||
| AIAgent agent = this.GetAgent(sessionId); | ||
| EntityAgentWrapper agentWrapper = new(agent, this.Context, request, this._services); | ||
|
|
||
| // Logger category is Microsoft.DurableTask.Agents.{agentName}.{sessionId} | ||
| ILogger logger = this._loggerFactory.CreateLogger($"Microsoft.DurableTask.Agents.{agent.Name}.{sessionId.Key}"); | ||
| ILogger logger = this.GetLogger(agent.Name!, sessionId.Key); | ||
|
|
||
| if (request.Messages.Count == 0) | ||
cgillum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| { | ||
|
|
@@ -113,6 +107,27 @@ async IAsyncEnumerable<AgentRunResponseUpdate> StreamResultsAsync() | |
| response.Usage?.TotalTokenCount); | ||
| } | ||
|
|
||
| // Update TTL expiration time. Only schedule deletion check on first interaction. | ||
| // Subsequent interactions just update the expiration time; CheckAndDeleteIfExpiredAsync | ||
| // will reschedule the deletion check when it runs. | ||
| TimeSpan? timeToLive = this._options.GetTimeToLive(sessionId.Name); | ||
| if (timeToLive.HasValue) | ||
cgillum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| { | ||
| DateTime newExpirationTime = DateTime.UtcNow.Add(timeToLive.Value); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be set before we get a response? Right now the expiration time is based on when the run finishes, not when the request is received. If the response takes an hour for long running tools and TTL is 1hr, the session lives up to about 2hrs from the users message
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with the current behavior. Regarding your example, I think it would be worse if the TTL started when the request is received. If the TTL is 1 hour and the response takes 1 hour to generate, then the session would be deleted immediately, before the user can attempt to run the agent again. |
||
| bool isFirstInteraction = this.State.Data.ExpirationTime is null; | ||
|
|
||
| this.State.Data.ExpirationTime = newExpirationTime; | ||
| logger.LogTTLExpirationTimeUpdated(sessionId, newExpirationTime); | ||
|
|
||
| // Only schedule deletion check on the first interaction when entity is created. | ||
| // On subsequent interactions, we just update the expiration time. The scheduled | ||
| // CheckAndDeleteIfExpiredAsync will reschedule itself if the entity hasn't expired. | ||
| if (isFirstInteraction) | ||
| { | ||
| this.ScheduleDeletionCheck(sessionId, logger, timeToLive.Value); | ||
| } | ||
| } | ||
|
|
||
| return response; | ||
| } | ||
| finally | ||
|
|
@@ -121,4 +136,78 @@ async IAsyncEnumerable<AgentRunResponseUpdate> StreamResultsAsync() | |
| DurableAgentContext.ClearCurrent(); | ||
| } | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Checks if the entity has expired and deletes it if so, otherwise reschedules the deletion check. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// This method is called by the durable task runtime when a <c>CheckAndDeleteIfExpired</c> signal is received. | ||
| /// </remarks> | ||
| public void CheckAndDeleteIfExpired() | ||
| { | ||
| AgentSessionId sessionId = this.Context.Id; | ||
| AIAgent agent = this.GetAgent(sessionId); | ||
| ILogger logger = this.GetLogger(agent.Name!, sessionId.Key); | ||
|
|
||
| DateTime currentTime = DateTime.UtcNow; | ||
| DateTime? expirationTime = this.State.Data.ExpirationTime; | ||
|
|
||
| logger.LogTTLDeletionCheck(sessionId, expirationTime, currentTime); | ||
|
|
||
| if (expirationTime.HasValue) | ||
| { | ||
| if (currentTime >= expirationTime.Value) | ||
| { | ||
| // Entity has expired, delete it | ||
| logger.LogTTLEntityExpired(sessionId, expirationTime.Value); | ||
| this.State = null!; | ||
| } | ||
| else | ||
| { | ||
| // Entity hasn't expired yet, reschedule the deletion check | ||
| TimeSpan? timeToLive = this._options.GetTimeToLive(sessionId.Name); | ||
| if (timeToLive.HasValue) | ||
| { | ||
| this.ScheduleDeletionCheck(sessionId, logger, timeToLive.Value); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private void ScheduleDeletionCheck(AgentSessionId sessionId, ILogger logger, TimeSpan timeToLive) | ||
| { | ||
| DateTime currentTime = DateTime.UtcNow; | ||
| DateTime expirationTime = this.State.Data.ExpirationTime ?? currentTime.Add(timeToLive); | ||
| TimeSpan minimumDelay = this._options.MinimumTimeToLiveSignalDelay; | ||
|
|
||
| // To avoid excessive scheduling, we schedule the deletion check for no less than the minimum delay. | ||
| DateTime scheduledTime = expirationTime > currentTime.Add(minimumDelay) | ||
| ? expirationTime | ||
| : currentTime.Add(minimumDelay); | ||
|
|
||
| logger.LogTTLDeletionScheduled(sessionId, scheduledTime); | ||
|
|
||
| // Schedule a signal to self to check for expiration | ||
| this.Context.SignalEntity( | ||
| this.Context.Id, | ||
| nameof(CheckAndDeleteIfExpired), // self-signal | ||
| options: new SignalEntityOptions { SignalTime = scheduledTime }); | ||
| } | ||
|
|
||
| private AIAgent GetAgent(AgentSessionId sessionId) | ||
| { | ||
| IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>> agents = | ||
| this._services.GetRequiredService<IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>>>(); | ||
| if (!agents.TryGetValue(sessionId.Name, out Func<IServiceProvider, AIAgent>? agentFactory)) | ||
| { | ||
| throw new InvalidOperationException($"Agent '{sessionId.Name}' not found"); | ||
| } | ||
|
|
||
| return agentFactory(this._services); | ||
| } | ||
|
|
||
| private ILogger GetLogger(string agentName, string sessionKey) | ||
| { | ||
| return this._loggerFactory.CreateLogger($"Microsoft.DurableTask.Agents.{agentName}.{sessionKey}"); | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,6 +9,25 @@ public sealed class DurableAgentsOptions | |
| { | ||
| // Agent names are case-insensitive | ||
| private readonly Dictionary<string, Func<IServiceProvider, AIAgent>> _agentFactories = new(StringComparer.OrdinalIgnoreCase); | ||
| private readonly Dictionary<string, TimeSpan?> _agentTimeToLive = new(StringComparer.OrdinalIgnoreCase); | ||
|
|
||
| /// <summary> | ||
| /// Gets or sets the default time-to-live (TTL) for agent entities. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// If an agent entity is idle for this duration, it will be automatically deleted. | ||
| /// Defaults to 30 days. Set to <see langword="null"/> to disable TTL for agents without explicit TTL configuration. | ||
| /// </remarks> | ||
| public TimeSpan? DefaultTimeToLive { get; set; } = TimeSpan.FromDays(30); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is setting this default value cause any breaking behavior for customers who already started an agent session and entities are present, when they redeploy with this version of the package? I see "GetTimeToLive" method falls back to this value.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically yes, the next time existing agent entities are touched, we'll set a TTL on them of 30 days after they upgrade to this version of the library. |
||
|
|
||
| /// <summary> | ||
| /// Gets or sets the minimum delay for scheduling TTL deletion signals. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// This ensures that deletion signals are not scheduled too frequently. | ||
| /// Defaults to 5 minutes. | ||
| /// </remarks> | ||
| public TimeSpan MinimumTimeToLiveSignalDelay { get; set; } = TimeSpan.FromMinutes(5); | ||
|
|
||
| internal DurableAgentsOptions() | ||
| { | ||
|
|
@@ -19,13 +38,19 @@ internal DurableAgentsOptions() | |
| /// </summary> | ||
| /// <param name="name">The name of the agent.</param> | ||
| /// <param name="factory">The factory function to create the agent.</param> | ||
| /// <param name="timeToLive">Optional time-to-live for this agent's entities. If not specified, uses <see cref="DefaultTimeToLive"/>.</param> | ||
| /// <returns>The options instance.</returns> | ||
| /// <exception cref="ArgumentNullException">Thrown when <paramref name="name"/> or <paramref name="factory"/> is null.</exception> | ||
| public DurableAgentsOptions AddAIAgentFactory(string name, Func<IServiceProvider, AIAgent> factory) | ||
| public DurableAgentsOptions AddAIAgentFactory(string name, Func<IServiceProvider, AIAgent> factory, TimeSpan? timeToLive = null) | ||
| { | ||
| ArgumentNullException.ThrowIfNull(name); | ||
| ArgumentNullException.ThrowIfNull(factory); | ||
| this._agentFactories.Add(name, factory); | ||
| if (timeToLive.HasValue) | ||
| { | ||
| this._agentTimeToLive[name] = timeToLive; | ||
| } | ||
|
|
||
| return this; | ||
| } | ||
|
|
||
|
|
@@ -50,12 +75,13 @@ public DurableAgentsOptions AddAIAgents(params IEnumerable<AIAgent> agents) | |
| /// Adds an AI agent to the options. | ||
| /// </summary> | ||
| /// <param name="agent">The agent to add.</param> | ||
| /// <param name="timeToLive">Optional time-to-live for this agent's entities. If not specified, uses <see cref="DefaultTimeToLive"/>.</param> | ||
| /// <returns>The options instance.</returns> | ||
| /// <exception cref="ArgumentNullException">Thrown when <paramref name="agent"/> is null.</exception> | ||
| /// <exception cref="ArgumentException"> | ||
| /// Thrown when <paramref name="agent.Name"/> is null or whitespace or when an agent with the same name has already been registered. | ||
| /// </exception> | ||
| public DurableAgentsOptions AddAIAgent(AIAgent agent) | ||
| public DurableAgentsOptions AddAIAgent(AIAgent agent, TimeSpan? timeToLive = null) | ||
| { | ||
| ArgumentNullException.ThrowIfNull(agent); | ||
|
|
||
|
|
@@ -70,6 +96,11 @@ public DurableAgentsOptions AddAIAgent(AIAgent agent) | |
| } | ||
|
|
||
| this._agentFactories.Add(agent.Name, sp => agent); | ||
| if (timeToLive.HasValue) | ||
| { | ||
| this._agentTimeToLive[agent.Name] = timeToLive; | ||
| } | ||
|
|
||
| return this; | ||
| } | ||
|
|
||
|
|
@@ -81,4 +112,14 @@ internal IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>> GetAgentFa | |
| { | ||
| return this._agentFactories.AsReadOnly(); | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Gets the time-to-live for a specific agent, or the default TTL if not specified. | ||
| /// </summary> | ||
| /// <param name="agentName">The name of the agent.</param> | ||
| /// <returns>The time-to-live for the agent, or the default TTL if not specified.</returns> | ||
| internal TimeSpan? GetTimeToLive(string agentName) | ||
| { | ||
| return this._agentTimeToLive.TryGetValue(agentName, out TimeSpan? ttl) ? ttl : this.DefaultTimeToLive; | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.