Skip to content
150 changes: 150 additions & 0 deletions docs/features/durable-agents/durable-agents-ttl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Time-To-Live (TTL) for durable agent sessions

## Overview

The durable agents automatically maintain conversation history and state for each session. Without automatic cleanup, this state can accumulate indefinitely, consuming storage resources and increasing costs. The Time-To-Live (TTL) feature provides automatic cleanup of idle agent sessions, ensuring that sessions are automatically deleted after a period of inactivity.

## What is TTL?

Time-To-Live (TTL) is a configurable duration that determines how long an agent session state will be retained after its last interaction. When an agent session is idle (no messages sent to it) for longer than the TTL period, the session state is automatically deleted. Each new interaction with an agent resets the TTL timer, extending the session's lifetime.

## Benefits

- **Automatic cleanup**: No manual intervention required to clean up idle agent sessions
- **Cost optimization**: Reduces storage costs by automatically removing unused session state
- **Resource management**: Prevents unbounded growth of agent session state in storage
- **Configurable**: Set TTL globally or per-agent type to match your application's needs

## Configuration

TTL can be configured at two levels:

1. **Global default TTL**: Applies to all agent sessions unless overridden
2. **Per-agent type TTL**: Overrides the global default for specific agent types

Additionally, you can configure a **minimum deletion delay** that controls how frequently deletion operations are scheduled. This prevents excessive deletion operations for agent sessions with short TTLs.

> [!NOTE]
> Setting the deletion delay is an advanced feature and should only be used if the default value results in excessive deletion operations (in which case you should increase the default value) or if you need to ensure that deletion operations are executed promptly (in which case you should decrease the default value).
### Default values

- **Default TTL**: 30 days
- **Minimum TTL deletion delay**: 5 minutes (subject to change in future releases)

### Configuration examples

#### .NET

```csharp
// Configure global default TTL and minimum signal delay
services.ConfigureDurableAgents(
options =>
{
// Set global default TTL to 7 days
options.DefaultTimeToLive = TimeSpan.FromDays(7);

// Set minimum signal delay to 10 minutes
options.MinimumTimeToLiveSignalDelay = TimeSpan.FromMinutes(10);

// Add agents (will use global default TTL)
options.AddAIAgent(myAgent);
});

// Configure per-agent TTL
services.ConfigureDurableAgents(
options =>
{
options.DefaultTimeToLive = TimeSpan.FromDays(30); // Global default
// Agent with custom TTL of 1 day
options.AddAIAgent(shortLivedAgent, timeToLive: TimeSpan.FromDays(1));

// Agent with custom TTL of 90 days
options.AddAIAgent(longLivedAgent, timeToLive: TimeSpan.FromDays(90));

// Agent using global default (30 days)
options.AddAIAgent(defaultAgent);
});

// Disable TTL for specific agents by setting TTL to null
services.ConfigureDurableAgents(
options =>
{
options.DefaultTimeToLive = TimeSpan.FromDays(30);

// Agent with no TTL (never expires)
options.AddAIAgent(permanentAgent, timeToLive: null);
});
```

## How TTL works

The following sections describe how TTL works in detail.

### Expiration tracking

Each agent session maintains an expiration timestamp in its internally managed state that is updated whenever the session processes a message:

1. When a message is sent to an agent session, the expiration time is set to `current time + TTL`
2. The runtime schedules a delete operation for the expiration time (subject to minimum delay constraints)
3. When the delete operation runs, if the current time is past the expiration time, the session state is deleted. Otherwise, the delete operation is rescheduled for the next expiration time.

### State deletion

When an agent session expires, its entire state is deleted, including:

- Conversation history
- Any custom state data
- Expiration timestamps

After deletion, if a message is sent to the same agent session, a new session is created with a fresh conversation history.

## Behavior examples

The following examples illustrate how TTL works in different scenarios.

### Example 1: Agent session expires after TTL

1. Agent configured with 30-day TTL
2. User sends message at Day 0 → agent session created, expiration set to Day 30
3. No further messages sent
4. At Day 30 → Agent session is deleted
5. User sends message at Day 31 → New agent session created with fresh conversation history

### Example 2: TTL reset on interaction

1. Agent configured with 30-day TTL
2. User sends message at Day 0 → agent session created, expiration set to Day 30
3. User sends message at Day 15 → Expiration reset to Day 45
4. User sends message at Day 40 → Expiration reset to Day 70
5. Agent session remains active as long as there are regular interactions

## Logging

The TTL feature includes comprehensive logging to track state changes:

- **Expiration time updated**: Logged when TTL expiration time is set or updated
- **Deletion scheduled**: Logged when a deletion check signal is scheduled
- **Deletion check**: Logged when a deletion check operation runs
- **Session expired**: Logged when an agent session is deleted due to expiration
- **TTL rescheduled**: Logged when a deletion signal is rescheduled

These logs help monitor TTL behavior and troubleshoot any issues.

## Best practices

1. **Choose appropriate TTL values**: Balance between storage costs and user experience. Too short TTLs may delete active sessions, while too long TTLs may accumulate unnecessary state.

2. **Use per-agent TTLs**: Different agents may have different usage patterns. Configure TTLs per-agent based on expected session lifetimes.

3. **Monitor expiration logs**: Review logs to understand TTL behavior and adjust configuration as needed.

4. **Test with short TTLs**: During development, use short TTLs (e.g., minutes) to verify TTL behavior without waiting for long periods.

## Limitations

- TTL is based on wall-clock time, not activity time. The expiration timer starts from the last message timestamp.
- Deletion checks are durably scheduled operations and may have slight delays depending on system load.
- Once an agent session is deleted, its conversation history cannot be recovered.
- TTL deletion requires at least one worker to be available to process the deletion operation message.
2 changes: 1 addition & 1 deletion dotnet/samples/AzureFunctions/01_SingleAgent/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,6 @@
using IHost app = FunctionsApplication
.CreateBuilder(args)
.ConfigureFunctionsWebApplication()
.ConfigureDurableAgents(options => options.AddAIAgent(agent))
.ConfigureDurableAgents(options => options.AddAIAgent(agent, timeToLive: TimeSpan.FromHours(1)))
.Build();
app.Run();
107 changes: 98 additions & 9 deletions dotnet/src/Microsoft.Agents.AI.DurableTask/AgentEntity.cs
Original file line number Diff line number Diff line change
Expand Up @@ -16,25 +16,19 @@ internal class AgentEntity(IServiceProvider services, CancellationToken cancella
private readonly DurableTaskClient _client = services.GetRequiredService<DurableTaskClient>();
private readonly ILoggerFactory _loggerFactory = services.GetRequiredService<ILoggerFactory>();
private readonly IAgentResponseHandler? _messageHandler = services.GetService<IAgentResponseHandler>();
private readonly DurableAgentsOptions _options = services.GetRequiredService<DurableAgentsOptions>();
private readonly CancellationToken _cancellationToken = cancellationToken != default
? cancellationToken
: services.GetService<IHostApplicationLifetime>()?.ApplicationStopping ?? CancellationToken.None;

public async Task<AgentRunResponse> RunAgentAsync(RunRequest request)
{
AgentSessionId sessionId = this.Context.Id;
IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>> agents =
this._services.GetRequiredService<IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>>>();
if (!agents.TryGetValue(sessionId.Name, out Func<IServiceProvider, AIAgent>? agentFactory))
{
throw new InvalidOperationException($"Agent '{sessionId.Name}' not found");
}

AIAgent agent = agentFactory(this._services);
AIAgent agent = this.GetAgent(sessionId);
EntityAgentWrapper agentWrapper = new(agent, this.Context, request, this._services);

// Logger category is Microsoft.DurableTask.Agents.{agentName}.{sessionId}
ILogger logger = this._loggerFactory.CreateLogger($"Microsoft.DurableTask.Agents.{agent.Name}.{sessionId.Key}");
ILogger logger = this.GetLogger(agent.Name!, sessionId.Key);

if (request.Messages.Count == 0)
{
Expand Down Expand Up @@ -113,6 +107,27 @@ async IAsyncEnumerable<AgentRunResponseUpdate> StreamResultsAsync()
response.Usage?.TotalTokenCount);
}

// Update TTL expiration time. Only schedule deletion check on first interaction.
// Subsequent interactions just update the expiration time; CheckAndDeleteIfExpiredAsync
// will reschedule the deletion check when it runs.
TimeSpan? timeToLive = this._options.GetTimeToLive(sessionId.Name);
if (timeToLive.HasValue)
{
DateTime newExpirationTime = DateTime.UtcNow.Add(timeToLive.Value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be set before we get a response? Right now the expiration time is based on when the run finishes, not when the request is received. If the response takes an hour for long running tools and TTL is 1hr, the session lives up to about 2hrs from the users message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the current behavior. Regarding your example, I think it would be worse if the TTL started when the request is received. If the TTL is 1 hour and the response takes 1 hour to generate, then the session would be deleted immediately, before the user can attempt to run the agent again.

bool isFirstInteraction = this.State.Data.ExpirationTime is null;

this.State.Data.ExpirationTime = newExpirationTime;
logger.LogTTLExpirationTimeUpdated(sessionId, newExpirationTime);

// Only schedule deletion check on the first interaction when entity is created.
// On subsequent interactions, we just update the expiration time. The scheduled
// CheckAndDeleteIfExpiredAsync will reschedule itself if the entity hasn't expired.
if (isFirstInteraction)
{
this.ScheduleDeletionCheck(sessionId, logger, timeToLive.Value);
}
}

return response;
}
finally
Expand All @@ -121,4 +136,78 @@ async IAsyncEnumerable<AgentRunResponseUpdate> StreamResultsAsync()
DurableAgentContext.ClearCurrent();
}
}

/// <summary>
/// Checks if the entity has expired and deletes it if so, otherwise reschedules the deletion check.
/// </summary>
/// <remarks>
/// This method is called by the durable task runtime when a <c>CheckAndDeleteIfExpired</c> signal is received.
/// </remarks>
public void CheckAndDeleteIfExpired()
{
AgentSessionId sessionId = this.Context.Id;
AIAgent agent = this.GetAgent(sessionId);
ILogger logger = this.GetLogger(agent.Name!, sessionId.Key);

DateTime currentTime = DateTime.UtcNow;
DateTime? expirationTime = this.State.Data.ExpirationTime;

logger.LogTTLDeletionCheck(sessionId, expirationTime, currentTime);

if (expirationTime.HasValue)
{
if (currentTime >= expirationTime.Value)
{
// Entity has expired, delete it
logger.LogTTLEntityExpired(sessionId, expirationTime.Value);
this.State = null!;
}
else
{
// Entity hasn't expired yet, reschedule the deletion check
TimeSpan? timeToLive = this._options.GetTimeToLive(sessionId.Name);
if (timeToLive.HasValue)
{
this.ScheduleDeletionCheck(sessionId, logger, timeToLive.Value);
}
}
}
}

private void ScheduleDeletionCheck(AgentSessionId sessionId, ILogger logger, TimeSpan timeToLive)
{
DateTime currentTime = DateTime.UtcNow;
DateTime expirationTime = this.State.Data.ExpirationTime ?? currentTime.Add(timeToLive);
TimeSpan minimumDelay = this._options.MinimumTimeToLiveSignalDelay;

// To avoid excessive scheduling, we schedule the deletion check for no less than the minimum delay.
DateTime scheduledTime = expirationTime > currentTime.Add(minimumDelay)
? expirationTime
: currentTime.Add(minimumDelay);

logger.LogTTLDeletionScheduled(sessionId, scheduledTime);

// Schedule a signal to self to check for expiration
this.Context.SignalEntity(
this.Context.Id,
nameof(CheckAndDeleteIfExpired), // self-signal
options: new SignalEntityOptions { SignalTime = scheduledTime });
}

private AIAgent GetAgent(AgentSessionId sessionId)
{
IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>> agents =
this._services.GetRequiredService<IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>>>();
if (!agents.TryGetValue(sessionId.Name, out Func<IServiceProvider, AIAgent>? agentFactory))
{
throw new InvalidOperationException($"Agent '{sessionId.Name}' not found");
}

return agentFactory(this._services);
}

private ILogger GetLogger(string agentName, string sessionKey)
{
return this._loggerFactory.CreateLogger($"Microsoft.DurableTask.Agents.{agentName}.{sessionKey}");
}
}
45 changes: 43 additions & 2 deletions dotnet/src/Microsoft.Agents.AI.DurableTask/DurableAgentsOptions.cs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,25 @@ public sealed class DurableAgentsOptions
{
// Agent names are case-insensitive
private readonly Dictionary<string, Func<IServiceProvider, AIAgent>> _agentFactories = new(StringComparer.OrdinalIgnoreCase);
private readonly Dictionary<string, TimeSpan?> _agentTimeToLive = new(StringComparer.OrdinalIgnoreCase);

/// <summary>
/// Gets or sets the default time-to-live (TTL) for agent entities.
/// </summary>
/// <remarks>
/// If an agent entity is idle for this duration, it will be automatically deleted.
/// Defaults to 30 days. Set to <see langword="null"/> to disable TTL for agents without explicit TTL configuration.
/// </remarks>
public TimeSpan? DefaultTimeToLive { get; set; } = TimeSpan.FromDays(30);
Copy link
Contributor

@kshyju kshyju Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is setting this default value cause any breaking behavior for customers who already started an agent session and entities are present, when they redeploy with this version of the package? I see "GetTimeToLive" method falls back to this value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, the next time existing agent entities are touched, we'll set a TTL on them of 30 days after they upgrade to this version of the library.


/// <summary>
/// Gets or sets the minimum delay for scheduling TTL deletion signals.
/// </summary>
/// <remarks>
/// This ensures that deletion signals are not scheduled too frequently.
/// Defaults to 5 minutes.
/// </remarks>
public TimeSpan MinimumTimeToLiveSignalDelay { get; set; } = TimeSpan.FromMinutes(5);

internal DurableAgentsOptions()
{
Expand All @@ -19,13 +38,19 @@ internal DurableAgentsOptions()
/// </summary>
/// <param name="name">The name of the agent.</param>
/// <param name="factory">The factory function to create the agent.</param>
/// <param name="timeToLive">Optional time-to-live for this agent's entities. If not specified, uses <see cref="DefaultTimeToLive"/>.</param>
/// <returns>The options instance.</returns>
/// <exception cref="ArgumentNullException">Thrown when <paramref name="name"/> or <paramref name="factory"/> is null.</exception>
public DurableAgentsOptions AddAIAgentFactory(string name, Func<IServiceProvider, AIAgent> factory)
public DurableAgentsOptions AddAIAgentFactory(string name, Func<IServiceProvider, AIAgent> factory, TimeSpan? timeToLive = null)
{
ArgumentNullException.ThrowIfNull(name);
ArgumentNullException.ThrowIfNull(factory);
this._agentFactories.Add(name, factory);
if (timeToLive.HasValue)
{
this._agentTimeToLive[name] = timeToLive;
}

return this;
}

Expand All @@ -50,12 +75,13 @@ public DurableAgentsOptions AddAIAgents(params IEnumerable<AIAgent> agents)
/// Adds an AI agent to the options.
/// </summary>
/// <param name="agent">The agent to add.</param>
/// <param name="timeToLive">Optional time-to-live for this agent's entities. If not specified, uses <see cref="DefaultTimeToLive"/>.</param>
/// <returns>The options instance.</returns>
/// <exception cref="ArgumentNullException">Thrown when <paramref name="agent"/> is null.</exception>
/// <exception cref="ArgumentException">
/// Thrown when <paramref name="agent.Name"/> is null or whitespace or when an agent with the same name has already been registered.
/// </exception>
public DurableAgentsOptions AddAIAgent(AIAgent agent)
public DurableAgentsOptions AddAIAgent(AIAgent agent, TimeSpan? timeToLive = null)
{
ArgumentNullException.ThrowIfNull(agent);

Expand All @@ -70,6 +96,11 @@ public DurableAgentsOptions AddAIAgent(AIAgent agent)
}

this._agentFactories.Add(agent.Name, sp => agent);
if (timeToLive.HasValue)
{
this._agentTimeToLive[agent.Name] = timeToLive;
}

return this;
}

Expand All @@ -81,4 +112,14 @@ internal IReadOnlyDictionary<string, Func<IServiceProvider, AIAgent>> GetAgentFa
{
return this._agentFactories.AsReadOnly();
}

/// <summary>
/// Gets the time-to-live for a specific agent, or the default TTL if not specified.
/// </summary>
/// <param name="agentName">The name of the agent.</param>
/// <returns>The time-to-live for the agent, or the default TTL if not specified.</returns>
internal TimeSpan? GetTimeToLive(string agentName)
{
return this._agentTimeToLive.TryGetValue(agentName, out TimeSpan? ttl) ? ttl : this.DefaultTimeToLive;
}
}
Loading
Loading