Skip to content

[Bug] Temporalio.Exceptions.RpcException:operation was canceled #395

@pauldotknopf

Description

@pauldotknopf

What are you really trying to do?

Get info about a namespace, using 1.4.0 of the .NET SDK.

Describe the bug

public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken)
{
    var stopwatch = new Stopwatch();

    try
    {
        stopwatch.Start();
        
        var client = await _rcmClientConnectionProvider.OpenClient();
        
        // Error gets thrown here.
        var namespaceInfo = await client.WorkflowService.DescribeNamespaceAsync(
            new Temporalio.Api.WorkflowService.V1.DescribeNamespaceRequest { Namespace = "default" },
            new RpcOptions
            {
                CancellationToken = cancellationToken
            }
        );

        stopwatch.Stop();

        if (namespaceInfo == null)
        {
            return HealthCheckResult.Unhealthy(
                $"Temporal namespace is unreachable: elapsed: {stopwatch.Elapsed}");
        }

        return HealthCheckResult.Healthy();
    }
    catch (Exception ex)
    {
        stopwatch.Stop();
        return HealthCheckResult.Unhealthy($"Temporal client is unreachable: elapsed: {stopwatch.Elapsed}", ex);
    }
}

The _rcmClientConnectionProvider variable is a singleton service that maintains a single ITemporalClient (through OpenClient), used throughout the application (creating workflows and subscribing to task queues). It's code is like this:

public async Task<ITemporalClient> OpenClient()
{
    if (_client != null)
    {
        return _client;
    }

    await _semaphoreSlim.WaitAsync();
    
    try
    {
        if (_client == null)
        {
            var options = new TemporalClientConnectOptions
            {
                TargetHost = $"{_options.HostName}:{_options.Port}",
                Namespace = _options.Namespace
            };
            if (serviceProvider.GetService(typeof(ILoggerFactory)) is ILoggerFactory loggerFactory)
            {
                options.LoggerFactory = loggerFactory;
            }

            try
            {
                _client = await TemporalClient.ConnectAsync(options);
            }
            catch(InvalidOperationException e)
            {
                if (e.Message.StartsWith("Connection failed: Server connection error"))
                {
                    var message = $"Failed to connect to Temporal server at {_options.HostName}:{_options.Port}";
                    if (_options.HostName == "localhost")
                    {
                        message += "\nA local instance of temporal can be ran by running 'temporal server start-dev'";
                    }
                    throw new InvalidOperationException(message, e);
                }

                throw;
            }
        }

        return _client;
    }
    finally
    {
        _semaphoreSlim.Release();
    }
}

Minimal Reproduction

It only happens in one environment, so I fear minimal repo may be hard to do.

Environment/Versions

Temporal helm chart v0.54.0.
Temporal .NET SDK 1.4.0 (nuget)
Azure App Service (for Linux)

Additional context

This is the complete error, being reported by App Insights.

[
  {
    "severityLevel": "Error",
    "outerId": "0",
    "message": "operation was canceled",
    "type": "Temporalio.Exceptions.RpcException",
    "id": "11222409",
    "parsedStack": [
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 0
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 1
      },
      {
        "assembly": "Temporalio, Version=1.4.0.0, Culture=neutral, PublicKeyToken=null",
        "method": "Temporalio.Bridge.Client+<CallAsync>d__14`1.MoveNext",
        "level": 2
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 3
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 4
      },
      {
        "assembly": "Temporalio, Version=1.4.0.0, Culture=neutral, PublicKeyToken=null",
        "method": "Temporalio.Client.TemporalConnection+<InvokeRpcAsync>d__42`1.MoveNext",
        "level": 5
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 6
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 7
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter`1.GetResult",
        "level": 8
      },
      {
        "assembly": "AutomatedActions.Services, Version=0.2.86.0, Culture=neutral, PublicKeyToken=null",
        "method": "AutomatedActions.Services.HealthChecks.TemporalConnectionHealthCheck+<CheckHealthAsync>d__2.MoveNext",
        "level": 9,
        "line": 26,
        "fileName": "/agent/_work/1/s/src/Workflow/src/AutomatedActions.Services/HealthChecks/TemporalConnectionHealthCheck.cs"
      }
    ]
  }
]

One this worth mentioning is that I don't think this is actually a timeout issue, because the exception is thrown immediately-ish: Temporal client is unreachable: elapsed: 00:00:00.0022905.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions