You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a host startup fails (e.g., due to storage connectivity issues), sets the host to Error and initiates a new host startup. This new startup acquires _hostStartSemaphore and calls BuildHost(). During that process, WorkerFunctionMetadataProvider.GetFunctionMetadataAsync() detects no active worker channels and calls RestartHostAsync(). However, RestartHostAsync() attempts to cancel the same in-progress startup, and because there’s no ThrowIfCancellationRequested, _hostStartSemaphore is never released. The restart remains blocked, leaving the host in Error state until it is manually restarted.
Repro steps
First Host Start
The host begins to initialize (loads metadata, starts worker channels, etc.).
Failure Connecting to Storage
A transient error (e.g., DNS or storage connection issue) leads to an aborted startup and moves the host state to Error.
Host Startup Canceled
Because of the error, the system transitions the existing startup to a canceled state (shutting down worker channels).
New Host Scheduled
The system schedules a new host startup after a short delay.
Second Host Startup
This new host startup acquires the _hostStartSemaphore and begins another BuildHost() process.
Metadata Provider Finds No Channels
Inside WorkerFunctionMetadataProvider.GetFunctionMetadataAsync(), the code detects that no channels exist (they were previously shut down).
// During the restart flow, GetFunctionMetadataAsync gets invoked
// again through a new script host initialization flow.
_logger.LogDebug("Host is running without any initialized channels, restarting the JobHost.");
await_scriptHostManager.RestartHostAsync();
}
RestartHostAsync() Called
Since no channels are active, RestartHostAsync() is invoked to re-initialize the host.
Cancellation Attempt RestartHostAsync() attempts to cancel the active startup operation by calling Cancel() on its CancellationTokenSource. However, because this cancellation call happens within the same call stack/async flow as the current BuildHost(), there is no ThrowIfCancellationRequested check or natural yield point to abort the build operation.
Semaphore Deadlock
With the second startup still holding the _hostStartSemaphore—and never releasing it due to the ineffective cancellation—the new restart attempt blocks indefinitely when trying to reacquire that semaphore.
At this point, the host remains in an Error state until it is manually restarted, since the restart logic is effectively deadlocked.
When a host startup fails (e.g., due to storage connectivity issues), sets the host to Error and initiates a new host startup. This new startup acquires
_hostStartSemaphore
and callsBuildHost()
. During that process,WorkerFunctionMetadataProvider.GetFunctionMetadataAsync()
detects no active worker channels and callsRestartHostAsync()
. However,RestartHostAsync()
attempts to cancel the same in-progress startup, and because there’s noThrowIfCancellationRequested
,_hostStartSemaphore
is never released. The restart remains blocked, leaving the host in Error state until it is manually restarted.Repro steps
First Host Start
The host begins to initialize (loads metadata, starts worker channels, etc.).
Failure Connecting to Storage
A transient error (e.g., DNS or storage connection issue) leads to an aborted startup and moves the host state to Error.
Host Startup Canceled
Because of the error, the system transitions the existing startup to a canceled state (shutting down worker channels).
New Host Scheduled
The system schedules a new host startup after a short delay.
Second Host Startup
This new host startup acquires the
_hostStartSemaphore
and begins anotherBuildHost()
process.Metadata Provider Finds No Channels
Inside
WorkerFunctionMetadataProvider.GetFunctionMetadataAsync()
, the code detects that no channels exist (they were previously shut down).azure-functions-host/src/WebJobs.Script/Host/WorkerFunctionMetadataProvider.cs
Lines 81 to 97 in dae16f9
RestartHostAsync()
CalledSince no channels are active,
RestartHostAsync()
is invoked to re-initialize the host.Cancellation Attempt
RestartHostAsync()
attempts to cancel the active startup operation by callingCancel()
on itsCancellationTokenSource
. However, because this cancellation call happens within the same call stack/async flow as the currentBuildHost()
, there is noThrowIfCancellationRequested
check or natural yield point to abort the build operation.azure-functions-host/src/WebJobs.Script.WebHost/WebJobsScriptHostService.cs
Lines 562 to 577 in dae16f9
With the second startup still holding the
_hostStartSemaphore
—and never releasing it due to the ineffective cancellation—the new restart attempt blocks indefinitely when trying to reacquire that semaphore.At this point, the host remains in an Error state until it is manually restarted, since the restart logic is effectively deadlocked.
Example Call Stack
The text was updated successfully, but these errors were encountered: