-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Media Platform Overloaded status #793
Comments
@rvleeuwen256
Depending on what we find from the logs you shared we may increase the logging verbosity to debug further at which point you may have to rerun your tests. Hope that's ok. |
Hi @rvleeuwen256 |
We are indeed careful with providing the AppId of the Bot. An example of a 2 overlapping calls that triggered the Media overload is the following: Teams Call Id: 34003b80-ddc7-47df-8231-adc02eda3399 Teams Call Id: 64003a80-bda7-490d-9629-c9acb1346ce5 At 2024-12-12 16:37:21.210: Media platform health changed from 'Normal' to 'HeavilyLoaded'. Interesting fact is that the overload happens while there was no active call anymore. The Media platform logs are way too big (even with just Information) to share here on GitHub. 4800121 2024-12-12 16:37:41.039 Warning [MediaPlatform] [AvMP][AppId:][RD28187857DFEF] TL_WARN(TF_COMPONENT) [RD28187857DFEF]3972.239::12/12/2024-16:37:41.039.0019ABD0 (MPAZAPPHOST,LogWarn:AzAppMPHostLogger.cs(96)) TL_WARN(TF_COMPONENT) [RD28187857DFEF]3972.239::12/12/2024-16:37:41.039.00110097 (AVMP,HeartBeatCallback:workitemqueue.cs(162)) [MP] ThreadPool: thread 33 considered stuck or inactive There are no errors or warnings that have the media session id in the logging. If you need more information please let us know. |
We have built a multi-tenant solution using the Media platform that handles roughly 10000 calls per day with audio and H.264 video. We are very pleased by the stability of the platform. However since the platform has moved from using WCF to HTTP (version 1.28) it reports the Overloaded status seemingly random multiple times per day. This is quite problematic because in this status the platform cannot handle new calls.
Investigation showed that this overload is not related to high CPU or network traffic, because we have seen cases where the platform was overloaded while there was only a single active call. When we run a load-test on our solution we cannot trigger the overload.
We have added additional logging in all callbacks from the MediaPlatform to ensure we handle the callbacks from the MediaPlatform very quickly (<10 ms) and we always return from the callback.
The issue happens both when running on .Net Framework 4.7.2. as well as on .Net 8.
We have enabled the logging of the Media platform and we do see some disturbing error messages. There seems to be a relation between this random overload and the message 'thread considered stuck or inactive'.
Is it possible the development team of the Media platform can have a look at these logs?
Microsoft.Skype.Bots.Media Version 1.29.0.75-preview
Microsoft.Graph.Communications.Calls.Media 1.2.0.10563
Microsoft.Graph 5.54.0
Microsoft.Graph.Core 3.1.11
The text was updated successfully, but these errors were encountered: