Can’t handle more than 1Gbps traffic on Edge #1515
Replies: 27 comments 3 replies
-
|
Hi, I've recently done some high-load testing with OME. There are some easy things to check here:
Good luck and keep us posted! |
Beta Was this translation helpful? Give feedback.
-
|
Hi, thanks for your reply! Our edge server has an Intel Xeon E-2386G CPU @ 3.50Ghz, with 6 physical, 12 logical cores, and 64GB RAM. What threads configuration do you recommend for having 30+ live streams, but only a few of them with high traffic? According to your suggestion, we set them like this: Are this threads settings kind of correct? If we stream 720p @3000kbps, and the Edge's network speed is 10Gpbs, then we should expect at least 3000 concurrent viewers right? Or is the calculation not so linear? Configuration should be the same if we expect all traffic from one live stream, than if traffic is evenly distributed accross all streams? (Eg: 3000 viewers in one stream vs 10 viewers in each of the 30 livestreams) We've also changed congestion control method to BBR, and disabled TCP segmentation as you mentioned. We're going to be testing with real traffic in a couple of hours. We will let you know how it goes! Thanks again for your fast reply! |
Beta Was this translation helpful? Give feedback.
-
|
I think your CPU is the bottleneck. 6C/12T is not much to work with at that scale. To serve a similar throughput, I was pretty much lighting up a Threadripper 7965WX with 24C/48T. I don't work a lot with Origin/Edge but I believe the AppWorkerCount is more relevant to your Origin server in this case, since it is the one actually receiving the 30 incoming streams. On the Edge server, do that 'top -H -p {OME PID}' check to see that OVTWorker isn't overwhelming the single thread or increase that number if so. Also for StreamWorker which I still believe could be the root issue. It will also reveal if any other process are maxing out threads. You need to keep the processes balanced since they will begin to split CPU time if there are not enough threads available. As far as your bandwidth calculation goes, I usually reserve a 30% overhead, i.e. treat your 10Gb connection as if it were a 7Gb connection. This accounts for spikes in bitrate, traffic, and the extra bits that repackaging the RTP stream adds (you'll notice your incoming and outgoing bitrate are slightly different). But I don't think this is your immediate bottleneck. Related to this, I am organizing a test on a ML research system with a 192C/384T CPU 🤓 Will report back with the results once I can get the time reserved on the server. |
Beta Was this translation helpful? Give feedback.
-
It sounds great! Sure we want to see those reports as soon as they are published. Regarding our server, we are on a tight budget right now. Maybe in a month or so we can upgrade it to a more powerful one. Does LLHLS require less CPU power for streaming at same bitrate, with same number of viewers, compared to WebRTC? |
Beta Was this translation helpful? Give feedback.
-
|
Yes, LLHLS uses less CPU versus WebRTC for playback since it is just focusing on serving out the chunk files and managing the sessions. I haven't done tests at scale with LLHLS but anecdotally others have observed it uses noticeably less resources. |
Beta Was this translation helpful? Give feedback.
-
|
You'll really want to put OME's llhls endpoint behind an nginx cache or something, cpu usage directly connecting to OME is insanely high. Latency will just be a bit higher on llhls than webrtc, ˜0s to ˜2-4s |
Beta Was this translation helpful? Give feedback.
-
|
Trust the wisdom of the lizard! They have done a lot more production testing with LLHLS than I have 👍 |
Beta Was this translation helpful? Give feedback.
-
|
We are running OME through Docker. Is any other configuration needed to enable multithreading? Also, how can we get OME's PID from Docker's container see each threads usage? |
Beta Was this translation helpful? Give feedback.
-
|
More anecdotal advice here but I believe there is a network performance impact when you use Docker networking (forwarding particular ports) rather than using passthrough host networking ("--network host"). There are some security considerations to this but it's worth a test.
You need to enter the Docker container first:
And here's a one liner to get the PID then check it (which assumes OME is running under a 'www-data' user):
|
Beta Was this translation helpful? Give feedback.
-
|
@bchah thanks for all this help. I replaced Docker networking with Hosts networking, and executed top-H to see threads usage. When there is one total viewer, with Edge Conf as I showed above, this is threads usage: I will be posting thread usage with real traffic in a couple of hours when we test it. |
Beta Was this translation helpful? Give feedback.
-
|
Cool, that's the readout! If you are using LLHLS you may need to adjust the LLHLS WorkerCount instead. But this readout will tell you which processes are bottlenecking. |
Beta Was this translation helpful? Give feedback.
-
|
After adding some few viewers, I only see AW-WebRTC0 thread usage increasing. Even though StreamWorkerCount is set to 6, there is no StreamWorker thread appearing as https://airensoft.gitbook.io/ovenmediaengine/performance-tuning#monitoring-the-usage-of-threads shows. |
Beta Was this translation helpful? Give feedback.
-
|
Here's my readout for the same command: AW-WebRTC is the AppWorker for WebRTC, i.e. the incoming stream(s). How many viewers do you have connected? EDIT: To clarify, Dech264 is the thumbnail encoder. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Interesting! I use another means of filling in the Server XML on system boot so haven't hit that issue before. Good luck on the big test 👍 |
Beta Was this translation helpful? Give feedback.
-
|
Hello again! After testing with real traffic, we found these limits with a single Edge Server as I mentioned above:
With 1 Publishers AppWorkerCount, 12 Publishers StreamWorkerCount, 6 WebRTC Signalling WorkerCount, and 6 WebRTC TcpRelayWorkerCount, we found the most appropiate configuration. We noticed that bitrate has a great impact not only in network usage, but in CPU usage too. We will be scaling our edge servers as our budget increases. Thanks @bchah for all your help! |
Beta Was this translation helpful? Give feedback.
-
|
Do you control the encoding of the streams? As in the original source? 4000kbps is quite a lot depending on what you're doing. If you need help with encoder settings I might be able to optimize it some |
Beta Was this translation helpful? Give feedback.
-
|
Those are great numbers for the amount of data you're pulling. Good tuning, especially for a 6-core CPU! My use case is similar where users generally stream at a minimum of 4Mbps. They use 10-bit 4:2:2, HDR, all the other fun stuff too. I'm orchestrating a load test soon on a 192-core machine, that'll be fun 😵💫 |
Beta Was this translation helpful? Give feedback.
-
Our Origin Server is only doing Audio Re-Encoding (due to WebRTC Opus). Video is bypassed. We use OBS to encode. |
Beta Was this translation helpful? Give feedback.
-
We figured out 4Mbps is quite the minimum for streaming @1080p, unless the video has many static segments such as Cartoons where bitrate can be lowered without compromising quality. Are you doing transcoding in your server? |
Beta Was this translation helpful? Give feedback.
-
Generally not. It works OK if you have basic inputs but gets very complex to handle things like different frame rates, colour spaces, etc... If you are implementing transcodes you'll almost certainly want to use the TranscodeWebook feature to decide how and when things are transcoded - otherwise it sends all streams to all defined encoders. On the topic of video quality, try a hardware encoder! OBS can utilize the encoders found in Apple computers and NVIDIA GPUs, or you can go for something like a Magewell Ultra Encode. These can give you better images at the same bitrate versus a software encoder. |
Beta Was this translation helpful? Give feedback.
-
This is the exact opposite of what we've found. Nvenc and amd's encoder give worse quality at the same bitrate. Saves CPU though of course I'll look at your OBS settings later I'm out now. |
Beta Was this translation helpful? Give feedback.
-
This is computers, everything is exact and nothing is exact 😝 GPU and software encoders both perform different under different conditions. I've found that GPUs handle motion better, and encoders with a dedicated FPGA like Magewell, Osprey, Haivision etc. work best. Here's a completely unscientific look at x264 Software vs Apple H264 Hardware both at 2Mbps: I think NVENC has improved over time and the 6th-gen cards onward look pretty good. My clients these to run encodes to HEVC 10-bit @ 12Mb and they look superb. Although you're certainly not wrong with the AMD encoders, those have always lagged behind. It's funny that they also make the XILINX platform but can't ship a decent encoder on their own SoCs. |
Beta Was this translation helpful? Give feedback.
-
|
Ahh, yeah I will admit most of my experience is with relatively low bitrate, 1080 at about 2.5mbps or less typically, lots of "slow" content for sure in comparison. We also have all sorts of input encoders, from brand new to super old GPUs so maybe that's coloring my experience too |
Beta Was this translation helpful? Give feedback.
-
|
Notably with those settings, your keyframe interval is super short. I like 4 as a nice middle ground, but I think it makes webrtc playback start more slowly. But bandwidth requirements (or, quality at the same bandwidth) go way up with lower keyframe intervals. Try backing off on that and bitrate to see how that looks. |
Beta Was this translation helpful? Give feedback.
-
|
Can confirm x264 with ffmpeg and preset medium or better looks a lot better than AMD's hardware encoder and a bit better (or on par) with NVENC, at typical 1080p bitrates and 0 b-frames. AMD's encoder looks especially garbage when having fast bright-dark transitions for example when live streaming a DJ rave party. Apple's H264 hardware implementation seems to be good on some devices, and garbage on other devices - I assume it adjusts depending on temperature / TDP availability. |
Beta Was this translation helpful? Give feedback.
-
|
I happened to reread this article. I think I was on vacation last year when this discussion was going on. This is a really good article that could help a lot of people. I think I should pin this. |
Beta Was this translation helpful? Give feedback.








Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, we’ve got an Origin-Edge architecture using OriginMapStore. Our edge server has a dedicated 10Gbps network, however, when streaming @3500kbps, no more than 300 viewers can watch our stream.
BigSalaEdgeMediaEngine edge * falseExecuting nload shows that we are at 1,1 Gbps network usage. We have implemented Linux Kernel settings as explained in OMEs troubleshooting page. We have also tested the server’s network speed connection with speedtest, and obtained +3Gbps upload speeds.
We are using WebRTC for streaming. Here is our Edge Conf:
`
`
Thanks for any advice, and please let us know if more information is required.
Beta Was this translation helpful? Give feedback.
All reactions