Bandwidth aware routing #670

phillebaba · 2024-12-17T23:50:24Z

Describe the problem to be solved

When running large clusters, situations can arise where the same image is being pulled from the same node. These happens especially during rollouts of new deployments where initially a few images will have pulled the image. In small clusters this is generally not a problem as the pressure on individual nodes is fairly limited. In large clusters however we can have hundreds of nodes pulling from the same node. As the underlying VM has limited network bandwidth the pulling of images will become slower and slower. Which could cause all image pulls to fail. It would be a lot more preferable to allow a few nodes to pull the image faster so that they also can start distributing the image.

Proposed solution to the problem

The easy solution would be to limit the amount of in flight requests to a node. This would however not cover the fact that different layers are of different size. Another option would be to limit the total amount of bytes that can be served, and deny any further requests. The third option would be to set a cap on the bandwidth when serving the layers so that new requests do not slow down in flight requests.

Relates to #551 and #530

phillebaba added the enhancement New feature or request label Dec 17, 2024

phillebaba moved this to Todo in Roadmap Dec 17, 2024

phillebaba added this to Roadmap Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bandwidth aware routing #670

Bandwidth aware routing #670

phillebaba commented Dec 17, 2024 •

edited

Loading

Bandwidth aware routing #670

Bandwidth aware routing #670

Comments

phillebaba commented Dec 17, 2024 • edited Loading

Describe the problem to be solved

Proposed solution to the problem

phillebaba commented Dec 17, 2024 •

edited

Loading