-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severely degraded performance of macos-14-xlarge runners #10098
Comments
Hey @shagedorn! Please, provide some examples (build links would be okay). |
Our repos are private, I have no such links (or I assume they are not useful – are they?). An example job definition, just stripped of some comments:
This alone already shows the regression described above, but if desired, I can also include/share a simple Ruby setup step in full which shows the regression very clearly for us. |
@shagedorn private links are okay to be provided. We don't have access to what's going on in the private repository, but we can get some useful technical information. |
I see 🙂 So I picked 3 random runs from the last days before the issue occurred. All of them use runner image https://github.com/biowink/clue-ios-rebirth/actions/runs/9565398661/job/26368249268 I deem all of them healthy, "Set up job" took around 1s, and our Ruby setup was in the ~25s range. They finished in ~13 minutes. Here are 3 recent runs that timed out after 20 minutes: https://github.com/biowink/clue-ios-rebirth/actions/runs/9582869833/job/26422805988 The "Set up job" is 2-3s here, and the Ruby setup is 44-50s. And then things get worse in the more substantial steps within Xcode. We have not made any workflow changes in between, and we pin all our tools (incl. Xcode) to specific versions and have not made any upgrades. Thank you for looking into this. |
We are in the same exact position here. The performances of all pipelines running on latests versions of MacOS are completely messed up. This is the run time of on old pipeline while things where actually working: Compared to the current situation: On top of the poor performance I tried to revert all the latest commits to Any idea if it's possible to select a runner image version to run the workflow on? At the current state the MacOS runners are not usable on our side. Every pro tip is welcome! Old runs: Since yesterday: |
Hey @AleBorini!
Does it mean that you see performance degradation on Standard Runners too? Since the same time? After the image update or just from some point in time? |
I just build this version of the same pipeline to verify what is what => https://github.com/LEGO/fabuland/actions/runs/9596143061/job/26462381554 It will run the worflow against every image of MacOS available stored in the matrix. From |
To follow up on the issue. As mentioned above I built a pipeline that runs the same steps on different MacOS versions and the results are all over the place. MacOS 14 runner looks like is the only one useable at the moment. Even simple tasks like Confirm that is happening from yesterday morning UK time. This is the link to the run, I will keep the PR alive for a while => https://github.com/LEGO/fabuland/actions/runs/9596143061 |
@erik-bershel I shared details in a support ticket 2849698. |
I have observed similar gradual degradation of runner performance.
I have noticed that yesterday when different repos started timing out. |
We're experiencing the same issue. During the last two weeks our workflow time climbed from ~24 minutes to ~56 minutes. |
@erik-bershel can you please leave a status update? I received a comment update email Friday night, but it seems this was deleted since. I started seeing normal runtimes again this morning, but the lack of any updates, either here or in form of an incident on https://www.githubstatus.com, doesn't exactly create confidence. This was a pretty large outage which you (= GitHub), according to your since-deleted comment, saw in your own monitoring. Why does it take so long to be acknowledged? It seems that all affected users either had unusable CI systems or essentially paid double (sometimes more) for a degraded service, for 1-2 days. |
Hey @shagedorn! I understand your concern - this is a very important point for users. 💯💯💯 |
During the past week, certain customers utilizing macOS runners may have noticed decreased performance when running Actions workflows. This issue arose due to the simultaneous implementation of a macOS update and a network driver update. Upon identifying the affected configurations, we reverted the network driver to its previous version and will ensure to avoid such conflicts in future updates. |
Description
Starting yesterday (June 19th) ~2pm CEST, we see severely degraded performance in all our
macos-14-xlarge
-runner-based workflows, taking roughly twice the time for each step, leading to timeouts and increased cost (especially since these are the most expensive runners).We see these ~2x degredations across every step, from checkouts (standard
actions/checkout@v4
step) to Ruby setup to Xcode builds and test executions. The pure "Set up job" step in one case took 21s, when it's usually a 1s step.We had some runs since the issue started that performed normally, but they are rare… most are extremely slow, which means our CI is essentially down. We are irritated that all status reports are green, because these degredations shows in completely standard actions (like checkout) so we are fairly sure this is unrelated to our own setup or code.
Platforms affected
Runner images affected
Image version and build link
20240611.1
Is it regression?
Yes, but the same image version seemed to perform ok until it did not…
Expected behavior
Predictable, stable performance across runs.
Actual behavior
Unpredictable and severely degraded performance since yesterday in most runs.
Repro steps
See description.
The text was updated successfully, but these errors were encountered: