Severely degraded performance of macos-14-xlarge runners #10098

shagedorn · 2024-06-20T05:29:41Z

erik-bershel · 2024-06-20T07:46:02Z

Please, provide some examples (build links would be okay).

shagedorn · 2024-06-20T07:52:58Z

Our repos are private, I have no such links (or I assume they are not useful – are they?).

An example job definition, just stripped of some comments:

name: Deploy Internal

on:
  workflow_dispatch:
  workflow_call:
    inputs:
      bundle-version:
        required: true
        type: string

jobs:

  build_and_upload_internal_build:
    runs-on: macos-14-xlarge
    timeout-minutes: 20

    steps:
    
    - name: Checkout
      uses: actions/checkout@v4

    … other steps

This alone already shows the regression described above, but if desired, I can also include/share a simple Ruby setup step in full which shows the regression very clearly for us.

erik-bershel · 2024-06-20T07:59:09Z

@shagedorn private links are okay to be provided. We don't have access to what's going on in the private repository, but we can get some useful technical information.
To understand the situation, we would like to collect as much information as possible. Not only how it works now, but also how it worked previously for comparison. To do this, we need examples of past launches for the same work flow.

shagedorn · 2024-06-20T08:09:40Z

I see 🙂 So I picked 3 random runs from the last days before the issue occurred. All of them use runner image 20240611.1.

https://github.com/biowink/clue-ios-rebirth/actions/runs/9565398661/job/26368249268
https://github.com/biowink/clue-ios-rebirth/actions/runs/9543717870/job/26300856571
https://github.com/biowink/clue-ios-rebirth/actions/runs/9515963801/job/26231278292

I deem all of them healthy, "Set up job" took around 1s, and our Ruby setup was in the ~25s range. They finished in ~13 minutes.

Here are 3 recent runs that timed out after 20 minutes:

https://github.com/biowink/clue-ios-rebirth/actions/runs/9582869833/job/26422805988
https://github.com/biowink/clue-ios-rebirth/actions/runs/9583761389/job/26425770660
https://github.com/biowink/clue-ios-rebirth/actions/runs/9584509128/job/26428290330

The "Set up job" is 2-3s here, and the Ruby setup is 44-50s. And then things get worse in the more substantial steps within Xcode.

We have not made any workflow changes in between, and we pin all our tools (incl. Xcode) to specific versions and have not made any upgrades.

Thank you for looking into this.

icecoffin · 2024-06-20T08:28:26Z

We're experiencing the same issue, with almost a 2x increase in runtime. This is actually similar to what we had back then when we were using standard runners, so I wonder if the action is actually running on a standard runner instead of a large runner (although the "Set up job" sections of the runs are the same).

One thing I noticed: if I go to the Runners page in the repository (https://github.com/<organization>/<repository>/actions/runners), it shows "1 available runner" and "Unprovisioned" next to "Larger GitHub-hosted runners." I don't know if this is how it's supposed to be, and I don't know how this page looked before. I was told that there were no organization-wide settings changes that could've affected this.

AleBorini · 2024-06-20T09:28:24Z

We are in the same exact position here. The performances of all pipelines running on latests versions of MacOS are completely messed up.

This is the run time of on old pipeline while things where actually working:

Compared to the current situation:

On top of the poor performance I tried to revert all the latest commits to main to the last successful run and stripped the pipeline of all the cache to avoid false negatives.

Any idea if it's possible to select a runner image version to run the workflow on? At the current state the MacOS runners are not usable on our side.

Every pro tip is welcome!

Old runs:
https://github.com/LEGO/fabuland/actions/runs/9548243121

Since yesterday:
https://github.com/LEGO/fabuland/actions/runs/9578611013

erik-bershel · 2024-06-20T10:09:51Z

Hey @AleBorini!

The performances of all pipelines running on latests versions of MacOS are completely messed up.

Does it mean that you see performance degradation on Standard Runners too? Since the same time? After the image update or just from some point in time?

AleBorini · 2024-06-20T10:14:41Z

Hey @AleBorini!

The performances of all pipelines running on latests versions of MacOS are completely messed up.

Does it mean that you see performance degradation on Standard Runners too? Since the same time? After the image update or just from some point in time?

I just build this version of the same pipeline to verify what is what => https://github.com/LEGO/fabuland/actions/runs/9596143061/job/26462381554

It will run the worflow against every image of MacOS available stored in the matrix. From macos-12 to macos-latest-xlarge. I expect 12 and 13 to fail because of the setup Ruby step but other than I should have a more clear view of what images are not running as the used to.

AleBorini · 2024-06-20T14:17:43Z

Hey @AleBorini!

The performances of all pipelines running on latests versions of MacOS are completely messed up.

Does it mean that you see performance degradation on Standard Runners too? Since the same time? After the image update or just from some point in time?

To follow up on the issue. As mentioned above I built a pipeline that runs the same steps on different MacOS versions and the results are all over the place.

MacOS 14 runner looks like is the only one useable at the moment. Even simple tasks like yarn install or pod install on MacOS extra large takes double the time.

Confirm that is happening from yesterday morning UK time.

This is the link to the run, I will keep the PR alive for a while => https://github.com/LEGO/fabuland/actions/runs/9596143061

mr-v · 2024-06-21T08:33:39Z

@erik-bershel I shared details in a support ticket 2849698.

TomaszLizer · 2024-06-21T12:05:21Z

I have observed similar gradual degradation of runner performance.
Runner used in workflow: macos-14-xlarge

Up until version 20240526.2 build was taking around ~5-6mins
https://github.com/Adaptavant/IOS-Scheduling-Module/actions/runs/9348483892
On 20240611.1 it went up a bit ~7mins
https://github.com/Adaptavant/IOS-Scheduling-Module/actions/runs/9566792467
Now on 20240616.1 build is taking over 10mins
https://github.com/Adaptavant/IOS-Scheduling-Module/actions/runs/9602402322

I have noticed that yesterday when different repos started timing out.

Vyazovoy · 2024-06-21T14:44:29Z

We're experiencing the same issue. During the last two weeks our workflow time climbed from ~24 minutes to ~56 minutes.

AleBorini · 2024-06-21T15:01:37Z

Im still investigating on our side but also our release pipelines are affected. The run time has doubled running on MacOS 14 extra large.

The inconsistency is the real issue. Simple commands like yarn install or pod install take forever and with extremely different results.

shagedorn · 2024-06-24T07:44:27Z

@erik-bershel can you please leave a status update? I received a comment update email Friday night, but it seems this was deleted since. I started seeing normal runtimes again this morning, but the lack of any updates, either here or in form of an incident on https://www.githubstatus.com, doesn't exactly create confidence.

This was a pretty large outage which you (= GitHub), according to your since-deleted comment, saw in your own monitoring. Why does it take so long to be acknowledged? It seems that all affected users either had unusable CI systems or essentially paid double (sometimes more) for a degraded service, for 1-2 days.

erik-bershel · 2024-06-24T09:15:05Z

Hey @shagedorn!

I understand your concern - this is a very important point for users. 💯💯💯
Now I am awaiting information from colleagues - as soon as any details appear, we will immediately share them with you.

AleBorini · 2024-06-24T09:23:52Z

I can confirm the run time seems to be improved from Friday. Im running more tests on my side to check if all our MacOS pipelines are back to normal.

This is an example of our QA release build:

sarahbarili · 2024-06-24T16:37:12Z

During the past week, certain customers utilizing macOS runners may have noticed decreased performance when running Actions workflows. This issue arose due to the simultaneous implementation of a macOS update and a network driver update. Upon identifying the affected configurations, we reverted the network driver to its previous version and will ensure to avoid such conflicts in future updates.

shagedorn added bug report needs triage labels Jun 20, 2024

erik-bershel added OS: macOS Area: Image administration investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Jun 20, 2024

erik-bershel self-assigned this Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Severely degraded performance of macos-14-xlarge runners #10098

Severely degraded performance of macos-14-xlarge runners #10098

shagedorn commented Jun 20, 2024

erik-bershel commented Jun 20, 2024

shagedorn commented Jun 20, 2024

erik-bershel commented Jun 20, 2024

shagedorn commented Jun 20, 2024

icecoffin commented Jun 20, 2024 •

edited

Loading

AleBorini commented Jun 20, 2024 •

edited

Loading

erik-bershel commented Jun 20, 2024

AleBorini commented Jun 20, 2024 •

edited

Loading

AleBorini commented Jun 20, 2024 •

edited

Loading

mr-v commented Jun 21, 2024

TomaszLizer commented Jun 21, 2024

Vyazovoy commented Jun 21, 2024

AleBorini commented Jun 21, 2024 •

edited

Loading

shagedorn commented Jun 24, 2024

erik-bershel commented Jun 24, 2024

AleBorini commented Jun 24, 2024

sarahbarili commented Jun 24, 2024

Severely degraded performance of macos-14-xlarge runners #10098

Severely degraded performance of macos-14-xlarge runners #10098

Comments

shagedorn commented Jun 20, 2024

Description

Platforms affected

Runner images affected

Image version and build link

Is it regression?

Expected behavior

Actual behavior

Repro steps

erik-bershel commented Jun 20, 2024

shagedorn commented Jun 20, 2024

erik-bershel commented Jun 20, 2024

shagedorn commented Jun 20, 2024

icecoffin commented Jun 20, 2024 • edited Loading

AleBorini commented Jun 20, 2024 • edited Loading

erik-bershel commented Jun 20, 2024

AleBorini commented Jun 20, 2024 • edited Loading

AleBorini commented Jun 20, 2024 • edited Loading

mr-v commented Jun 21, 2024

TomaszLizer commented Jun 21, 2024

Vyazovoy commented Jun 21, 2024

AleBorini commented Jun 21, 2024 • edited Loading

shagedorn commented Jun 24, 2024

erik-bershel commented Jun 24, 2024

AleBorini commented Jun 24, 2024

sarahbarili commented Jun 24, 2024

icecoffin commented Jun 20, 2024 •

edited

Loading

AleBorini commented Jun 20, 2024 •

edited

Loading

AleBorini commented Jun 20, 2024 •

edited

Loading

AleBorini commented Jun 20, 2024 •

edited

Loading

AleBorini commented Jun 21, 2024 •

edited

Loading