Skip to content

Commit 1e4fe9b

Browse files
committed
Simplify caching mechanisms for CI and PROD images
For a long time we had used a sophisticated mechanism to speed up our CI jobs by building the images in "pull_request_target" workflow and pushing them to GitHub registry. That however had several drawbacks: * CI image was complex when it comes to layer setup (we had to pre- cache installed dependencies by installing them from branch tip * The pull_request_target is a very dangerous workflow, we had a number of security problems with it (and it's difficult to debug) * Caching of `pip` and `uv` was not used because it increased size of the image significantly This PR significantly improves the caching mechanisms for the images building of several advacements that were not possible before: * The upload-artifacts@v4 action and improved stash action developed by @assignUser and published in "apache/infrastructure-actions" allows us to store all images (8GB per run) in artifacts rather than in registry - so we can do the image build once and share it with all the jobs. * The uv speed is "enough" to allow occasional installation of Airlfow locally. This allows to utilize cache-mount and locally build uv cache, rather than rely on "remote" cache when we are building local images for breeze. The first time you build local breeze image it will take 2-5 more minutes (depending on your network speed, but because we can utilise cache mounts, every subsequent build should be very fast - even if all dependencies change. Using uv also allows to "always" reinstall airflow when you build the image even if single source file changed, because with cache it takes sub-seconds to reinstall airflow and all dependencies. * the cache mounts are not included in the image size, and since we can export and import images in CI in artifacts and we do not need to rebuild them, the images shared as compressed artifacts are relatively small (2GB) - cache of `uv` is around 4GB on top of that so sharing image built in the "build image" job with other jobs in the same workflow is fast. * we are still using registry cache for the "non-python" parts of the image - both CI and breeze image build speed benefit from using the image cache for system dependencies, database clients etc. Fixes: #42999 Fixes: #43268
1 parent a22faa5 commit 1e4fe9b

File tree

127 files changed

+2299
-2827
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

127 files changed

+2299
-2827
lines changed

.github/actions/checkout_target_commit/action.yml

-81
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
#
18+
---
19+
name: 'Prepare all CI images'
20+
description: 'Recreates current python CI images from artifacts for all python versions'
21+
inputs:
22+
python-versions-list-as-string:
23+
description: 'Stringified array of all Python versions to test - separated by spaces.'
24+
required: true
25+
platform:
26+
description: 'Platform for the build - linux/amd64 or linux/arm64'
27+
required: true
28+
outputs:
29+
host-python-version:
30+
description: Python version used in host
31+
value: ${{ steps.breeze.outputs.host-python-version }}
32+
runs:
33+
using: "composite"
34+
steps:
35+
- name: "Cleanup docker"
36+
run: ./scripts/ci/cleanup_docker.sh
37+
shell: bash
38+
# TODO: Currently we cannot loop through the list of python versions and have dynamic list of
39+
# tasks. Instead we hardcode all possible python versions and they - but
40+
# this should be implemented in stash action as list of keys to download.
41+
# That includes 3.8 - 3.12 as we are backporting it to v2-10-test branch
42+
- name: "Restore CI docker image ${{ inputs.platform }}:3.8"
43+
uses: ./.github/actions/prepare_single_ci_image
44+
with:
45+
platform: ${{ inputs.platform }}
46+
python: "3.8"
47+
python-versions-list-as-string: ${{ inputs.python-versions-list-as-string }}
48+
- name: "Restore CI docker image ${{ inputs.platform }}:3.9"
49+
uses: ./.github/actions/prepare_single_ci_image
50+
with:
51+
platform: ${{ inputs.platform }}
52+
python: "3.9"
53+
python-versions-list-as-string: ${{ inputs.python-versions-list-as-string }}
54+
- name: "Restore CI docker image ${{ inputs.platform }}:3.10"
55+
uses: ./.github/actions/prepare_single_ci_image
56+
with:
57+
platform: ${{ inputs.platform }}
58+
python: "3.10"
59+
python-versions-list-as-string: ${{ inputs.python-versions-list-as-string }}
60+
- name: "Restore CI docker image ${{ inputs.platform }}:3.11"
61+
uses: ./.github/actions/prepare_single_ci_image
62+
with:
63+
platform: ${{ inputs.platform }}
64+
python: "3.11"
65+
python-versions-list-as-string: ${{ inputs.python-versions-list-as-string }}
66+
- name: "Restore CI docker image ${{ inputs.platform }}:3.12"
67+
uses: ./.github/actions/prepare_single_ci_image
68+
with:
69+
platform: ${{ inputs.platform }}
70+
python: "3.12"
71+
python-versions-list-as-string: ${{ inputs.python-versions-list-as-string }}

.github/actions/prepare_breeze_and_image/action.yml

+32-14
Original file line numberDiff line numberDiff line change
@@ -16,30 +16,48 @@
1616
# under the License.
1717
#
1818
---
19-
name: 'Prepare breeze && current python image'
20-
description: 'Installs breeze and pulls current python image'
19+
name: 'Prepare breeze && current image (CI or PROD)'
20+
description: 'Installs breeze and recreates current python image from artifact'
2121
inputs:
22-
pull-image-type:
23-
description: 'Which image to pull'
24-
default: CI
22+
python:
23+
description: 'Python version for image to prepare'
24+
required: true
25+
image-type:
26+
description: 'Which image type to prepare (CI/PROD)'
27+
default: "CI"
28+
platform:
29+
description: 'Platform for the build - linux/amd64 or linux/arm64'
30+
required: true
2531
outputs:
2632
host-python-version:
2733
description: Python version used in host
2834
value: ${{ steps.breeze.outputs.host-python-version }}
2935
runs:
3036
using: "composite"
3137
steps:
38+
- name: "Cleanup docker"
39+
run: ./scripts/ci/cleanup_docker.sh
40+
shell: bash
3241
- name: "Install Breeze"
3342
uses: ./.github/actions/breeze
3443
id: breeze
35-
- name: Login to ghcr.io
36-
shell: bash
37-
run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
38-
- name: Pull CI image ${{ env.PYTHON_MAJOR_MINOR_VERSION }}:${{ env.IMAGE_TAG }}
44+
- name: "Restore CI docker image ${{ inputs.platform }}:${{ inputs.python }}"
45+
uses: apache/infrastructure-actions/stash/restore@c94b890bbedc2fc61466d28e6bd9966bc6c6643c
46+
with:
47+
key: "ci-image-save-${{ inputs.platform }}-${{ inputs.python }}"
48+
path: "/tmp/"
49+
if: inputs.image-type == 'CI'
50+
- name: "Load CI image ${{ inputs.platform }}:${{ inputs.python }}"
51+
run: breeze ci-image load --platform ${{ inputs.platform }} --python ${{ inputs.python }}
3952
shell: bash
40-
run: breeze ci-image pull --tag-as-latest
41-
if: inputs.pull-image-type == 'CI'
42-
- name: Pull PROD image ${{ env.PYTHON_MAJOR_MINOR_VERSION }}:${{ env.IMAGE_TAG }}
53+
if: inputs.image-type == 'CI'
54+
- name: "Restore PROD docker image ${{ inputs.platform }}:${{ inputs.python }}"
55+
uses: apache/infrastructure-actions/stash/restore@c94b890bbedc2fc61466d28e6bd9966bc6c6643c
56+
with:
57+
key: "prod-image-save-${{ inputs.platform }}-${{ inputs.python }}"
58+
path: "/tmp/"
59+
if: inputs.image-type == 'PROD'
60+
- name: "Load PROD image ${{ inputs.platform }}:${{ inputs.python }}"
61+
run: breeze prod-image load --platform ${{ inputs.platform }} --python ${{ inputs.python }}
4362
shell: bash
44-
run: breeze prod-image pull --tag-as-latest
45-
if: inputs.pull-image-type == 'PROD'
63+
if: inputs.image-type == 'PROD'
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
#
18+
---
19+
name: 'Prepare single images'
20+
description: 'Recreates current python image from artifacts'
21+
inputs:
22+
python:
23+
description: 'Python version for image to prepare'
24+
required: true
25+
python-versions-list-as-string:
26+
description: 'Stringified array of all Python versions to prepare - separated by spaces.'
27+
required: true
28+
platform:
29+
description: 'Platform for the build - linux/amd64 or linux/arm64'
30+
required: true
31+
outputs:
32+
host-python-version:
33+
description: Python version used in host
34+
value: ${{ steps.breeze.outputs.host-python-version }}
35+
runs:
36+
using: "composite"
37+
steps:
38+
- name: "Restore CI docker images ${{ inputs.platform }}:${{ inputs.python }}"
39+
uses: apache/infrastructure-actions/stash/restore@c94b890bbedc2fc61466d28e6bd9966bc6c6643c
40+
with:
41+
key: "ci-image-save-${{ inputs.platform }}-${{ inputs.python }}"
42+
path: "/tmp/"
43+
if: contains(inputs.python-versions-list-as-string, inputs.python)
44+
- name: "Load CI image ${{ inputs.platform }}:${{ inputs.python }}"
45+
run: breeze ci-image load --platform "${{ inputs.platform }}" --python "${{ inputs.python }}"
46+
shell: bash
47+
if: contains(inputs.python-versions-list-as-string, inputs.python)

.github/workflows/additional-ci-image-checks.yml

+1-8
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,6 @@ on: # yamllint disable-line rule:truthy
3232
description: "The array of labels (in json form) determining self-hosted runners."
3333
required: true
3434
type: string
35-
image-tag:
36-
description: "Tag to set for the image"
37-
required: true
38-
type: string
3935
python-versions:
4036
description: "The list of python versions (stringified JSON array) to run the tests on."
4137
required: true
@@ -103,8 +99,6 @@ jobs:
10399
contents: read
104100
# This write is only given here for `push` events from "apache/airflow" repo. It is not given for PRs
105101
# from forks. This is to prevent malicious PRs from creating images in the "apache/airflow" repo.
106-
# For regular build for PRS this "build-prod-images" workflow will be skipped anyway by the
107-
# "in-workflow-build" condition
108102
packages: write
109103
secrets: inherit
110104
with:
@@ -159,7 +153,7 @@ jobs:
159153
# # There is no point in running this one in "canary" run, because the above step is doing the
160154
# # same build anyway.
161155
# build-ci-arm-images:
162-
# name: Build CI ARM images (in-workflow)
156+
# name: Build CI ARM images
163157
# uses: ./.github/workflows/ci-image-build.yml
164158
# permissions:
165159
# contents: read
@@ -169,7 +163,6 @@ jobs:
169163
# push-image: "false"
170164
# runs-on-as-json-public: ${{ inputs.runs-on-as-json-public }}
171165
# runs-on-as-json-self-hosted: ${{ inputs.runs-on-as-json-self-hosted }}
172-
# image-tag: ${{ inputs.image-tag }}
173166
# python-versions: ${{ inputs.python-versions }}
174167
# platform: "linux/arm64"
175168
# branch: ${{ inputs.branch }}

.github/workflows/additional-prod-image-tests.yml

+14-33
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,6 @@ on: # yamllint disable-line rule:truthy
3232
description: "Branch used to construct constraints URL from."
3333
required: true
3434
type: string
35-
image-tag:
36-
description: "Tag to set for the image"
37-
required: true
38-
type: string
3935
upgrade-to-newer-dependencies:
4036
description: "Whether to upgrade to newer dependencies (true/false)"
4137
required: true
@@ -70,7 +66,6 @@ jobs:
7066
default-python-version: ${{ inputs.default-python-version }}
7167
branch: ${{ inputs.default-branch }}
7268
use-uv: "false"
73-
image-tag: ${{ inputs.image-tag }}
7469
build-provider-packages: ${{ inputs.default-branch == 'main' }}
7570
upgrade-to-newer-dependencies: ${{ inputs.upgrade-to-newer-dependencies }}
7671
chicken-egg-providers: ${{ inputs.chicken-egg-providers }}
@@ -88,7 +83,6 @@ jobs:
8883
default-python-version: ${{ inputs.default-python-version }}
8984
branch: ${{ inputs.default-branch }}
9085
use-uv: "false"
91-
image-tag: ${{ inputs.image-tag }}
9286
build-provider-packages: ${{ inputs.default-branch == 'main' }}
9387
upgrade-to-newer-dependencies: ${{ inputs.upgrade-to-newer-dependencies }}
9488
chicken-egg-providers: ${{ inputs.chicken-egg-providers }}
@@ -117,36 +111,25 @@ jobs:
117111
persist-credentials: false
118112
- name: "Cleanup docker"
119113
run: ./scripts/ci/cleanup_docker.sh
120-
- name: "Install Breeze"
121-
uses: ./.github/actions/breeze
122-
- name: Login to ghcr.io
123-
shell: bash
124-
run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
125-
- name: Pull PROD image ${{ inputs.default-python-version}}:${{ inputs.image-tag }}
126-
run: breeze prod-image pull --tag-as-latest
127-
env:
128-
PYTHON_MAJOR_MINOR_VERSION: "${{ inputs.default-python-version }}"
129-
IMAGE_TAG: "${{ inputs.image-tag }}"
130-
- name: "Setup python"
131-
uses: actions/setup-python@v5
114+
- name: "Prepare breeze & PROD image: ${{ inputs.default-python-version }}"
115+
uses: ./.github/actions/prepare_breeze_and_image
132116
with:
133-
python-version: ${{ inputs.default-python-version }}
134-
cache: 'pip'
135-
cache-dependency-path: ./dev/requirements.txt
117+
platform: "linux/amd64"
118+
image-type: "PROD"
119+
python: ${{ inputs.default-python-version }}
136120
- name: "Test examples of PROD image building"
137121
run: "
138122
cd ./docker_tests && \
139123
python -m pip install -r requirements.txt && \
140124
TEST_IMAGE=\"ghcr.io/${{ github.repository }}/${{ inputs.default-branch }}\
141-
/prod/python${{ inputs.default-python-version }}:${{ inputs.image-tag }}\" \
125+
/prod/python${{ inputs.default-python-version }}\" \
142126
python -m pytest test_examples_of_prod_image_building.py -n auto --color=yes"
143127

144128
test-docker-compose-quick-start:
145129
timeout-minutes: 60
146-
name: "Docker-compose quick start with PROD image verifying"
130+
name: "Docker Compose quick start with PROD image verifying"
147131
runs-on: ${{ fromJSON(inputs.runs-on-as-json-public) }}
148132
env:
149-
IMAGE_TAG: "${{ inputs.image-tag }}"
150133
PYTHON_MAJOR_MINOR_VERSION: "${{ inputs.default-python-version }}"
151134
GITHUB_REPOSITORY: ${{ github.repository }}
152135
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -161,14 +144,12 @@ jobs:
161144
with:
162145
fetch-depth: 2
163146
persist-credentials: false
164-
- name: "Cleanup docker"
165-
run: ./scripts/ci/cleanup_docker.sh
166-
- name: "Install Breeze"
167-
uses: ./.github/actions/breeze
168-
- name: Login to ghcr.io
169-
shell: bash
170-
run: echo "${{ env.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
171-
- name: "Pull image ${{ inputs.default-python-version}}:${{ inputs.image-tag }}"
172-
run: breeze prod-image pull --tag-as-latest
147+
- name: "Prepare breeze & PROD image: ${{ env.PYTHON_MAJOR_MINOR_VERSION }}"
148+
uses: ./.github/actions/prepare_breeze_and_image
149+
with:
150+
platform: "linux/amd64"
151+
image-type: "PROD"
152+
python: ${{ env.PYTHON_MAJOR_MINOR_VERSION }}
153+
id: breeze
173154
- name: "Test docker-compose quick start"
174155
run: breeze testing docker-compose-tests

0 commit comments

Comments
 (0)