Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
This directory provides details about our Continuous Integration using GitHub actions.

# Overview

We enable multiple kinds of continuous integration to aide with the review of Cholla Pull Requests. At a high level, this machinery includes:

1. Jenkins (run on the CRC @ Pitt)
- actually builds and tests the code (with Nvidia GPUs)
- also performs linting

2. pre-commit.ci
- performs code formatting (e.g. clang-format)
- can be instructed to push a commit to the branch in a PR that fixes formatting issues.

3. GitHub Actions
- primarily performs a compilation check to verify that Cholla can be built with AMD GPUs (but we can't actually test the code)
- also provides a workflow to generate the container image to use in the compile-test


# More about about GitHub Actions

## Compilation Checks

In more detail, GitHub Actions uses the logic within ``compilation_checks.yml`` to actually run the compilation checks. This logic is executed every time a new PR is issued (or a new commit is added to a PR).

Importantly, all of this logic is executed within a custom docker image. This docker image contains all of the dependencies that we need for building Cholla.

Historically, we would pull the custom docker image from DockerHub, but (at the time of writing) we're in the process of transitioning to using the GitHub Container Registry. We provide more context about building down below.

## Building Images

Periodically, we need to bump the versions of compilers/libraries used within the docker image that we use for the compilation check. To do this, it's important to understand how an image gets built. Consequently,
- This section provides a basic overview of the details of building new images. The purpose is to provide the reader with a basic understanding of the various concepts.
- The next section outlines the actual procedure that needs to be followed in order to update the image used by the Compilation-Check.


> [!IMPORTANT]
> We will provide a procedure in the next section for updating the Docker Image used for the Compilation Checks. Be advised, simply updating a Dockerfile is **NOT** enough to do that.

### I. The recipe of image: **Dockerfile**

The contents of an image are dictated by the **Dockerfile**s found within the **docker/** directory. You should think of this as the recipe for the image (i.e. they provide the instructions to create an image).

### II. Building the Recipe
This is where the workflow in **build_image.yml** becomes relevant. The workflow is used to build the image from the Dockerfile

In more detail, this workflow is automatically triggered in PRs where we update this or we update one of the **Dockerfile**s.
- when the workflow runs, it builds a new Docker image from the ROCm Dockerfile
- when you manually trigger the workflow, the new docker image is uploaded to the GitHub container registry (ghcr.io)
- **IMPORTANTLY:** a new image will **NOT** be uploaded in when the workflow is triggered by a Pull Request (to avoid the creation of many, unnecessary images)
- **How to manually trigger:** this is a simple procedure (be aware, GitHub might change how exactly you do this in the future)
1. Navigate to the landing page for the Cholla repository
2. Click on the "Actions" tab
3. On the left sidebar, you will see a list of actions. Click on the ``Build-Image`` action.
4. Now, in the table at the center of the page, the top row should state that "This workflow has a `workflow_dispatch` event trigger." All the way on the right side of this column, there is a drop-down button that says "Run workflow". You should run the workflow from the desired branch (usually the dev branch).


## How do I update the image that we use for Compilation Checks?

This is a multi-step procedure for doing this.

1. open a new PR (to the dev branch), where you update the ROCm **Dockerfile**. The Dockerfile is the **only** thing that should change in the PR.

2. Somebody needs to review and merge the PR.
- When the PR is made, the ``Build-Image`` will automatically run to try to build the image (but it won't upload the resulting image).
- If the workflow fails to build an image, that's almost certainly that the **Dockerfile** has an error.

3. Once the Dockerfile-Update PR has been merged, it's now time to update to manually trigger the ``Build-Image`` workflow. As we previously noted up above, when the ``Build-Image`` workflow is manually triggered, it will upload the image to the GitHub Container Registry. (The procedure for manually triggering the workflow is described up above).

4. Wait a few minutes after step 3 is done to confirm that the image was successfully uploaded.
- You can see a list of all images built in this way by clicking on the packages button on the right sidebar of the main GitHub webpage for the Cholla repository.
- At the time of writing, each version of the image is named based on the Git Commit that the image was built from.

5. To actually use the new image in the Compilation Checks workflow, you need to modify the path of the container listed in `jobs.Build.strategy.matrix.container.link`. You should do this in a separate PR.

## Other Thoughts

As an experiment, I tried to see if we could get away with running the Compilation Tests without constructing a custom image. In more detail, I tried to make the compilation tests run using AMD's ROCm image and then manually install our dependencies. Unfortunately, I would get cryptic error messages while running ``sudo apt-get ...``. It's still a little unclear to me whether:

- this was an image-specific issue. In other words: were the problems arising because the ROCm image was overwriting the repositories to download packages from? (seems fairly unlikely).
- this was an issue related to the docker image's permissions?
- this is a docker "feature" (e.g. for security? related to docker layers?)

In any case, I don't know enough about it to fix it.
107 changes: 107 additions & 0 deletions .github/workflows/build_image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
name: Build-Image

# this drew a lot of inspiration from lots of guides on the internet...
# - ultimately, I was able to find
# https://docs.github.com/en/packages/managing-github-packages-using-github-actions-workflows/publishing-and-installing-a-package-with-github-actions#publishing-a-package-using-an-action
#

on:
# we limit the frequency of when this workflow is run

pull_request: # only run on PRs when relevant changes have been made
branches:
- main
- dev
paths:
- ".github/workflows/build_image.yml"
- "docker/**"

push:
branches:
- main
- dev
paths:
- ".github/workflows/build_image.yml"
- "docker/**"

workflow_dispatch: # run this when we manually trigger the workflow

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
IS_WORKFLOW_DISPATCH: ${{ github.event_name == 'workflow_dispatch' }}

jobs:
build-and-push-image:
runs-on: ubuntu-latest
# Sets the permissions granted to the `GITHUB_TOKEN` for the actions in
# this job.
permissions:
contents: read
packages: write
attestations: write
id-token: write
steps:
# other guides seem to recommend using Docker Buildx (over the normal
# checkout step), but I have a sneaking suspicion, that it was "messing
# up" the `context` argument within the
- name: Checkout repository
uses: actions/checkout@v4

# Use the `docker/login-action` action to log in to the Container
# registry registry using the account and password that will publish the
# packages.
# - Once published, the packages are scoped to the account defined here.
- name: Log in to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

# Use the docker/metadata-action action
# https://github.com/docker/metadata-action#about
# to extract tags and labels (from the GitHub repository, itself) that
# will be applied to the specified image.
- name: Extract metadata (tags, labels) for Docker
# set `id` to "meta", to make if possible for subsequent steps of this
# job to access the outputs of this current step
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
# the `images` argument sets the base name that we use for the image
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-ROCm
# we primarily name the image based on the (shortenned) hash of the
# git-commit that we used to generate the image.
tags: type=sha

# This step uses the `docker/build-push-action` action to build the
# image, based on our repository's `Dockerfile`. If the build succeeds,
# it pushes the image to GitHub Packages.
- name: Build and export
uses: docker/build-push-action@v6
with:
push: ${{ env.IS_WORKFLOW_DISPATCH }}
context: .
# obviously specifies the path to the docker file
file: docker/rocm/Dockerfile
# use the tags collected by the "Extract metadata" step
tags: ${{ steps.meta.outputs.tags }}
# use the labels collected by the "Extract metadata" step
labels: ${{ steps.meta.outputs.labels }}

# the online guide provided by GitHub really wants us to perform an
# "attestation" step, but I can't get it to work. So, we just skip it
# - Honestly, that step is somewhat irrelevanat for our purposes (i.e.
# creating an image to use as a build environment).
# - Attestation is mostly useful when the image is the primary "product"
# a project wants to ship


- name: Report
run: |
if [[ "${{ env.IS_WORKFLOW_DISPATCH }}" == 'true' ]]; then
echo "Successfully Built And Uploaded Image"
else
echo "Successfully Built Image"
fi
3 changes: 3 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This directory contains Dockerfiles used to create docker images for compile tests that we use in GitHub Actions.

Go see the README.md within the github actions directory for more details ([here](../.github/workflows/README.md)).
2 changes: 1 addition & 1 deletion docker/rocm/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM rocm/dev-ubuntu-20.04:5.2.3
FROM rocm/dev-ubuntu-20.04:5.5.1

# Avoid annoying cmake -> tzdata install prompt
ENV DEBIAN_FRONTEND=noninteractive
Expand Down