Skip to content
Closed
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
4b1e0c1
Introduce evaluator Docker-in-Docker setup, add SSL certificates, and…
JersyJ Jan 25, 2026
6b80e5b
Refactor Evaluator Docker Images, Introduces simple CI
JersyJ Feb 7, 2026
61fa33e
Remove the evaluator service from docker-compose.yml
JersyJ Feb 7, 2026
bf8ec57
Rename workflow
JersyJ Feb 7, 2026
a0b337e
Formatting, pre-commit
JersyJ Feb 7, 2026
1009628
Build context
JersyJ Feb 7, 2026
25233f1
Update name, and build context
JersyJ Feb 7, 2026
c323f44
Try to use context use default
JersyJ Feb 7, 2026
fe00a11
feat: Inject build contexts for local 'kelvin/' dependencies during i…
JersyJ Feb 7, 2026
a0d9156
Another try
JersyJ Feb 7, 2026
527eaaf
feat: explicitly tag Docker images with `:latest`
JersyJ Feb 7, 2026
a619225
build: Explicitly specify 'latest' tag for base image in Dockerfile.
JersyJ Feb 7, 2026
dfc016e
chore: Remove explicit docker context usage from workflow and strip `…
JersyJ Feb 7, 2026
76b3825
Try again
JersyJ Feb 7, 2026
dd0e1d8
Go back to the "docker" driver
JersyJ Feb 7, 2026
7537da6
ntroduce dedicated Docker Compose services for evaluator scheduler, C…
JersyJ Feb 8, 2026
bb9fec4
Migrate to prek
JersyJ Feb 8, 2026
e21081f
Try the default directory
JersyJ Feb 8, 2026
f523239
Improve internal API communication, add debug-mode SSL handling
JersyJ Feb 8, 2026
8faa012
Documentation for env.example
JersyJ Feb 8, 2026
1fe0b1f
Update UV version in CI and Dockerfiles, improve installation documen…
JersyJ Feb 8, 2026
df15c62
Refactor Docker configuration: remove unused network aliases and upda…
JersyJ Feb 8, 2026
8424e37
Implement evaluator image building via an entrypoint script
JersyJ Feb 8, 2026
0e60ace
Healtcheck Docker Status mode for Deployment Service
JersyJ Feb 8, 2026
95a0109
Move evaluator Dockerfile into a multi-stage build, Evaluator builf a…
JersyJ Feb 8, 2026
a8c126d
Fix Mypy issue
JersyJ Feb 8, 2026
254f02a
Simplify DooD socket access via socat and remove DOCKER_GROUP_ID
JersyJ Feb 8, 2026
68eea93
Add fail-fast and unhealthy state to health_check
JersyJ Feb 8, 2026
52f0e86
Add health_check_timeout to settings and deployment request model
JersyJ Feb 9, 2026
c63b6b7
Add health check timeout option to deployment and documentation
JersyJ Feb 9, 2026
0199ceb
Add documentation for evaluator images and update tests and pipeline …
JersyJ Feb 9, 2026
9ef32d5
Merge branch 'evaluator-deployment' into evaluator-refactor-images
JersyJ Feb 9, 2026
7424d46
Rename conclusion job to conclusion-images for clarity in workflow
JersyJ Feb 9, 2026
16f8252
Remove change detection and simplify image build process
JersyJ Feb 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,31 @@ submits/
submit_results/
.venv/
node_modules/

# Python
__pycache__/
*.py[cod]
*.pyd
*.pyo
*.so
.pytest_cache/
.mypy_cache/
.ruff_cache/
.coverage
htmlcov/

# Node
**/dist/
**/.vite/

# VCS / tooling
.git/

# Local data (avoid baking into images)
kelvin_data/
**/*.log

# Editor
.vscode/
.idea/
.DS_Store
18 changes: 18 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
### Kelvin
# ------------------------------------------------------------------------------

# !!! IMPORTANT: For Production deployments using Deployment Service, all file paths must be specified as absolute due to use of DooD (Docker out of Docker)

Expand All @@ -12,6 +13,9 @@ KELVIN__TASKS_PATH=./tasks
KELVIN__SUBMITS_PATH=./submits
# Path where submit results will be stored
KELVIN__SUBMIT_RESULTS_PATH=./submit_results
# (Optional) Internal URL used by workers. Defaults to https://nginx when running locally with Docker;
# otherwise defaults to the request URL in production or non-Docker local environments.
# API_INTERNAL_BASEURL=https://custom-internal-url

### Postgres
DATABASE__HOST=127.0.0.1
Expand Down Expand Up @@ -40,7 +44,21 @@ OPENAI__API_KEY=your_openai_api_key_here
OPENAI__API_URL=http://localhost:8080/v1
OPENAI__MODEL=openai/gpt-oss-120b

### Evaluator Workers
# ------------------------------------------------------------------------------
# Number of worker processes
EVALUATOR_CPU_REPLICAS=32
EVALUATOR_CUDA_REPLICAS=32

# Redis Connection for Evaluators
# - If running LOCALLY (same machine as app): Leave these commented out or set to 'redis' and '6379'.
# - If running DISTRIBUTED (on a different machine): Set these to the IP/Host and Port of the main server's Redis.
# EVALUATOR_REDIS__HOST=redis
# EVALUATOR_REDIS__PORT=6379


### Deployment Service
# ------------------------------------------------------------------------------
# ID of the docker group on the host machine (get it via `getent group docker | cut -d: -f3`)
DOCKER_GROUP_ID=999
SECURITY__WEBHOOK_SECRET=yoursecretvalue
Expand Down
57 changes: 57 additions & 0 deletions .github/workflows/build-evaluator-images.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: Evaluator Docker Images

on:
pull_request:
merge_group:
workflow_dispatch:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue that we have with the images is not that they break after we change them, but that they break when something external changes, most often apt repositories. So it would be great to run CI periodically to detect that sooner.

One way of doing that is running them always in CI, without file change detection. That has the annoying property that it can break CI for unrelated PRs. Another possibility is to setup a cron, to run this e.g. once a week. I'd go with the cron for now (in addition to the existing triggers).


concurrency:
group: ${{ github.workflow }}-${{ (github.event_name == 'merge_group' || github.event_name == 'workflow_dispatch') && 'build' || github.sha }}
cancel-in-progress: ${{ github.event_name != 'merge_group' }}

jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v6

- name: Detect changes
id: changed-files
uses: tj-actions/changed-files@v47.0.1
with:
files: |
evaluator/images/**
.github/workflows/build-evaluator-images.yml

- name: Set up Docker Buildx
if: steps.changed-files.outputs.any_changed == 'true'
uses: docker/setup-buildx-action@v3
with:
driver: docker

- name: Build images
if: steps.changed-files.outputs.any_changed == 'true'
run: |
python3 evaluator/images/build.py

# Summary job to enable easier handling of required status checks.
# On PRs, we need everything to be green, while deploy jobs are skipped.
# On master, we need everything to be green.
# ALL THE PREVIOUS JOBS NEED TO BE ADDED TO THE `needs` SECTION OF THIS JOB!
conclusion:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conclusion job won't work like this, I think I wrote it to you before. We either have to merge the two workflows, or use a different name for the conclusion job here (e.g. conclusion-images), so that we can configure CI to wait for both jobs to be green.

needs: [ build ]
# We need to ensure this job does *not* get skipped if its dependencies fail,
# because a skipped job is considered a success by GitHub. So we have to
# overwrite `if:`. We use `!cancelled()` to ensure the job does still not get run
# when the workflow is canceled manually.
if: ${{ !cancelled() }}
runs-on: ubuntu-latest
steps:
- name: Conclusion Images
run: |
# Print the dependent jobs to see them in the CI log
jq -C <<< '${{ toJson(needs) }}'
# Check if all jobs that we depend on (in the needs array)
# were either successful or skipped.
jq --exit-status 'all(.result == "success" or .result == "skipped")' <<< '${{ toJson(needs) }}'
19 changes: 8 additions & 11 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ concurrency:
env:
# Configure a constant location for the uv cache
UV_CACHE_DIR: /tmp/.uv-cache
UV_VERSION: "0.9.20"
UV_VERSION: "0.10.0"


jobs:
Expand Down Expand Up @@ -95,6 +95,9 @@ jobs:

test-deployment-service:
runs-on: ubuntu-latest
defaults:
run:
working-directory: deployment_service/

steps:
- name: Checkout sources
Expand All @@ -114,32 +117,26 @@ jobs:
working-directory: "deployment_service"

- name: Install dependencies
working-directory: deployment_service/
run: |
uv sync --frozen

- name: Ruff Linter
working-directory: deployment_service/
run: uv run ruff check --output-format=github

- name: Ruff Formatter
if: success() || failure()
working-directory: deployment_service/
run: uv run ruff format --check

- name: Check lockfile
if: success() || failure()
working-directory: deployment_service/
run: uv lock --locked

- name: MyPy
if: success() || failure()
working-directory: deployment_service/
run: |
uv run mypy --check .

- name: Run tests
working-directory: deployment_service/
run: uv run pytest
env:
SECURITY__WEBHOOK_SECRET: "yoursecretvalue"
Expand All @@ -166,21 +163,21 @@ jobs:
- name: Build Kelvin Docker image
uses: docker/build-push-action@v6
with:
cache-from: type=registry,ref=ghcr.io/mrlvsb/kelvin-ci-cache
cache-from: type=gha
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the switch made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to maintenance of the registry storage (LRU is used there automatically) and also the readability/visibility of that registry. Also this is official way and recommendation from GitHub and Docker.

# Only write the cache in the master branch or workflow_dispatch builds
# https://github.com/docker/build-push-action/issues/845#issuecomment-1512619265
cache-to: ${{ (github.event_name == 'merge_group' || github.event_name == 'workflow_dispatch') && 'type=registry,ref=ghcr.io/mrlvsb/kelvin-ci-cache,compression=zstd' || '' }}
cache-to: ${{ (github.event_name == 'merge_group' || github.event_name == 'workflow_dispatch') && 'type=gha,mode=max' || '' }}
tags: ghcr.io/mrlvsb/kelvin:latest,ghcr.io/mrlvsb/kelvin:${{ github.sha }}
outputs: type=docker,dest=${{ runner.temp }}/kelvin.tar

- name: Build Deployment_service Docker image
uses: docker/build-push-action@v6
with:
context: "{{defaultContext}}:deployment_service"
cache-from: type=registry,ref=ghcr.io/mrlvsb/deployment-ci-cache
cache-from: type=gha
# Only write the cache in the master branch or workflow_dispatch builds
# https://github.com/docker/build-push-action/issues/845#issuecomment-1512619265
cache-to: ${{ (github.event_name == 'merge_group' || github.event_name == 'workflow_dispatch') && 'type=registry,ref=ghcr.io/mrlvsb/deployment-ci-cache,compression=zstd' || '' }}
cache-to: ${{ (github.event_name == 'merge_group' || github.event_name == 'workflow_dispatch') && 'type=gha,mode=max' || '' }}
tags: ghcr.io/mrlvsb/deployment:latest,ghcr.io/mrlvsb/deployment:${{ github.sha }}
outputs: type=docker,dest=${{ runner.temp }}/deployment.tar

Expand Down
9 changes: 3 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
rev: v6.0.0
hooks:
- id: check-yaml
args: [--allow-multiple-documents]
Expand All @@ -18,11 +18,8 @@ repos:
- id: mixed-line-ending
args: [ --fix=lf ]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.7
rev: v0.15.0
hooks:
- id: ruff-format
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.7
hooks:
- id: ruff
- id: ruff-check
args: [ --fix ]
12 changes: 9 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
FROM ghcr.io/astral-sh/uv:python3.12-bookworm AS build-backend
FROM python:3.12-slim-bookworm AS build-backend

COPY --from=ghcr.io/astral-sh/uv:0.10.0 /uv /usr/local/bin/uv

RUN export DEBIAN_FRONTEND=noninteractive && \
apt-get update && \
apt-get install -y \
-o APT::Install-Recommends=false \
-o APT::Install-Suggests=false \
build-essential \
libsasl2-dev \
libgraphviz-dev

Expand All @@ -26,14 +29,15 @@ RUN npm ci

RUN npm run build

FROM python:3.12-bookworm AS runtime
FROM python:3.12-slim-bookworm AS runtime

RUN export DEBIAN_FRONTEND=noninteractive && \
apt-get update && \
apt-get install -y \
-o APT::Install-Recommends=false \
-o APT::Install-Suggests=false \
graphviz && \
graphviz \
libmagic1 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand All @@ -43,6 +47,8 @@ WORKDIR /app
# We want to use ID 1000, to have the same ID as the default outside user
# And we also want group 101, to provide share access to the Unix uWSGI
# socket with the nginx image.
RUN getent group 101 >/dev/null || groupadd -g 101 webserver

RUN useradd --uid 1000 --gid 101 --shell /bin/false --system webserver

RUN chown -R webserver .
Expand Down
2 changes: 1 addition & 1 deletion api/views/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -706,7 +706,7 @@ def set_subject(task):
else:
return JsonResponse(
{
"errors": [f'Invalid task type {data.get("type")}'],
"errors": [f"Invalid task type {data.get('type')}"],
},
status=400,
)
Expand Down
9 changes: 8 additions & 1 deletion common/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import django_rq
import requests
import yaml
from django.conf import settings
from django.core import signing
from django.urls import reverse
from django.utils import timezone
Expand Down Expand Up @@ -101,10 +102,16 @@ def get_meta(login):
def evaluate_job(submit_url, task_url, token, meta):
logging.basicConfig(level=logging.DEBUG)
s = requests.Session()
if settings.DEBUG:
s.verify = False

logging.info(f"Evaluating {submit_url}")

with tempfile.TemporaryDirectory() as workdir:
# Create kelvin subdirectory in system temp (cross-platform)
kelvin_temp = os.path.join(tempfile.gettempdir(), "kelvin")
os.makedirs(kelvin_temp, exist_ok=True)

with tempfile.TemporaryDirectory(dir=kelvin_temp) as workdir:
os.chdir(workdir)

def untar(url, dest):
Expand Down
2 changes: 1 addition & 1 deletion common/event_log.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ class Action(models.TextChoices):
created_at = models.DateTimeField(auto_now_add=True)

def __str__(self):
return f"{self.action} ({self.user.username}) at {self.created_at.strftime("%d. %m. %y %H:%M:%S")} from {self.ip_address}"
return f"{self.action} ({self.user.username}) at {self.created_at.strftime('%d. %m. %y %H:%M:%S')} from {self.ip_address}"

def deserialize(self) -> UserEventBase | None:
shared = dict(
Expand Down
8 changes: 8 additions & 0 deletions common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import django.contrib.auth.models
import requests
from django.conf import settings
from django.http import HttpRequest
from ipware import get_client_ip

Expand Down Expand Up @@ -109,6 +110,13 @@ def download_source_to_path(source_url: str, destination_path: str) -> None:

def build_absolute_uri(request, location):
base_uri = os.getenv("API_INTERNAL_BASEURL", None)

# If the URL is the default Docker-internal one, only use it in DEBUG mode.
# This prevents Production from accidentally using the internal container hostname
# instead of the public domain, unless explicitly forced.
if base_uri == "https://nginx" and not settings.DEBUG:
base_uri = None

if base_uri:
return "".join([base_uri, location])
return request.build_absolute_uri(location)
Loading
Loading