Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,34 @@
# Kelvin
tasks/
submits/
submit_results/
kelvin_data/

# Python
.venv/
__pycache__/
*.py[cod]
*.pyd
*.pyo
*.so
.pytest_cache/
.mypy_cache/
.ruff_cache/
.coverage
htmlcov/

# Node
node_modules/
**/dist/
**/.vite/

# VCS / tooling
.git/

# Logs
**/*.log

# Editor
.vscode/
.idea/
.DS_Store
20 changes: 18 additions & 2 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
### Kelvin
# ------------------------------------------------------------------------------

# !!! IMPORTANT: For Production deployments using Deployment Service, all file paths must be specified as absolute due to use of DooD (Docker out of Docker)

Expand All @@ -12,6 +13,9 @@ KELVIN__TASKS_PATH=./tasks
KELVIN__SUBMITS_PATH=./submits
# Path where submit results will be stored
KELVIN__SUBMIT_RESULTS_PATH=./submit_results
# (Optional) Internal URL used by workers. Defaults to https://nginx when running locally with Docker;
# otherwise defaults to the request URL in production or non-Docker local environments.
# API_INTERNAL_BASEURL=https://custom-internal-url

### Postgres
DATABASE__HOST=127.0.0.1
Expand Down Expand Up @@ -40,9 +44,21 @@ OPENAI__API_KEY=your_openai_api_key_here
OPENAI__API_URL=http://localhost:8080/v1
OPENAI__MODEL=openai/gpt-oss-120b

### Evaluator Workers
# ------------------------------------------------------------------------------
# Number of worker processes
EVALUATOR_CPU_REPLICAS=32
EVALUATOR_CUDA_REPLICAS=32

# Redis Connection for Evaluators
# - If running LOCALLY (same machine as app): Leave these commented out or set to 'redis' and '6379'.
# - If running DISTRIBUTED (on a different machine): Set these to the IP/Host and Port of the main server's Redis.
# EVALUATOR_REDIS__HOST=redis
# EVALUATOR_REDIS__PORT=6379


### Deployment Service
# ID of the docker group on the host machine (get it via `getent group docker | cut -d: -f3`)
DOCKER_GROUP_ID=999
# ------------------------------------------------------------------------------
SECURITY__WEBHOOK_SECRET=yoursecretvalue
SECURITY__ALLOWED_HOSTS=["localhost", "127.0.0.1", "nginx", "kelvin.cs.vsb.cz"]

Expand Down
50 changes: 46 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -175,14 +175,25 @@ jobs:
uses: docker/build-push-action@v6
with:
context: "{{defaultContext}}:deployment_service"
cache-from: type=registry,ref=ghcr.io/mrlvsb/deployment-ci-cache
# Only write the cache in the master branch or workflow_dispatch builds
cache-from: type=gha
# Only write the cache in the merge_group or workflow_dispatch builds
# https://github.com/docker/build-push-action/issues/845#issuecomment-1512619265
cache-to: ${{ (github.event_name == 'merge_group' || github.event_name == 'workflow_dispatch') && 'type=registry,ref=ghcr.io/mrlvsb/deployment-ci-cache,compression=zstd' || '' }}
cache-to: ${{ (github.event_name == 'merge_group' || github.event_name == 'workflow_dispatch') && 'type=gha,mode=max' || '' }}
tags: ghcr.io/mrlvsb/deployment:latest,ghcr.io/mrlvsb/deployment:${{ github.sha }}
outputs: type=docker,dest=${{ runner.temp }}/deployment.tar

- name: Share Kelvin built image
- name: Build Kelvin-Evaluator Docker image
uses: docker/build-push-action@v6
with:
target: evaluator
cache-from: type=gha
# Only write the cache in the merge_group or workflow_dispatch builds
# https://github.com/docker/build-push-action/issues/845#issuecomment-1512619265
cache-to: ${{ (github.event_name == 'merge_group' || github.event_name == 'workflow_dispatch') && 'type=gha,mode=max' || '' }}
tags: ghcr.io/mrlvsb/kelvin-evaluator:latest,ghcr.io/mrlvsb/kelvin-evaluator:${{ github.sha }}
outputs: type=docker,dest=${{ runner.temp }}/kelvin-evaluator.tar

- name: Share built image
uses: actions/upload-artifact@v6
with:
name: kelvin
Expand All @@ -196,6 +207,13 @@ jobs:
path: ${{ runner.temp }}/deployment.tar
retention-days: 1

- name: Share Kelvin-Evaluator image
uses: actions/upload-artifact@v6
with:
name: kelvin-evaluator
path: ${{ runner.temp }}/kelvin-evaluator.tar
retention-days: 1

build-docs:
runs-on: ubuntu-latest
steps:
Expand Down Expand Up @@ -266,6 +284,12 @@ jobs:
name: deployment
path: ${{ runner.temp }}

- name: Download Kelvin-Evaluator image
uses: actions/download-artifact@v6
with:
name: kelvin-evaluator
path: ${{ runner.temp }}

- name: Load image
id: load_image
run: |
Expand All @@ -276,6 +300,12 @@ jobs:
echo "$LOADED"
SHA_TAG=$(echo "$LOADED" | grep -v ':latest' | awk '{print $3}')
echo "app_image_tag=$SHA_TAG" >> $GITHUB_OUTPUT

LOADED_EVAL=$(docker load --input ${{ runner.temp }}/kelvin-evaluator.tar)
echo "$LOADED_EVAL"
SHA_TAG_EVAL=$(echo "$LOADED_EVAL" | grep -v ':latest' | awk '{print $3}')
echo "evaluator_image_tag=$SHA_TAG_EVAL" >> $GITHUB_OUTPUT

if [ "${{ steps.changed-files-deployment.outputs.any_changed }}" = "true" ]; then
docker load --input ${{ runner.temp }}/deployment.tar
fi
Expand All @@ -291,6 +321,9 @@ jobs:
- name: Push Docker image with SHA tag
run: docker push ${{ steps.load_image.outputs.app_image_tag }}

- name: Push Kelvin-Evaluator Docker image with SHA tag
run: docker push ${{ steps.load_image.outputs.evaluator_image_tag }}

- name: Trigger on-prem deployment
run: |
python3 deployment_service/deploy.py \
Expand All @@ -306,6 +339,9 @@ jobs:
- name: Push Kelvin Docker image with latest tag
run: docker push ghcr.io/mrlvsb/kelvin:latest

- name: Push Kelvin Evaluator Docker image with latest tag
run: docker push ghcr.io/mrlvsb/kelvin-evaluator:latest

- name: Push Deployment_service Docker image with all tags
if: steps.changed-files-deployment.outputs.any_changed == 'true'
run: docker push --all-tags ghcr.io/mrlvsb/deployment
Expand All @@ -323,6 +359,12 @@ jobs:
package-type: 'container'
min-versions-to-keep: 15

- uses: actions/delete-package-versions@v5
with:
package-name: 'kelvin-evaluator'
package-type: 'container'
min-versions-to-keep: 15

deploy-docs:
runs-on: ubuntu-latest
needs: [ build-docs ]
Expand Down
55 changes: 52 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
FROM ghcr.io/astral-sh/uv:python3.12-bookworm AS build-backend
FROM python:3.12-slim-bookworm AS build-backend

COPY --from=ghcr.io/astral-sh/uv:0.10.0 /uv /usr/local/bin/uv

RUN export DEBIAN_FRONTEND=noninteractive && \
apt-get update && \
apt-get install -y \
-o APT::Install-Recommends=false \
-o APT::Install-Suggests=false \
build-essential \
libsasl2-dev \
libgraphviz-dev

Expand All @@ -26,14 +29,15 @@ RUN npm ci

RUN npm run build

FROM python:3.12-bookworm AS runtime
FROM python:3.12-slim-bookworm AS runtime

RUN export DEBIAN_FRONTEND=noninteractive && \
apt-get update && \
apt-get install -y \
-o APT::Install-Recommends=false \
-o APT::Install-Suggests=false \
graphviz && \
graphviz \
libmagic1 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Expand All @@ -43,6 +47,8 @@ WORKDIR /app
# We want to use ID 1000, to have the same ID as the default outside user
# And we also want group 101, to provide share access to the Unix uWSGI
# socket with the nginx image.
RUN getent group 101 >/dev/null || groupadd -g 101 webserver

RUN useradd --uid 1000 --gid 101 --shell /bin/false --system webserver

RUN chown -R webserver .
Expand Down Expand Up @@ -72,3 +78,46 @@ COPY --chown=webserver deploy/entrypoint.sh ./
STOPSIGNAL SIGINT

ENTRYPOINT ["/app/entrypoint.sh"]

FROM runtime AS evaluator

# Switch temporary to root user to install Docker CLI and other system dependencies
USER root

RUN export DEBIAN_FRONTEND=noninteractive && \
apt-get update && \
apt-get install -y \
-o APT::Install-Recommends=false \
-o APT::Install-Suggests=false \
ca-certificates \
curl \
procps && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN mkdir -p /etc/apt/keyrings && \
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
RUN chmod a+r /etc/apt/keyrings/docker.asc

RUN echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null

RUN export DEBIAN_FRONTEND=noninteractive && \
apt-get update && \
apt-get install -y \
-o APT::Install-Recommends=false \
-o APT::Install-Suggests=false \
docker-ce docker-ce-cli containerd.io docker-compose-plugin && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

USER webserver

COPY --chown=webserver evaluator/evaluator-entrypoint.sh /app/evaluator-entrypoint.sh

ENTRYPOINT ["/app/evaluator-entrypoint.sh"]
CMD ["rqworker", "default", "evaluator", "--with-scheduler"]
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD pgrep -f "rqworker" || exit 1
13 changes: 13 additions & 0 deletions common/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import django_rq
import requests
import yaml
from django.conf import settings
from django.core import signing
from django.urls import reverse
from django.utils import timezone
Expand Down Expand Up @@ -102,6 +103,18 @@ def evaluate_job(submit_url, task_url, token, meta):
logging.basicConfig(level=logging.DEBUG)
s = requests.Session()

# Disable SSL verification in DEBUG mode (local Docker development environment).
#
# EXPLANATION:
# In the local Docker development environment (DEBUG=True), the services communicate
# via internal Docker network names (e.g. 'https://nginx').
# The Nginx service uses self-signed certificates for HTTPS.
# Since these certificates are not issued by a trusted Certificate Authority (CA),
# requests would fail with an SSL error. Disabling verification allows
# the evaluator to download submissions and upload results in this dev environment.
if settings.DEBUG:
s.verify = False

logging.info(f"Evaluating {submit_url}")

with tempfile.TemporaryDirectory() as workdir:
Expand Down
21 changes: 21 additions & 0 deletions common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import django.contrib.auth.models
import requests
from django.conf import settings
from django.http import HttpRequest
from ipware import get_client_ip

Expand Down Expand Up @@ -109,6 +110,26 @@ def download_source_to_path(source_url: str, destination_path: str) -> None:

def build_absolute_uri(request, location):
base_uri = os.getenv("API_INTERNAL_BASEURL", None)

# If the URL is the default Docker-internal one ('https://nginx'), only use it in DEBUG mode which means it is local development.
#
# EXPLANATION:
# 1. In Docker, 'localhost' inside a container refers to the container itself, not the host machine.
# Therefore, to reach the Nginx container from the App container, we must use the service name 'nginx',
# which Docker's internal DNS resolves to the Nginx container's IP address.
# 2. However, this internal URL (https://nginx/...) is only accessible within the Docker network.
# It MUST NOT be used in Production for generating links sent to users (e.g. emails) or redirects,
# as users cannot resolve 'nginx' or access the private Docker network.
#
# If we are in Production (DEBUG=False) and the env var is still set to the default 'https://nginx',
# we explicitly unset it (set to None).
#
# This ensures that we do not mistakenly use the internal Docker URL 'https://nginx' in production.
# Instead, the code will fall back to using 'request.build_absolute_uri(location)', which constructs
# the URL using the public hostname from the incoming HTTP request.
if base_uri == "https://nginx" and not settings.DEBUG:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more explanation somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think that I understand the issue now. There are a few things that are not ideal here:

  1. We require HTTPS in nginx in local deployment (but this is more related to the s.verify = False thing, not this, though it is also related to this
  2. We set the default value of the internal API to nginx; it is not clear if that is indeed a good default, especially given that we don't really expect to run the evaluators in the same Docker network as the web in production.
  3. This function looks like it should be used for any general URL generation, but in reality it is only used for evaluators.

So I would suggest this:

  1. Put API_INTERNAL_BASEURL=https://nginx to .env.example, so that it is used by default in local Docker deployment, without it being specified in docker-compose.yml
  2. Remove the :-https://nginx default in docker-compose.yml; make the variable required
  3. Move the variable loading to settings.py, an environment variable shouldn't be accessed randomly in one of Kelvin's functions, but rather be centralized in the website configuration
  4. Rename the variable e.g. to EVALUATION_LINK_BASEURL
  5. Only use the variable when generating an URL for evaluators (including LLM evaluation), not for "normal" public URL links, such as for e-mails. I think that this actually already happens, as the non-evaluator part of Kelvin uses request.build_absolute_uri directly. So it is enough to rename build_absolute_uri to e.g. build_evaluation_download_uri, or something like that, to make it clear for what it should (and shouldn't!) be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed mostly based on your suggestions, please review it, if it is acceptable

base_uri = None

if base_uri:
return "".join([base_uri, location])
return request.build_absolute_uri(location)
Loading
Loading