Skip to content

server-cuda images: include sm_120 (RTX 50-series) — needs CUDA_VERSION 12.8+ in the CI build #85

Description

@noonghunna

Follow-up to the Docker CI from #48 — club-3090 pins your official server-cuda-* images for its beellama presets, and RTX 50-series reports are starting to arrive.

Problem: the CI builds with the Dockerfile default ARG CUDA_VERSION=12.4.1, and the CUDA 12.4 toolchain cannot target sm_120 at all (Blackwell needs nvcc from 12.8+). So the published images carry no sm_120 cubin and no compute_120 PTX — a 50-series card has nothing to run or JIT, and users hit kernel-image errors. Verified on the current preview digests (858e7cf… and f229f79…): both cap at compute_90.

Ask: in the GHCR workflow, pass --build-arg CUDA_VERSION=12.8.1 and make the arch list explicit with 120 included — e.g. --build-arg CUDA_DOCKER_ARCH="80;86;89;90;120" (or your current default set + 120). If the runtime bump's driver floor is a concern for older rigs, a parallel server-cuda128-* tag variant would work just as well — 50-series rigs have R570+ drivers by definition.

Why it matters on our side: we currently have to redirect 50-series users to a stale self-built v0.3.0 snapshot image, which forks them off your current feature set (KVarN, the v0.3.1 MTP/KV-quant fixes). With sm_120 in your images — ideally both preview and stable tags, so server-cuda-v0.3.1+ inherit it — we retire that snapshot and pin upstream for every arch, including our planned v0.3.1 stable re-pin.

Happy to validate a test build same-day on 3090s (sm_86), and we have 50-series users among club-3090 reporters who can confirm Blackwell quickly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions