Follow-up to the Docker CI from #48 — club-3090 pins your official server-cuda-* images for its beellama presets, and RTX 50-series reports are starting to arrive.
Problem: the CI builds with the Dockerfile default ARG CUDA_VERSION=12.4.1, and the CUDA 12.4 toolchain cannot target sm_120 at all (Blackwell needs nvcc from 12.8+). So the published images carry no sm_120 cubin and no compute_120 PTX — a 50-series card has nothing to run or JIT, and users hit kernel-image errors. Verified on the current preview digests (858e7cf… and f229f79…): both cap at compute_90.
Ask: in the GHCR workflow, pass --build-arg CUDA_VERSION=12.8.1 and make the arch list explicit with 120 included — e.g. --build-arg CUDA_DOCKER_ARCH="80;86;89;90;120" (or your current default set + 120). If the runtime bump's driver floor is a concern for older rigs, a parallel server-cuda128-* tag variant would work just as well — 50-series rigs have R570+ drivers by definition.
Why it matters on our side: we currently have to redirect 50-series users to a stale self-built v0.3.0 snapshot image, which forks them off your current feature set (KVarN, the v0.3.1 MTP/KV-quant fixes). With sm_120 in your images — ideally both preview and stable tags, so server-cuda-v0.3.1+ inherit it — we retire that snapshot and pin upstream for every arch, including our planned v0.3.1 stable re-pin.
Happy to validate a test build same-day on 3090s (sm_86), and we have 50-series users among club-3090 reporters who can confirm Blackwell quickly.
Follow-up to the Docker CI from #48 — club-3090 pins your official
server-cuda-*images for its beellama presets, and RTX 50-series reports are starting to arrive.Problem: the CI builds with the Dockerfile default
ARG CUDA_VERSION=12.4.1, and the CUDA 12.4 toolchain cannot target sm_120 at all (Blackwell needs nvcc from 12.8+). So the published images carry no sm_120 cubin and nocompute_120PTX — a 50-series card has nothing to run or JIT, and users hit kernel-image errors. Verified on the current preview digests (858e7cf…andf229f79…): both cap atcompute_90.Ask: in the GHCR workflow, pass
--build-arg CUDA_VERSION=12.8.1and make the arch list explicit with120included — e.g.--build-arg CUDA_DOCKER_ARCH="80;86;89;90;120"(or your current default set +120). If the runtime bump's driver floor is a concern for older rigs, a parallelserver-cuda128-*tag variant would work just as well — 50-series rigs have R570+ drivers by definition.Why it matters on our side: we currently have to redirect 50-series users to a stale self-built v0.3.0 snapshot image, which forks them off your current feature set (KVarN, the v0.3.1 MTP/KV-quant fixes). With sm_120 in your images — ideally both preview and stable tags, so
server-cuda-v0.3.1+ inherit it — we retire that snapshot and pin upstream for every arch, including our planned v0.3.1 stable re-pin.Happy to validate a test build same-day on 3090s (sm_86), and we have 50-series users among club-3090 reporters who can confirm Blackwell quickly.