diff --git a/.github/workflows/create-challenge.yml b/.github/workflows/create-challenge.yml
index add4231c..dd08c5cd 100644
--- a/.github/workflows/create-challenge.yml
+++ b/.github/workflows/create-challenge.yml
@@ -73,6 +73,15 @@ jobs:
 
             Pick the next available challenge number that is NOT already taken by a merged challenge or an open PR. Also avoid creating a challenge on the same topic as any pending PR, even if the number differs.
 
+            THEME — REAL-WORLD INFERENCE KERNELS:
+            Focus on challenges inspired by real-world ML inference workloads. Think about the building blocks of modern neural networks (transformers, diffusion models, LLMs, vision models) and the GPU kernels that make them fast. Good examples:
+            - Fused inference kernels: fused SwiGLU/GeGLU MLP blocks, flash attention, paged attention, speculative decoding verification, quantized matmul (INT8/INT4), fused QKV projection, KV-cache updates
+            - Sequence/token operations: top-k/top-p sampling, beam search step, KV-cache rotation, causal masking
+            - Model architecture blocks: full transformer decoder block (like the existing GPT-2 challenge), mixture-of-experts routing, LoRA forward pass
+            - Online/streaming algorithms: online softmax, streaming attention (process new queries without storing entire rows), continuous batching, prefix caching
+
+            Look at `challenges/medium/74_gpt2_block/` as the gold standard for this style of challenge. The solver should implement a meaningful, self-contained inference building block — not a toy operation.
+
             HARD RULES:
             - Do NOT create trivial element-wise challenges. We have way too many (sigmoid, relu, silu, clipping, etc). If your idea is just "apply f(x) to every element", pick something else.
             - Do NOT duplicate existing challenges — check both the merged challenges in the repo AND the open PRs listed above.