feat: kernel hub introduction draft #2777

drbh · 2025-03-28T18:55:28Z

This PR is an early draft for an introduction to the kernel hub

TODO

review post
edit/improve/expand topics?
add disclaimer about pre stable version changes
**separately draft a post about the kernel-builder to showcase kernel creation/publishing to the hub

pcuenca

Nice, looking great! I did a quick early pass, feel free to ping again when you want!

pcuenca · 2025-03-31T12:42:01Z

assets/hello-hf-kernels/kernel-hub-five-mins-short.png

Nice! But too wide I think, it will be cropped at the sides possibly hiding part of the title. The recommended aspect ratio is 2:1.

thanks! updated to be 2:1 in the latest commits

pcuenca · 2025-03-31T12:42:24Z

hello-hf-kernels.md

Reminder that we also have to add an entry to _blog.yml when you are ready to submit.

oh thanks for the tip, added an entry in the latest commit (and will make sure to bump when the article is ready)

pcuenca · 2025-03-31T12:44:20Z

hello-hf-kernels.md

+thumbnail: /blog/assets/hello-hf-kernels/kernel-hub-five-mins-short.png
+authors:
+- user: drbh
+date: 2025-03-28 


Date goes in _blog.yml using a format like "March 28, 2025"

thanks! updated in the latest commits

pcuenca · 2025-03-31T12:45:19Z

hello-hf-kernels.md

+
+# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes
+
+**Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.**


Suggested change

**Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.**

**Boost your model performance with pre-optimized kernels, easily loaded from the Hub.**

Maybe, for simplification?

thanks! updated in the latest commits

pcuenca · 2025-03-31T12:46:50Z

hello-hf-kernels.md

+
+**Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.**
+
+Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub aims to simplify this dramatically.


Suggested change

Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub aims to simplify this dramatically.

Today, we'll explore an exciting development from Hugging Face: the **Kernel Hub**! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub simplifies this process dramatically!

oh this is better, updated in latest commit

pcuenca · 2025-03-31T13:38:45Z

hello-hf-kernels.md

+expected = torch.tensor(
+    [
+        [0.1100, 2.1309, -0.0700, 0.6802],
+        [-0.0500, 0.4800, -0.1700, -0.1700],
+        [0.3701, -0.1300, -0.0800, -0.1200],
+        [-0.0400, 0.1200, -0.1500, 1.7998],
+    ],
+    dtype=torch.float16,
+    device=DEVICE,
+)


Perhaps an alternative could be to retrieve the reference results from PyTorch's gelu?

yea agreed that is a better example, updated in latest commit

pcuenca · 2025-03-31T13:42:29Z

hello-hf-kernels.md

+
+## 2. How to Use the Kernel Hub (Basic Example)
+
+Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example loading an optimized GELU activation function kernel (we'll use a different kernel for the main example later).


Suggested change

Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example loading an optimized GELU activation function kernel (we'll use a different kernel for the main example later).

Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example that loads an optimized GELU activation function kernel. (Later on, we'll see another example about how to integrate a kernel in our model).

thanks this reads better, updated in latest

pcuenca · 2025-03-31T13:47:41Z

hello-hf-kernels.md

+
+**Important Notes on the `KernelModel`:**
+* **Kernel Inheritance:** The `KernelRMSNorm` class inherits from `layer_norm_kernel_module.layers.LlamaRMSNorm`, which is the RMSNorm implementation in the kernel. This allows us to use the optimized kernel directly.
+* **Accessing the Function:** The exact way to access the RMSNorm function (`layer_norm_kernel_module.layers.LlamaRMSNorm.forward`, `layer_norm_kernel_module.rms_norm_forward`, or something else) **depends entirely on how the kernel creator structured the repository on the Hub.** You may need to inspect the loaded `layer_norm_kernel_module` object (e.g., using `dir()`) or check the kernel's documentation on the Hub to find the correct function/method and its signature. I've used `rms_norm_forward` as a plausible placeholder and added error handling.


Would be nice if we can point to some kernel documentation (in the kernel's model card in the Hub) by the time this is published :) This could encourage others to adopt some common structure for kernel description / docs.

agreed! currently there is a effort to generate some useful docs started here huggingface/kernel-builder#89 however this is still a work in progress and should be updated before publishing

TODO

improve docs across all existing examples (probably autogen)

pcuenca · 2025-03-31T13:48:40Z

hello-hf-kernels.md

+from snippet2 import BaselineModel
+from snippet3 import KernelModel


We should introduce the script name before each snippet, I think.

good point, updated to have meaningful names and use them in the scripts in latest

pcuenca · 2025-03-31T13:54:27Z

hello-hf-kernels.md

+
+# Download optimized activation kernels from the Hub
+# This fetches the kernel code if not already cached
+activation_kernels = get_kernel("kernels-community/activation")


Super cool! Would something like this (different kernel) be automatically resolved? Do we want to talk (in a later section) about what happens if there's no match?

pagezyhf · 2025-04-01T12:11:52Z

hello-hf-kernels.md

+
+### Benefits of the Kernel Hub:
+
+* **Instant Access to Optimized Kernels**: Load and run kernels optimized for various hardware (like NVIDIA GPUs) without local compilation hassles.


Suggested change

* **Instant Access to Optimized Kernels**: Load and run kernels optimized for various hardware (like NVIDIA GPUs) without local compilation hassles.

* **Instant Access to Optimized Kernels**: Load and run kernels optimized for various hardware starting with NVIDIA and AMD GPUs, without local compilation hassles.

thanks! updated in the latest commits

pagezyhf · 2025-04-01T12:14:44Z

hello-hf-kernels.md

+    ~~~bash
+    pip install kernels torch numpy
+    ~~~
+    Ensure you have a compatible PyTorch version and CUDA installed if using GPU kernels.


Can we make this hardware agnostic for AMD?

good catch, i've updated the phrasing to avoid "CUDA" in the latest commit

danieldk · 2025-04-02T11:21:50Z

hello-hf-kernels.md

+
+## 1. What is the Kernel Hub?
+
+The [Kernel Hub](https://huggingface.co/kernels) (👈 Check it out!) allows Python libraries and applications to **load optimized compute kernels directly from the Hugging Face Hub**. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs. Examples include optimized attention mechanisms (like FlashAttention), activation functions, and normalization layers (like LayerNorm or RMSNorm).


I think it would be better to mention some challenging kernels here. I think activation and normalization kernels are usually pretty good in frameworks. Maybe, attention mechanisms, quantizers, and Mixture of Expert layers?

good point, updated to include some more impactful/useful examples. thanks!

danieldk · 2025-04-02T11:24:46Z

hello-hf-kernels.md

+# Ensure you have a CUDA-enabled device
+if not torch.cuda.is_available():
+    raise RuntimeError("This example requires a CUDA-enabled GPU")


Let me upload the activation kernel for ROCm as well. I think the example is stronger if we can show something that works with both CUDA and ROCm.

https://huggingface.co/kernels-community/activation/tree/main/build/torch26-cxx11-rocm62-x86_64-linux

Built, running validation tests now...

All tests pass.

wooo amazing, thank you!

danieldk · 2025-04-02T11:27:01Z

hello-hf-kernels.md

+if not torch.cuda.is_available():
+    raise RuntimeError("This example requires a CUDA-enabled GPU")


I think the Triton kernel should also work with ROCm? Worth trying.

awesome, thanks for building/testing! removed torch.cuda.. in the latest commit

danieldk · 2025-04-02T11:31:55Z

hello-hf-kernels.md

+layer_norm_kernel_module = get_kernel("kernels-community/triton-layer-norm")
+
+
+class KernelRMSNorm(layer_norm_kernel_module.layers.LlamaRMSNorm):
+    def __init__(self, hidden_size, variance_epsilon=1e-5):
+        super().__init__()
+        self.weight = nn.Parameter(torch.ones(hidden_size))
+        self.variance_epsilon = variance_epsilon


We want people to use @use_kernel_forward_from_hub to annotate the Torch class and then register LlamaRMSNorm using a mapping. See: https://github.com/huggingface/kernels/blob/main/docs/layers.md

Using @use_kernel_forward_from_hub enables people to make layers that are (dynamically) extensible with kernels, people can replace kernels, etc.

ah yea great point! I've updated the code to prefer adding @use_kernel_forward_from_hub("LlamaRMSNorm") to the RMSNorm defined in the reference example (and added some descriptive comments).

danieldk · 2025-04-02T11:33:35Z

hello-hf-kernels.md

+    ):
+        super().__init__()
+        self.linear1 = nn.Linear(input_size, hidden_size)
+        self.norm = KernelRMSNorm(hidden_size, variance_epsilon=eps)


With @use_kernel_forward_from_hub, you don't need this. The model doesn't need any change to use kernels, the model writer or the user can map kernels externally.

this has been updated in the latest commit along with the larger change to prefer using the use_kernel_forward_from_hub decorator in the example. thanks!

feat: kernel hub introduction draft

50efdeb

pcuenca reviewed Mar 31, 2025

View reviewed changes

pagezyhf reviewed Apr 1, 2025

View reviewed changes

danieldk reviewed Apr 2, 2025

View reviewed changes

drbh added 2 commits April 4, 2025 10:28

feat: address edit comments and improve examples

092e484

fix: adjust image size

292fba4


		# 🏎️ Learn the Hugging Face Kernel Hub in 5 Minutes

		Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.

	Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.
	Boost your model performance with pre-optimized kernels, easily loaded from the Hub.


		Unlock performance boosts for your models with pre-optimized compute kernels, easily loaded from the Hub.

		Today, we'll explore an exciting development from Hugging Face: the Kernel Hub! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub aims to simplify this dramatically.


		## 2. How to Use the Kernel Hub (Basic Example)

		Using the Kernel Hub is designed to be straightforward. The `kernels` library provides the main interface. Here's a quick example loading an optimized GELU activation function kernel (we'll use a different kernel for the main example later).

		from snippet2 import BaselineModel
		from snippet3 import KernelModel


		### Benefits of the Kernel Hub:

		* Instant Access to Optimized Kernels: Load and run kernels optimized for various hardware (like NVIDIA GPUs) without local compilation hassles.


		## 1. What is the Kernel Hub?

		The [Kernel Hub](https://huggingface.co/kernels) (👈 Check it out!) allows Python libraries and applications to load optimized compute kernels directly from the Hugging Face Hub. Think of it like the Model Hub, but for low-level, high-performance code snippets (kernels) that accelerate specific operations, often on GPUs. Examples include optimized attention mechanisms (like FlashAttention), activation functions, and normalization layers (like LayerNorm or RMSNorm).

		if not torch.cuda.is_available():
		raise RuntimeError("This example requires a CUDA-enabled GPU")

feat: kernel hub introduction draft #2777

Are you sure you want to change the base?

feat: kernel hub introduction draft #2777

Conversation

drbh commented Mar 28, 2025

TODO

pcuenca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TODO

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danieldk Apr 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danieldk Apr 2, 2025 •

edited

Loading