apache · pintusoliya · Nov 24, 2025 · Nov 24, 2025 · Nov 27, 2025 · Nov 27, 2025
diff --git a/website/blog/2025-08-29-building-a-rag-based-ai-recommender-2.mdx b/website/blog/2025-08-29-building-a-rag-based-ai-recommender-2.mdx
@@ -2,6 +2,7 @@
 title: "Building a RAG-based AI Recommender (2/2)"
 author: Shiyan Xu
 category: blog
+subCategory: indexing
 image: /assets/images/blog/2025-08-29-building-a-rag-based-ai-recommender-2.jpg
 tags:
 - blog

diff --git a/website/blog/2025-09-17-hudi-auto-gen-keys.mdx b/website/blog/2025-09-17-hudi-auto-gen-keys.mdx
@@ -3,6 +3,7 @@ title: "Automatic Record Key Generation in Apache Hudi"
 excerpt: ""
 author: Shiyan Xu
 category: blog
+subCategory: security
 image: /assets/images/blog/2025-09-17-hudi-auto-gen-keys/2025-09-17-hudi-auto-gen-keys.fig2.jpg
 tags:
 - hudi
@@ -19,8 +20,8 @@ By using a primary key that is stable across record movement, a system can effic
 
 Apache Hudi was the first lakehouse storage project to introduce the notion of record keys. For mutable workloads, this addressed a significant architectural challenge. In a typical data lake table, updating records usually required rewriting entire partitions—a process that is slow and expensive. By supporting the record key as the stable identifier for every record, Hudi offered unique and advanced capabilities among lakehouse frameworks:
 
-* Hudi supports [record-level indexing](https://hudi.apache.org/blog/2023/11/01/record-level-index/) for directly locating records in [file groups](https://hudi.apache.org/docs/storage_layouts) for highly efficient upserts and queries, and [secondary indexes](https://hudi.apache.org/blog/2025/04/02/secondary-index/) that enable performant lookups for predicates on non-record key fields.  
-* Hudi implements [merge modes](https://hudi.apache.org/blog/2025/03/03/record-mergers-in-hudi/), standardizing record-merging semantics to handle requirements such as unordered events, duplicate records, and custom merge logic.  
+* Hudi supports [record-level indexing](https://hudi.apache.org/blog/2023/11/01/record-level-index/) for directly locating records in [file groups](https://hudi.apache.org/docs/storage_layouts) for highly efficient upserts and queries, and [secondary indexes](https://hudi.apache.org/blog/2025/04/02/secondary-index/) that enable performant lookups for predicates on non-record key fields.
+* Hudi implements [merge modes](https://hudi.apache.org/blog/2025/03/03/record-mergers-in-hudi/), standardizing record-merging semantics to handle requirements such as unordered events, duplicate records, and custom merge logic.
 * By materializing record keys along with other [record-level meta-fields](https://www.onehouse.ai/blog/hudi-metafields-demystified), Hudi unlocks features such as efficient [change data capture (CDC)](https://hudi.apache.org/blog/2024/07/30/data-lake-cdc/) that serves record-level change streams, near-infinite history for time-travel queries, and the [clustering table service](https://hudi.apache.org/docs/clustering) that can significantly optimize file sizes.
 
 <figure>
@@ -61,10 +62,10 @@ In this example, you’re creating a Copy-on-Write table partitioned by `city`.
 
 Designing a key generation mechanism that operates efficiently at petabyte scale requires careful thought. We established five core requirements for the auto-generated keys:
 
-1. **Global Uniqueness:** Keys must be unique across the entire table to maintain the integrity of a primary key.  
-2. **Low Storage Footprint:** The keys should be highly compressible to add minimal storage overhead.  
-3. **Computational Efficiency:** The encoding and decoding process must be lightweight so as not to slow down the write process.  
-4. **Idempotency:** The generation process must be resilient to task retries, producing the same key for the same record every time.  
+1. **Global Uniqueness:** Keys must be unique across the entire table to maintain the integrity of a primary key.
+2. **Low Storage Footprint:** The keys should be highly compressible to add minimal storage overhead.
+3. **Computational Efficiency:** The encoding and decoding process must be lightweight so as not to slow down the write process.
+4. **Idempotency:** The generation process must be resilient to task retries, producing the same key for the same record every time.
 5. **Engine Agnostic:** The logic must be reusable and implemented consistently across different execution engines like Spark and Flink.
 
 These principles guided the technical design. To align with primary key semantics, global uniqueness was non-negotiable. To minimize storage footprint, the generated keys needed to be compact and highly compressible, especially for tables with billions of records. The computational cost was also critical; any expensive operation would be amplified by the number of records, creating a significant performance overhead. Furthermore, in distributed systems where task failures and retries are common, the key generation process had to be idempotent—ensuring the same input record always produces the exact same key. Finally, the solution needed to be engine-agnostic to provide consistent behavior, whether data is written via Spark, Flink, or another supported engine.
@@ -79,8 +80,8 @@ Based on the requirements mentioned previously, we eliminated several common ID
 
 Each component serves a specific purpose:
 
-* **Write Action Start Time:** The timestamp from the Hudi timeline that marks the beginning of a write transaction.  
-* **Workload Partition ID:** An internal identifier that execution engines use to track the specific data split being processed by a given distributed write task.  
+* **Write Action Start Time:** The timestamp from the Hudi timeline that marks the beginning of a write transaction.
+* **Workload Partition ID:** An internal identifier that execution engines use to track the specific data split being processed by a given distributed write task.
 * **Record Sequence ID:** A counter that uniquely identifies each record within that data split.
 
 Together, these three components—all readily accessible during the write process—form a record identifier that satisfies the requirements of global uniqueness, idempotency, and being engine-agnostic.

diff --git a/website/blog/2025-10-02-Real-Time-Cloud-Security-Graphs-Hudi+PuppyGraph.mdx b/website/blog/2025-10-02-Real-Time-Cloud-Security-Graphs-Hudi+PuppyGraph.mdx
@@ -3,6 +3,7 @@ title: "Real-Time Cloud Security Graphs with Apache Hudi and PuppyGraph"
 excerpt: "Hudi tables support fast upserts and incremental processing. PuppyGraph queries relationships in place using openCypher or Gremlin. In this blog, we explore how to get started with real-time security graph analytics at scale using the data already stored in your Hudi lakehouse tables."
 author: Jaz Samantha Ku, in collaboration with Shiyan Xu
 category: blog
+subCategory: use case
 image: /assets/images/blog/2025-10-02-Real-Time-Cloud-Security-Graphs-Hudi+PuppyGraph/fig-4-Sample-Architecture-of-PuppyGraph-Hudi.png
 tags:
 - Apache Hudi
@@ -16,8 +17,8 @@ Security tools such as SIEM, CSPM, and cloud workload protection need relationsh
 
 To keep up, the data pipeline must support:
 
-* Continuous upserts with low lag so detections run on the latest state  
-* Incremental consumption so analytics read only “what changed since T”  
+* Continuous upserts with low lag so detections run on the latest state
+* Incremental consumption so analytics read only “what changed since T”
 * A rewindable timeline so responders can review state during investigations
 
 With Apache Hudi and PuppyGraph, this becomes straightforward. Hudi tables support fast upserts and incremental processing. PuppyGraph queries relationships in place using openCypher or Gremlin. In this blog, we explore how to get started with real-time security graph analytics at scale using the data already stored in your Hudi lakehouse tables.
@@ -81,19 +82,19 @@ Getting started is straightforward. You will deploy the stack, load security dat
 
 The components of this demo project include:
 
-* Storage: MinIO/S3 – Object store for Hudi data  
-* Data Lakehouse: Apache Hudi – Brings database functionality to your data lakes  
-* Catalog: Hive Metastore – Backed by Postgres  
-* Compute engines:  
-  * Spark – Initial table writes  
+* Storage: MinIO/S3 – Object store for Hudi data
+* Data Lakehouse: Apache Hudi – Brings database functionality to your data lakes
+* Catalog: Hive Metastore – Backed by Postgres
+* Compute engines:
+  * Spark – Initial table writes
   * PuppyGraph – Graph query engine for complex, multi-hop graph queries
 
 ### Prerequisites
 
 This tutorial assumes that you have the following:
 
-1. **Docker** and **Docker** **Compose** (for setting up the Docker container)  
-2. **Python 3** (for managing dependencies)  
+1. **Docker** and **Docker** **Compose** (for setting up the Docker container)
+2. **Python 3** (for managing dependencies)
 3. [PuppyGraph-Hudi Demo Repository](https://github.com/puppygraph/puppygraph-getting-started/tree/main/integration-demos/hudi-demo)
 
 #### Data Preparation
@@ -137,7 +138,7 @@ docker compose exec spark /opt/spark/bin/spark-sql -f /init.sql
 
 #### Modeling the Graph
 
-Now that our data is loaded in, we can log into the PuppyGraph Web UI at [http://localhost:8081](http://localhost:8081) with the default credentials (username: puppygraph, password: puppygraph123)  
+Now that our data is loaded in, we can log into the PuppyGraph Web UI at [http://localhost:8081](http://localhost:8081) with the default credentials (username: puppygraph, password: puppygraph123)
 
 <figure>
 ![](/assets/images/blog/2025-10-02-Real-Time-Cloud-Security-Graphs-Hudi+PuppyGraph/fig-5-PuppyGraph-Login-Page.png)
@@ -162,9 +163,9 @@ Once you see your graph schema loaded in, you’re ready to start querying your
 
 By modeling the network infrastructure as a graph, users can identify potential security risks, such as:
 
-* Public IP addresses exposed to the internet  
-* Network interfaces not protected by any security group  
-* Roles granted excessive access permissions  
+* Public IP addresses exposed to the internet
+* Network interfaces not protected by any security group
+* Roles granted excessive access permissions
 * Security groups with overly permissive ingress rules
 
 Listed below are some sample queries you can try running to explore the data:

diff --git a/...-16-Modernizing-Upstox-Data-Platform-with-Apache-Hudi-DBT-and-EMR-Serverless.md b/...-16-Modernizing-Upstox-Data-Platform-with-Apache-Hudi-DBT-and-EMR-Serverless.md
@@ -3,6 +3,7 @@ title: "Modernizing Upstox's Data Platform with Apache Hudi, dbt, and EMR Server
 excerpt: ""
 author: The Hudi Community
 category: blog
+subCategory: data lake
 image: /assets/images/blog/2025-10-16-Modernizing-Upstox-Data-Platform-with-Apache-Hudi-DBT-and-EMR-Serverless/fig1.png
 tags:
 - hudi

diff --git a/website/blog/2025-10-22-Partition_Stats_Enhancing_Column_Stats_in_Hudi_1.0.md b/website/blog/2025-10-22-Partition_Stats_Enhancing_Column_Stats_in_Hudi_1.0.md
@@ -3,6 +3,7 @@ title: "Partition Stats: Enhancing Column Stats in Hudi 1.0"
 excerpt: ""
 author: Aditya Goenka and Shiyan Xu
 category: blog
+subCategory: lakehouse
 image: /assets/images/blog/2025-10-22-Partition_Stats_Enhancing_Column_Stats_in_Hudi_1.0/fig1.jpg
 tags:
 - hudi

diff --git a/website/blog/2025-10-29-deep-dive-into-hudis-indexing-subsystem-part-1-of-2.md b/website/blog/2025-10-29-deep-dive-into-hudis-indexing-subsystem-part-1-of-2.md
@@ -3,6 +3,7 @@ title: "Deep Dive Into Hudi’s Indexing Subsystem (Part 1 of 2)"
 excerpt: ""
 author: Shiyan Xu
 category: blog
+subCategory: upserts
 image: /assets/images/blog/2025-10-29-deep-dive-into-hudis-indexing-subsystem-part-1-of-2/fig1.png
 tags:
 - hudi

diff --git a/.../blog/2025-11-07-how-freewheel-uses-apache-hudi-to-power-its-data-lakehouse.mdx b/.../blog/2025-11-07-how-freewheel-uses-apache-hudi-to-power-its-data-lakehouse.mdx
@@ -3,6 +3,7 @@ title: "How FreeWheel Uses Apache Hudi to Power Its Data Lakehouse"
 excerpt: "How FreeWheel unified batch and streaming with an Apache Hudi–powered lakehouse to improve freshness, simplify operations, and scale analytics."
 author: The Hudi Community
 category: blog
+subCategory: security
 image: /assets/images/blog/2025-11-07-how-freewheel-uses-apache-hudi-to-power-its-data-lakehouse/image1.png
 tags:
   - hudi

diff --git a/website/blog/2025-11-12-deep-dive-into-hudis-indexing-subsystem-part-2-of-2.md b/website/blog/2025-11-12-deep-dive-into-hudis-indexing-subsystem-part-2-of-2.md
@@ -3,6 +3,7 @@ title: Deep Dive Into Hudi's Indexing Subsystem (Part 2 of 2)
 excerpt: 'Explore advanced indexing in Apache Hudi: record and secondary indexes for fast point lookups, expression indexes for transformed predicates, and async indexing for building indexes without blocking writes.'
 author: Shiyan Xu
 category: blog
+subCategory: data lake
 image: /assets/images/blog/2025-11-12-deep-dive-into-hudis-indexing-subsystem-part-2-of-2/fig1.png
 tags:
   - hudi

diff --git a/website/learn/blog.md b/website/learn/blog.md
@@ -1,11 +1,8 @@
 ---
 title: "Blogs"
+hide_title: true
 ---
 
 import BlogList from '@site/src/components/BlogList';
 
-# Blogs
-
-Welcome to Apache Hudi blogs! Here you'll find the latest articles, tutorials, and updates from the Hudi community.
-
 <BlogList />
diff --git a/website/src/components/BlogList/Icon/search.svg b/website/src/components/BlogList/Icon/search.svg