[SPARK-53446][CORE] Optimize BlockManager remove operations with cached block mappings #52210

JacobZheng0927 · 2025-09-03T04:01:52Z

What changes were proposed in this pull request?

This PR optimizes BlockManager remove operations by introducing cached mappings to eliminate O(n) linear scans. The main changes are:

Introduced three concurrent hash maps to track block ID associations:
- rddToBlockIds: Maps RDD ID to its block IDs
- broadcastToBlockIds: Maps broadcast ID to its block IDs
- sessionToBlockIds: Maps session UUID to its cache block IDs
Added cache maintenance methods:
- addToCache(blockId): Updates caches when blocks are stored
- removeFromCache(blockId): Updates caches when blocks are deleted
Reworked remove operations to use cached lookups:
- removeRdd(), removeBroadcast(), and removeCache() now perform O(1) lookups instead of scanning all entries
Integrated with block lifecycle:
- doPutIterator() calls addToCache() after successful block storage
- removeBlock() calls removeFromCache() when blocks are removed

Why are the changes needed?

Previously, removeRdd(), removeBroadcast(), and removeCache() required scanning all blocks in blockInfoManager.entries to find matches. This approach becomes a serious bottleneck when:

Large block counts: In production deployments with millions or even tens of millions of cached blocks, linear scans can be prohibitively slow
High cleanup frequency: Workloads that repeatedly create and discard RDDs or broadcast variables accumulate overhead quickly

The original removeRdd() method already contained a TODO noting that an additional mapping would be needed to avoid linear scans. This PR implements that improvement.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests: Verified the correctness of removeRdd(), removeBroadcast(), and removeCache(), including edge cases.
Stress tests: Ran multiple simple tasks using broadcast joins under sustained high concurrency to validate performance and stability of the optimized remove operations.

Before optimization

After optimization

The optimization delivers significant performance improvements for block cleanup under large data volumes, reducing the overhead caused by frequent GC when blocks accumulate.

Was this patch authored or co-authored using generative AI tooling?

No.

…ed block mappings

cloud-fan · 2025-09-10T03:09:38Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

+        Option(rddToBlockIds.get(rddBlockId.rddId)).foreach { blockSet =>
+          blockSet.remove(blockId)
+          if (blockSet.isEmpty) {
+            rddToBlockIds.remove(rddBlockId.rddId)


I think there is a race condition here:

thread 1 calls addToCache, gets the map for its RDD id

thread 2 calls removeFromCache, gets the map for the same RDD id, remove the last block id, and then removes this RDD id from the cache

thread 1 adds the block id, but it's noop as this map entire is dangling now.

Agree with @cloud-fan, use compute instead.
For example, something like this:

// Use import java.util.{Set => JSet} and change change the 'type' of value for the sets to JSet[BlockId] def removeFromCache( <snip> def doRemove[K](map: ConcurrentHashMap[K, JSet[BlockId]], key: K, block: BlockId): Unit = { map.compute(key, (_, set) => { if (null != set) { set.remove(block) if (set.isEmpty) null else set } else { // missing null } } ) } <snip> case rddBlockId: RDDBlockId => doRemove(rddToBlockIds, rddBlockId.rddId, blockId) case broadcastBlockId: BroadcastBlockId => doRemove(broadcastToBlockIds, broadcastBlockId.broadcastId, blockId) // and so on

cloud-fan · 2025-09-10T03:12:17Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

      exceptionWasThrown = false
      if (res.isEmpty) {
        // the block was successfully stored
+        addToCache(blockId)


is doPut the only entry point that can add blocks?

For adding a new block, yes; it goes through doPut

cloud-fan · 2025-09-10T03:31:06Z

can we put this extra mapping in the lower level BlockInfoManager? Then it's easier to guarantee the consistency between the extra mapping and the original block id map.

mridulm

Took a quick pass, thanks for working on this !

mridulm · 2025-09-12T14:53:00Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

+        Option(rddToBlockIds.get(rddBlockId.rddId)).foreach { blockSet =>
+          blockSet.remove(blockId)
+          if (blockSet.isEmpty) {
+            rddToBlockIds.remove(rddBlockId.rddId)


Agree with @cloud-fan, use compute instead.
For example, something like this:

// Use import java.util.{Set => JSet} and change change the 'type' of value for the sets to JSet[BlockId] def removeFromCache( <snip> def doRemove[K](map: ConcurrentHashMap[K, JSet[BlockId]], key: K, block: BlockId): Unit = { map.compute(key, (_, set) => { if (null != set) { set.remove(block) if (set.isEmpty) null else set } else { // missing null } } ) } <snip> case rddBlockId: RDDBlockId => doRemove(rddToBlockIds, rddBlockId.rddId, blockId) case broadcastBlockId: BroadcastBlockId => doRemove(broadcastToBlockIds, broadcastBlockId.broadcastId, blockId) // and so on

mridulm · 2025-09-12T15:04:48Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

+  /**
+   * Add a block ID to the appropriate cache mapping based on its type.
+   */
+  private def addToCache(blockId: BlockId): Unit = {


nit: Move these helper methods towards the bottom.

mridulm · 2025-09-12T15:10:22Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

+    val blocksToRemove = Option(rddToBlockIds.get(rddId)) match {
+      case Some(blockSet) =>
+        blockSet.asScala.toSeq
+      case None =>
+        Seq.empty
+    }


Remove it proactively from the map. This also makes the size returned consistent with what is actually removed (in case of race conditions).
Same for the other cases as well

Suggested change

val blocksToRemove = Option(rddToBlockIds.get(rddId)) match {

case Some(blockSet) =>

blockSet.asScala.toSeq

case None =>

Seq.empty

}

val blocksToRemove = Option(rddToBlockIds.remove(rddId)).

map(_.asScala.toSeq).getOrElse(Seq.empty)

mridulm · 2025-09-12T15:24:54Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala


      blockInfoManager.removeBlock(blockId)
+      removeFromCache(blockId)
      hasRemoveBlock = true


Do this in the finally block as well.

mridulm · 2025-09-12T15:30:20Z

core/src/main/scala/org/apache/spark/storage/BlockManager.scala

      exceptionWasThrown = false
      if (res.isEmpty) {
        // the block was successfully stored
+        addToCache(blockId)


For adding a new block, yes; it goes through doPut

JacobZheng0927 · 2025-10-17T08:32:22Z

Sorry, I don’t have time to continue this PR. @zml1206 has taken over in #52646.

[SPARK-53446][CORE] Optimize BlockManager remove operations with cach…

c84be58

…ed block mappings

github-actions bot added the CORE label Sep 3, 2025

JacobZheng0927 changed the title ~~[SPARK-53446][CORE] Optimize BlockManager remove operations with cach…~~ [SPARK-53446][CORE] Optimize BlockManager remove operations with cached block mappings Sep 4, 2025

cloud-fan reviewed Sep 10, 2025

View reviewed changes

mridulm reviewed Sep 12, 2025

View reviewed changes

zml1206 mentioned this pull request Oct 17, 2025

[SPARK-53446][CORE] Optimize BlockManager remove operations with cached block mappings #52646

Open

Uh oh!

[SPARK-53446][CORE] Optimize BlockManager remove operations with cached block mappings #52210

Are you sure you want to change the base?

[SPARK-53446][CORE] Optimize BlockManager remove operations with cached block mappings #52210

Conversation

JacobZheng0927 commented Sep 3, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Sep 10, 2025

Uh oh!

mridulm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JacobZheng0927 commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mridulm left a comment •

edited

Loading