Skip to content

Latest commit

 

History

History
52 lines (39 loc) · 1.79 KB

spark-blockdatamanager.adoc

File metadata and controls

52 lines (39 loc) · 1.79 KB

BlockDataManager - Block Storage Management API

BlockDataManager is a pluggable interface to manage storage for blocks (aka block storage management API). Blocks are identified by BlockId and stored as ManagedBuffer.

Note
BlockManager is currently the only available implementation of BlockDataManager.
Note
org.apache.spark.network.BlockDataManager is a private[spark] Scala trait in Spark.

BlockDataManager Contract

Every BlockDataManager offers the following services:

  • getBlockData to fetch a local block data by blockId.

    getBlockData(blockId: BlockId): ManagedBuffer
  • putBlockData to upload a block data locally by blockId. The return value says whether the operation has succeeded (true) or failed (false).

    putBlockData(
      blockId: BlockId,
      data: ManagedBuffer,
      level: StorageLevel,
      classTag: ClassTag[_]): Boolean
  • releaseLock is a release lock for getBlockData and putBlockData operations.

    releaseLock(blockId: BlockId): Unit

BlockId

BlockId identifies a block of data. It has a globally unique identifier (name)

There are the following types of BlockId:

  • RDDBlockId - described by rddId and splitIndex

  • ShuffleBlockId - described by shuffleId, mapId and reduceId

  • ShuffleDataBlockId - described by shuffleId, mapId and reduceId

  • ShuffleIndexBlockId - described by shuffleId, mapId and reduceId

  • BroadcastBlockId - described by broadcastId and optional field - a piece of broadcast value

  • TaskResultBlockId - described by taskId

  • StreamBlockId - described by streamId and uniqueId

ManagedBuffer