-
Notifications
You must be signed in to change notification settings - Fork 6.4k
RocksDB Bloom Filter
For any arbitrary set of keys, an algorithm may be applied to create a bit array called a Bloom filter. Given an arbitrary key, this bit array may be used to determine if the key may exist or definitely does not exist in the key set.
In RocksDB, every SST file contains a Bloom filter, which is used to determine if the file may contain the key we're looking for.
For a more detailed explanation of how Bloom filters work, see this Wikipedia article.
In RocksDB, each SST file has a corresponding Bloom filter. It is created when the SST file is written to storage, and is stored as part of the associated SST file. Bloom filters are constructed for files in all levels in the same way.
Bloom filters may only be created from a set of keys - there is no operation to combine Bloom filters. When we combine two SST files, a new Bloom filter is created from the keys of the new file.
When we open an SST file, the corresponding Bloom filter is also opened and loaded in memory. When the SST file is closed, the Bloom filter is removed from memory.
To cache the Bloom filter in block cache, use: BlockBasedTableOptions::cache_index_and_filter_blocks=true,
otherwise they are always pinned into memory.
A Bloom filter may only be built if all keys fit in memory. On the other hand, sharding a bloom filter will not affect its false positive rate. Therefore, to alleviate the memory pressure when creating the SST file, in the old format a separate bloom filter is created per each 2KB block of key values. Details for the format can be found here. At the end an array of the offsets of individual bloom blocks are stored in SST file.
At read time, the offset of the block that might contain the key/value is obtained from the SST index. Based on the offset the corresponding bloom filter is then loaded. If the filter suggests that the key might exist, then it searches the actual data block for the key.
The individual filter blocks in the old format are not cache aligned and could result into a lot of cache misses during lookup. Moreover although negative responses (i.e., key does not exist) of the filter saves only searching into the data block, the index is already loaded and looked into. The new format, full filter, addresses these issues by creating one filter for the entire SST file. The tradeoff is that more memory is required to cache the hash of all the keys in the file (versus only keys of a 2KB block in the old format).
Full filter limits the bits that are modified by hash functions of a key to be all within the same CPU cache line. The ensures fast lookups by limiting the CPU cache misses to one per key. Note that this is essentially sharding the bloom space and will not affect the false positive rate of the bloom.
Users are able to specify which kind of Bloom filter to use when calling NewBloomFilterPolicy
. The default is old block-based format.
(1) Stores hashes of keys in-memory to reduce memory consumption. Users should still think twice before using it for large SST files.
(2) Write/Read multiple probes (bits generated by different hash functions) in one CPU cache line. Thus write/read on filter is faster.
By default, the bloom filter remains the original format. To enable new filter format, you just need to add a parameter when creating FilterPolicy like:
NewBloomFilterPolicy(10, false).
The second parameter means "not use original filter format".
When reading filter block, RocksDB could tell the filter format and create the appropriate reader.
Original filter policy interface is too fixed and not suitable for new filter format and optimization. We define two more interfaces in filter policy (include/rocksdb/filter_policy.h) :
FilterBitsBuilder* GetFilterBitsBuilder()
FilterBitsReader* GetFilterBitsReader(const Slice& contents)
In this way, the new filter policy would function as a factory for FilterBitsBuilder and FilterBitsReader (also defined in include/rocksdb/filter_policy.h). FilterBitsBuilder provides interface for key storage and filter generation and FilterBitsReader provides interface to check if a key may exist in filter.
Notice: This two new interface just works for new filter format. Original filter format still use original method to customize.
Read here.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc