-
Notifications
You must be signed in to change notification settings - Fork 6.4k
IO Tracer and Parser
Akanksha Mahajan edited this page Dec 18, 2020
·
9 revisions
[In Progress]
On Posix systems we can use existing tracing mechanism(strace etc.). But on storage systems where we cannot use existing tracing tools, we added a mechanism to trace IO operations to understand IO behavior of RocksDB while accessing data on the storage.
IO trace record contains following information:
Column Name | Values | Comment |
---|---|---|
Access timestamp in microseconds | unsigned long | |
Block ID | unsigned long | A unique block ID. |
Block type | 7: Index block 8: Filter block 9: Data block 10: Uncompressed dictionary block 11: Range deletion block |
|
Block size | unsigned long | Block size may be 0 when 1) compaction observes cache misses and does not insert the missing blocks into the cache. 2) IO error when fetching a block. 3) prefetching filter blocks but the SST file does not have filter blocks. |
Column family ID | unsigned long | A unique column family ID. |
Column family name | string | |
Level | unsigned long | The LSM tree level of this block. |
SST file number | unsigned long | The SST file this block belongs to. |
Caller | See Caller | The caller that accesses this block, e.g., Get, Iterator, Compaction, etc. |
No insert | 0: do not insert the block upon a miss 1: insert the block upon a cache miss |
|
Get ID | unsigned long | A unique ID associated with the Get request. |
Get key ID | unsigned long | The referenced key of the Get request. |
Get referenced data size | unsigned long | The referenced data (key+value) size of the Get request. |
Is a cache hit | 0: A cache hit 1: A cache miss |
The running RocksDB instance observes a cache hit/miss on this block. |
Get Does get referenced key exist in this block | 0: Does not exist 1: Exist |
Data block only: Whether the referenced key is found in this block. |
Get Approximate number of keys in this block | unsigned long | Data block only. |
Get table ID | unsigned long | The table ID of the Get request. We treat the first four bytes of the Get request as table ID. |
Get sequence number | unsigned long | The sequence number associated with the Get request. |
Block key size | unsigned long | |
Get referenced key size | unsigned long | |
Block offset in the SST file | unsigned long | |
An example to start IO tracing: |
Env* env = rocksdb::Env::Default();
EnvOptions env_options;
std::string trace_path = "/tmp/binary_trace_test_example”;
std::unique_ptr<TraceWriter> trace_writer;
DB* db = nullptr;
std::string db_name = "/tmp/rocksdb”;
/*Create the trace file writer*/
NewFileTraceWriter(env, env_options, trace_path, &trace_writer);
DB::Open(options, dbname);
/*Start IO tracing*/
db->StartIOTrace(env, trace_opt, std::move(trace_writer));
/*Your call of RocksDB APIs */
/*End IO tracing*/
db->EndIOTrace();
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc