Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Removed typos and made docs more comprehensible
  • Loading branch information
arrowler authored Dec 26, 2024
1 parent 2bed39c commit 06999d5
Showing 1 changed file with 8 additions and 12 deletions.
20 changes: 8 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ VelarixDB is an ongoing project (*not production ready*) designed to optimize da

## Problem

During compaction in LevelDB or RocksDB, in the worst case, up to 10 SSTable files needs to be read, sorted and re-written since keys are not allowed to overlapp across all the sstables from Level 1 downwards. Suppose after merging SSTables in one level, the next level exceeds its threshold, compaction can cascade from Level 0 all the way to Level 6 meaning the overall write amplification can be up to 50 (ignoring the first compaction level).[ Reference -> [Official LevelDB Compaction Process Docs](https://github.com/facebook/rocksdb/wiki/Leveled-Compaction) ].
During compaction in LevelDB or RocksDB, in the worst case, up to 10 SSTable files need to be read, sorted ,and re-written since keys are not allowed to overlap across all the sstables from Level 1 downwards. Suppose after merging SSTables in one level, the next level exceeds its threshold; compaction can cascade from Level 0 all the way to Level 6, meaning the overall write amplification can be up to 50 (ignoring the first compaction level).[ Reference -> [Official LevelDB Compaction Process Docs](https://github.com/facebook/rocksdb/wiki/Leveled-Compaction) ].
This repetitive data movement can cause significant wear on SSDs, reducing their lifespan due to the high number of write cycles. The goal is to minimize the amount of data moved during compaction, thereby reducing the amount of data re-written and extending the device's lifetime.

## Solution
Expand All @@ -44,22 +44,19 @@ According to the benchmarks presented in the WiscKey paper, implementations can
- **1.6x to 14x** for random lookups

## Addressing major concerns
- **Range Query**: Since keys are separate from values, won't that affect range queries performance. Well, we now have internal parallelism in SSDs, as we fetch the keys from the LSM tree we can fetch the values in parallel from the vlog file. This [benchmark](https://github.com/Gifted-s/velarixdb/blob/main/bench.png) from the Wisckey Paper shows how for request size ≥ 64KB, the aggregate throughput of random reads with 32 threads matches the sequential read throughput.
- **More Disk IO for Reads**: Since keys are now seperate from values, we have to make extra disk IO to fetch values? Yes, but since the key density now increases for each level (since we are only storing keys and value offsets in the sstable), we will most likely search fewer levels compared to LevelDB or RocksDB for thesame query. A significant portion of the LSM tree can also be cached in memory.
- **Range Query**: Since keys are separate from values, won't that affect range queries performance? Well, we now have internal parallelism in SSDs, as we fetch the keys from the LSM tree we can fetch the values in parallel from the vlog file. This [benchmark](https://github.com/Gifted-s/velarixdb/blob/main/bench.png) from the Wisckey Paper shows how, for request size ≥ 64KB, the aggregate throughput of random reads with 32 threads matches the sequential read throughput.
- **More Disk IO for Reads**: Since keys are now separate from values, do we have to make extra disk IO to fetch values? Yes, but since the key density now increases for each level (since we are only storing keys and value offsets in the sstable), we will most likely search fewer levels compared to LevelDB or RocksDB for the same query. A significant portion of the LSM tree can also be cached in memory.

## Designed for asynchronous runtime (unstable)
Based on the introduction and efficiency of asynchronous IO at the OS kernel level e.g **io_uring** for the Linux kernel, VelarixDB is designed for asynchronous runtime. In this case Tokio runtime.
Tokio allows for efficient and scalable asynchronous operations, making the most of modern multi-core processors. Frankly, most OS File System does not provide async API currently but Tokio uses a thread pool to offload blocking file system operations.
This means that even though the file system operations themselves are blocking at the OS level, Tokio can handle them without blocking the main async task executor. Tokio might adopt [io_uring](https://docs.rs/tokio/latest/tokio/fs/index.html#:~:text=Currently%2C%20Tokio%20will%20always%20use%20spawn_blocking%20on%20all%20platforms%2C%20but%20it%20may%20be%20changed%20to%20use%20asynchronous%20file%20system%20APIs%20such%20as%20io_uring%20in%20the%20future.) in the future. (We haven't benchmarked the async version therefore this is unstable and might be removed in a future version)
Based on the introduction and efficiency of asynchronous IO at the OS kernel level e.g **io_uring** for the Linux kernel, VelarixDB is designed for asynchronous runtime. In this case, Tokio runtime.
Tokio allows for efficient and scalable asynchronous operations, making the most of modern multi-core processors. Frankly, most OS File Systems do not provide async API currently, but Tokio uses a thread pool to offload blocking file system operations.
This means that even though the file system operations themselves are blocking at the OS level, Tokio can handle them without blocking the main async task executor. Tokio might adopt [io_uring](https://docs.rs/tokio/latest/tokio/fs/index.html#:~:text=Currently%2C%20Tokio%20will%20always%20use%20spawn_blocking%20on%20all%20platforms%2C%20but%20it%20may%20be%20changed%20to%20use%20asynchronous%20file%20system%20APIs%20such%20as%20io_uring%20in%20the%20future.) in the future. (We haven't benchmarked the async version therefore, this is unstable and might be removed in a future version)


## Disclaimer

Please note that velarixdb is still under development and is not yet production-ready.

### NOTE
v2 is the most recent version (not experimental) and under active development, the src modules are for the experimental version

### Basic Features
- [x] Atomic `Put()`, `Get()`, `Delete()`, and `Update()` operations
- [x] 100% safe & stable Rust
Expand Down Expand Up @@ -92,8 +89,7 @@ v2 is the most recent version (not experimental) and under active development, t

### Constraint
- Keys are limited to 65,536 bytes, and values are limited to 2^32 bytes. Larger keys and values have a bigger performance impact.
- Like any typical key-value store, keys are stored in lexicographic order. If you are storing integer keys (e.g., timeseries data), use the big-endian form to adhere to locality.

-
# Basic usage

```sh
Expand All @@ -102,7 +98,7 @@ cargo add velarixdb

```rust
use velarixdb::db::DataStore;
# use tempfile::tempdir;
use tempfile::tempdir;

#[tokio::main]
async fn main() {
Expand Down

0 comments on commit 06999d5

Please sign in to comment.