Go-Based Storage Engine for Key-Value Pairs

A comprehensive project implementing fundamental storage engine architectures used in modern databases. This project explores how persistent key-value stores work under the hood by building three different storage engine designs from scratch in Go.

Overview

This project implements three core storage engine architectures:

Hash Indexing - Simple, fast, append-only log with in-memory hash index
SSTables + LSM Trees - Write-optimized Log-Structured Merge Trees
B-Trees - Traditional balanced tree structure with page-based storage

Each implementation demonstrates different trade-offs in performance, complexity, and use cases, providing hands-on experience with the data structures that power systems like PostgreSQL, Cassandra, LevelDB, and RocksDB.

Project Goals

Understand the fundamental trade-offs in storage engine design
Learn how databases achieve persistence and durability
Explore different indexing strategies and their performance characteristics
Gain practical experience implementing complex data structures
Build production-ready storage primitives in Go

Architecture

Hash Indexing

The simplest storage engine using an append-only log file with an in-memory hash index.

Features:

O(1) read and write performance
Append-only log for fast writes
In-memory hash map for instant lookups
Background compaction to reclaim space
Crash recovery from log replay

Use Cases: High write throughput, working set fits in memory, simple key-value operations

SSTables + LSM Trees

A write-optimized structure using sorted string tables and multi-level compaction.

Features:

Memtable (in-memory sorted structure)
Multiple levels of immutable SSTables on disk
Background compaction and merging
Bloom filters for fast negative lookups
Write-ahead log (WAL) for durability

Use Cases: Write-heavy workloads, time-series data, append-mostly patterns

B-Trees

A traditional balanced tree structure with fixed-size pages stored on disk.

Features:

Self-balancing tree with guaranteed O(log n) operations
Page-based I/O for efficient disk access
In-place updates
Range query support
Write-ahead logging for crash recovery

Use Cases: Read-heavy workloads, range queries, ACID transactions

Implementation Details

Hash Indexing

Log Format: Length-prefixed key-value pairs
Compaction: Merges segments and removes duplicates
Recovery: Rebuilds index from log on startup

LSM Trees

Memtable: Skip list or red-black tree
SSTable Format: Sorted blocks with sparse index
Compaction Strategy: Leveled or size-tiered
Optimization: Bloom filters reduce disk I/O

B-Trees

Node Size: 4KB pages (configurable)
Fanout: Typically 100-200 children per node
Concurrency: Page-level locking
Durability: WAL with checkpointing

Target Benchmarks

Operation	Hash Index	LSM Tree	B-Tree
Write	~100k/s	~80k/s	~50k/s
Read	~150k/s	~60k/s	~70k/s
Range Scan	N/A	~40k/s	~80k/s

Benchmarks run on consumer hardware with default configurations

Project Structure

.
├── hash/           # Hash indexing implementation
├── lsm/            # LSM Tree implementation
├── btree/          # B-Tree implementation
├── common/         # Shared utilities and interfaces
├── benchmarks/     # Performance tests
└── examples/       # Usage examples

Roadmap

Learning Resources

Designing Data-Intensive Applications by Martin Kleppmann
Database Internals by Alex Petrov
LSM Trees Paper
B-Trees vs LSM Trees

Acknowledgments

Inspired by storage engines in LevelDB, RocksDB, PostgreSQL, and SQLite. Built for learning and understanding database internals.

Note: This is a personal project. For production use cases, consider mature solutions like BadgerDB, BoltDB, or Pebble.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
hash_indexes		hash_indexes
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Go-Based Storage Engine for Key-Value Pairs

Overview

Project Goals

Architecture

Hash Indexing

SSTables + LSM Trees

B-Trees

Implementation Details

Hash Indexing

LSM Trees

B-Trees

Target Benchmarks

Project Structure

Roadmap

Learning Resources

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

LDagnachew/KV-Storage-Engine

Folders and files

Latest commit

History

Repository files navigation

Go-Based Storage Engine for Key-Value Pairs

Overview

Project Goals

Architecture

Hash Indexing

SSTables + LSM Trees

B-Trees

Implementation Details

Hash Indexing

LSM Trees

B-Trees

Target Benchmarks

Project Structure

Roadmap

Learning Resources

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages