Skip to content

zsxh/build-a-simple-database-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building a clone of sqlite from scratch in order to understand how does a database work.

Read

https://cstack.github.io/db_tutorial/

How Does a Database Work?

  • What format is data saved in? (in memory and on disk)
  • When does it move from memory to disk?
  • Why can there only be one primary key per table?
  • How does rolling back a transaction work?
  • How are indexes formatted?
  • When and how does a full table scan happen?
  • What format is a prepared statement saved in?

static/arch2.gif

sqlite architecture

Note

Sqlite Architecture

Tokenizer -> Parser -> Code Generator -> Virtual Machine -> B-Tree -> Pager -> OS Interface

A query goes through a chain of components in order to retrieve or modify data. The front-end consists of the:

  • tokenizer
  • parser
  • code generator

The input to the front-end is a SQL query. the output is sqlite virtual machine bytecode (essentially a compiled program that can operate on the database).

The back-end consists of the:

  • virtual machine
  • B-tree
  • pager
  • os interface

The virtual machine takes bytecode generated by the front-end as instructions. It can then perform operations on one or more tables or indexes, each of which is stored in a data structure called a B-tree. The VM is essentially a big switch statement on the type of bytecode instruction.

Each B-tree consists of many nodes. Each node is one page in length. The B-tree can retrieve a page from disk or save it back to disk by issuing commands to the pager.

The pager receives commands to read or write pages of data. It is responsible for reading/writing at appropriate offsets in the database file. It also keeps a cache of recently-accessed pages in memory, and determines when those pages need to be written back to disk.

The os interface is the layer that differs depending on which operating system sqlite was compiled for. In this tutorial, I’m not going to support multiple platforms.

SQL compiler parses a string and output an internal representation called bytecode.
This bytecode is passed to the virtual machine, which executes it.\ Breaking things into two steps like this has a couple advantages: \

  • Reduces the complexity of each part (e.g. virtual machine does not worry about syntax errors)
  • Allows compiling common queries once and caching the bytecode for improved performance

B-Tree Introdution

The B-Tree is the data structure SQLite uses to represent both tables and indexes.

Why is a tree a good data structure for a database?

  • Searching for a particular value is fast (logarithmic time)
  • Inserting / deleting a value you’ve already found is fast (constant-ish time to rebalance)
  • Traversing a range of values is fast (unlike a hash map)

B-Tree / B+ Tree

A B-Tree is different from a binary tree (the “B” probably stands for the inventor’s name, but could also stand for “balanced”).
Each node can have up to m children, where m is called the tree’s “order”. To keep the tree mostly balanced, we also say nodes have to have at least m/2 children (rounded up). \ Exception:

  • Leaf nodes have 0 children
  • The root node can have fewer than m children but must have at least 2
  • If the root node is a leaf node (the only node), it still has 0 children

SQLite uses B-Tree to store indexes. To store tables, SQLites uses a variation called B+ Tree.

B-treeB+ tree
Pronounced“Bee Tree”“Bee Plus Tree”
Used to storeIndexesTables
Internal nodes store keysYesYes
Internal nodes store valuesYesNo
Number of children per nodeLessMore
Internal nodes vs. leaf nodesSame structureDifferent structure

B+ Tree

For an order-m tree…Internal NodeLeaf Node
Storeskeys and pointers to childrenkeys and values
Number of keysup to m-1as many as will fit
Number of pointersnumber of keys + 1none
Number of valuesnonenumber of keys
Key purposeused for routingpaired with value
Stores values?NoYes

https://cstack.github.io/db_tutorial/parts/part7.html

Tests

Install Ruby

Install bunlder

gem install bunlder

Install rspec

Specify dependencies in a Gemfile in project’s root.

source 'http://rubygems.org'
gem 'rspec'
bundle install --path vendor/bundle

Run tests

tests file “spec/main_spec.rb”

bundle exec rspec

Issues

Part 4
if you failed “prints error message when table is full”, comment “free_table(table);” in do_meta_command method, uncomment it until fixing all bugs in the end.

Links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published