-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
279 lines (206 loc) · 8.13 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
To build the software:
$ make
To install it:
$ sudo make install
This will put the executables in /usr/local/bin
To initialize or re-create the database files:
$ sudo /usr/local/bin/dbr_ctl initdb [force]
To start the db:
$ sudo /usr/local/bin/dbr_ctl start
Look at the top of the dbr_ctl script for some initialization variables you
might want to change.
--------------
Precis
--------------
Roxanne is a very simple database server that allows a client to store and
retrieve values by key. Keys are stored in a hash map of 64K buckets. Hash
collisions are resolved by separate chaining onto linked lists at the end
of the index file. The default location for the index file is in
/var/roxanne/idx
In addition to value lookups by key, the database provides a way to group
keys in a hierarchical directory structure. See Composite Keys below.
The values for the given keys are stored in contiguous 4KB blocks in the
database file (/var/roxanne/db). A file called block_bitmap tracks the
free/busy blocks in the db file. The dbr processes memory-map this file for
very fast access. The db file starts out with zero blocks. Blocks are only
added to the db file as needed to accomodate new records. As typically
built, the database can accomodate about a billion blocks.
--------------
Composite Keys
--------------
The database supports the notion of a composite key. That is, a key
that is subdivided into a hierarchy that all keys participate in. The key-
space is similar to a typical Unix filesystem organized into directories.
To work with hierarchical keys, divide them with spaces ' '. The last
element becomes the value. The 'value' does not participate in the
key-space.
The commands
create finance accounting payroll employees 50
create finance accounting receivables employees 70
Creates a 4-level hierarchy:
1 2 3 4
finance
accounting
payroll
employees
receivables
employees
The key-space then becomes a kind of database on its own. A client
can query the database for all the subkeys of a path. This gives clients
building blocks for range queries and ordered (sorted) results.
Nodes in the key-space are reference-counted so that repeating groups
in the hierarchy don't waste space.
All lookups of values are still done via hashmap of the entire key. In
other words, values can only be fetched by providing THE ENTIRE KEY.
In this case, the hash-lookup key is 'finance accounting payroll employees'.
It points to a starting block in the db that contains '50'.
This means that point-lookups of records will always be very fast, but that
the operator can use the key-space to organize and sort results.
Results from the keys command return in ascending sort order (strcmp).
--------------
Usage Example
--------------
madison:Roxanne rothrock$ sudo dbr_ctl start
Started listening.
carp:Roxanne rothrock$ telnet localhost 4080
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
create alpha record_1
STATUS: OK
SIZE: 9
Write OK.
create alpha beta record_2
STATUS: OK
SIZE: 9
Write OK.
read alpha
STATUS: OK
SIZE: 8
record_1
read alpha beta
STATUS: OK
SIZE: 8
record_2
...Now a demonstration of the 'keys' command:
create alpha gamma record_3
STATUS: OK
SIZE: 9
Write OK.
create alpha delta record_4
STATUS: OK
SIZE: 9
Write OK.
keys alpha
STATUS: OK
SIZE: 16
beta delta gamma
--------------
Protocol
--------------
Data manipulation commands:
create key [key]...value<newline>
read key [key]...<newline>
delete key [key]...<newline>
keys [key] [key]....<newline>
Other commands:
quit
Response:
"STATUS: <status string>\nSIZE: <integer>\n<integer> long string of bytes\n\n"
--------------
On Disk
--------------
The initial and maximum sizes are hard-coded in the application along
with the units of growth and storage. Eventually, these will become
configurable options. For now, it's just simpler this way.
Roxanne's persistent storage lives in /var/roxanne by default.
$ du -skh /var/roxanne/*
128M /var/roxanne/block_bitmap
64K /var/roxanne/db
64M /var/roxanne/idx
24K /var/roxanne/keydb
0B /var/roxanne/keydb_freelist
### The Block Bitmap (/var/roxanne/block_bitmap)
Each _bit_ in this file corresponds to a 4 KB block in the 'db' file.
Roxanne processes memory-map this file and use it keep track of used
and free blocks. Only one process may update the block bitmap at any
time. The block bitmap file provides a way to very quickly find and
reserve a contiguous set of blocks to store values.The block_bitmap
file is regularly flushed to disk with msync() after each reservation
request. The size of the block bitmap file is fixed at 128MB which is
enough space to keep track of 1,073,741,824 blocks.
### The database (/var/roxanne/db)
All values are stored in the db file. The unit of storage in the 'db'
file is the block, and all blocks are 4KB.
The db file starts out small and grows as more values are added to the
the database. All values inserted into the database are stored in
contiguous blocks. This simplifies the code, and provides for fast access,
but some fragmentation will occur over time if the database serves lots
of reads and writes of varying size.
#### The hash index (/var/roxanne/idx)
Initially sized at 64 megabytes, the hash index stores the full, composite
keys for values in the db file. An entry in the index is comprised of a
key stored as a string, an integer that represents a byte-offset in the
db file, an integer representing the length in bytes of the value, and
a third integer that represents a byte-offset in the idx file for
chaining additional keys.
Hash collisions are resolved by the separate chaining method. Each
index entry is 1024 bytes, so the hash table has 65,536 slots. When a
collision occurs, the colliding key is appended to the end if the index
file and the key in the slot where the collision occurred has its
'next' pointer updated to be the byte-offest in the index file where
the append occurred.
#### The keydb (/var/roxanne/keydb)
Initially sized at 0 bytes, the keydb stores the composite key hierarchy.
Each level of the hierarchy is stored as a binary tree.
Each node in the hierarchy has: a key-part, a left pointer (less-than),
a right pointer (greater-than), and a next pointer that points to the
next level in the hierarchy.
Nodes have a reference count so that duplicate key-parts don't require
additional space. Unfortunately, the database does not yet support
reclamation of keydb nodes with a reference count of 0.
##### Example
Consider the following two composite keys (values left off):
foo bar toast
foo bar jam
This set of two composite keys comprises 4 nodes in the keydb, stored
like so:
key refcount next-pointer left-pointer right-pointer
------------------------------------------------------------
foo 2 bar NULL NULL
bar 2 toast NULL NULL
toast 1 NULL jam NULL
jam 1 NULL NULL NULL
Next, add these composite keys (again, values left off).
foo bar whiskey
zen
zoo
egg
The table now looks like this:
key refcount next-pointer left-pointer right-pointer
------------------------------------------------------------
foo 3 bar egg zen
bar 3 toast NULL NULL
toast 1 NULL jam whiskey
jam 1 NULL NULL NULL
whiskey 1 NULL NULL NULL
zen 1 NULL NULL zoo
zoo 1 NULL NULL NULL
egg 1 NULL NULL NULL
Here is the composite-key space represented graphically:
+---+foo+----+
| + |
| | |
| | |
v v v
egg bar zen+--+
+ |
| v
| v
v zoo
+--+toast+--+
| |
| |
| |
v v
jam whiskey