-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathets.html
708 lines (529 loc) · 52.9 KB
/
ets.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="keywords" content="Erlang, ETS, Erlang Term Storage, database, table, match specification, dets, set, bag, fun2ms" />
<meta name="description" content="Learning to use ETS tables, an in-memory database for Erlang. Side concepts such as match specifications are also seen, and we refactor a name registry server to use ETS." />
<meta name="google-site-verification" content="mi1UCmFD_2pMLt2jsYHzi_0b6Go9xja8TGllOSoQPVU" />
<link rel="stylesheet" type="text/css" href="static/css/screen.css" media="screen" />
<link rel="stylesheet" type="text/css" href="static/css/sh/shCore.css" media="screen" />
<link rel="stylesheet" type="text/css" href="static/css/sh/shThemeLYSE2.css" media="screen" />
<link rel="stylesheet" type="text/css" href="static/css/print.css" media="print" />
<link href="rss" type="application/rss+xml" rel="alternate" title="LYSE news" />
<link rel="icon" type="image/png" href="favicon.ico" />
<link rel="apple-touch-icon" href="static/img/touch-icon-iphone.png" />
<link rel="apple-touch-icon" sizes="72x72" href="static/img/touch-icon-ipad.png" />
<link rel="apple-touch-icon" sizes="114x114" href="static/img/touch-icon-iphone4.png" />
<title>Bears, ETS, Beets | Learn You Some Erlang for Great Good!</title>
</head>
<body>
<div id="wrapper">
<div id="header">
<h1>Learn you some Erlang</h1>
<span>for great good!</span>
</div> <!-- header -->
<div id="menu">
<ul>
<li><a href="content.html" title="Home">Home</a></li>
<li><a href="faq.html" title="Frequently Asked Questions">FAQ</a></li>
<li><a href="rss" title="Latest News">RSS</a></li>
<li><a href="static/erlang/learn-you-some-erlang.zip" title="Source Code">Code</a></li>
</ul>
</div><!-- menu -->
<div id="content">
<div class="noscript"><noscript>Hey there, it appears your Javascript is disabled. That's fine, the site works without it. However, you might prefer reading it with syntax highlighting, which requires Javascript!</noscript></div>
<h2>Bears, ETS, Beets</h2>
<img class="right" src="static/img/beets.png" width="333" height="171" alt="a beet" title="Taken from the Schrute farm" />
<p>Something we've been doing time and time again has been to implement some kind of storage device as a process. We've done fridges to store things, built <code>regis</code> to register processes, seen key/value stores, etc. If we were programmers doing object-oriented design, we would be having a bunch of singletons floating around, and special storage classes and whatnot. In fact, wrapping data structures like <code>dict</code>s and <code>gb_trees</code> in processes is a bit like that.</p>
<p>Holding data structures in a process is actually fine for a lot of cases — whenever we actually need that data to do some task within the process, as internal state, and so on. We've had plenty of valid uses and we shouldn't change that. However, there is one case where it is possibly not the best choice: when the process holds a data structure for the sake of sharing it with other processes and little more.</p>
<p>One of the applications we've written is guilty of that. Can you guess which? Of course you can; I've mentioned it at the end of last chapter: regis needs to be rewritten. It's not that it doesn't work or can't do its job well, but because it acts as a gateway to share data with potentially <em>a lot</em> of other processes, there is an architectural problem with it.</p>
<p>See, regis is this central application to do messaging in Process Quest (and anything else that would use it), and pretty much every message going to a named process has to go through it. This means that even though we took great care to make our applications very concurrent with independent actors and made sure our supervision structure was right to scale up, all of our operations will depend on a central regis process that will need to answer messages one by one:</p>
<img class="center explanation" src="static/img/central-regis.png" width="393" height="274" alt="Shows 6 clients (green bubbles) with arrows pointing to a central regis server (blue rectangle)" />
<p>If we have a lot of message passing going on, regis risks getting busier and busier, and if the demand is high enough our whole system will become sequential and slow. That's pretty bad.</p>
<div class="note">
<p><strong>Note:</strong> we have no direct proof that regis is a bottleneck within Process Quest — In fact, Process Quest does very little messaging compared to many other applications in the wild. If we were using regis for something that requires a lot more messaging and lookups, then the problems would be more apparent.</p>
</div>
<p>The few ways we'd have to get around that would be to either split regis into subprocesses to make lookups faster by sharding the data, or find a way to store the data in some database that will allow for parallel and concurrent access of the data. While the first way to do it would be very interesting to explore, we'll go through an easier path by doing the latter.</p>
<p>Erlang has something called ETS (Erlang Term Storage) tables. ETS tables are an efficient in-memory database included with the Erlang virtual machine. It sits in a part of the virtual machine where destructive updates are allowed and where garbage collection dares not approach. They're generally fast, and a pretty easy way for Erlang programmers to optimize some of their code when parts of it get too slow.</p>
<p>ETS tables allow limited concurrency in reads and writes (much better than none at all for a process' mailbox) in a way that could let us optimize away a lot of the pain.</p>
<div class="note koolaid">
<p><strong>Don't Drink Too Much Kool-Aid:</strong><br />
While ETS tables are a nice way to optimize, they should still be used with some care. By default, the VM is limited to 1400 ETS tables. While it is possible to change that number (<code>erl -env ERL_MAX_ETS_TABLES Number</code>), this default low level is a good sign that you should try to avoid having one table per process in general</p>
</div>
<p>But before we rewrite regis to use ETS, we should try to understand a bit of ETS' principles beforehand.</p>
<h3><a class="section" name="the-concepts-of-ets">The Concepts of ETS</a></h3>
<p>ETS tables are implemented as BIFs in the <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html">ets</a></code> module. The main design objectives ETS had was to provide a way to store large amounts of data in Erlang with constant access time (functional data structures usually tend to flirt with logarithmic access time) and to have such storage look as if it were implemented as processes in order to keep their use simple and idiomatic.</p>
<div class="note">
<p><strong>Note:</strong> having tables look like processes doesn't mean that you can spawn them or link to them, but that they can respect semantics of nothing-shared, wrapping calls behind functional interfaces, having them handle any native data type for Erlang, and having the possibility to give them names (in a separate registry), etc.</p>
</div>
<p>All ETS tables natively store Erlang tuples containing whatever you want, where one of the tuple elements will act as a primary key that you use to sort things. That is to say, having tuples of people of the form <code>{Name, Age, PhoneNumber, Email}</code> will let you have a table that looks like:</p>
<pre class="brush:erl">
{Name, Age, PhoneNumber, Email},
{Name, Age, PhoneNumber, Email},
{Name, Age, PhoneNumber, Email},
{Name, Age, PhoneNumber, Email},
{Name, Age, PhoneNumber, Email},
...
</pre>
<p>So if we say that we want to have the table's index be the e-mail addresses, we can do this by telling ETS to set the key position to 4 (we'll see how to do this in a bit, when we get to actual ETS function calls). Once you've decided on a key, you can choose different ways to store data into tables:</p>
<dl>
<dt>set</dt>
<dd>A set table will tell you that each key instance must be unique. There can be no duplicate e-mail in the database above. Sets are great when you need to use a standard key/value store with constant time access.</dd>
<dt>ordered_set</dt>
<dd>There can still only be one key instance per table, but <code>ordered_set</code> adds a few other interesting properties. The first is that elements in an <code>ordered_set</code> table will be ordered (who would have thought?!). The first element of the table is the smallest one, and the last element is the largest one. If you traverse a table iteratively (jumping to the next element over and over again), the values should be increasing, which is not necessarily true of <code>set</code> tables. Ordered set tables are great when you frequently need to operate on ranges (I want entries 12 to 50 !). They will, however, have the downside of being slower in their access time (<code>O(log N)</code> where <var>N</var> is the number of objects stored).</dd>
<dt>bag</dt>
<dd>A bag table can have multiple entries with the same key, as long as the tuples themselves are different. This means that the table can have <code>{key, some, values}</code> and <code>{key, other, values}</code> inside of it without a problem, which would be impossible with sets (they have the same key). However, you couldn't have <code>{key, some, values}</code> twice in the table as they would be entirely identical.</dd>
<dt>duplicate_bag</dt>
<dd>The tables of this type work like <code>bag</code> tables, with the exception that they do allow entirely identical tuples to be held multiple time within the same table.</dd>
</dl>
<div class="note">
<p><strong>Note:</strong> ordered_set tables will see the values 1 and 1.0 as identical for all operations. Other tables will see them as different.</p>
</div>
<p>The last general concept to learn about is that ETS tables will have the concept of a controlling process, much like sockets do. When a process calls a function that starts a new ETS table, that process is the owner of the table.</p>
<p>By default, only the owner of the table can write to it, but everyone can read from it. This is known as the <em>protected</em> level of permissions. You can also choose to set the permissions to <em>public</em>, where everyone can read and write, or <em>private</em>, where only the owner can read or write.</p>
<img class="right" src="static/img/et.png" width="320" height="306" />
<p>The concept of table ownership goes a bit further. The ETS table is intimately linked to the process. If the process dies, the table disappears (and so does all of its content). However, the table can be given away, much like we did with sockets and their controlling processes, or a heir can be determined so that if the owner process dies, the table is automatically given away to the heir process.</p>
<h3><a class="section" name="ets-phone-home">ETS Phone Home</a></h3>
<p>To start an ETS table, the function <code>ets:new/2</code> has to be called. The function takes the argument <var>Name</var> and then a list of options. In return, what you get is a unique identifier necessary to use the table, comparable to a Pid for processes. The options can be any of these:</p>
<dl>
<dt><code>Type = set | ordered_set | bag | duplicate_bag</code></dt>
<dd>Sets the type of table you want to have, as described in the previous section. The default value is <code>set</code>.</dd>
<dt><code>Access = private | protected | public</code></dt>
<dd>Lets us set the permissions on the table as described earlier. The default option is <code>protected</code>.</dd>
<dt><code>named_table</code></dt>
<dd>Funnily enough, if you call <code>ets:new(some_name, [])</code>, you'll be starting a protected set table, without a name. For the name to be used as a way to contact a table (and to be made unique), the option <code>named_table</code> has to be passed to the function. Otherwise, the name of the table will purely be for documentation purposes and will appear in functions such as <code>ets:i()</code>, which print information about all ETS tables in the system.</dd>
<dt><code>{keypos, Position}</code></dt>
<dd>As you may (and should) recall, ETS tables work by storing tuples. The <var>Position</var> parameter holds an integer from 1 to <var>N</var> telling which of each tuple's element shall act as the primary key of the database table. The default key position is set to 1. This means you have to be careful if you're using records as each record's first element is always going to be the record's name (remember what they look like in their tuple form). If you want to use any field as the key, use <code>{keypos, #RecordName.FieldName}</code>, as it will return the position of <var>FieldName</var> within the record's tuple representation.</dd>
<dt><code>{heir, Pid, Data} | {heir, none}</code></dt>
<dd>As mentioned in the previous section, ETS tables have a process that acts as their parent. If the process dies, the table disappears. If the data attached to a table is something you might want to keep alive, then defining a heir can be useful. If the process attached to a table dies, the heir receives a message saying <code>{'ETS-TRANSFER', TableId, FromPid, Data}'</code>, where <var>Data</var> is the element passed when the option was first defined. The table is automatically inherited by the heir. By default, no heir is defined. It is possible to define or change a heir at a later point in time by calling <code>ets:setopts(Table, {heir, Pid, Data})</code> or <code>ets:setopts(Table, {heir, none})</code>. If you simply want to give the table away, call <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#give_away/3">ets:give_away/3</a></code>.</dd>
<dt><code>{read_concurrency, true | false}</code></dt>
<dd>This is an option to optimize the table for read concurrency. Setting this option to true means that reads become way cheaper to do, but then make switching to writes a lot more expensive. Basically, this option should be enabled when you do a lot of reading and little writing and need an extra kick of performance. If you do some reading, some writing and they are interleaved, using this option might even hurt performance.</dd>
<dt><code>{write_concurrency, true | false}</code></dt>
<dd>Usually, writing to a table will lock the whole thing and nobody else can access it, either for reading or writing to it, until the write is done. Setting this option to 'true' lets both reads and writes be done concurrently, without affecting the <a class="external" href="http://en.wikipedia.org/wiki/ACID">ACID</a> properties of ETS. Doing this, however, will reduce the performance of sequential writes by a single process and also the capacity of concurrent reads. You can combine this option with 'read_concurrency' when both writes and reads come in large bursts.</dd>
<dt><code>compressed</code></dt>
<dd>Using this option will allow the data in the table to be compressed for most fields, but not the primary key. This comes at the cost of performance when it comes to inspecting entire elements of the table, as we will see with the next functions.</dd>
</dl>
<p>Then, the opposite of table creation is table destruction. For that one, all that's needed is to call <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#delete/1">ets:delete(Table)</a></code> where <var>Table</var> is either a table id or the name of a named table. If you want to delete a single entry from the table, a very similar function call is required: <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#delete/2">ets:delete(Table, Key)</a></code>.</p>
<p>Two more functions are required for very basic table handling: <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#insert/2">insert(Table, ObjectOrObjects)</a></code> and <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#insert/2">lookup(Table, Key)</a></code>. In the case of <code>insert/2</code>, <var>ObjectOrObjects</var> can be either a single tuple or a list of tuples to insert:</p>
<pre class="brush:eshell">
1> ets:new(ingredients, [set, named_table]).
ingredients
2> ets:insert(ingredients, {bacon, great}).
true
3> ets:lookup(ingredients, bacon).
[{bacon,great}]
4> ets:insert(ingredients, [{bacon, awesome}, {cabbage, alright}]).
true
5> ets:lookup(ingredients, bacon).
[{bacon,awesome}]
6> ets:lookup(ingredients, cabbage).
[{cabbage,alright}]
7> ets:delete(ingredients, cabbage).
true
8> ets:lookup(ingredients, cabbage).
[]
</pre>
<p>You'll notice that the <code>lookup</code> function returns a list. It will do that for all types of tables, even though set-based tables will always return at most one item. It just means that you should be able to use the <code>lookup</code> function in a generic way even when you use bags or duplicate bags (which may return many values for a single key).</p>
<p>Another thing that takes place in the snippet above is that inserting the same key twice overwrites it. This will always happen in sets and ordered sets, but not in bags or duplicate bags. If you want to avoid this, the function <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#insert_new/2">ets:insert_new/2</a></code> might be what you want, as it will only insert elements if they are not in the table already.</p>
<div class="note">
<p><strong>Note:</strong> The tuples do not have to all be of the same size in an ETS table, although it should be seen as good practice to do so. It is however necessary that the tuple is at least of the same size (or greater) than whatever the key position is.</p>
</div>
<p>There's another lookup function available if you need to only fetch part of a tuple. The function is <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#lookup_element/3">lookup_element(TableID, Key, PositionToReturn)</a></code> and it will either return the element that matched (or a list of them if there is more than one with a bag or duplicate bag table). If the element isn't there, the function errors out with <code>badarg</code> as a reason.</p>
<p>In any case let's try again with a bag:</p>
<pre class="brush:eshell">
9> TabId = ets:new(ingredients, [bag]).
16401
10> ets:insert(TabId, {bacon, delicious}).
true
11> ets:insert(TabId, {bacon, fat}).
true
12> ets:insert(TabId, {bacon, fat}).
true
13> ets:lookup(TabId, bacon).
[{bacon,delicious},{bacon,fat}]
</pre>
<p>As this is a bag, <code>{bacon, fat}</code> is only there once even though we inserted twice, but you can see that we can still have more than one 'bacon' entry. The other thing to look at here is that without passing in the <code>named_table</code> option, we have to use the <var>TableId</var> to use the table.</p>
<div class="note">
<p><strong>Note:</strong> if at any point while copying these examples your shell crashes, the tables are going to disappear as their parent process (the shell) has disappeared.</p>
</div>
<p>The last basic operations we can make use of will be about traversing tables one by one. If you're paying attention, <code>ordered_set</code> tables are the best fit for this:</p>
<pre class="brush:eshell">
14> ets:new(ingredients, [ordered_set, named_table]).
ingredients
15> ets:insert(ingredients, [{ketchup, "not much"}, {mustard, "a lot"}, {cheese, "yes", "goat"}, {patty, "moose"}, {onions, "a lot", "caramelized"}]).
true
16> Res1 = ets:first(ingredients).
cheese
17> Res2 = ets:next(ingredients, Res1).
ketchup
18> Res3 = ets:next(ingredients, Res2).
mustard
19> ets:last(ingredients).
patty
20> ets:prev(ingredients, ets:last(ingredients)).
onions
</pre>
<p>As you can see, elements are now in sorting order, and they can be accessed one after the other, both forwards and backwards. Oh yeah, and then we need to see what happens in boundary conditions:</p>
<pre class="brush:eshell">
21> ets:next(ingredients, ets:last(ingredients)).
'$end_of_table'
22> ets:prev(ingredients, ets:first(ingredients)).
'$end_of_table'
</pre>
<p>When you see atoms starting with a <code>$</code>, you should know that they're some special value (chosen by convention) by the OTP team telling you about something. Whenever you're trying to iterate outside of the table, you'll see these <code>$end_of_table</code> atoms.</p>
<p>So we know how to use ETS as a very basic key-value store. There are more advanced uses now, when we need more than just matching on keys.</p>
<h3><a class="section" name="meeting-your-match">Meeting Your Match</a></h3>
<img class="left" src="static/img/match.png" width="163" height="175" alt="a match falling in a puddle of gas" />
<p>There are plenty of functions to be used with ETS when it comes to finding records from more special mechanisms.</p>
<p>When we think about it, the best way to select things would be with pattern matching, right? The ideal scenario would be to be able to somehow store a pattern to match on within a variable (or as a data structure), pass that to some ETS function and let the said function do its thing.</p>
<p>This is called <em>higher order pattern matching</em> and sadly, it is not available in Erlang. In fact, very few languages have it. Instead, Erlang has some kind of sublanguage that Erlang programmers have agreed to that is being used to describe pattern matching as a bunch of regular data structures.</p>
<p>This notation is based on tuples to fit nicely with ETS. It simply lets you specify variables (regular and "don't care" variables), that can be mixed with the tuples to do pattern matching. Variables are written as <code>'$0'</code>, <code>'$1'</code>, <code>'$2'</code>, and so on (the number has no importance except in how you'll get the results) for regular variables. The "don't care" variable can be written as <code>'_'</code>. All these atoms can take form in a tuple like:</p>
<pre class="brush:erl">
{items, '$3', '$1', '_', '$3'}
</pre>
<p>This is roughly equivalent to saying <code>{items, C, A, _, C}</code> with regular pattern matching. As such, you can guess that the first element needs to be the atom <code>items</code>, that the second and fifth slots of the tuple need to be identical, etc.</p>
<p>To make use of this notation in a more practical setting, two functions are available: <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#match/2">match/2</a></code> and <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#match_object/2">match_object/2</a></code> (there are <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#match/3">match/3</a></code> and <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#match_object/3">match_object/3</a></code> available as well, but their use is outside the scope of this chapter and readers are encouraged to check the docs for details.) The former will return the variables of the pattern, while the later will return the whole entry that matched the pattern.
<pre class="brush:eshell">
1> ets:new(table, [named_table, bag]).
table
2> ets:insert(table, [{items, a, b, c, d}, {items, a, b, c, a}, {cat, brown, soft, loveable, selfish}, {friends, [jenn,jeff,etc]}, {items, 1, 2, 3, 1}]).
true
3> ets:match(table, {items, '$1', '$2', '_', '$1'}).
[[a,b],[1,2]]
4> ets:match(table, {items, '$114', '$212', '_', '$6'}).
[[d,a,b],[a,a,b],[1,1,2]]
5> ets:match_object(table, {items, '$1', '$2', '_', '$1'}).
[{items,a,b,c,a},{items,1,2,3,1}]
6> ets:delete(table).
true
</pre>
<p>The nice thing about <code>match/2-3</code> as a function is that it only returns what is strictly necessary to be returned. This is useful because as mentioned earlier, ETS tables are following the nothing-shared ideals. If you have very large records, only copying the necessary fields might be a good thing to do. Anyway, you'll also notice that while the numbers in variables have no explicit meaning, their order is important. In the final list of values returned, the value bound to <code>$114</code> will always come after the values bound to <code>$6</code> by the pattern. If nothing matches, empty lists are returned.</p>
<p>It is also possible you might want to delete entries based on such a pattern match. In these cases, the function <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#match_delete/2">ets:match_delete(Table, Pattern)</a></code> is what you want.</p>
<img class="right" src="static/img/claw-game.png" width="168" height="241" alt="A claw game thing" />
<p>This is all fine and lets us put any kind of value to do basic pattern matching in a weird way. It would be pretty neat if it were possible to have things like comparisons and ranges, explicit ways to format the output (maybe lists isn't what we want), and so on. Oh wait, you can!</p>
<h3><a class="section" name="you-have-been-selected">You Have Been Selected</a></h3>
<p>This is when we get something more equivalent to true function heads-level pattern matching, including very simple guards. If you've ever used a SQL database before, you might have seen ways to do queries where you compare elements that are greater, equal, smaller, etc. than other elements. This is the kind of good stuff we want here.</p>
<p>The people behind Erlang thus took the syntax we've seen for matches and augmented it in crazy ways until it was powerful enough. Sadly, they also made it unreadable. Here's what it can look like:</p>
<pre class="brush:erl">
[{{'$1','$2',<<1>>,'$3','$4'},
[{'andalso',{'>','$4',150},{'<','$4',500}},
{'orelse',{'==','$2',meat},{'==','$2',dairy}}],
['$1']},
{{'$1','$2',<<1>>,'$3','$4'},
[{'<','$3',4.0},{is_float,'$3'}],
['$1']}]
</pre>
<p>This is pretty ugly, not the data structure you would want your children to look like. Believe it or not, we'll learn how to write these things called <em>match specifications</em>. Not under that form, no, that would be a bit too hard for no reason. We'll still learn how to read them though! Here's what it looks like a bit from a higher level view:</p>
<pre class="brush:erl">
[{InitialPattern1, Guards1, ReturnedValue1},
{InitialPattern2, Guards2, ReturnedValue2}].
</pre>
<p>Or from a yet higher view:</p>
<pre class="brush:erl">
[Clause1,
Clause2]
</pre>
<p>So yeah, things like that represent, roughly, the pattern in a function head, then the guards, then the body of a function. The format is still limited to <code>'$N'</code> variables for the initial pattern, exactly the same to what it was for match functions. The new sections are the guard patterns, allowing to do something quite similar to regular guards. If we look closely to the guard <code>[{'<','$3',4.0},{is_float,'$3'}]</code>, we can see that it is quite similar to <code>... when Var < 4.0, is_float(Var) -> ...</code> as a guard.</p>
<p>The next guard, more complex this time, is:</p>
<pre class="brush:erl">
[{'andalso',{'>','$4',150},{'<','$4',500}},
{'orelse',{'==','$2',meat},{'==','$2',dairy}}]
</pre>
<p>Translating it gives us a guard that looks like <code>... when Var4 > 150 andalso Var4 < 500, Var2 == meat orelse Var2 == dairy -> ...</code>. Got it?</p>
<p>Each operator or guard function works with a prefix syntax, meaning that we use the order <code>{FunctionOrOperator, Arg1, ..., ArgN}</code>. So <code>is_list(X)</code> becomes <code>{is_list, '$1'}</code>, <code>X andalso Y</code> becomes <code>{'andalso', X, Y}</code>, and so on. Reserved keywords such as <code>andalso</code>, <code>orelse</code> and operators like <code>==</code> need to be put into atoms so the Erlang parser won't choke on them.</p>
<p>The last section of the pattern is what you want to return. Just put the variables you need in there. If you want to return the full input of the match specification, use the variable <code>'$_'</code> to do so. A <a class="docs" href="http://www.erlang.org/doc/apps/erts/match_spec.html">full specification of match specifications</a> can be found in the Erlang Documentation.</p>
<p>As I said before, we won't learn how to write patterns that way, there's something nicer to do it. ETS comes with what is called a <em>parse transform</em>. Parse transforms are an undocumented (thus not supported by the OTP team) way of accessing the Erlang parse tree halfway through the compiling phase. They let ballsy Erlang programmers transform the code in a module to a new alternative form. Parse transforms can be pretty much anything and change existing Erlang code to almost anything else, as long as it doesn't change the language's syntax or its tokens.</p>
<p>The parse transform coming with ETS needs to be enabled manually for each module that needs it. The way to do it in a module is as follows:</p>
<pre class="brush:erl">
-module(SomeModule).
-include_lib("stdlib/include/ms_transform.hrl").
...
some_function() ->
ets:fun2ms(fun(X) when X > 4 -> X end).
</pre>
<p>The line <code>-include_lib("stdlib/include/ms_transform.hrl").</code> contains some special code that will override the meaning of <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#fun2ms/1">ets:fun2ms(SomeLiteralFun)</a></code> whenever it's being used in a module. Rather than being a higher order function, the parse transform will analyse what is in the fun (the pattern, the guards and the return value), remove the function call to <code>ets:fun2ms/1</code>, and replace it all with an actual match specification. Weird, huh? The best is that because this happens at compile time, there is no overhead to using this way of doing things.</p>
<p>We can try it in the shell, without the include file this time:</p>
<pre class="brush:eshell">
1> ets:fun2ms(fun(X) -> X end).
[{'$1',[],['$1']}]
2> ets:fun2ms(fun({X,Y}) -> X+Y end).
[{{'$1','$2'},[],[{'+','$1','$2'}]}]
3> ets:fun2ms(fun({X,Y}) when X < Y -> X+Y end).
[{{'$1','$2'},[{'<','$1','$2'}],[{'+','$1','$2'}]}]
4> ets:fun2ms(fun({X,Y}) when X < Y, X rem 2 == 0 -> X+Y end).
[{{'$1','$2'},
[{'<','$1','$2'},{'==',{'rem','$1',2},0}],
[{'+','$1','$2'}]}]
5> ets:fun2ms(fun({X,Y}) when X < Y, X rem 2 == 0; Y == 0 -> X end).
[{{'$1','$2'},
[{'<','$1','$2'},{'==',{'rem','$1',2},0}],
['$1']},
{{'$1','$2'},[{'==','$2',0}],['$1']}]
</pre>
<p>All of these! They are written so easily now! And of course the funs are much simpler to read. How about that complex example from the beginning of the section? Here's what it would be like as a fun:</p>
<pre class="brush:eshell">
6> ets:fun2ms(fun({Food, Type, <<1>>, Price, Calories}) when Calories > 150 andalso Calories < 500, Type == meat orelse Type == dairy; Price < 4.00, is_float(Price) -> Food end).
[{{'$1','$2',<<1>>,'$3','$4'},
[{'andalso',{'>','$4',150},{'<','$4',500}},
{'orelse',{'==','$2',meat},{'==','$2',dairy}}],
['$1']},
{{'$1','$2',<<1>>,'$3','$4'},
[{'<','$3',4.0},{is_float,'$3'}],
['$1']}]
</pre>
<p>It doesn't exactly make sense at first glance, but at least it's much simpler to figure out what it means when variables can actually have a name rather than a number. One thing to be careful about is that not all funs are valid match specifications:</p>
<pre class="brush:eshell">
7> ets:fun2ms(fun(X) -> my_own_function(X) end).
Error: fun containing the local function call 'my_own_function/1' (called in body) cannot be translated into match_spec
{error,transform_error}
8> ets:fun2ms(fun(X,Y) -> ok end).
Error: ets:fun2ms requires fun with single variable or tuple parameter
{error,transform_error}
9> ets:fun2ms(fun([X,Y]) -> ok end).
Error: ets:fun2ms requires fun with single variable or tuple parameter
{error,transform_error}
10> ets:fun2ms(fun({<<X/binary>>}) -> ok end).
Error: fun head contains bit syntax matching of variable 'X', which cannot be translated into match_spec
{error,transform_error}
</pre>
<p>The function head needs to match on a single variable or a tuple, no non-guard functions can be called as part of the return value, assigning values from within binaries is not allowed, etc. Try stuff in the shell, see what you can do.</p>
<div class="note koolaid">
<p><strong>Don't Drink Too Much Kool-Aid:</strong><br />
A function like <code>ets:fun2ms</code> sounds totally awesome, right! You have to be careful with it. A problem with it is that if <code>ets:fun2ms</code> can handle dynamic funs when in the shell (you can pass funs around and it will just eat them up), this isn't possible in compiled modules.</p>
<p>This is due to the fact that Erlang has two kinds of funs: shell funs and module funs. Module funs are compiled down to some compact format understood by the virtual machine. They're opaque and cannot be inspected to know how they are on the inside.</p>
<p>On the other hand, shell funs are abstract terms not yet evaluated. They're made in a way that the shell can call the evaluator on them. The function <code>fun2ms</code> will thus have two versions of itself: one for when you're getting compiled code, and one from when you're in the shell.</p>
<p>This is fine, except that the funs aren't interchangeable with different types of funs. This means that you can't take a compiled fun and try to call <code>ets:fun2ms</code> on it while in the shell, and you can't take a dynamic fun and send it over to a compiled bit of code that's calling <code>fun2ms</code> in there. Too bad!</p>
</div>
<p>To make match specifications useful, it would make sense to use them. This can be done by using the functions <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#select/2">ets:select/2</a></code> to fetch results, <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#select_reverse/2">ets:select_reverse/2</a></code> to get results in reverse in <code>ordered_set</code> tables (for other types, it's the same as <code>select/2</code>), <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#select_count/2">ets:select_count/2</a></code> to know how many results match the specification, and <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#select_delete/2">ets:select_delete(Table, MatchSpec)</a></code> to delete records matching a match specification.</p>
<p>Let's try it, first defining a record for our tables, and then populating them with various goods:</p>
<pre class="brush:eshell">
11> rd(food, {name, calories, price, group}).
food
12> ets:new(food, [ordered_set, {keypos,#food.name}, named_table]).
food
13> ets:insert(food, [#food{name=salmon, calories=88, price=4.00, group=meat},
13> #food{name=cereals, calories=178, price=2.79, group=bread},
13> #food{name=milk, calories=150, price=3.23, group=dairy},
13> #food{name=cake, calories=650, price=7.21, group=delicious},
13> #food{name=bacon, calories=800, price=6.32, group=meat},
13> #food{name=sandwich, calories=550, price=5.78, group=whatever}]).
true
</pre>
<p>We can then try to select food items under a given number of calories:</p>
<pre class="brush:eshell">
14> ets:select(food, ets:fun2ms(fun(N = #food{calories=C}) when C < 600 -> N end)).
[#food{name = cereals,calories = 178,price = 2.79,group = bread},
#food{name = milk,calories = 150,price = 3.23,group = dairy},
#food{name = salmon,calories = 88,price = 4.0,group = meat},
#food{name = sandwich,calories = 550,price = 5.78,group = whatever}]
15> ets:select_reverse(food, ets:fun2ms(fun(N = #food{calories=C}) when C < 600 -> N end)).
[#food{name = sandwich,calories = 550,price = 5.78,group = whatever},
#food{name = salmon,calories = 88,price = 4.0,group = meat},
#food{name = milk,calories = 150,price = 3.23,group = dairy},
#food{name = cereals,calories = 178,price = 2.79,group = bread}]
</pre>
<p>Or maybe what we want is just delicious food:</p>
<pre class="brush:eshell">
16> ets:select(food, ets:fun2ms(fun(N = #food{group=delicious}) -> N end)).
[#food{name = cake,calories = 650,price = 7.21,group = delicious}]
</pre>
<p>Deleting has a little special twist to it. You have to return <code>true</code> in the pattern instead of any kind of value:</p>
<pre class="brush:eshell">
17> ets:select_delete(food, ets:fun2ms(fun(#food{price=P}) when P > 5 -> true end)).
3
18> ets:select_reverse(food, ets:fun2ms(fun(N = #food{calories=C}) when C < 600 -> N end)).
[#food{name = salmon,calories = 88,price = 4.0,group = meat},
#food{name = milk,calories = 150,price = 3.23,group = dairy},
#food{name = cereals,calories = 178,price = 2.79,group = bread}]
</pre>
<p>And as the last selection shows, items over $5.00 were removed from the table.</p>
<p>There are way more functions inside ETS, such as ways to convert the table to lists or files (<code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#tab2list/1">ets:tab2list/1</a></code>, <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#tab2file/1">ets:tab2file/1</a></code>, <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#file2tab/1">ets:file2tab/1</a></code>), get information about all tables (<code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#i/0">ets:i/0</a></code>, <code><a class="docs" href="http://erldocs.com/17.3/stdlib/ets.html#info/1">ets:info(Table)</a></code>). Heading over to the official documentation is strongly recommended in this case.</p>
<p>There's also a module called <code><a class="docs" href="http://www.erlang.org/doc/apps/tv/table_visualizer.html">tv</a></code> (Table Viewer) that can be used to visually manage the ETS tables on a given Erlang VM. Just call <code>tv:start()</code> and a window will be opened, showing you your tables.</p>
<h3><a class="section" name="dets">DETS</a></h3>
<p>DETS is a disk-based version of ETS, with a few key differences.</p>
<p>There are no longer <code>ordered_set</code> tables, there is a disk-size limit of 2GB for DETS files, and operations such as <code>prev/1</code> and <code>next/1</code> are not nearly as safe or fast.</p>
<p>Starting and stopping tables has changed a bit. A new database table is created by calling <code><a class="docs" href="http://erldocs.com/17.3/stdlib/dets.html#open_file/2">dets:open_file/2</a></code>, and is closed by doing <code><a class="docs" href="http://erldocs.com/17.3/stdlib/dets.html#close/1">dets:close/1</a></code>. The table can later be re-opened by calling <code><a class="docs" href="http://erldocs.com/17.3/stdlib/dets.html#open_file/1">dets:open_file/1</a></code>.</p>
<p>Otherwise, the API is nearly the same, and it is thus possible to have a very simple way to handle writing and looking for data inside of files.</p>
<div class="note koolaid">
<p><strong>Don't Drink Too Much Kool-Aid:</strong><br />
DETS risks being slow as it is a disk-only database. It is possible you might feel like coupling ETS and DETS tables into a somewhat efficient database that stores both in RAM and on disk.</p>
<p>If you feel like doing so, it might be a good idea to look into <em><a class="docs" href="http://www.erlang.org/doc/apps/mnesia/Mnesia_chap1.html">Mnesia</a></em> as a database, which does exactly the same thing, while adding support for sharding, transactions, and distribution.</p>
</div>
<h3><a class="section" name="a-little-less-conversation">A Little Less Conversation, A Little More Action Please</a></h3>
<img class="left" src="static/img/elvis.png" width="199" height="250" alt="A very bad drawing of Elvis" />
<p>Following this rather long section title (and long previous sections), we'll turn to the practical problem that brought us here in the first place: updating regis so that it uses ETS and gets rid of a few potential bottlenecks.</p>
<p>Before we get started, we have to think of how we're going to handle operations, and what is safe and unsafe. Things that should be safe are those that modify nothing and are limited to one query (not 3-4 over time). They can be done by anyone at any time. Everything else that has to do with writing to a table, updating records, deleting them, or reading in a way that requires consistency over many requests are to be considered unsafe.</p>
<p>Because ETS has no transactions whatsoever, all unsafe operations should be performed by the process that owns the table. The safe ones should be allowed to be public, done outside of the owner process. We'll keep this in mind as we update regis.</p>
<p>The first step will be to make a copy of <code>regis-1.0.0</code> as <code><a class="source" href="static/erlang/regis-1.1.0.zip">regis-1.1.0</a></code>. I'm bumping the second number and not the third one here because our changes shouldn't break the existing interface, are technically not bugfixes, and so we're only going to consider it to be a feature upgrade.</p>
<p>In that new directory, we'll need to operate only on <a class="source" href="static/erlang/processquest/apps/regis-1.0.0/src/regis_server.erl">regis_server.erl</a> at first: we'll keep the interface intact so all the rest, in terms of structure, should not need to change too much:</p>
<pre class="brush:erl">
%%% The core of the app: the server in charge of tracking processes.
-module(regis_server).
-behaviour(gen_server).
-include_lib("stdlib/include/ms_transform.hrl").
-export([start_link/0, stop/0, register/2, unregister/1, whereis/1,
get_names/0]).
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
code_change/3, terminate/2]).
%%%%%%%%%%%%%%%%%
%%% INTERFACE %%%
%%%%%%%%%%%%%%%%%
start_link() ->
gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
stop() ->
gen_server:call(?MODULE, stop).
%% Give a name to a process
register(Name, Pid) when is_pid(Pid) ->
gen_server:call(?MODULE, {register, Name, Pid}).
%% Remove the name from a process
unregister(Name) ->
gen_server:call(?MODULE, {unregister, Name}).
%% Find the pid associated with a process
whereis(Name) -> ok.
%% Find all the names currently registered.
get_names() -> ok.
</pre>
<p>For the public interface, only <code>whereis/1</code> and <code>get_names/0</code> will change and be rewritten. That's because, as mentioned earlier, those are single-read safe operations. The rest will require to be serialized in the process owning the table. That's it for the API so far. Let's head for the inside of the module.</p>
<p>We're going to use an ETS table to store stuff, so it makes sense to put that table into the <code>init</code> function. Moreover, because our <code>whereis/1</code> and <code>get_names/0</code> functions will give public access to the table (for speed reasons), naming the table will be necessary for it to be accessible to the outside world. By naming the table, much like what happens when we name processes, we can hardcode the name in the functions, compared to needing to pass an id around.</p>
<pre class="brush:erl">
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% GEN_SERVER CALLBACKS %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%
init([]) ->
?MODULE = ets:new(?MODULE, [set, named_table, protected]),
{ok, ?MODULE}.
</pre>
<p>The next function will be <code>handle_call/3</code>, handling the message <code>{register, Name, Pid}</code> as defined in <code>register/2</code>.</p>
<pre class="brush:erl">
handle_call({register, Name, Pid}, _From, Tid) ->
%% Neither the name or the pid can already be in the table
%% so we match for both of them in a table-long scan using this.
MatchSpec = ets:fun2ms(fun({N,P,_Ref}) when N==Name; P==Pid -> {N,P} end),
case ets:select(Tid, MatchSpec) of
[] -> % free to insert
Ref = erlang:monitor(process, Pid),
ets:insert(Tid, {Name, Pid, Ref}),
{reply, ok, Tid};
[{Name,_}|_] -> % maybe more than one result, but name matches
{reply, {error, name_taken}, Tid};
[{_,Pid}|_] -> % maybe more than one result, but Pid matches
{reply, {error, already_named}, Tid}
end;
</pre>
<p>This is by far the most complex function in the module. There are three basic rules to respect:</p>
<ol>
<li>A process cannot be registered twice</li>
<li>A name cannot be taken twice</li>
<li>A process can be registered if it doesn't break rules 1 and 2</li>
</ol>
<p>This is what the code above does. The match specification derived from <code>fun({N,P,_Ref}) when N==Name; P==Pid -> {N,P} end</code> will look through the whole table for entries that match either the name or the pid that we're trying to register. If there's a match, we return both the name and pids that were found. This may be weird, but it makes sense to want both when we look at the patterns for the <code>case ... of</code> after that.</p>
<p>The first pattern means nothing was found, and so insertions are good. We monitor the process we have registered (to unregister it in case of failure) and then add the entry to the table. In case the name we are trying to register was already in the table, the pattern <code>[{Name,_}|_]</code> will take care of it. If it was the Pid that matched, then the pattern <code>[{_,Pid}|_]</code> will take care of it. That's why both values are returned: it makes it simpler to match on the whole tuple later on, not caring which of them matched in the match spec. Why is the pattern of the form <code>[Tuple|_]</code> rather than just <code>[Tuple]</code>? The explanation is simple enough. If we're traversing the table looking for either Pids or names that are similar, it is possible the list return will be <code>[{NameYouWant, SomePid},{SomeName,PidYouWant}]</code>. If that happens, then a pattern match of the form <code>[Tuple]</code> will crash the process in charge of the table and ruin your day.</p>
<p>Oh yeah, don't forget to add the <code>-include_lib("stdlib/include/ms_transform.hrl").</code> in the module, otherwise, <code>fun2ms</code> will die with a weird error message:</p>
<pre class="brush:erl">
** {badarg,{ets,fun2ms,
[function,called,with,real,'fun',should,be,transformed,with,
parse_transform,'or',called,with,a,'fun',generated,in,the,
shell]}}
</pre>
<p>That's what happens when you forget the include file. Consider yourself warned. Look before crossing the streets, don't cross the streams, and don't forget your include files.</p>
<p>The next bit to do is when we ask to manually unregister a process:</p>
<pre class="brush:erl">
handle_call({unregister, Name}, _From, Tid) ->
case ets:lookup(Tid, Name) of
[{Name,_Pid,Ref}] ->
erlang:demonitor(Ref, [flush]),
ets:delete(Tid, Name),
{reply, ok, Tid};
[] ->
{reply, ok, Tid}
end;
</pre>
<p>If you looked at the old version of the code, this is still similar. The idea is simple: find the monitor reference (with a lookup on the name), cancel the monitor, then delete the entry and keep going. If the entry's not there, we pretend we deleted it anyway and everybody's going to be happy. Oh, how dishonest we are.</p>
<p>Next bit is about stopping the server:</p>
<pre class="brush:erl">
handle_call(stop, _From, Tid) ->
%% For the sake of being synchronous and because emptying ETS
%% tables might take a bit longer than dropping data structures
%% held in memory, dropping the table here will be safer for
%% tricky race conditions, especially in tests where we start/stop
%% servers a lot. In regular code, this doesn't matter.
ets:delete(Tid),
{stop, normal, ok, Tid};
handle_call(_Event, _From, State) ->
{noreply, State}.
</pre>
<p>As the comments in the code say, we could have been fine just ignoring the table and letting it be garbage collected. However, because the test suite we have written for last chapter starts and stops the server all the time, delays can be a bit dangerous. See, this is what the timeline of the process looks like with the old one:</p>
<img class="center explanation" src="static/img/shell-server-1.png" width="376" height="209" alt="A sequential graph showing what happens between the shell and the regis server. The shell sends 'stop' to the server, with the server replying to the client, then dying. Then the shell starts a new server replacing the old one." />
<p>And here's what sometimes happens with the new one:</p>
<img class="center explanation" src="static/img/shell-server-2.png" width="423" height="225" alt="Similar to the previous graph, except that this time dying happens later due to ETS tables getting cleaned. Now the process dies at the same time as the shell tries to create a new server and there's a conflict" />
<p>By using the scheme above, we're making it a lot more unlikely for errors to happen by doing more work in the synchronous part of the code:</p>
<img class="center explanation" src="static/img/shell-server-3.png" width="407" height="213" alt="Like the previous graph, except the table is removed before the server sends the reply to the shell. This leaves less time before the race condition between the shell and the server happen and there is no conflict." />
<p>If you don't plan on running the test suite very often, you can just ignore the whole thing. I've decided to show it to avoid nasty surprises, although in a non-test system, this kind of edge case should very rarely occur.</p>
<p>Here's the rest of the OTP callbacks:</p>
<pre class="brush:erl">
handle_cast(_Event, State) ->
{noreply, State}.
handle_info({'DOWN', Ref, process, _Pid, _Reason}, Tid) ->
ets:match_delete(Tid, {'_', '_', Ref}),
{noreply, Tid};
handle_info(_Event, State) ->
{noreply, State}.
code_change(_OldVsn, State, _Extra) ->
{ok, State}.
terminate(_Reason, _State) ->
ok.
</pre>
<p>We don't care about any of them, except receiving a <code>DOWN</code> message, meaning one of the processes we were monitoring died. When that happens, we delete the entry based on the reference we have in the message, then move on.</p>
<p>You'll notice that <code>code_change/3</code> could actually work as a transition between the old <code>regis_server</code> and the new <code>regis_server</code>. Implementing this function is left as an exercise to the reader. I always hate books that give exercises to the reader without solutions, so here's at least a little pointer so I'm not just being a jerk like all the other writers out there: you have to take either of the two <code>gb_trees</code> from the older version, and use <code><a class="docs" href="http://erldocs.com/17.3/stdlib/gb_trees.html#map/2">gb_trees:map/2</a></code> or the <code>gb_trees</code> iterators to populate a new table before moving on. The downgrade function can be written by doing the opposite.</p>
<p>All that's left to do is fix the two public functions we have left unimplemented before. Of course, we could write a <code>%% TODO</code> comment, call it a day and go drink until we forget we're programmers, but that would be a tiny bit irresponsible. Let's fix stuff:</p>
<pre class="brush:erl">
%% Find the pid associated with a process
whereis(Name) ->
case ets:lookup(?MODULE, Name) of
[{Name, Pid, _Ref}] -> Pid;
[] -> undefined
end.
</pre>
<p>This one looks for a name, returns the <var>Pid</var> or <code>undefined</code> depending on whether the entry has been found or not. Note that we do use <code>regis_server</code> (<code>?MODULE</code>) as the table name there; that's why we made it protected and named in the first place. For the next one:</p>
<pre class="brush:erl">
%% Find all the names currently registered.
get_names() ->
MatchSpec = ets:fun2ms(fun({Name, _, _}) -> Name end),
ets:select(?MODULE, MatchSpec).
</pre>
<p>We use <code>fun2ms</code> again to match on the <var>Name</var> and keep only that. Selecting from the table will return a list and do what we need.</p>
<p>That's it! You can run the test suite in <code>test/</code> to make things go:</p>
<pre class="brush:eshell">
$ erl -make
...
Recompile: src/regis_server
$ erl -pa ebin
...
1> eunit:test(regis_server).
All 13 tests passed.
ok
</pre>
<p>Hell yes. I think we can consider ourselves pretty good at ETS'ing now.</p>
<p>You know what would be really nice to do next? Actually exploring the distributed aspects of Erlang. Maybe we can bend our minds in a few more twisted ways before being done with the Erlang beast. Let's see.</p>
<!-- image idea: it's man - monty python, ETS -->
<ul class="navigation">
<li><a href="eunit.html" title="Previous chapter">< Previous</a></li>
<li><a href="contents.html" title="Index">Index</a></li>
<li><a href="distribunomicon.html" title="Next chapter">Next ></a></li>
</ul>
</div><!-- content -->
<div id="footer">
<a href="http://creativecommons.org/licenses/by-nc-nd/3.0/" title="Creative Commons License Details"><img src="static/img/cc.png" width="88" height="31" alt="Creative Commons Attribution Non-Commercial No Derivative License" /></a>
<p>Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution Non-Commercial No Derivative License</p>
</div> <!-- footer -->
</div> <!-- wrapper -->
<div id="grass" />
<script type="text/javascript" src="static/js/shCore.js"></script>
<script type="text/javascript" src="static/js/shBrushErlang2.js%3F11"></script>
<script type="text/javascript">
SyntaxHighlighter.defaults.gutter = false;
SyntaxHighlighter.all();
</script>
</body>
</html>