-
Notifications
You must be signed in to change notification settings - Fork 19
/
Copy pathREADME
573 lines (421 loc) · 21.9 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
# GPR Manual
###### \[in package MGL-GPR\]
## MGL-GPR ASDF System
- Version: 0.0.1
- Description: MGL-GPR is a library of evolutionary algorithms such
as Genetic Programming (evolving typed expressions from a set of
operators and constants) and Differential Evolution.
- Licence: MIT, see COPYING.
- Author: Gábor Melis <[email protected]>
- Mailto: [[email protected]](mailto:[email protected])
- Homepage: [http://melisgl.github.io/mgl-gpr](http://melisgl.github.io/mgl-gpr)
- Bug tracker: [https://github.com/melisgl/mgl-gpr/issues](https://github.com/melisgl/mgl-gpr/issues)
- Source control: [GIT](https://github.com/melisgl/mgl-gpr.git)
## Links
Here is the [official
repository](https://github.com/melisgl/mgl-gpr) and the [HTML
documentation](http://melisgl.github.io/mgl-gpr/gpr-manual.html)
for the latest version.
## Background
Evolutionary algorithms are optimization tools that assume little
of the task at hand. Often they are population based, that is, there
is a set of individuals that each represent a candidate solution.
Individuals are combined and changed with crossover and mutationlike
operators to produce the next generation. Individuals with lower
fitness have a lower probability to survive than those with higher
fitness. In this way, the fitness function defines the optimization
task.
Typically, EAs are quick to get up and running, can produce
reasonable results across a wild variety of domains, but they may
need a bit of fiddling to perform well and domain specific
approaches will almost always have better results. All in all, EA
can be very useful to cut down on the tedium of human trial and
error. However, they have serious problems scaling to higher number
of variables.
This library grew from the Genetic Programming implementation I
wrote while working for Ravenpack who agreed to release it under an
MIT licence. Several years later I cleaned it up, and documented it.
Enjoy.
## Evolutionary Algorithms
Evolutionary algorithm is an umbrella term. In this section we
first discuss the concepts common to conrete evolutionary algorithms
@GPR-GP and @GPR-DE.
- [class] EVOLUTIONARY-ALGORITHM
The EVOLUTIONARY-ALGORITHM is an abstract base
class for generational, population based optimization algorithms.
### Populations
The currenly implemented EAs are generational. That is, they
maintain a population of candidate solutions (also known as
individuals) which they replace with the next generation of
individuals.
- [accessor] POPULATION-SIZE EVOLUTIONARY-ALGORITHM (:POPULATION-SIZE)
The number of individuals in a generation. This is
a very important parameter. Too low and there won't be enough
diversity in the population, too high and convergence will be
slow.
- [accessor] POPULATION EVOLUTIONARY-ALGORITHM (= (MAKE-ARRAY 0 :ADJUSTABLE 0 :FILL-POINTER T))
An adjustable array with a fill-pointer that holds
the individuals that make up the population.
- [reader] GENERATION-COUNTER EVOLUTIONARY-ALGORITHM (= 0)
A counter that starts from 0 and is incremented by
ADVANCE. All accessors of EVOLUTIONARY-ALGORITHM are allowed to be
specialized on a subclass which allows them to be functions of
GENERATION-COUNTER.
- [function] ADD-INDIVIDUAL EA INDIVIDUAL
Adds INDIVIDUAL to POPULATION of EA. Usually called when
initializing the EA.
### Evaluation
- [reader] EVALUATOR EVOLUTIONARY-ALGORITHM (:EVALUATOR)
A function of two arguments: the
EVOLUTIONARY-ALGORITHM object and an individual. It must return
the fitness of the individual. For @GPR-GP, the evaluator often
simply calls EVAL, or COMPILE + FUNCALL, and compares the result
to some gold standard. It is also typical to slightly penalize
solutions with too many nodes to control complexity and evaluation
cost (see COUNT-NODES). For @GPR-DE, individuals are
conceptually (and often implemented as) vectors of numbers so the
fitness function may include an L1 or L2 penalty term.
Alternatively, one can specify MASS-EVALUATOR instead.
- [reader] MASS-EVALUATOR EVOLUTIONARY-ALGORITHM (:MASS-EVALUATOR = NIL)
NIL or a function of three arguments: the
EVOLUTIONARY-ALGORITHM object, the population vector and the
fitness vector into which the fitnesses of the individuals in the
population vector shall be written. By specifying MASS-EVALUATOR
instead of an EVALUATOR, one can, for example, distribute costly
evaluations over multiple threads. MASS-EVALUATOR has precedence
over EVALUATOR.
- [reader] FITNESS-KEY EVOLUTIONARY-ALGORITHM (:FITNESS-KEY = #'IDENTITY)
A function that returns a real number for an
object returned by EVALUATOR. It is called when two fitness are to
be compared. The default value is #'IDENTITY which is sufficient
when EVALUATOR returns real numbers. However, sometimes the
evaluator returns more information about the solution (such as
fitness in various situations) and FITNESS-KEY key be used to
select the fitness value.
### Training
Training is easy: one creates an object of a subclass of
EVOLUTIONARY-ALGORITHM such as GENETIC-PROGRAMMING or
DIFFERENTIAL-EVOLUTION, creates the initial population by adding
individuals to it (see ADD-INDIVIDUAL) and calls ADVANCE in a loop
to move on to the next generation until a certain number of
generations or until the FITTEST individual is good enough.
- [generic-function] ADVANCE EA
Create the next generation and place it in
POPULATION of EA.
- [reader] FITTEST EVOLUTIONARY-ALGORITHM (= NIL)
The fittest individual ever to be seen and its
fittness as a cons cell.
- [accessor] FITTEST-CHANGED-FN EVOLUTIONARY-ALGORITHM (:FITTEST-CHANGED-FN = NIL)
If non-NIL, a function that's called when FITTEST
is updated with three arguments: the EVOLUTIONARY-ALGORITHM
object, the fittest individual and its fitness. Useful for
tracking progress.
## Genetic Programming
### Background
What is Genetic Programming? This is what Wikipedia has to say:
In artificial intelligence, genetic programming (GP) is an
evolutionary algorithm-based methodology inspired by biological
evolution to find computer programs that perform a user-defined
task. Essentially GP is a set of instructions and a fitness
function to measure how well a computer has performed a task. It
is a specialization of genetic algorithms (GA) where each
individual is a computer program. It is a machine learning
technique used to optimize a population of computer programs
according to a fitness landscape determined by a program's ability
to perform a given computational task.
Lisp has a long history of Genetic Programming because GP involves
manipulation of expressions which is of course particularly easy
with sexps.
### Tutorial
GPR works with typed expressions. Mutation and crossover never
produce expressions that fail with a type error. Let's define a
couple of operators that work with real numbers and also return a
real:
(defparameter *operators* (list (operator (+ real real) real)
(operator (- real real) real)
(operator (* real real) real)
(operator (sin real) real)))
One cannot build an expression out of these operators because they
all have at least one argument. Let's define some literal classes
too. The first is produces random numbers, the second always returns
the symbol `*X*`:
(defparameter *literals* (list (literal (real)
(- (random 32.0) 16.0))
(literal (real)
'*x*)))
Armed with `*OPERATORS*` and `*LITERALS*`, one can already build
random expressions with RANDOM-EXPRESSION, but we also need to
define how good a certain expression is which is called *fitness*.
In this example, we are going to perform symbolic regression, that
is, try to find an expression that approximates some target
expression well:
(defparameter *target-expr* '(+ 7 (sin (expt (* *x* 2 pi) 2))))
Think of `*TARGET-EXPR*` as a function of `*X*`. The evaluator
function will bind the special `*X*` to the input and simply EVAL
the expression to be evaluated.
(defvar *x*)
The evaluator function calculates the average difference between
`EXPR` and `TARGET-EXPR`, penalizes large expressions and returns
the fitness of `EXPR`. Expressions with higher fitness have higher
chance to produce offsprings.
(defun evaluate (gp expr target-expr)
(declare (ignore gp))
(/ 1
(1+
;; Calculate average difference from target.
(/ (loop for x from 0d0 to 10d0 by 0.5d0
summing (let ((*x* x))
(abs (- (eval expr)
(eval target-expr)))))
21))
;; Penalize large expressions.
(let ((min-penalized-size 40)
(size (count-nodes expr)))
(if (< size min-penalized-size)
1
(exp (min 120 (/ (- size min-penalized-size) 10d0)))))))
When an expression is to undergo mutation, a randomizer function is
called. Here we change literal numbers slightly, or produce an
entirely new random expression that will be substituted for `EXPR`:
(defun randomize (gp type expr)
(if (and (numberp expr)
(< (random 1.0) 0.5))
(+ expr (random 1.0) -0.5)
(random-gp-expression gp (lambda (level)
(<= 3 level))
:type type)))
That's about it. Now we create a GP instance hooking everything up,
set up the initial population and just call ADVANCE a couple of
times to create new generations of expressions.
(defun run ()
(let ((*print-length* nil)
(*print-level* nil)
(gp (make-instance
'gp
:toplevel-type 'real
:operators *operators*
:literals *literals*
:population-size 1000
:copy-chance 0.0
:mutation-chance 0.5
:evaluator (lambda (gp expr)
(evaluate gp expr *target-expr*))
:randomizer 'randomize
:selector (lambda (gp fitnesses)
(declare (ignore gp))
(hold-tournament fitnesses :n-contestants 2))
:fittest-changed-fn
(lambda (gp fittest fitness)
(format t "Best fitness until generation ~S: ~S for~% ~S~%"
(generation-counter gp) fitness fittest)))))
(loop repeat (population-size gp) do
(add-individual gp (random-gp-expression gp (lambda (level)
(<= 5 level)))))
(loop repeat 1000 do
(when (zerop (mod (generation-counter gp) 20))
(format t "Generation ~S~%" (generation-counter gp)))
(advance gp))
(destructuring-bind (fittest . fitness) (fittest gp)
(format t "Best fitness: ~S for~% ~S~%" fitness fittest))))
Note that this example can be found in
example/symbolic-regression.lisp.
### Expressions
Genetic programming works with a population of individuals. The
individuals are sexps that may be evaluated directly by EVAL or by
other means. The internal nodes and the leafs of the sexp as a tree
represent the application of operators and literal objects,
respectively. Note that currently there is no way to represent
literal lists.
- [class] EXPRESSION-CLASS
An object of EXPRESSION-CLASS defines two things:
how to build a random expression that belongs to that expression
class and what lisp type those expressions evaluate to.
- [reader] RESULT-TYPE EXPRESSION-CLASS (:RESULT-TYPE)
Expressions belonging to this expression class
must evaluate to a value of this lisp type.
- [reader] WEIGHT EXPRESSION-CLASS (:WEIGHT = 1)
The probability of an expression class to be
selected from a set of candidates is proportional to its
weight.
- [class] OPERATOR EXPRESSION-CLASS
Defines how the symbol NAME in the function
position of a list can be combined arguments: how many and of what
types. The following defines `+` as an operator that
adds two `FLOAT`s:
(make-instance 'operator
:name '+
:result-type float
:argument-types '(float float))
See the macro [OPERATOR][macro] for a shorthand for the above.
Currently no lambda list keywords are supported and there is no way
to define how an expression with a particular operator is to be
built. See RANDOM-EXPRESSION.
- [reader] NAME OPERATOR (:NAME)
A symbol that's the name of the operator.
- [reader] ARGUMENT-TYPES OPERATOR (:ARGUMENT-TYPES)
A list of lisp types. One for each argument of
this operator.
- [macro] OPERATOR (NAME &REST ARG-TYPES) RESULT-TYPE &KEY (WEIGHT 1)
Syntactic sugar for instantiating operators. The example given for
[OPERATOR][class] could be written as:
(operator (+ float float) float)
See [WEIGHT][(reader expression-class)] for what WEIGHT means.
- [class] LITERAL EXPRESSION-CLASS
This is slightly misnamed. An object belonging to
the LITERAL class is not a literal itself, it's a factory for
literals via its BUILDER function. For example, the following
literal builds bytes:
(make-instance 'literal
:result-type '(unsigned-byte 8)
:builder (lambda () (random 256)))
In practice, one rarely writes it out like that, because the LITERAL
macro provides a more convenient shorthand.
- [reader] BUILDER LITERAL (:BUILDER)
A function of no arguments that returns a random
literal that belongs to its literal class.
- [macro] LITERAL (RESULT-TYPE &KEY (WEIGHT 1)) &BODY BODY
Syntactic sugar for defining literal classes. The example given for
[LITERAL][class] could be written as:
(literal ((unsigned-byte 8))
(random 256))
See [WEIGHT][(reader expression-class)] for what WEIGHT means.
- [function] RANDOM-EXPRESSION OPERATORS LITERALS TYPE TERMINATE-FN
Return an expression built from OPERATORS and LITERALS that
evaluates to values of TYPE. TERMINATE-FN is a function of one
argument: the level of the root of the subexpression to be generated
in the context of the entire expression. If it returns T then a
[LITERAL][class] will be inserted (by calling its BUILDER function),
else an [OPERATOR][class] with all its necessary arguments.
The algorithm recursively generates the expression starting from
level 0 where only operators and literals with a RESULT-TYPE that's
a subtype of TYPE are considered and one is selected with the
unnormalized probability given by its WEIGHT. On lower levels, the
ARGUMENT-TYPES specification of operators is similarly satisfied and
the resulting expression should evaluate without without a type
error.
The building of expressions cannot backtrack. If it finds itself in
a situation where no literals or operators of the right type are
available then it will fail with an error.
### Basics
To start the evolutionary process one creates a GP object,
adds to it the individuals (see ADD-INDIVIDUAL) that make up the
initial population and calls ADVANCE in a loop to move on to the
next generation.
- [class] GENETIC-PROGRAMMING EVOLUTIONARY-ALGORITHM
The GENETIC-PROGRAMMING class defines the search
space, how mutation and recombination occur, and hold various
parameters of the evolutionary process and the individuals
themselves.
- [function] RANDOM-GP-EXPRESSION GP TERMINATE-FN &KEY (TYPE (TOPLEVEL-TYPE GP))
Creating the initial population by hand is tedious. This
convenience function calls RANDOM-EXPRESSION to create a random
individual that produces GP's TOPLEVEL-TYPE. By passing in another
TYPE one can create expressions that fit somewhere else in a larger
expression which is useful in a RANDOMIZER function.
### Search Space
The search space of the GP is defined by the available operators,
literals and the type of the final result produced. The evaluator
function acts as the guiding light.
- [reader] OPERATORS GENETIC-PROGRAMMING (:OPERATORS)
The set of [OPERATOR][class]s from which (together
with [LITERAL][class]s) individuals are built.
- [reader] LITERALS GENETIC-PROGRAMMING (:LITERALS)
The set of [LITERAL][class]s from which (together
with [OPERATOR][class]s) individuals are built.
- [reader] TOPLEVEL-TYPE GENETIC-PROGRAMMING (:TOPLEVEL-TYPE = T)
The type of the results produced by individuals.
If the problem is to find the minimum a 1d real function then this
may be the symbol REAL. If the problem is to find the shortest
route, then this may be a vector. It all depends on the
representation of the problem, the operators and the literals.
- [function] COUNT-NODES TREE &KEY INTERNAL
Count the nodes in the sexp TREE. If INTERNAL then don't count the
leaves.
### Reproduction
The RANDOMIZER and SELECTOR functions define how mutation and
recombination occur.
- [reader] RANDOMIZER GENETIC-PROGRAMMING (:RANDOMIZER)
Used for mutations, this is a function of three
arguments: the GP object, the type the expression must produce and
current expression to be replaced with the returned value. It is
called with subexpressions of individuals.
- [reader] SELECTOR GENETIC-PROGRAMMING (:SELECTOR)
A function of two arguments: the GP object and a
vector of fitnesses. It must return the and index into the fitness
vector. The individual whose fitness was thus selected will be
selected for reproduction be it copying, mutation or crossover.
Typically, this defers to HOLD-TOURNAMENT.
- [function] HOLD-TOURNAMENT FITNESSES &KEY SELECT-CONTESTANT-FN N-CONTESTANTS KEY
Select N-CONTESTANTS (all different) for the tournament randomly,
represented by indices into FITNESSES and return the one with the
highest fitness. If SELECT-CONTESTANT-FN is NIL then contestants are
selected randomly with uniform probability. If SELECT-CONTESTANT-FN
is a function, then it's called with FITNESSES to return an
index (that may or may not be already selected for the tournament).
Specifying SELECT-CONTESTANT-FN allows one to conduct 'local'
tournaments biased towards a particular region of the index range.
KEY is NIL or a function that select the real fitness value from
elements of FITNESSES.
### Environment
The new generation is created by applying a reproduction operator
until POPULATION-SIZE is reached in the new generation. At each
step, a reproduction operator is randomly chosen.
- [accessor] COPY-CHANCE GENETIC-PROGRAMMING (:COPY-CHANCE = 0)
The probability of the copying reproduction
operator being chosen. Copying simply creates an exact copy of a
single individual.
- [accessor] MUTATION-CHANCE GENETIC-PROGRAMMING (:MUTATION-CHANCE = 0)
The probability of the mutation reproduction
operator being chosen. Mutation creates a randomly altered copy of
an individual. See RANDOMIZER.
If neither copying nor mutation were chosen, then a crossover will
take place.
- [accessor] KEEP-FITTEST-P GENETIC-PROGRAMMING (:KEEP-FITTEST-P = T)
If true, then the fittest individual is always
copied without mutation to the next generation. Of course, it may
also have other offsprings.
## Differential Evolution
The concepts in this section are covered by [Differential
Evolution: A Survey of the State-of-the-Art][1].
[1]: http://107.167.189.191/~piak/teaching/ec/ec2012/das-de-sota-2011.pdf
- [class] DIFFERENTIAL-EVOLUTION EVOLUTIONARY-ALGORITHM
Differential evolution (DE) is an evolutionary
algorithm in which individuals are represented by vectors of
numbers. New individuals are created by taking linear combinations
or by randomly swapping some of these numbers between two
individuals.
- [reader] MAP-WEIGHTS-INTO-FN DIFFERENTIAL-EVOLUTION (:MAP-WEIGHTS-INTO-FN = #'MAP-INTO)
The vector of numbers (the 'weights') are most
often stored in some kind of array. All individuals must have the
same number of weights, but the actual representation can be
anything as long as the function in this slot mimics the semantics
of MAP-INTO that's the default.
- [reader] CREATE-INDIVIDUAL-FN DIFFERENTIAL-EVOLUTION (:CREATE-INDIVIDUAL-FN)
Holds a function of one argument, the DE, that
returns a new individual that needs not be initialized in any way.
Typically this just calls MAKE-ARRAY.
- [reader] MUTATE-FN DIFFERENTIAL-EVOLUTION (:MUTATE-FN)
One of the supplied mutation functions:
MUTATE/RAND/1 MUTATE/BEST/1 MUTATE/CURRENT-TO-BEST/2.
- [reader] CROSSOVER-FN DIFFERENTIAL-EVOLUTION (:CROSSOVER-FN = #'CROSSOVER/BINARY)
A function of three arguments, the DE and two
individuals, that destructively modifies the second individual by
using some parts of the first one. Currently, the implemented
crossover function is CROSSOVER/BINARY.
- [function] MUTATE/RAND/1 DE CURRENT BEST POPULATION NURSERY &KEY (F 0.5)
- [function] MUTATE/BEST/1 DE CURRENT BEST POPULATION NURSERY &KEY (F 0.5)
- [function] MUTATE/CURRENT-TO-BEST/2 DE CURRENT BEST POPULATION NURSERY &KEY (F 0.5)
- [function] CROSSOVER/BINARY DE INDIVIDUAL-1 INDIVIDUAL-2 &KEY (CROSSOVER-RATE 0.5)
Destructively modify INDIVIDUAL-2 by replacement each element with
a probability of 1 - CROSSOVER-RATE with the corresponding element
in INDIVIDUAL-1. At least one, element is changed. Return
INDIVIDUAL-2.
- [function] SELECT-DISTINCT-RANDOM-NUMBERS TABOOS N LIMIT
### SANSDE
- [class] SANSDE DIFFERENTIAL-EVOLUTION
SaNSDE is a special DE that dynamically adjust the
crossover and mutation are performed. The only parameters are the
generic EA ones: POPULATION-SIZE, EVALUATOR, etc. One also has to
specify MAP-WEIGHTS-INTO-FN and CREATE-INDIVIDUAL-FN.
* * *
###### \[generated by [MGL-PAX](https://github.com/melisgl/mgl-pax)\]