-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCh10 Exercises.py
171 lines (146 loc) · 6.88 KB
/
Ch10 Exercises.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
#Jordan Breffle
#https://github.com/jtbreffle
#Theoretical Neuroscience Exercises, Dayan and Abbott, 2001
#Chapter 10: Representational learning
import scipy.io #use scipy.io.loadmat() for MAT files
import numpy as np #if loadmat() fails, then use numpy.loadtxt(), becuase it is a plain text file
# ------------------Helper Functions--------------------------------------
#----------------------------Exercises------------------------------------
def Ex1():
'''
Data file c10p1.mat contains 100 data points drawn from the same
two-dimensional distribution as those in figure 10.1. Fit a mixture
of two circular Gaussian distributions to these data using EM, as in
equation 10.4. Do not allow the variance of either of the Gaussians
to become smaller than a minimal value of 0.0001.
'''
def Ex2():
'''
Explore what happens to the fit of the mixture of Gaussians model
from exercise 1 as the number of data points from each Gaussian
is reduced and the number of potential Gaussians is increased. If
you set the minimal variance given in exercise 1 to 0, a Gaussian
distribution can settle around a single sample point and then have
its variance shrink to 0. Why does this pathological behavior occur?
'''
def Ex3():
'''
Modify your code from exercise 1 to calculate function F of equation
10.14 during each E and M step of EM. Check that F changes
monotonically. Explicitly calculate the true log likelihood of the data
from equation 10.7 at the end of each M phase. Is it equal to F ?
'''
def Ex4():
'''
Modify the code in exercise 1 to fit a K-means model rather than a
mixture of Gaussians. Can you see any practical differences in the
solutions that arise?
'''
def Ex5():
'''
Consider the factor analysis model of figure 10.3 (discussed in more
generality later in the chapter). Using the joint probability over v and
u given in equation 10.15, derive an expression for F , and thus the
learning rules of equation 10.5.
'''
def Ex6():
'''
Using the EM version of factor analysis (see the appendix of chapter
10), reproduce figure 10.4. MATLAB file c10p6.m shows how to
generate data u1 for figure 10.4A and B, and u2 for C and D. First
perform factor analysis on these data and reproduce figures 10.4A
and C. Next, use the eig function to perform principal components
analysis on u1 and u2, and thereby produce the rest of the figures. For
some initial conditions, the cloud of points in the figuresmight slope
downwards instead of upwards. Why? Calculate the expression for
F derived in exercise 5 as factor analysis progresses and show that
it changes monotonically.
'''
def Ex7():
'''
Apply a rotationmatrix to the data set u2 fromexercise 6 (an example
rotation matrix is given as rot in MATLAB file c10p6.m). Performfactor
analysis and principal components analysis on the rotated data.
How do the results compare with those for the unrotated data (remember
to rotate your results back, if necessary, so that appropriate
comparisons can be made)?
'''
def Ex8():
'''
Construct a data set u from a set of independent, heavy-tailed,
“sources” v through the relation u = G * v. Both u and v should
be four-dimensional vectors. Choose the components of v independently
and randomly from a double exponential distribution,
for which the probability of getting the value v is proportional to
exp(-|v|) (note that a one-sided exponentially distributed random
variable can be generated using either exprnd(*) or -log(rand(*))).
Choose a randommatrix G and generate the corresponding u values
as u = G * v. Use 2000 randomly chosen v's and their corresponding
u's. Then, use independent components analysis, as in equation
10.40, to learn generative sources from the inputs u. How well
do the values of the extracted sources match those of the original
sources?
'''
def Ex9():
'''
Compare the actual G you used to generate the data in problem 8
with the G that is recovered by independent components analysis.
Plot the six two-dimensional projections of the input data (u1 versus
u2; u1 versus u3; etc) together with the projections of the mixing axes
coming fromG(it is good to usemore data points for this, say 10000).
The mixing axes are lines parallel to vectors with components G1i,
G2i, G3i, and G4i, for i = 1, 2, 3, 4. What relationship exists between
thesemixing axes and the envelope of the data points, andwhy? Plot
u generated in the same way when the components of v are chosen
independently from identical Gaussian distributions, together with
the mixing axes coming from G. What differences do you see?
'''
def Ex10():
'''
Implement wake-sleep learning for the Helmholtz machine with binary
units when the input data is derived from a square “retina” of
size ndim×ndim. The ndim columns of the input array are independently
turned “on” with probability pbar. Each unit in a column that
is “on” takes the value 1 with probability 1-pout and 0 with probability
pout, and each unit in a column that is “off” takes the value 1
with probability pout and 0 with probability 1-pout. MATLAB program
c10p10.m is an example. In what way does the activity of the
v units in the model capture the way that each input u was actually
generated? What happens if there are not enough hidden units to
represent each column separately?
'''
def Ex11():
'''
Implementwake-sleep learning for the binaryHelmholtzmachine as
in problem10, except nowmake a correlational structure between the
columns - so that for half the input patterns, only columns 1 . . .ndim/2
are eligible to be turned on (with probability pbar), and for the other
half, only the other columns ndim/2+ 1 . . .ndim are eligible. Program
c10p11.m shows oneway to generate such inputs. Train aHelmholtz
machine with two representational layers (v and z), the top layer (z)
having just one unit, the middle layer (v) with ndim+1 units. Does
this build a generative model that captures the hierarchical way in
which each input pattern is generated?
'''
def main():
#loads data
matc1p8 = scipy.io.loadmat(r'exercises\c2\data\c1p8.MAT')
matc2p3 = scipy.io.loadmat(r'exercises\c2\data\c2p3.MAT')
#Visualizes data parameters
#for key in matc1p8:
#print (key, ": ", len(matc1p8[key])
#print (len(matc10p1))
#execute exercises here
#Ex1()
#Ex2()
#Ex3()
#Ex4()
#Ex5()
#Ex6()
#Ex7()
#Ex8()
#Ex9()
#Ex10()
#Ex11()
if __name__ == "__main__":
main()