forked from timnugent/logistic-regression-sgd
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
174 lines (152 loc) · 7.24 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
L1-regularized logistic regression using stochastic gradient descent
--------------------------------------------------------------------
(c) Tim Nugent
Compile by running 'make'. Uses -std=c++11 - on older compilers you may need to change this to -std=c++0x in the Makefile.
Run all tests with 'make test'
Run the train/classify tool as follows:
./lr_sgd -o weights.out -p predict.out -t test.dat train.dat
This trains using train.dat, writes the weights to weights.out, then classifies test.dat and writes predictions to predict.out.
Just train and write weights:
./lr_sgd -o weights.out train.dat
Just classify using weights and test files:
./lr_sgd -m weights.out -p predict.out -t test.dat
Training and test data should be in svm-light/libsvm format, e.g.
+1 1:4.12069 2:11.3896 3:18.5742 4:2.85764 5:53.4406
-1 1:3.14565 2:17.4338 3:19.3353 4:2.63431 5:56.4233
Sparse data is OK. Stochastic gradient descent is sensitive to feature scaling, so it is highly recommended that you scale your data e.g. by standardising to Z-scores, or scaling in the range [0,1]. Only 2 data classes are permitted (e.g. 0 and 1, or -1 and 1). Full options are as follows:
Usage:
./lr_sgd [options] [training_data]
Options:
-s <int> Shuffle dataset after each iteration. default 1
-i <int> Maximum iterations. default 50000
-e <float> Convergence rate. default 0.005
-a <float> Learning rate. default 0.001
-l <float> L1 regularization weight. default 0.0001
-m <file> Read weights from file
-o <file> Write weights to file
-t <file> Test file to classify
-p <file> Write predictions to file
-r Randomise weights between -1 and 1, otherwise 0
-v Verbose.
L1-regularization is via the cumulative approach described in:
Tsuruoka, Y., Tsujii, J., and Ananiadou, S., 2009
http://aclweb.org/anthology/P/P09/P09-1054.pdf
Test example
------------
This uses training/test data from Reuters articles about corporate acquisitions, borrowed from here: http://svmlight.joachims.org/. Use the -v flag to see the prediction results.
With L1-regularization:
./lr_sgd -o weights.out -p predict.out -t test.dat train.dat
# called with: ./lr_sgd -o weights.out -p predict.out -t test.dat train.dat
# learning rate: 0.001
# convergence rate: 0.005
# l1 penalty weight: 0.0001
# max. iterations: 50000
# training data: train.dat
# model output: weights.out
# test data: test.dat
# predictions: predict.out
# training examples: 2000
# features: 7034
# stochastic gradient descent
# convergence: 0.0488 l1-norm: 1.4560e+02 iterations: 100
# convergence: 0.0356 l1-norm: 2.4081e+02 iterations: 200
# convergence: 0.0281 l1-norm: 3.1019e+02 iterations: 300
# convergence: 0.0233 l1-norm: 3.6375e+02 iterations: 400
# convergence: 0.0198 l1-norm: 4.0679e+02 iterations: 500
# convergence: 0.0173 l1-norm: 4.4234e+02 iterations: 600
# convergence: 0.0154 l1-norm: 4.7222e+02 iterations: 700
# convergence: 0.0139 l1-norm: 4.9784e+02 iterations: 800
# convergence: 0.0127 l1-norm: 5.1999e+02 iterations: 900
# convergence: 0.0117 l1-norm: 5.3936e+02 iterations: 1000
# convergence: 0.0108 l1-norm: 5.5645e+02 iterations: 1100
# convergence: 0.0101 l1-norm: 5.7178e+02 iterations: 1200
# convergence: 0.0094 l1-norm: 5.8565e+02 iterations: 1300
# convergence: 0.0089 l1-norm: 5.9820e+02 iterations: 1400
# convergence: 0.0084 l1-norm: 6.0956e+02 iterations: 1500
# convergence: 0.0079 l1-norm: 6.2004e+02 iterations: 1600
# convergence: 0.0076 l1-norm: 6.2980e+02 iterations: 1700
# convergence: 0.0072 l1-norm: 6.3883e+02 iterations: 1800
# convergence: 0.0069 l1-norm: 6.4721e+02 iterations: 1900
# convergence: 0.0066 l1-norm: 6.5499e+02 iterations: 2000
# convergence: 0.0064 l1-norm: 6.6228e+02 iterations: 2100
# convergence: 0.0061 l1-norm: 6.6908e+02 iterations: 2200
# convergence: 0.0059 l1-norm: 6.7543e+02 iterations: 2300
# convergence: 0.0057 l1-norm: 6.8134e+02 iterations: 2400
# convergence: 0.0055 l1-norm: 6.8684e+02 iterations: 2500
# convergence: 0.0053 l1-norm: 6.9202e+02 iterations: 2600
# convergence: 0.0052 l1-norm: 6.9694e+02 iterations: 2700
# convergence: 0.0050 l1-norm: 7.0160e+02 iterations: 2800
# sparsity: 0.1483 (1043/7034)
# written weights to file weights.out
# classifying
# accuracy: 0.9733 (584/600)
# precision: 0.9581
# recall: 0.9900
# mcc: 0.9472
# tp: 297
# tn: 287
# fp: 13
# fn: 3
# written predictions to file predict.out
Without L1-regularization:
./lr_sgd -l 0.0 -o weights.out -p predict.out -t test.dat train.dat
# called with: ./lr_sgd -l 0.0 -o weights.out -p predict.out -t test.dat train.dat
# learning rate: 0.001
# convergence rate: 0.005
# l1 penalty weight: 0
# max. iterations: 50000
# training data: train.dat
# model output: weights.out
# test data: test.dat
# predictions: predict.out
# training examples: 2000
# features: 7034
# stochastic gradient descent
# convergence: 0.0529 l1-norm: 2.3400e+02 iterations: 100
# convergence: 0.0390 l1-norm: 4.0495e+02 iterations: 200
# convergence: 0.0311 l1-norm: 5.4213e+02 iterations: 300
# convergence: 0.0260 l1-norm: 6.5733e+02 iterations: 400
# convergence: 0.0224 l1-norm: 7.5714e+02 iterations: 500
# convergence: 0.0197 l1-norm: 8.4550e+02 iterations: 600
# convergence: 0.0177 l1-norm: 9.2504e+02 iterations: 700
# convergence: 0.0161 l1-norm: 9.9751e+02 iterations: 800
# convergence: 0.0147 l1-norm: 1.0642e+03 iterations: 900
# convergence: 0.0136 l1-norm: 1.1260e+03 iterations: 1000
# convergence: 0.0127 l1-norm: 1.1838e+03 iterations: 1100
# convergence: 0.0119 l1-norm: 1.2380e+03 iterations: 1200
# convergence: 0.0112 l1-norm: 1.2892e+03 iterations: 1300
# convergence: 0.0106 l1-norm: 1.3376e+03 iterations: 1400
# convergence: 0.0100 l1-norm: 1.3836e+03 iterations: 1500
# convergence: 0.0096 l1-norm: 1.4276e+03 iterations: 1600
# convergence: 0.0091 l1-norm: 1.4695e+03 iterations: 1700
# convergence: 0.0087 l1-norm: 1.5097e+03 iterations: 1800
# convergence: 0.0084 l1-norm: 1.5483e+03 iterations: 1900
# convergence: 0.0080 l1-norm: 1.5854e+03 iterations: 2000
# convergence: 0.0077 l1-norm: 1.6211e+03 iterations: 2100
# convergence: 0.0075 l1-norm: 1.6556e+03 iterations: 2200
# convergence: 0.0072 l1-norm: 1.6890e+03 iterations: 2300
# convergence: 0.0070 l1-norm: 1.7213e+03 iterations: 2400
# convergence: 0.0068 l1-norm: 1.7525e+03 iterations: 2500
# convergence: 0.0066 l1-norm: 1.7829e+03 iterations: 2600
# convergence: 0.0064 l1-norm: 1.8124e+03 iterations: 2700
# convergence: 0.0062 l1-norm: 1.8410e+03 iterations: 2800
# convergence: 0.0060 l1-norm: 1.8689e+03 iterations: 2900
# convergence: 0.0059 l1-norm: 1.8960e+03 iterations: 3000
# convergence: 0.0057 l1-norm: 1.9225e+03 iterations: 3100
# convergence: 0.0056 l1-norm: 1.9482e+03 iterations: 3200
# convergence: 0.0054 l1-norm: 1.9734e+03 iterations: 3300
# convergence: 0.0053 l1-norm: 1.9979e+03 iterations: 3400
# convergence: 0.0052 l1-norm: 2.0219e+03 iterations: 3500
# convergence: 0.0051 l1-norm: 2.0453e+03 iterations: 3600
# sparsity: 1.0000 (7034/7034)
# written weights to file weights.out
# classifying
# accuracy: 0.9783 (587/600)
# precision: 0.9674
# recall: 0.9900
# mcc: 0.9569
# tp: 297
# tn: 290
# fp: 10
# fn: 3
# written predictions to file predict.out