-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.txt
More file actions
60 lines (49 loc) · 2.73 KB
/
README.txt
File metadata and controls
60 lines (49 loc) · 2.73 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Prep:
-Removed redundant columns/ renamed columns
-Converted Iron units to micrograms
-Converted excel file to csv
-Split data into approximately 80%-20% (train-test)
First Pass:
Will treat each respective Copper,Iron,Lead 3-tuple as a data point and disregard time. We will try to predict the amount of lead from copper and iron.
-We try a linear regression on these:
-Regression on only Copper to predict Lead is inconclusive as expected, same for Iron to predict Lead
-Regression using both to predict lead was equally inconclusive
-Regression on just iron performed best in terms of R^2 score, copper performed worse than even expected value
-We next try kernel svm models:
-Our linear kernel svm (R^2 = .02 on train, .2 on test)
-Our rbf kernel svm (R^2 = -.01 on train, -.1 on test)
-Our polynomial(deg 3) kernel svm did not converge for some reason
-Our sigmoid kernel svm (R^2 = -.018 on train, -.15 on test)
We next tried a normal MLP Neural network regression with multiple hidden layer widths/depths:
-Our results were variable so we averaged over 5 train-test R^2 results
-For two hidden neurons we got R^2 (.03 for train, .047 for test)
-For 4 neurons our average R^2 (.06 for train, .06 for test)
-For 8 hidden layer neuron R^2 (.06 train, .065 test)
-For 16 hidden layer neuron R^2 (.068 train, .15 test)
-For 16 neurons in first layer 8 in second R^2 (.06 train, .069 test)
-For 32 neurons in first layer 16 in second R^2 (.07 train, .05 test)
We next try decision tree regressors:
-Results not worth reporting, overfit dramatically
Second Pass:
Now we are working with Copper, Iron, Chloride, Lead 4-tuple as a datapoint. We disregard the time series component again. We will try to predict lead content from the other 3. We will use cross validation when it is appropriate.
-We try a linear regression:
-We got R^2 (.07 train, .06 test).
-No cross valid
-We try kernel SVMs:
-Linear Kerner R^2 (.035 training, .45 test)
-RBF kernel R^2 (-.014 training, -.135 test)
-Sigmoid kernel R^2 (-.019 training, -.155 test)
-Polynomial kernel did not halt again
-We try a regular MLP Neural Network regression with multiple hidden layer depths/widths:
-50 hidden units 1 layer R^2 (.085 training, .167 test)
-Cross Valid hyper params (relu activation, learning rate=3*10^-5, solver=lgfbs)
-100 hidden 1 layer R^2 (.084 training, .163 test)
-Cross validation hyper params the same
-We try Decision Trees:
-Aggressive overfitting, not worth reporting
-We try Ridge Regression:
-Not good, no victory over past models
Third Pass:
We now treat the data as time series. We will use the Copper,Iron, Chloride readings from each home to predict the lead at the corresponding time step.
-We try RNN of varying complexity:
-