-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
801 lines (763 loc) · 50 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="UTF-8">
<title>Pml by JagPadala</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" type="text/css" href="stylesheets/normalize.css" media="screen">
<link href='http://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" href="stylesheets/stylesheet.css" media="screen">
<link rel="stylesheet" type="text/css" href="stylesheets/github-light.css" media="screen">
</head>
<body>
<section class="page-header">
<h1 class="project-name">Pml</h1>
<h2 class="project-tagline">Practical Machine Learning - Predictive assignment</h2>
<a href="https://github.com/JagPadala/pml" class="btn">View on GitHub</a>
<a href="https://github.com/JagPadala/pml/zipball/master" class="btn">Download .zip</a>
<a href="https://github.com/JagPadala/pml/tarball/master" class="btn">Download .tar.gz</a>
</section>
<section class="main-content">
<p></p>
<p></p>
<div>
<div id="header">
<h1>
<a id="practical-machine-learning---prediction-assignment-writeup" class="anchor" href="#practical-machine-learning---prediction-assignment-writeup" aria-hidden="true"><span class="octicon octicon-link"></span></a>Practical Machine Learning - Prediction Assignment Writeup</h1>
<h4>
<a id="jag-padala" class="anchor" href="#jag-padala" aria-hidden="true"><span class="octicon octicon-link"></span></a><em>Jag Padala</em>
</h4>
<h4>
<a id="jun-18-2015" class="anchor" href="#jun-18-2015" aria-hidden="true"><span class="octicon octicon-link"></span></a><em>Jun 18, 2015</em>
</h4>
</div>
<p>In this project we will utilize several machine learning techniques to predict the behavior of users of fitness tracker equipment.</p>
<div id="exploratory-analysis">
<h2>
<a id="exploratory-analysis" class="anchor" href="#exploratory-analysis" aria-hidden="true"><span class="octicon octicon-link"></span></a>Exploratory Analysis</h2>
<p>Let us do some exploratory analysis to get a basic understanding of the data set</p>
<pre><code>suppressWarnings(library(ggplot2))
suppressWarnings(library(caret))</code></pre>
<pre><code>## Loading required package: lattice</code></pre>
<pre><code>setwd("/Users/apadala/Coursera/pml")
traindata <- read.csv("pml-training.csv")
testdata <- read.csv("pml-testing.csv")
set.seed(1234)
inTrainPml <- createDataPartition(y=traindata$classe,p=0.6,list=FALSE)
trainingPml <- traindata[inTrainPml,]
testingPml <- traindata[-inTrainPml,]
summary(trainingPml)</code></pre>
<pre><code>## X user_name raw_timestamp_part_1 raw_timestamp_part_2
## Min. : 2 adelmo :2327 Min. :1.322e+09 Min. : 294
## 1st Qu.: 4936 carlitos:1881 1st Qu.:1.323e+09 1st Qu.:258646
## Median : 9794 charles :2147 Median :1.323e+09 Median :496316
## Mean : 9808 eurico :1878 Mean :1.323e+09 Mean :501632
## 3rd Qu.:14708 jeremy :1985 3rd Qu.:1.323e+09 3rd Qu.:752362
## Max. :19621 pedro :1558 Max. :1.323e+09 Max. :998750
##
## cvtd_timestamp new_window num_window roll_belt
## 5/12/11 11:24 : 904 no :11540 Min. : 1 Min. :-28.90
## 28/11/2011 14:14: 898 yes: 236 1st Qu.:221 1st Qu.: 1.11
## 5/12/11 11:25 : 860 Median :421 Median :114.00
## 30/11/2011 17:11: 857 Mean :430 Mean : 64.50
## 2/12/11 14:57 : 846 3rd Qu.:644 3rd Qu.:123.00
## 5/12/11 14:23 : 828 Max. :864 Max. :162.00
## (Other) :6583
## pitch_belt yaw_belt total_accel_belt kurtosis_roll_belt
## Min. :-55.8000 Min. :-179.00 Min. : 0.00 :11540
## 1st Qu.: 1.8275 1st Qu.: -88.30 1st Qu.: 3.00 #DIV/0! : 8
## Median : 5.2900 Median : -12.60 Median :17.00 -0.01685 : 1
## Mean : 0.3029 Mean : -11.15 Mean :11.32 -0.025513: 1
## 3rd Qu.: 15.0000 3rd Qu.: 13.10 3rd Qu.:18.00 -0.033935: 1
## Max. : 60.3000 Max. : 179.00 Max. :28.00 -0.06016 : 1
## (Other) : 224
## kurtosis_picth_belt kurtosis_yaw_belt skewness_roll_belt
## :11540 :11540 :11540
## #DIV/0! : 23 #DIV/0!: 236 #DIV/0! : 7
## -0.15095 : 3 -0.003095: 1
## -2.060105: 3 -0.010002: 1
## 11.094417: 3 -0.01402 : 1
## -0.684748: 2 -0.015465: 1
## (Other) : 202 (Other) : 225
## skewness_roll_belt.1 skewness_yaw_belt max_roll_belt max_picth_belt
## :11540 :11540 Min. :-94.300 Min. : 3.00
## #DIV/0! : 23 #DIV/0!: 236 1st Qu.:-87.900 1st Qu.: 5.00
## 0 : 4 Median : -4.900 Median :18.00
## -0.189082: 2 Mean : -4.565 Mean :13.34
## -0.587156: 2 3rd Qu.: 19.175 3rd Qu.:19.25
## -1.159179: 2 Max. :177.000 Max. :30.00
## (Other) : 203 NA's :11540 NA's :11540
## max_yaw_belt min_roll_belt min_pitch_belt min_yaw_belt
## :11540 Min. :-94.400 Min. : 0.0 :11540
## -1.2 : 18 1st Qu.:-88.200 1st Qu.: 3.0 -1.2 : 18
## -1.4 : 17 Median : -6.900 Median :17.0 -1.4 : 17
## -1.1 : 15 Mean : -6.504 Mean :11.2 -1.1 : 15
## -1.5 : 15 3rd Qu.: 11.425 3rd Qu.:17.0 -1.5 : 15
## -0.7 : 11 Max. :173.000 Max. :22.0 -0.7 : 11
## (Other): 160 NA's :11540 NA's :11540 (Other): 160
## amplitude_roll_belt amplitude_pitch_belt amplitude_yaw_belt
## Min. : 0.000 Min. : 0.000 :11540
## 1st Qu.: 0.300 1st Qu.: 1.000 #DIV/0!: 8
## Median : 1.000 Median : 1.000 0 : 228
## Mean : 1.939 Mean : 2.144
## 3rd Qu.: 2.000 3rd Qu.: 2.000
## Max. :27.860 Max. :10.000
## NA's :11540 NA's :11540
## var_total_accel_belt avg_roll_belt stddev_roll_belt var_roll_belt
## Min. : 0.000 Min. :-18.80 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.100 1st Qu.: 1.20 1st Qu.: 0.189 1st Qu.: 0.000
## Median : 0.200 Median :116.90 Median : 0.400 Median : 0.100
## Mean : 0.929 Mean : 71.22 Mean : 1.291 Mean : 7.705
## 3rd Qu.: 0.300 3rd Qu.:124.00 3rd Qu.: 0.616 3rd Qu.: 0.410
## Max. :11.000 Max. :157.40 Max. :14.200 Max. :200.700
## NA's :11540 NA's :11540 NA's :11540 NA's :11540
## avg_pitch_belt stddev_pitch_belt var_pitch_belt avg_yaw_belt
## Min. :-45.300 Min. :0.000 Min. :0.00 Min. :-94.400
## 1st Qu.: 1.725 1st Qu.:0.200 1st Qu.:0.00 1st Qu.:-88.000
## Median : 5.600 Median :0.300 Median :0.10 Median : -5.900
## Mean : 0.919 Mean :0.581 Mean :0.73 Mean : -5.531
## 3rd Qu.: 16.200 3rd Qu.:0.700 3rd Qu.:0.50 3rd Qu.: 16.900
## Max. : 41.000 Max. :3.100 Max. :9.30 Max. :173.400
## NA's :11540 NA's :11540 NA's :11540 NA's :11540
## stddev_yaw_belt var_yaw_belt gyros_belt_x gyros_belt_y
## Min. :0.000 Min. : 0.000 Min. :-1.040000 Min. :-0.64000
## 1st Qu.:0.100 1st Qu.: 0.010 1st Qu.:-0.030000 1st Qu.: 0.00000
## Median :0.300 Median : 0.090 Median : 0.030000 Median : 0.02000
## Mean :0.624 Mean : 1.349 Mean :-0.004883 Mean : 0.03994
## 3rd Qu.:0.600 3rd Qu.: 0.382 3rd Qu.: 0.110000 3rd Qu.: 0.11000
## Max. :8.700 Max. :75.920 Max. : 2.220000 Max. : 0.64000
## NA's :11540 NA's :11540
## gyros_belt_z accel_belt_x accel_belt_y accel_belt_z
## Min. :-1.4600 Min. :-82.000 Min. :-69.00 Min. :-269.00
## 1st Qu.:-0.2000 1st Qu.:-21.000 1st Qu.: 3.00 1st Qu.:-162.00
## Median :-0.1000 Median :-15.000 Median : 35.50 Median :-152.50
## Mean :-0.1313 Mean : -5.609 Mean : 30.21 Mean : -72.79
## 3rd Qu.:-0.0200 3rd Qu.: -5.000 3rd Qu.: 61.00 3rd Qu.: 27.00
## Max. : 1.6200 Max. : 85.000 Max. :109.00 Max. : 105.00
##
## magnet_belt_x magnet_belt_y magnet_belt_z roll_arm
## Min. :-49.00 Min. :360.0 Min. :-621.0 Min. :-180.00
## 1st Qu.: 9.00 1st Qu.:581.0 1st Qu.:-376.0 1st Qu.: -31.12
## Median : 34.00 Median :601.0 Median :-320.0 Median : 0.00
## Mean : 55.34 Mean :593.5 Mean :-346.2 Mean : 18.43
## 3rd Qu.: 59.00 3rd Qu.:610.0 3rd Qu.:-306.0 3rd Qu.: 77.70
## Max. :485.00 Max. :669.0 Max. : 293.0 Max. : 180.00
##
## pitch_arm yaw_arm total_accel_arm var_accel_arm
## Min. :-88.800 Min. :-180.0000 Min. : 1.00 Min. : 0.000
## 1st Qu.:-26.000 1st Qu.: -43.5000 1st Qu.:17.00 1st Qu.: 7.843
## Median : 0.000 Median : 0.0000 Median :26.50 Median : 38.767
## Mean : -4.616 Mean : -0.6392 Mean :25.37 Mean : 51.562
## 3rd Qu.: 11.300 3rd Qu.: 46.0250 3rd Qu.:32.00 3rd Qu.: 77.616
## Max. : 88.500 Max. : 180.0000 Max. :65.00 Max. :253.010
## NA's :11540
## avg_roll_arm stddev_roll_arm var_roll_arm avg_pitch_arm
## Min. :-166.67 Min. : 0.000 Min. : 0.000 Min. :-81.773
## 1st Qu.: -35.54 1st Qu.: 1.210 1st Qu.: 1.464 1st Qu.:-22.433
## Median : 0.00 Median : 5.345 Median : 28.573 Median : 0.000
## Mean : 14.20 Mean : 11.788 Mean : 521.297 Mean : -2.883
## 3rd Qu.: 76.27 3rd Qu.: 14.917 3rd Qu.: 222.528 3rd Qu.: 10.679
## Max. : 160.78 Max. :161.964 Max. :26232.208 Max. : 75.659
## NA's :11540 NA's :11540 NA's :11540 NA's :11540
## stddev_pitch_arm var_pitch_arm avg_yaw_arm stddev_yaw_arm
## Min. : 0.000 Min. : 0.000 Min. :-173.283 Min. : 0.000
## 1st Qu.: 1.675 1st Qu.: 2.806 1st Qu.: -30.917 1st Qu.: 2.734
## Median : 7.945 Median : 63.119 Median : 0.000 Median : 16.256
## Mean :10.376 Mean : 197.892 Mean : 2.264 Mean : 21.366
## 3rd Qu.:16.606 3rd Qu.: 275.762 3rd Qu.: 38.937 3rd Qu.: 33.036
## Max. :43.412 Max. :1884.565 Max. : 152.000 Max. :133.562
## NA's :11540 NA's :11540 NA's :11540 NA's :11540
## var_yaw_arm gyros_arm_x gyros_arm_y gyros_arm_z
## Min. : 0.000 Min. :-6.37000 Min. :-3.440 Min. :-2.3300
## 1st Qu.: 7.487 1st Qu.:-1.35000 1st Qu.:-0.790 1st Qu.:-0.0700
## Median : 264.248 Median : 0.06000 Median :-0.240 Median : 0.2300
## Mean : 891.480 Mean : 0.03313 Mean :-0.252 Mean : 0.2696
## 3rd Qu.: 1091.373 3rd Qu.: 1.57000 3rd Qu.: 0.160 3rd Qu.: 0.7200
## Max. :17838.878 Max. : 4.87000 Max. : 2.840 Max. : 2.2000
## NA's :11540
## accel_arm_x accel_arm_y accel_arm_z magnet_arm_x
## Min. :-404.00 Min. :-318.00 Min. :-630.00 Min. :-584.0
## 1st Qu.:-241.00 1st Qu.: -54.00 1st Qu.:-142.00 1st Qu.:-297.0
## Median : -41.00 Median : 14.00 Median : -47.00 Median : 290.0
## Mean : -59.79 Mean : 32.44 Mean : -70.44 Mean : 194.1
## 3rd Qu.: 84.00 3rd Qu.: 138.00 3rd Qu.: 24.25 3rd Qu.: 640.0
## Max. : 435.00 Max. : 308.00 Max. : 292.00 Max. : 782.0
##
## magnet_arm_y magnet_arm_z kurtosis_roll_arm kurtosis_picth_arm
## Min. :-392.0 Min. :-596.0 :11540 :11540
## 1st Qu.: -12.0 1st Qu.: 133.8 #DIV/0! : 42 #DIV/0! : 44
## Median : 200.0 Median : 446.0 -0.05051: 1 -0.00484: 1
## Mean : 155.4 Mean : 307.3 -0.05695: 1 -0.02967: 1
## 3rd Qu.: 322.0 3rd Qu.: 545.0 -0.09698: 1 -0.07394: 1
## Max. : 583.0 Max. : 694.0 -0.14153: 1 -0.10385: 1
## (Other) : 190 (Other) : 188
## kurtosis_yaw_arm skewness_roll_arm skewness_pitch_arm skewness_yaw_arm
## :11540 :11540 :11540 :11540
## #DIV/0! : 9 #DIV/0! : 41 #DIV/0! : 44 #DIV/0! : 9
## 0.55844 : 2 -0.00696: 1 -0.00184: 1 -0.00562: 1
## -0.01548: 1 -0.03359: 1 -0.01185: 1 -0.04866: 1
## -0.02101: 1 -0.03484: 1 -0.01247: 1 -0.05413: 1
## -0.04059: 1 -0.04254: 1 -0.02063: 1 -0.06077: 1
## (Other) : 222 (Other) : 191 (Other) : 188 (Other) : 223
## max_roll_arm max_picth_arm max_yaw_arm min_roll_arm
## Min. :-73.10 Min. :-173.00 Min. : 4.00 Min. :-88.80
## 1st Qu.: 0.00 1st Qu.: -10.47 1st Qu.:29.00 1st Qu.:-41.75
## Median : 8.15 Median : 19.95 Median :34.00 Median :-21.35
## Mean : 12.92 Mean : 35.96 Mean :35.09 Mean :-19.29
## 3rd Qu.: 29.88 3rd Qu.: 101.00 3rd Qu.:40.00 3rd Qu.: 0.00
## Max. : 85.50 Max. : 180.00 Max. :65.00 Max. : 66.40
## NA's :11540 NA's :11540 NA's :11540 NA's :11540
## min_pitch_arm min_yaw_arm amplitude_roll_arm amplitude_pitch_arm
## Min. :-179.00 Min. : 1.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: -76.72 1st Qu.: 7.75 1st Qu.: 5.475 1st Qu.: 10.28
## Median : -32.95 Median :13.00 Median : 28.150 Median : 53.65
## Mean : -32.18 Mean :14.86 Mean : 32.212 Mean : 68.14
## 3rd Qu.: 0.00 3rd Qu.:19.25 3rd Qu.: 51.080 3rd Qu.:113.83
## Max. : 152.00 Max. :38.00 Max. :118.000 Max. :359.00
## NA's :11540 NA's :11540 NA's :11540 NA's :11540
## amplitude_yaw_arm roll_dumbbell pitch_dumbbell yaw_dumbbell
## Min. : 0.00 Min. :-153.51 Min. :-149.59 Min. :-150.871
## 1st Qu.:10.75 1st Qu.: -18.60 1st Qu.: -39.94 1st Qu.: -77.470
## Median :21.00 Median : 48.17 Median : -20.46 Median : 0.000
## Mean :20.23 Mean : 23.69 Mean : -10.50 Mean : 2.151
## 3rd Qu.:28.00 3rd Qu.: 67.78 3rd Qu.: 17.59 3rd Qu.: 80.414
## Max. :52.00 Max. : 153.55 Max. : 149.40 Max. : 154.952
## NA's :11540
## kurtosis_roll_dumbbell kurtosis_picth_dumbbell kurtosis_yaw_dumbbell
## :11540 :11540 :11540
## #DIV/0!: 4 -0.9334: 2 #DIV/0!: 236
## -2.0851: 2 -2.0851: 2
## -0.0115: 1 -0.0163: 1
## -0.0262: 1 -0.0233: 1
## -0.0334: 1 -0.0322: 1
## (Other): 227 (Other): 229
## skewness_roll_dumbbell skewness_pitch_dumbbell skewness_yaw_dumbbell
## :11540 :11540 :11540
## #DIV/0!: 3 0.109 : 2 #DIV/0!: 236
## 0.111 : 2 -0.0053: 1
## 1.0312 : 2 -0.0084: 1
## -0.0096: 1 -0.0166: 1
## -0.0234: 1 -0.0452: 1
## (Other): 227 (Other): 230
## max_roll_dumbbell max_picth_dumbbell max_yaw_dumbbell min_roll_dumbbell
## Min. :-70.10 Min. :-112.90 :11540 Min. :-123.30
## 1st Qu.:-27.52 1st Qu.: -68.72 -0.6 : 14 1st Qu.: -59.02
## Median : 16.20 Median : 43.30 0.2 : 13 Median : -41.30
## Mean : 12.50 Mean : 33.34 -0.8 : 10 Mean : -38.39
## 3rd Qu.: 48.52 3rd Qu.: 133.35 0 : 10 3rd Qu.: -23.93
## Max. :129.80 Max. : 154.50 -0.3 : 9 Max. : 41.50
## NA's :11540 NA's :11540 (Other): 180 NA's :11540
## min_pitch_dumbbell min_yaw_dumbbell amplitude_roll_dumbbell
## Min. :-146.20 :11540 Min. : 0.00
## 1st Qu.: -92.17 -0.6 : 14 1st Qu.: 13.27
## Median : -60.00 0.2 : 13 Median : 33.19
## Mean : -28.92 -0.8 : 10 Mean : 50.89
## 3rd Qu.: 31.45 0 : 10 3rd Qu.: 74.16
## Max. : 116.60 -0.3 : 9 Max. :232.79
## NA's :11540 (Other): 180 NA's :11540
## amplitude_pitch_dumbbell amplitude_yaw_dumbbell total_accel_dumbbell
## Min. : 0.00 :11540 Min. : 0.00
## 1st Qu.: 16.54 #DIV/0!: 4 1st Qu.: 4.00
## Median : 41.10 0 : 232 Median :10.00
## Mean : 62.26 Mean :13.57
## 3rd Qu.: 93.47 3rd Qu.:19.00
## Max. :263.60 Max. :58.00
## NA's :11540
## var_accel_dumbbell avg_roll_dumbbell stddev_roll_dumbbell
## Min. : 0.000 Min. :-117.02 Min. : 0.00
## 1st Qu.: 0.342 1st Qu.: -9.24 1st Qu.: 4.25
## Median : 0.988 Median : 48.63 Median : 11.72
## Mean : 4.508 Mean : 23.98 Mean : 18.58
## 3rd Qu.: 3.314 3rd Qu.: 63.35 3rd Qu.: 23.62
## Max. :230.428 Max. : 125.99 Max. :123.78
## NA's :11540 NA's :11540 NA's :11540
## var_roll_dumbbell avg_pitch_dumbbell stddev_pitch_dumbbell
## Min. : 0.00 Min. :-70.73 Min. : 0.000
## 1st Qu.: 18.07 1st Qu.:-37.91 1st Qu.: 3.023
## Median : 137.44 Median :-17.05 Median : 7.806
## Mean : 827.05 Mean :-11.91 Mean :12.262
## 3rd Qu.: 557.74 3rd Qu.: 12.97 3rd Qu.:18.209
## Max. :15321.01 Max. : 82.21 Max. :62.881
## NA's :11540 NA's :11540 NA's :11540
## var_pitch_dumbbell avg_yaw_dumbbell stddev_yaw_dumbbell
## Min. : 0.00 Min. :-117.950 Min. : 0.000
## 1st Qu.: 9.14 1st Qu.: -77.008 1st Qu.: 3.461
## Median : 60.94 Median : 11.368 Median : 9.467
## Mean : 299.03 Mean : 2.387 Mean : 15.981
## 3rd Qu.: 331.57 3rd Qu.: 75.471 3rd Qu.: 22.805
## Max. :3953.97 Max. : 134.905 Max. :107.088
## NA's :11540 NA's :11540 NA's :11540
## var_yaw_dumbbell gyros_dumbbell_x gyros_dumbbell_y
## Min. : 0.00 Min. :-204.0000 Min. :-2.10000
## 1st Qu.: 11.98 1st Qu.: -0.0300 1st Qu.:-0.14000
## Median : 89.62 Median : 0.1300 Median : 0.03000
## Mean : 573.49 Mean : 0.1536 Mean : 0.04977
## 3rd Qu.: 520.13 3rd Qu.: 0.3500 3rd Qu.: 0.21000
## Max. :11467.91 Max. : 2.2200 Max. :52.00000
## NA's :11540
## gyros_dumbbell_z accel_dumbbell_x accel_dumbbell_y accel_dumbbell_z
## Min. : -2.3000 Min. :-419.00 Min. :-189.00 Min. :-319.00
## 1st Qu.: -0.3100 1st Qu.: -50.00 1st Qu.: -8.00 1st Qu.:-141.00
## Median : -0.1300 Median : -8.00 Median : 39.00 Median : 0.00
## Mean : -0.1194 Mean : -27.83 Mean : 51.94 Mean : -37.18
## 3rd Qu.: 0.0300 3rd Qu.: 11.00 3rd Qu.: 109.00 3rd Qu.: 38.00
## Max. :317.0000 Max. : 235.00 Max. : 310.00 Max. : 318.00
##
## magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z roll_forearm
## Min. :-639.0 Min. :-3600.0 Min. :-249.00 Min. :-180.000
## 1st Qu.:-535.0 1st Qu.: 231.0 1st Qu.: -45.00 1st Qu.: -0.905
## Median :-478.0 Median : 311.0 Median : 13.00 Median : 21.500
## Mean :-328.5 Mean : 221.6 Mean : 46.41 Mean : 34.167
## 3rd Qu.:-304.0 3rd Qu.: 391.0 3rd Qu.: 96.00 3rd Qu.: 140.000
## Max. : 592.0 Max. : 632.0 Max. : 447.00 Max. : 180.000
##
## pitch_forearm yaw_forearm kurtosis_roll_forearm
## Min. :-72.50 Min. :-180.00 :11540
## 1st Qu.: 0.00 1st Qu.: -69.30 #DIV/0!: 51
## Median : 9.48 Median : 0.00 -0.8079: 2
## Mean : 10.95 Mean : 18.45 -0.0359: 1
## 3rd Qu.: 28.80 3rd Qu.: 110.00 -0.0781: 1
## Max. : 88.70 Max. : 180.00 -0.1363: 1
## (Other): 180
## kurtosis_picth_forearm kurtosis_yaw_forearm skewness_roll_forearm
## :11540 :11540 :11540
## #DIV/0!: 52 #DIV/0!: 236 #DIV/0!: 50
## -0.0442: 1 -0.1912: 2
## -0.0523: 1 -0.0063: 1
## -0.092 : 1 -0.011 : 1
## -0.1002: 1 -0.0237: 1
## (Other): 180 (Other): 181
## skewness_pitch_forearm skewness_yaw_forearm max_roll_forearm
## :11540 :11540 Min. :-66.60
## #DIV/0!: 52 #DIV/0!: 236 1st Qu.: 0.00
## 0 : 2 Median : 26.15
## -0.0131: 1 Mean : 24.12
## -0.0405: 1 3rd Qu.: 45.35
## -0.0599: 1 Max. : 89.80
## (Other): 179 NA's :11540
## max_picth_forearm max_yaw_forearm min_roll_forearm min_pitch_forearm
## Min. :-151.00 :11540 Min. :-72.500 Min. :-180.00
## 1st Qu.: 0.00 #DIV/0!: 51 1st Qu.: -3.700 1st Qu.:-175.00
## Median : 112.50 -1.3 : 19 Median : 0.000 Median : -48.40
## Mean : 80.25 -1.2 : 18 Mean : -0.312 Mean : -56.95
## 3rd Qu.: 175.00 -1.4 : 14 3rd Qu.: 12.650 3rd Qu.: 0.00
## Max. : 180.00 -1.6 : 13 Max. : 62.100 Max. : 164.00
## NA's :11540 (Other): 121 NA's :11540 NA's :11540
## min_yaw_forearm amplitude_roll_forearm amplitude_pitch_forearm
## :11540 Min. : 0.000 Min. : 0.0
## #DIV/0!: 51 1st Qu.: 1.028 1st Qu.: 1.0
## -1.3 : 19 Median : 15.970 Median : 83.7
## -1.2 : 18 Mean : 24.429 Mean :137.2
## -1.4 : 14 3rd Qu.: 39.403 3rd Qu.:349.2
## -1.6 : 13 Max. :120.300 Max. :360.0
## (Other): 121 NA's :11540 NA's :11540
## amplitude_yaw_forearm total_accel_forearm var_accel_forearm
## :11540 Min. : 0.00 Min. : 0.000
## #DIV/0!: 51 1st Qu.: 29.00 1st Qu.: 6.974
## 0 : 185 Median : 36.00 Median : 18.898
## Mean : 34.72 Mean : 30.676
## 3rd Qu.: 41.00 3rd Qu.: 47.601
## Max. :108.00 Max. :158.816
## NA's :11540
## avg_roll_forearm stddev_roll_forearm var_roll_forearm
## Min. :-177.13 Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.339 1st Qu.: 0.11
## Median : 14.36 Median : 6.972 Median : 48.66
## Mean : 36.54 Mean : 42.380 Mean : 5366.19
## 3rd Qu.: 106.12 3rd Qu.: 93.055 3rd Qu.: 8660.85
## Max. : 177.26 Max. :179.171 Max. :32102.24
## NA's :11540 NA's :11540 NA's :11540
## avg_pitch_forearm stddev_pitch_forearm var_pitch_forearm
## Min. :-68.17 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.281 1st Qu.: 0.079
## Median : 12.10 Median : 4.770 Median : 22.756
## Mean : 11.75 Mean : 7.918 Mean : 141.664
## 3rd Qu.: 30.48 3rd Qu.:12.877 3rd Qu.: 165.822
## Max. : 72.09 Max. :39.561 Max. :1565.055
## NA's :11540 NA's :11540 NA's :11540
## avg_yaw_forearm stddev_yaw_forearm var_yaw_forearm
## Min. :-153.08 Min. : 0.00 Min. : 0.00
## 1st Qu.: -14.24 1st Qu.: 0.49 1st Qu.: 0.24
## Median : 0.00 Median : 25.40 Median : 645.57
## Mean : 18.98 Mean : 46.00 Mean : 4957.98
## 3rd Qu.: 85.54 3rd Qu.: 89.03 3rd Qu.: 7929.03
## Max. : 167.33 Max. :197.51 Max. :39009.33
## NA's :11540 NA's :11540 NA's :11540
## gyros_forearm_x gyros_forearm_y gyros_forearm_z
## Min. :-22.0000 Min. : -7.02000 Min. : -8.0900
## 1st Qu.: -0.2200 1st Qu.: -1.49000 1st Qu.: -0.1800
## Median : 0.0500 Median : 0.03000 Median : 0.0800
## Mean : 0.1522 Mean : 0.08676 Mean : 0.1613
## 3rd Qu.: 0.5600 3rd Qu.: 1.64000 3rd Qu.: 0.4900
## Max. : 3.5200 Max. :311.00000 Max. :231.0000
##
## accel_forearm_x accel_forearm_y accel_forearm_z magnet_forearm_x
## Min. :-498.00 Min. :-595.0 Min. :-446.00 Min. :-1280.0
## 1st Qu.:-179.00 1st Qu.: 57.0 1st Qu.:-181.00 1st Qu.: -618.0
## Median : -57.00 Median : 201.0 Median : -36.00 Median : -387.5
## Mean : -63.01 Mean : 163.8 Mean : -53.56 Mean : -313.9
## 3rd Qu.: 76.00 3rd Qu.: 312.0 3rd Qu.: 27.00 3rd Qu.: -75.0
## Max. : 359.00 Max. : 923.0 Max. : 285.00 Max. : 672.0
##
## magnet_forearm_y magnet_forearm_z classe
## Min. :-896.0 Min. :-973 A:3348
## 1st Qu.: -1.0 1st Qu.: 185 B:2279
## Median : 587.0 Median : 507 C:2054
## Mean : 378.7 Mean : 391 D:1930
## 3rd Qu.: 736.0 3rd Qu.: 652 E:2165
## Max. :1450.0 Max. :1090
## </code></pre>
<p>We loaded the data and created a training and test dataset. From the summary we can see that there are 159 predictors in the data for predicting the value of the classe. We can see that classe is a categorical variable with 5 possible results.</p>
<p>We can also see that we have measured the movements for 6 users over a period of time. The timestamp for the measurements and the window of the measurements are recorded. Since the users were instructed to deliberately perform some good and some bad repitions we need to make sure there is no target leakage by including measurements such as window and timestamp that would ovefit our model and lead to incorrect predictions.</p>
<p>We then need to start looking at various other measurements that were taken.</p>
<div id="data-clean-up">
<h3>
<a id="data-clean-up" class="anchor" href="#data-clean-up" aria-hidden="true"><span class="octicon octicon-link"></span></a>DATA CLEAN UP</h3>
<p>The first observation that pops up on closer inspection of the summary is the number of columns that seem to have a number of NA values. This means that the sensors for measuring these variables are not accurate/reliable. For example kurtosis_roll_belt has a total of 11518 measurements that were empty out of the set of 11776 measurements. So this cannot be used for an meaningful prediction. kurtosis_picth_belt kurtosis_yaw_belt skewness_roll_belt and skewness_yaw_belt. We also notice another pattern in variables like max_roll_belt max_yaw_belt min_roll_belt min_yaw_belt amplitude_yaw_belt amplitude_pitch_belt where a number of measurements are NA. Infact these predictors just have NA instead of empty values in 11518 cases. All of these predictors need to be removed so we can get some accurate predictions.</p>
<pre><code>trainingPmlNonNACols <- trainingPml[c("roll_belt","pitch_belt","yaw_belt","total_accel_belt","gyros_belt_x","gyros_belt_y","gyros_belt_z","accel_belt_x","accel_belt_y","accel_belt_z","magnet_belt_x","magnet_belt_y","magnet_belt_z","roll_arm","pitch_arm","yaw_arm","total_accel_arm","gyros_arm_x","gyros_arm_y","gyros_arm_z","accel_arm_x","accel_arm_y","accel_arm_z","magnet_arm_x","magnet_arm_y","magnet_arm_z","roll_dumbbell","pitch_dumbbell","yaw_dumbbell","total_accel_dumbbell","gyros_dumbbell_x","gyros_dumbbell_y","gyros_dumbbell_z","accel_dumbbell_x","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","magnet_dumbbell_y","magnet_dumbbell_z","roll_forearm","pitch_forearm","yaw_forearm","total_accel_forearm","gyros_forearm_x","gyros_forearm_y","gyros_forearm_z","accel_forearm_x","accel_forearm_y","accel_forearm_z","magnet_forearm_x","magnet_forearm_y","magnet_forearm_z","classe")]
testingPmlNonNACols <- testingPml[c("roll_belt","pitch_belt","yaw_belt","total_accel_belt","gyros_belt_x","gyros_belt_y","gyros_belt_z","accel_belt_x","accel_belt_y","accel_belt_z","magnet_belt_x","magnet_belt_y","magnet_belt_z","roll_arm","pitch_arm","yaw_arm","total_accel_arm","gyros_arm_x","gyros_arm_y","gyros_arm_z","accel_arm_x","accel_arm_y","accel_arm_z","magnet_arm_x","magnet_arm_y","magnet_arm_z","roll_dumbbell","pitch_dumbbell","yaw_dumbbell","total_accel_dumbbell","gyros_dumbbell_x","gyros_dumbbell_y","gyros_dumbbell_z","accel_dumbbell_x","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","magnet_dumbbell_y","magnet_dumbbell_z","roll_forearm","pitch_forearm","yaw_forearm","total_accel_forearm","gyros_forearm_x","gyros_forearm_y","gyros_forearm_z","accel_forearm_x","accel_forearm_y","accel_forearm_z","magnet_forearm_x","magnet_forearm_y","magnet_forearm_z","classe")]</code></pre>
<p>Now that we have cleaned up the data we can start fitting various models and see how the models perform</p>
</div>
<div id="model-training-1">
<h3>
<a id="model-training-1" class="anchor" href="#model-training-1" aria-hidden="true"><span class="octicon octicon-link"></span></a>Model Training 1</h3>
<p>Trees : We will first fit a tree model using the rpart package in caret. We will analyze the accuracy of the prediction for the model</p>
<pre><code> modfitRpart <- train(classe ~.,method="rpart",data=trainingPmlNonNACols)</code></pre>
<pre><code>## Loading required package: rpart</code></pre>
<pre><code> modfitRpart$finalModel</code></pre>
<pre><code>## n= 11776
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 11776 8428 A (0.28 0.19 0.17 0.16 0.18)
## 2) roll_belt< 130.5 10774 7436 A (0.31 0.21 0.19 0.18 0.11)
## 4) pitch_forearm< -34.55 919 2 A (1 0.0022 0 0 0) *
## 5) pitch_forearm>=-34.55 9855 7434 A (0.25 0.23 0.21 0.2 0.12)
## 10) magnet_dumbbell_y< 436.5 8314 5944 A (0.29 0.18 0.24 0.19 0.11)
## 20) roll_forearm< 122.5 5137 3022 A (0.41 0.18 0.18 0.17 0.061) *
## 21) roll_forearm>=122.5 3177 2124 C (0.08 0.18 0.33 0.23 0.18) *
## 11) magnet_dumbbell_y>=436.5 1541 743 B (0.033 0.52 0.039 0.23 0.18) *
## 3) roll_belt>=130.5 1002 10 E (0.01 0 0 0 0.99) *</code></pre>
<pre><code> library(rattle)</code></pre>
<pre><code>## Rattle: A free graphical interface for data mining with R.
## Version 3.4.1 Copyright (c) 2006-2014 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.</code></pre>
<pre><code> fancyRpartPlot(modfitRpart$finalModel)</code></pre>
<p><img title alt width="672"></p>
<pre><code> modfitRpart</code></pre>
<pre><code>## CART
##
## 11776 samples
## 52 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
##
## Summary of sample sizes: 11776, 11776, 11776, 11776, 11776, 11776, ...
##
## Resampling results across tuning parameters:
##
## cp Accuracy Kappa Accuracy SD Kappa SD
## 0.03618889 0.5232124 0.38448962 0.02470143 0.03890642
## 0.06110584 0.4018777 0.18592797 0.06028161 0.09852178
## 0.11651637 0.3320531 0.07413223 0.04188912 0.06186040
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was cp = 0.03618889.</code></pre>
<p>As we can see the accuracy is pretty poor. However the model does give us some insight into some of the significant variables such as roll_belt, pitch_forearm, magnet_dumbbell_y and roll_forearm that seem to have a higher relevance to predict the outcome for classe.</p>
<p>Since the accuracy of predictions is only 52% the our of sample error rate would be unacceptable.</p>
</div>
<div id="model-training-2">
<h3>
<a id="model-training-2" class="anchor" href="#model-training-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>Model Training 2</h3>
<p>The next model we will try out is the random forest. This may be a better choice for this kind of data since random forest models are good for data with a large number of variables. It estimates the variables that are important so we can get a fairly accurate idea of the variables used for the end result</p>
<pre><code>modfitRforest <- train(classe ~.,method="rf",data=trainingPmlNonNACols)</code></pre>
<pre><code>## Loading required package: randomForest
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.</code></pre>
<pre><code>print(modfitRforest)</code></pre>
<pre><code>## Random Forest
##
## 11776 samples
## 52 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
##
## Summary of sample sizes: 11776, 11776, 11776, 11776, 11776, 11776, ...
##
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa Accuracy SD Kappa SD
## 2 0.9870336 0.9835895 0.001655951 0.002103040
## 27 0.9868318 0.9833367 0.001919397 0.002427099
## 52 0.9787137 0.9730624 0.005580349 0.007067277
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.</code></pre>
<pre><code>print(modfitRforest$finalModel)</code></pre>
<pre><code>##
## Call:
## randomForest(x = x, y = y, mtry = param$mtry)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 0.87%
## Confusion matrix:
## A B C D E class.error
## A 3343 2 0 2 1 0.001493429
## B 21 2252 6 0 0 0.011847301
## C 0 20 2028 6 0 0.012658228
## D 0 0 34 1894 2 0.018652850
## E 0 1 2 6 2156 0.004157044</code></pre>
<pre><code>varImp(modfitRforest)</code></pre>
<pre><code>## rf variable importance
##
## only 20 most important variables shown (out of 52)
##
## Overall
## roll_belt 100.00
## yaw_belt 83.40
## magnet_dumbbell_z 69.36
## magnet_dumbbell_y 66.25
## pitch_belt 61.15
## pitch_forearm 59.75
## magnet_dumbbell_x 55.72
## roll_forearm 51.97
## magnet_belt_z 48.42
## accel_belt_z 48.28
## roll_dumbbell 43.96
## accel_dumbbell_y 42.59
## magnet_belt_y 42.27
## accel_dumbbell_z 39.35
## roll_arm 35.14
## accel_forearm_x 31.29
## yaw_dumbbell 30.36
## total_accel_dumbbell 29.57
## accel_dumbbell_x 28.94
## magnet_forearm_z 28.08</code></pre>
</div>
<div id="description-of-the-sample-error-and-estimation-of-the-error-with-cross-validation">
<h3>
<a id="description-of-the-sample-error-and-estimation-of-the-error-with-cross-validation" class="anchor" href="#description-of-the-sample-error-and-estimation-of-the-error-with-cross-validation" aria-hidden="true"><span class="octicon octicon-link"></span></a>Description of the sample error and estimation of the error with cross validation</h3>
<p>As we can see the accuracy of the model fit is pretty high at 98.7% with the internal test data that random forest uses within the 60% of data we provided for the model. The out of box error rate is estimated at .87%. The number of trees used were 500 with optimal mtry at 2.</p>
<p>We now test the model against the 40% of the original data that we resreved to be test data.</p>
<pre><code> confusionMatrix(testingPmlNonNACols$classe,predict(modfitRforest,testingPmlNonNACols))</code></pre>
<pre><code>## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 2231 1 0 0 0
## B 10 1506 2 0 0
## C 0 12 1352 4 0
## D 0 0 30 1255 1
## E 0 0 2 3 1437
##
## Overall Statistics
##
## Accuracy : 0.9917
## 95% CI : (0.9895, 0.9936)
## No Information Rate : 0.2856
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9895
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9955 0.9914 0.9755 0.9945 0.9993
## Specificity 0.9998 0.9981 0.9975 0.9953 0.9992
## Pos Pred Value 0.9996 0.9921 0.9883 0.9759 0.9965
## Neg Pred Value 0.9982 0.9979 0.9948 0.9989 0.9998
## Prevalence 0.2856 0.1936 0.1767 0.1608 0.1833
## Detection Rate 0.2843 0.1919 0.1723 0.1600 0.1832
## Detection Prevalence 0.2845 0.1935 0.1744 0.1639 0.1838
## Balanced Accuracy 0.9977 0.9948 0.9865 0.9949 0.9993</code></pre>
<p>The tests show an accuracy of 99.17 with a 95% confidence interval between 98.9% to 99.36 %. Based on this the estimated sample error rate is 0.83% which is slighty better than the out of box error estimated by using the training data set.</p>
<p>Predictions for the Coursera Test data</p>
<p>Once the models are complete we run the tests against the test data set provided by Coursera</p>
<pre><code>testPmlNonNACols <- testdata[c("roll_belt","pitch_belt","yaw_belt","total_accel_belt","gyros_belt_x","gyros_belt_y","gyros_belt_z","accel_belt_x","accel_belt_y","accel_belt_z","magnet_belt_x","magnet_belt_y","magnet_belt_z","roll_arm","pitch_arm","yaw_arm","total_accel_arm","gyros_arm_x","gyros_arm_y","gyros_arm_z","accel_arm_x","accel_arm_y","accel_arm_z","magnet_arm_x","magnet_arm_y","magnet_arm_z","roll_dumbbell","pitch_dumbbell","yaw_dumbbell","total_accel_dumbbell","gyros_dumbbell_x","gyros_dumbbell_y","gyros_dumbbell_z","accel_dumbbell_x","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","magnet_dumbbell_y","magnet_dumbbell_z","roll_forearm","pitch_forearm","yaw_forearm","total_accel_forearm","gyros_forearm_x","gyros_forearm_y","gyros_forearm_z","accel_forearm_x","accel_forearm_y","accel_forearm_z","magnet_forearm_x","magnet_forearm_y","magnet_forearm_z")]
courseraTestResults <- predict( modfitRforest,testPmlNonNACols)
courseraTestResults </code></pre>
<pre><code>## [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E</code></pre>
</div>
<div id="final-results-for-the-random-forest-model">
<h3>
<a id="final-results-for-the-random-forest-model" class="anchor" href="#final-results-for-the-random-forest-model" aria-hidden="true"><span class="octicon octicon-link"></span></a>Final results for the random forest model</h3>
<p>We expect close to a 100% match since the accuracy expected is over 99.5%. In actual comparison the model performed at 100% and correctly predicted all the Coursera testcases.</p>
</div>
<div id="fine-tuning-the-random-forest">
<h3>
<a id="fine-tuning-the-random-forest" class="anchor" href="#fine-tuning-the-random-forest" aria-hidden="true"><span class="octicon octicon-link"></span></a>Fine tuning the random forest</h3>
<p>Randon forest does not need any additional cross validation but it is possible to fine tune training model by using a k fold cross validation for the data. Since this is more computing intensive the original sample traning data did not execute on the current hardware within a reasonable amount of time (60 minutes). So I created a smaller sample size to pass to the trainer.</p>
<pre><code>set.seed(1234)
inTrainPml10 <- createDataPartition(y=traindata$classe,p=0.3,list=FALSE)
trainingPml10 <- traindata[inTrainPml10,]
testingPml10 <- traindata[-inTrainPml10,]
trainingPmlNonNACols10 <- trainingPml10[c("roll_belt","pitch_belt","yaw_belt","total_accel_belt","gyros_belt_x","gyros_belt_y","gyros_belt_z","accel_belt_x","accel_belt_y","accel_belt_z","magnet_belt_x","magnet_belt_y","magnet_belt_z","roll_arm","pitch_arm","yaw_arm","total_accel_arm","gyros_arm_x","gyros_arm_y","gyros_arm_z","accel_arm_x","accel_arm_y","accel_arm_z","magnet_arm_x","magnet_arm_y","magnet_arm_z","roll_dumbbell","pitch_dumbbell","yaw_dumbbell","total_accel_dumbbell","gyros_dumbbell_x","gyros_dumbbell_y","gyros_dumbbell_z","accel_dumbbell_x","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","magnet_dumbbell_y","magnet_dumbbell_z","roll_forearm","pitch_forearm","yaw_forearm","total_accel_forearm","gyros_forearm_x","gyros_forearm_y","gyros_forearm_z","accel_forearm_x","accel_forearm_y","accel_forearm_z","magnet_forearm_x","magnet_forearm_y","magnet_forearm_z","classe")]
testingPmlNonNACols10 <- testingPml10[c("roll_belt","pitch_belt","yaw_belt","total_accel_belt","gyros_belt_x","gyros_belt_y","gyros_belt_z","accel_belt_x","accel_belt_y","accel_belt_z","magnet_belt_x","magnet_belt_y","magnet_belt_z","roll_arm","pitch_arm","yaw_arm","total_accel_arm","gyros_arm_x","gyros_arm_y","gyros_arm_z","accel_arm_x","accel_arm_y","accel_arm_z","magnet_arm_x","magnet_arm_y","magnet_arm_z","roll_dumbbell","pitch_dumbbell","yaw_dumbbell","total_accel_dumbbell","gyros_dumbbell_x","gyros_dumbbell_y","gyros_dumbbell_z","accel_dumbbell_x","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","magnet_dumbbell_y","magnet_dumbbell_z","roll_forearm","pitch_forearm","yaw_forearm","total_accel_forearm","gyros_forearm_x","gyros_forearm_y","gyros_forearm_z","accel_forearm_x","accel_forearm_y","accel_forearm_z","magnet_forearm_x","magnet_forearm_y","magnet_forearm_z","classe")]
testPmlNonNACols <- testdata[c("roll_belt","pitch_belt","yaw_belt","total_accel_belt","gyros_belt_x","gyros_belt_y","gyros_belt_z","accel_belt_x","accel_belt_y","accel_belt_z","magnet_belt_x","magnet_belt_y","magnet_belt_z","roll_arm","pitch_arm","yaw_arm","total_accel_arm","gyros_arm_x","gyros_arm_y","gyros_arm_z","accel_arm_x","accel_arm_y","accel_arm_z","magnet_arm_x","magnet_arm_y","magnet_arm_z","roll_dumbbell","pitch_dumbbell","yaw_dumbbell","total_accel_dumbbell","gyros_dumbbell_x","gyros_dumbbell_y","gyros_dumbbell_z","accel_dumbbell_x","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","magnet_dumbbell_y","magnet_dumbbell_z","roll_forearm","pitch_forearm","yaw_forearm","total_accel_forearm","gyros_forearm_x","gyros_forearm_y","gyros_forearm_z","accel_forearm_x","accel_forearm_y","accel_forearm_z","magnet_forearm_x","magnet_forearm_y","magnet_forearm_z")]
modfitRforest10 <- train(classe ~.,method="rf",data=trainingPmlNonNACols10,
trControl=trainControl(method="cv",number=10, repeats=2, verboseIter = TRUE), prox=TRUE)</code></pre>
<pre><code>## + Fold01: mtry= 2
## - Fold01: mtry= 2
## + Fold01: mtry=27
## - Fold01: mtry=27
## + Fold01: mtry=52
## - Fold01: mtry=52
## + Fold02: mtry= 2
## - Fold02: mtry= 2
## + Fold02: mtry=27
## - Fold02: mtry=27
## + Fold02: mtry=52
## - Fold02: mtry=52
## + Fold03: mtry= 2
## - Fold03: mtry= 2
## + Fold03: mtry=27
## - Fold03: mtry=27
## + Fold03: mtry=52
## - Fold03: mtry=52
## + Fold04: mtry= 2
## - Fold04: mtry= 2
## + Fold04: mtry=27
## - Fold04: mtry=27
## + Fold04: mtry=52
## - Fold04: mtry=52
## + Fold05: mtry= 2
## - Fold05: mtry= 2
## + Fold05: mtry=27
## - Fold05: mtry=27
## + Fold05: mtry=52
## - Fold05: mtry=52
## + Fold06: mtry= 2
## - Fold06: mtry= 2
## + Fold06: mtry=27
## - Fold06: mtry=27
## + Fold06: mtry=52
## - Fold06: mtry=52
## + Fold07: mtry= 2
## - Fold07: mtry= 2
## + Fold07: mtry=27
## - Fold07: mtry=27
## + Fold07: mtry=52
## - Fold07: mtry=52
## + Fold08: mtry= 2
## - Fold08: mtry= 2
## + Fold08: mtry=27
## - Fold08: mtry=27
## + Fold08: mtry=52
## - Fold08: mtry=52
## + Fold09: mtry= 2
## - Fold09: mtry= 2
## + Fold09: mtry=27
## - Fold09: mtry=27
## + Fold09: mtry=52
## - Fold09: mtry=52
## + Fold10: mtry= 2
## - Fold10: mtry= 2
## + Fold10: mtry=27
## - Fold10: mtry=27
## + Fold10: mtry=52
## - Fold10: mtry=52
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 27 on full training set</code></pre>
<pre><code>print(modfitRforest10)</code></pre>
<pre><code>## Random Forest
##
## 5889 samples
## 52 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
##
## Summary of sample sizes: 5301, 5301, 5300, 5301, 5298, 5300, ...
##
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa Accuracy SD Kappa SD
## 2 0.9779269 0.9720659 0.008125489 0.010289454
## 27 0.9787749 0.9731428 0.006700520 0.008476131
## 52 0.9706244 0.9628343 0.008157182 0.010318391
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 27.</code></pre>
<pre><code>print(modfitRforest10$finalModel)</code></pre>
<pre><code>##
## Call:
## randomForest(x = x, y = y, mtry = param$mtry, proximity = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 27
##
## OOB estimate of error rate: 1.83%
## Confusion matrix:
## A B C D E class.error
## A 1668 4 0 1 1 0.003584229
## B 21 1106 9 3 1 0.029824561
## C 0 15 1002 10 0 0.024342746
## D 0 1 21 940 3 0.025906736
## E 0 6 5 7 1065 0.016620499</code></pre>
<pre><code>confusionMatrix(testingPmlNonNACols10$classe,predict(modfitRforest10,testingPmlNonNACols10))</code></pre>
<pre><code>## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 3888 14 1 2 1
## B 44 2563 48 1 1
## C 0 39 2340 16 0
## D 0 2 64 2184 1
## E 0 8 13 24 2479
##
## Overall Statistics
##
## Accuracy : 0.9797
## 95% CI : (0.9772, 0.982)
## No Information Rate : 0.2863
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9743
## Mcnemar's Test P-Value : 7.765e-15
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9888 0.9760 0.9489 0.9807 0.9988
## Specificity 0.9982 0.9915 0.9951 0.9942 0.9960
## Pos Pred Value 0.9954 0.9646 0.9770 0.9702 0.9822
## Neg Pred Value 0.9955 0.9943 0.9889 0.9963 0.9997
## Prevalence 0.2863 0.1912 0.1796 0.1622 0.1807
## Detection Rate 0.2831 0.1866 0.1704 0.1590 0.1805
## Detection Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Balanced Accuracy 0.9935 0.9838 0.9720 0.9874 0.9974</code></pre>
<pre><code>courseraTestResults <- predict( modfitRforest10,testPmlNonNACols)
courseraTestResults</code></pre>
<pre><code>## [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E</code></pre>
</div>
<div id="final-results-for-the-tweaked-model">
<h3>
<a id="final-results-for-the-tweaked-model" class="anchor" href="#final-results-for-the-tweaked-model" aria-hidden="true"><span class="octicon octicon-link"></span></a>Final results for the tweaked model</h3>
<p>With a sample size of 0.3 of the origial the k fold cross validation completes in a reasonable amount of time. The accuracy is 97.97 on the target with an out of box error estimate of 1.83%. The results on the coursera test cases however achieve a 100% target and it yields the same results.</p>
<p>With more processing power it may be possible to use more of the dataset to generate a higher accuracy with k fold cross validation. The default cross validation built into the random forests routine seems to be doing a fine job as is.</p>
</div>
<p></p>
</div>
<p></p>
</div>
<p>
</p>
<footer class="site-footer">
<span class="site-footer-owner"><a href="https://github.com/JagPadala/pml">Pml</a> is maintained by <a href="https://github.com/JagPadala">JagPadala</a>.</span>
<span class="site-footer-credits">This page was generated by <a href="https://pages.github.com">GitHub Pages</a> using the <a href="https://github.com/jasonlong/cayman-theme">Cayman theme</a> by <a href="https://twitter.com/jasonlong">Jason Long</a>.</span>
</footer>
</section>
</body>
</html>