-
Notifications
You must be signed in to change notification settings - Fork 1
/
usefulFunctions-explained.Rmd
1624 lines (1228 loc) · 56.8 KB
/
usefulFunctions-explained.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Functions frequently used"
output:
html_document:
toc: true
toc_float: true
---
This is an R Markdown document explaining all the functions used by the language and learning lab.
The functions are stored into the [tools](https://github.com/n400peanuts/languagelearninglab/tree/master/tools) folder.
## addBf_ranges
This function takes as its input a dataframe which is the output of the BF_set function. This will have a set of betas, standard errors and sd-theory. The function adds and additional column to this table which shows the range of values over which BF’s meet the criteria of (i) strong evdience for null (BF< 1/10); substantial evidence for null (BF< 1/3); ambiguous (3 > BF >1/3 ); substantial evidence H1 (BF >= 10).
```{r}
addBf_ranges <-function(Bf_df, sdtheoryrange)
{
BFranges = vector()
for (b in 1:dim(Bf_df)[1]){
range = Bf_range(sd=as.numeric(as.character(Bf_df$SE[b])), obtained=as.numeric(as.character(Bf_df$'Mean difference'[b])), meanoftheory=0, sdtheoryrange=sdtheoryrange)
from_table = vector()
to_table = vector()
cat = vector()
category_table = vector()
for(i in 1:dim(range)[1]) { # go through each value in the range and categorize it
#categorize current BF
if (range[i,2] <= (1/3)) {
cat[i] = "H0" ## below or equal to 1/3
} else if (range[i,2] < 3) { ## NOT below or equal to 1/3, IS below 3
cat[i] = "ambig"
} else { ## NOT below or equal to 1/3, NOT below 3
cat[i] = "substH1"
}
# adjust the table
j = length(category_table)
if (i==1){ # first one
category_table[j+1] = cat[i]
from_table[j+1] = range[i,1]
} else if (cat[i] != cat[i-1]) { # NOT the first one, IS one where need to start new range
to_table[j] = range[i-1,1]
category_table[j+1] = cat[i]
from_table[j+1] = range[i,1]
}
if (i==dim(range)[1]){ # if its the last one, finish off the table
to_table[j] = range[i,1]
}
}
# go through the little table and turn it int a string of ranges
string = ""
for(i in 1: length(category_table)){
string = paste(string, category_table[i],":", round(from_table[i],3),"-", round(to_table[i],3))
}
BFranges[b] = string
}
out = cbind(Bf_df, BFranges)
return(out)
}
```
## addBf_ranges3
Author: Elizabeth Wonnacott
Similar to addBf_ranges2 except that it works out ranges by working down/up from the given sd_theory value in increments given by stepsizes up to maximum given by maxs. Stepsizes/maxs can either be a single value used across all the coefficients or can be a vector of values (one per row - size must match )
```{r}
addBf_ranges3 = function(Bf_df, stepsizes, maxs, method = "old") {
if(method=="old"){
RRminV = vector()
RRmaxV = vector()
for (b in 1:dim(Bf_df)[1]){
if (length(stepsizes)==1){stepsize = stepsizes} else {stepsize = stepsizes[b]}
if (length(maxs)==1){max = maxs} else {max = maxs[b]}
Ndp = num.decimals(stepsize)
BF = as.numeric(as.character(Bf_df$Bf[b]))
BType= BfClassify(BF)
sdtheory = as.numeric(as.character(Bf_df$sdtheory[b]))
sd = as.numeric(as.character(Bf_df$std.Error[b]))
obtained = as.numeric(as.character(Bf_df$estimate[b]))
tail = as.numeric(as.character(Bf_df$BFtail[b]))
# get the max RRmax
RRmax = ""
lastsdtheory = sdtheory
newsdtheory = ""
while(newsdtheory<= max){
newsdtheory = lastsdtheory + stepsize
newBF = Bf(sd=sd, obtained=obtained, uniform = NULL, meanoftheory=0,sdtheory=newsdtheory, tail=tail)$BayesFactor
if (BfClassify(newBF)!= BType) {
RRmax = format(round(lastsdtheory, Ndp))
break }
lastsdtheory = newsdtheory
}
if(RRmax == "" & BType == "h0") {RRmax = "inf"}
if(RRmax == "" & BType == "h1") {RRmax = paste(">", max, sep="")}
if(RRmax == "" & BType == "ambig") {RRmax = paste(">", max, sep="")}
# get the min RRmin
RRmin = ""
lastsdtheory = sdtheory
newsdtheory = lastsdtheory - stepsize
while(newsdtheory > 0){
newBF = Bf(sd=sd, obtained=obtained, uniform = NULL, meanoftheory=0,sdtheory=newsdtheory, tail=tail)$BayesFactor
if (BfClassify(newBF)!= BType) {
RRmin = format(round(lastsdtheory, Ndp))
break }
lastsdtheory = newsdtheory
newsdtheory = lastsdtheory-stepsize
}
if(RRmin== "" & BType == "ambig") {RRmin = 0}
if(RRmin== "" & BType == "h1") {RRmin = paste(" 0< & <", format(round(lastsdtheory, Ndp), nsmall=Ndp), sep="")}
if(RRmin== "" & BType == "h0") {RRmin = paste(" 0< & <", format(round(lastsdtheory, Ndp), nsmall=Ndp), sep="")}
RRminV[b] = RRmin
RRmaxV[b] = RRmax
}
out = cbind(Bf_df, RRminV, RRmaxV)
colnames(out)[10:11]= c("RRmin","RRmax")
return(out)
}
if(method=="new"){
RRminV = vector()
RRmaxV = vector()
for (b in 1:dim(Bf_df)[1]){
if (length(stepsizes)==1){stepsize = stepsizes} else {stepsize = stepsizes[b]}
if (length(maxs)==1){max = maxs} else {max = maxs[b]}
Ndp = num.decimals(stepsize)
BF = as.numeric(as.character(Bf_df$Bf[b]))
BType= BfClassify(BF)
sdtheory = as.numeric(as.character(Bf_df$sdtheory[b]))
sd = as.numeric(as.character(Bf_df$std.Error[b]))
obtained = as.numeric(as.character(Bf_df$estimate[b]))
tail = as.numeric(as.character(Bf_df$BFtail[b]))
# get the max RRmax
RRmax = ""
lastsdtheory = sdtheory
newsdtheory = ""
while(newsdtheory<= max){
newsdtheory = lastsdtheory + stepsize
if(colnames(Bf_df)[4]=="t"){
dfdata=as.numeric(as.character(Bf_df$df[b]))
newBF = Bf(sd=sd, obtained=obtained, dfdata=dfdata, likelihood = c("t"), modeloftheory= c("normal"),modeoftheory=0,scaleoftheory=newsdtheory, tail=tail, method="new")
}
if(colnames(Bf_df)[4]=="z"){
dfdata=as.numeric(as.character(Bf_df$df[b]))
newBF = Bf(sd=sd, obtained=obtained, likelihood = c("normal"), modeloftheory= c("normal"),modeoftheory=0,scaleoftheory=newsdtheory, tail=tail, method="new")
}
if (BfClassify(newBF)!= BType) {
RRmax = format(round(lastsdtheory, Ndp))
break }
lastsdtheory = newsdtheory
}
if(RRmax == "" & BType == "h0") {RRmax = "inf"}
if(RRmax == "" & BType == "h1") {RRmax = paste(">", max, sep="")}
if(RRmax == "" & BType == "ambig") {RRmax = paste(">", max, sep="")}
# get the min RRmin
RRmin = ""
lastsdtheory = sdtheory
newsdtheory = lastsdtheory - stepsize
while(newsdtheory > 0){
if(colnames(Bf_df)[4]=="t"){
dfdata=as.numeric(as.character(Bf_df$df[b]))
newBF = Bf(sd=sd, obtained=obtained, dfdata=dfdata, likelihood = c("t"), modeloftheory= c("normal"),modeoftheory=0,scaleoftheory=newsdtheory, tail=tail, method="new")
}
if(colnames(Bf_df)[4]=="z"){
dfdata=as.numeric(as.character(Bf_df$df[b]))
newBF = Bf(sd=sd, obtained=obtained, likelihood = c("normal"), modeloftheory= c("normal"),modeoftheory=0,scaleoftheory=newsdtheory, tail=tail, method="new")
}
if (is.nan(newBF)) {
RRmin = format(round(lastsdtheory, Ndp))
break }
if (BfClassify(newBF)!= BType) {
RRmin = format(round(lastsdtheory, Ndp))
break }
lastsdtheory = newsdtheory
newsdtheory = lastsdtheory-stepsize
}
if(RRmin== "" & BType == "ambig") {RRmin = 0}
if(RRmin== "" & BType == "h1") {RRmin = paste(" 0< & <", format(round(lastsdtheory, Ndp), nsmall=Ndp), sep="")}
if(RRmin== "" & BType == "h0") {RRmin = paste(" 0< & <", format(round(lastsdtheory, Ndp), nsmall=Ndp), sep="")}
RRminV[b] = RRmin
RRmaxV[b] = RRmax
}
out = cbind(Bf_df, RRminV, RRmaxV)
colnames(out)[10:11]= c("RRmin", "RRmax")
return(out)
}
}
```
## add_p_tails_match_Bf
This function takes a table output by the Bfs_ranges function (or the BF_model function and for any rows where the tails are set as 1 adjusts the pvalues to be one-tailed. This means that the tails for BFs and p values are consistent)
```{r}
add_p_tails_match_Bf <- function(Bfs.table) {
Bfs.table$p_tails_match_Bf = Bfs.table$p
Bfs.table$p_tails_match_Bf[Bfs.table$BFtail == 1 &
Bfs.table$z > 0] = Bfs.table$p[Bfs.table$BFtail == 1 &
Bfs.table$z > 0] / 2
Bfs.table$p_tails_match_Bf[Bfs.table$BFtail == 1 &
Bfs.table$z < 0] = 1 - Bfs.table$p[Bfs.table$BFtail == 1 &
Bfs.table$z < 0] / 2
Bfs.table$p_tails_match_Bf[Bfs.table$BFtail == 1 &
Bfs.table$t > 0] = Bfs.table$p[Bfs.table$BFtail == 1 &
Bfs.table$t > 0] / 2
Bfs.table$p_tails_match_Bf[Bfs.table$BFtail == 1 &
Bfs.table$t < 0] = 1 - Bfs.table$p[Bfs.table$BFtail == 1 &
Bfs.table$t < 0] / 2
Bfs.table$p_tails_match_Bf = round(Bfs.table$p_tails_match_Bf, 3)
Bfs.table$p = Bfs.table$p_tails_match_Bf
Bfs.table$p_tails_match_Bf = NULL
return(Bfs.table)
}
```
## adjust_intercept_model
Function to adjust the intercept of the model
```{r}
adjust_intercept_model<- function(model, chance, intercept_list = c("(Intercept)"))
{
summary = summary(model)$coefficients
for (i in 1:length(intercept_list)){
summary[intercept_list[i], "Estimate"] = summary[intercept_list[i], "Estimate"]- chance
summary[intercept_list[i], "z value"] = summary[intercept_list[i], "Estimate"]/summary[intercept_list[i], "Std. Error"]
summary[intercept_list[i], "Pr(>|z|)"] = p =2*pnorm(-abs(summary[intercept_list[i], "z value"])) }
summary[intercept_list[i], "Estimate"]
return(summary)
}
```
## BayesFactor *updated*
Author of the new version: [Bence Palfi](http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bence%20Bayes%20factor%20calculator.html).
Author of the old version: [Baguely and Kayne (2010)](http://www.academia.edu/427288/Review_of_Understanding_psychology_as_a_science_An_introduction_to_scientific_and_statistical_inference).
This function computes the Bayes Factor and it's equivalent to the [Dienes (2008) calculator](http://www.lifesci.sussex.ac.uk/home/Zoltan_Dienes/inference/Bayes.htm).
Integration of new and old versions is made available by Catriona Silvey on 16.04.2021
The BF function has been updated recently. The novel function has options for a likelihood that is either normal- or t-distributed, and a model of H1 that is either uniform, or normal or t- (or Cauchy-) distributed, with the normal/t/cauchy models being 1- or 2-tailed. In addition, the 1-tailed models are compatible with any mode (unlike the Dienes, 2008, calculator that assumed that 1-tailed models had a mode of zero).
The Bf function incorporates both old and new methods (Baguley & Kaye, 2010, and Bence Palfi's more recent version). By default, uses B&K version (for backward compatibility) if using Palfi version, specify method = "new".
```{r}
Bf<- function(sd, obtained, dfdata = 1, likelihood = c("normal", "t"),
modeloftheory = c("normal", "t", "cauchy", "uniform"), uniform,
lower=0, upper=1, meanoftheory=0, sdtheory=1, modeoftheory =0, scaleoftheory = 1,
dftheory = 1, tail=1, method = "old"){
if (method == "old") {
area <- 0
if(identical(uniform, 1)){
theta <- lower
range <- upper - lower
incr <- range / 2000
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- 1 / range
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
}else
{theta <- meanoftheory - 5 * sdtheory
incr <- sdtheory / 200
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if(identical(tail, 1)){
if (theta <= 0){
dist_theta <- 0
} else {
dist_theta <- dist_theta * 2
}
}
height <- dist_theta * dnorm(obtained, theta, sd)
area <- area + height * incr
}
}
LikelihoodTheory <- area
Likelihoodnull <- dnorm(obtained, 0, sd)
BayesFactor <- LikelihoodTheory / Likelihoodnull
ret <- list("LikelihoodTheory" = LikelihoodTheory,"Likelihoodnull" = Likelihoodnull, "BayesFactor" = BayesFactor)
ret
} else if (method == "new") {
if(likelihood=="normal"){
dfdata=10^10
}
if(modeloftheory=="normal"){
dftheory = 10^10
} else if(modeloftheory=="cauchy"){
dftheory = 1
}
area <- 0
normarea <- 0
if(modeloftheory=="uniform"){
theta <- lower
range <- upper - lower
incr <- range / 2000
for (A in -1000:1000){
theta <- theta + incr
dist_theta <- 1 / range
height <- dist_theta * dt((obtained-theta)/sd, df=dfdata)
area <- area + height * incr
}
LikelihoodTheory <- area
}else{
theta <- modeoftheory - 10 * scaleoftheory
incr <- scaleoftheory/200
for (A in -2000:2000){
theta <- theta + incr
dist_theta <- dt((theta-modeoftheory)/scaleoftheory, df=dftheory)
if(identical(tail, 1)){
if (theta <= modeoftheory){
dist_theta <- 0
} else {
dist_theta <- dist_theta * 2
}
}
height <- dist_theta * dt((obtained-theta)/sd, df = dfdata)
area <- area + height * incr
normarea <- normarea + dist_theta*incr
}
LikelihoodTheory <- area/normarea
}
Likelihoodnull <- dt(obtained/sd, df = dfdata)
BayesFactor <- LikelihoodTheory/Likelihoodnull
BayesFactor
}
}
```
## BfClassify
Author: Elizabeth Wonnacott
Classifies a Bayes factors as substantial for h1/h0 or ambigious (is used by addBf_ranges3) is used by Bf_addranges function below
```{r}
BfClassify<-function(B)
{
if(B<=(1/3)){ "h0"}
else if (B>=3) {"h1"}
else {"ambig"}
}
```
## Bf_model
Author: Elizabeth Wonnacott
This function calls the Bf function described above. Summary should be the coefficients summary from an lmer/gmer model (obtained via #summary(model)$coefficients). If its a lmer model, it expects there to be a columns "df" and "p" (i.e.summary function has been called using lmerTest package which applieds Kenward-Roger approximation for denominator degrees of freedom and computes p values).
For each coefficient, the model uses the estimate as the obtained mean and the Std.Error as the measure of standard error.
The function allows for positive and negtive h1 values. If negative, since the BF function requires a positive value for sdtheory, both sdtheory and #obtained are multiplied by -1.
It doesn't round unless there is an argument "digits" which says how many dp to round to- in this case the round_df function below is called.
```{r}
Bf_model = function(coeff_summary, coeff_list, h1_list,
h1_motivation, tail_list, digits = "", method = "old") {
if(method == "old") {
Bfs = vector('double')
estimates = vector('double')
sterrors = vector('double')
sdtheorys = vector('double')
pvalues = vector('double')
tzvalues = vector('double')
for (i in 1:length(coeff_list)) {
sd_error = coeff_summary[coeff_list[i], "Std. Error"]
obtained = coeff_summary[coeff_list[i], "Estimate"]
stdtheory = h1_list[i]
if(h1_list[i] < 0) {
stdtheory = h1_list[i] * -1
obtained = (coeff_summary[coeff_list[i], "Estimate"] * -1)
}
Bfs[i] = Bf(sd_error, obtained, uniform = 0, meanoftheory = 0,
sdtheory = stdtheory, tail = tail_list[i])$BayesFactor
estimates[i] = obtained
sdtheorys[i] = stdtheory
sterrors[i] = sd_error
if(colnames(coeff_summary)[3] == "z value") {
tzlabel = "z"
tzvalues[i] = abs(coeff_summary[coeff_list[i], 3]) * sign(obtained)
pvalues[i] = coeff_summary[coeff_list[i], 4]
} else if(colnames(coeff_summary)[3] == "df") {
tzlabel = "t"
tzvalues[i] = abs(coeff_summary[coeff_list[i], 4]) * sign(obtained)
pvalues[i] = coeff_summary[coeff_list[i], 5]
}
}
df = data.frame(cbind(coeff_list, estimates, sterrors, tzvalues,
pvalues, sdtheorys, tail_list, Bfs,h1_motivation ))
colnames(df) = c("coefficient", "estimate", "std.Error", tzlabel, "p",
"sdtheory", "BFtail", "Bf", "h1 motivation")
df$estimate = as.numeric(as.character(df$estimate))
df$std.Error = as.numeric(as.character(df$std.Error))
df$sdtheory = as.numeric(as.character(df$sdtheory))
df$Bf = as.numeric(as.character(df$Bf))
df$p = as.numeric(as.character(df$p))
if (tzlabel == "z") {df$z = as.numeric(as.character(df$z))}
if (tzlabel == "t") {df$t = as.numeric(as.character(df$t))}
if (digits != "") {df = round_df(df, digits)}
return(df)
}
if(method == "new") {
Bfs = vector('double')
estimates = vector('double')
sterrors = vector('double')
sdtheorys = vector('double')
pvalues = vector('double')
tzvalues = vector('double')
dfvalues = vector('double')
if(colnames(coeff_summary)[3] == "df"){tzlabel = "t"}
if(colnames(coeff_summary)[3] == "z value"){tzlabel = "z"}
for (i in 1:length(coeff_list)) {
sd = coeff_summary[coeff_list[i], "Std. Error"]
obtained = coeff_summary[coeff_list[i], "Estimate"]
stdtheory = h1_list[i]
if(h1_list[i] < 0) {
stdtheory = h1_list[i] * -1
obtained = (coeff_summary[coeff_list[i], "Estimate"] * -1)}
if(tzlabel == "t") {
dfdata = coeff_summary[coeff_list[i], "df"]
Bfs[i] = Bf(sd, obtained, dfdata, likelihood = c("t"),
modeloftheory = c("normal"), modeoftheory = 0,
scaleoftheory = stdtheory, tail = tail_list[i],
method = "new")
}
if(tzlabel == "z") {
Bfs[i] = Bf(sd, obtained, likelihood = c("normal"),
modeloftheory = c("normal"), modeoftheory = 0,
scaleoftheory = stdtheory, tail = tail_list[i],
method = "new")
}
estimates[i] = obtained
sdtheorys[i] = stdtheory
sterrors[i] = sd
if(tzlabel == "z") {
tzvalues[i] = abs(coeff_summary[coeff_list[i], 3]) * sign(obtained)
pvalues[i] = coeff_summary[coeff_list[i], 4]
} else if(tzlabel == "t") {
dfvalues[i] = dfdata
tzvalues[i] = abs(coeff_summary[coeff_list[i], 4]) * sign(obtained)
pvalues[i] = coeff_summary[coeff_list[i], 5]
}
}
if(tzlabel == "z") {
df = data.frame(cbind(coeff_list, estimates, sterrors, tzvalues, pvalues, sdtheorys, tail_list, Bfs, h1_motivation))
colnames(df) = c("coefficient", "estimate", "std.Error", tzlabel, "p", "sdtheory", "BFtail", "Bf", "h1 motivation")
}
if(tzlabel == "t") {df = data.frame(cbind(coeff_list, estimates, sterrors, tzvalues, dfvalues, pvalues, sdtheorys, tail_list, Bfs, h1_motivation))
colnames(df) = c("coefficient", "estimate", "std.Error", tzlabel, "df", "p", "sdtheory", "BFtail", "Bf", "h1 motivation")
}
df$estimate = as.numeric(as.character(df$estimate))
df$std.Error = as.numeric(as.character(df$std.Error))
df$sdtheory = as.numeric(as.character(df$sdtheory))
df$Bf = as.numeric(as.character(df$Bf))
df$p = as.numeric(as.character(df$p))
if (tzlabel == "z") {df$z = as.numeric(as.character(df$z))}
if (tzlabel == "t") {df$t = as.numeric(as.character(df$t))}
if (digits != "") {df = round_df(df, digits)}
return(df)
}
}
```
## Bf_powercalc
Author: Elizabeth Wonnacott.
This works with the Bf function above. It requires the same values as that function (i.e. the obtained mean and SE for the current sample, a value for the predicted mean, which is set to be sdtheory (with meanoftheory=0), and the current number of participants N). However rather than return BF for current sample, it works out what the BF would be for a range of different subject numbers (assuming that the SE scales with sqrt(N)).
```{r Bf_powercalc, message=FALSE, warning=FALSE}
Bf_powercalc<-function(sd, obtained, uniform, lower=0, upper=1, meanoftheory=0, sdtheory=1, tail=1, N, min, max)
{
x = c(0)
y = c(0)
# note: working out what the difference between N and df is (for the contrast between two groups, this is 2; for constraints where there is 4 groups this will be 3, etc.)
for(newN in min : max)
{
B = as.numeric(Bf(sd = sd*sqrt(N/newN), obtained, uniform, lower, upper, meanoftheory, sdtheory, tail)[3])
x= append(x,newN)
y= append(y,B)
output = cbind(x,y)
}
output = output[-1,]
return(output)
}
```
## Bf_range
Author: Elizabeth Wonnacott
Updated by: Catriona Silvey on 16.04.2021
This works with the Bf function above. It requires the obtained mean and SE for the current sample and works out what the BF would be for a range of predicted means (which are set to be sdtheoryrange (with meanoftheory=0)).
Bf_range function can be used with both old and new versions of Bf function. By default, uses old version (for backward compatibility) if using new version, specify method = "new". The "new" function is an update of the Bf_range function in order to use the new Bf updated version. It works with normal, t and Cauchy H1s, and normal or t likelihoods.
```{r}
Bf_range <- function(sd, obtained, dfdata = 1, likelihood = c("normal", "t"),
modeloftheory= c("normal","t","cauchy") , meanoftheory = 0,
modeoftheory = 0, sdtheoryrange, dftheory = 1, tail = 1, method = "old")
{
if (method == "old") {
x = c(0)
y = c(0)
for(sdi in sdtheoryrange)
{
#sdi = sdtheoryrange[1]
# uses old Bf method
B = as.numeric(Bf(sd, obtained, meanoftheory=0, uniform = 0, sdtheory=sdi, tail=tail)[3])
#following line corrects for the fact that the calcuator does not correctly compute BF when sdtheory==0; this code ensures that if sdtheory ==0, BF=1
if (sdi ==0 ) {B=1}
x= append(x,sdi)
y= append(y,B)
output = cbind(x,y)
}
}
else if (method == "new") {
x = c(0)
y = c(0)
for(sdi in sdtheoryrange)
{
#sdi = sdtheoryrange[1]
# uses new Bf method
B = as.numeric(Bf(sd = sd, obtained = obtained, dfdata = dfdata, likelihood = likelihood,
modeloftheory = modeloftheory, modeoftheory=modeoftheory, scaleoftheory=sdi,
dftheory = dftheory, tail = tail, method="new"))
#following line corrects for the fact that the calcuator does not correctly compute BF when sdtheory==0; this code ensures that if sdtheory ==0, BF=1
if (sdi ==0 ) {B=1}
x= append(x,sdi)
y= append(y,B)
output = cbind(x,y)
}
}
output = output[-1,]
colnames(output) = c("sdtheory", "BF")
return(output)
}
```
## Bf_set
Bf_set is a wrapper around Bf which allows us to caculate a set of BFs and put them in a table. It can only be used with non-uniform method and each h1 must be positive (as for original calculator).
```{r}
Bf_set <-function(names_list, meandiff_list, sd_list, h1_list, tail_list)
{
Bfs = vector('double')
for (i in 1:length(meandiff_list)){
Bfs[i] = Bf(sd_list[i], meandiff_list[i], uniform = 0, meanoftheory = 0, sdtheory=h1_list[i] , tail = tail_list[i])$BayesFactor
}
df = data.frame(names_list, cbind(round(meandiff_list,3), round(sd_list,3), h1_list, round(Bfs,3)))
colnames(df) = c("Contrast", "Mean difference", "SE", "H1 estimate", "BF" )
return(df)
kable(df)
}
```
## confidence interval calculators (95% CI)
lower_ci.R and upper_ci.R are functions made available publicly in this [Rcommunity post](https://community.rstudio.com/t/computing-confidence-intervals-with-dplyr/31868)
Example of how to use it::
```{r, eval=FALSE}
library(tidyverse)
test <- data.frame(
n = c(298, 298, 298, 298, 298, 298, 298, 298, 298, 298, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
run = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
mAP = c(0.8112, 0.8006, 0.8076, 0.7999, 0.8067, 0.8046, 0.8004, 0.799,
0.8052, 0.8002, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.8333, 0.8333,
0.8333, 1, 0.8333, 1, 1, 0.8333, 1, 1)
)
lower_ci <- function(mean, se, n, conf_level = 0.95){
lower_ci <- mean - qt(1 - ((1 - conf_level) / 2), n - 1) * se
}
upper_ci <- function(mean, se, n, conf_level = 0.95){
upper_ci <- mean + qt(1 - ((1 - conf_level) / 2), n - 1) * se
}
foobar <- test %>%
group_by(n) %>%
summarise(smean = mean(mAP, na.rm = TRUE),
ssd = sd(mAP, na.rm = TRUE)) %>%
mutate(se = ssd / sqrt(n),
lower_ci = lower_ci(smean, se, n),
upper_ci = upper_ci(smean, se, n))
foobar
```
Functions below:
```{r}
lower_ci <- function(mean, se, n, conf_level = 0.95){
lower_ci <- mean - qt(1 - ((1 - conf_level) / 2), n - 1) * se
}
upper_ci <- function(mean, se, n, conf_level = 0.95){
upper_ci <- mean + qt(1 - ((1 - conf_level) / 2), n - 1) * se
}
```
## deleteRandomRows
This takes a dataframe (df) and an integer (n) and returns a df which has n rows randomly deleted
```{r}
deleteRandomRows = function(df,n){
indices <- sample(1:nrow(df), n)
df <- df[-indices,]
return(df)
}
```
## download_from_Gorilla
Author: Eva Viviani
This function is a wrapper around [Ferenc Igali's script.](https://github.com/University-of-Sheffield-EduTech/data-endpoint-gorilla.io/blob/master/Preview_Downloader_Script.R?fbclid=IwAR2zI3ZHgFKqZchCI1yEghEaGXgoDtJ_Fr3OWOGG-Iz_ki6jk30clYG3aR8)
It downloads the data from Gorilla automatically, provided that you set your Gorilla login credentials into R.
**TO SET YOUR GORILLA LOGIN CREDENTIALS:**
Install "keyring" package and run in R the following lines:
- keyring::key_set(service = "Gorilla", username = "[email protected]")
Change the "username" with the email you use for login into Gorilla. Keyring package will ask and save your password within R. These information are passed into the function below to log into Gorilla and access your data. **Only if you have set your login credentials, you can run the "download_from_Gorilla" function below.**
Arguments:
- output: a character string with the name of the folder where the downloaded file is saved
- ulr: a character string, or a vector of characters, with the links (starting with "https") that I want to download
- myGorillaemail: the email that I use to log into Gorilla
Example of how to use the function:
```{r download_from_Gorilla example, eval=F}
output <- c("C:/Users/...")
previewDownloadFile <- read.csv("data.csv") #this is a preview file of the eyetracker zone that contains in the "Response" column my links. One for each trial.
url <- previewDownloadFile[grepl("https", previewDownloadFile$Response),]$Response
download_from_Gorilla(output, url, "[email protected]")
```
Function below:
```{r download_from_Gorilla}
download_from_Gorilla <-function(output, url, myGorillaemail){
if (!require(downloader)) {
stop("downloader not installed")
} else if (!require(keyring)){
stop("keyring not installed")
} else if (!require(httr)){
stop("httr not installed")
} else {
print("------ start download -------")
};
#---------------------- get credentials and login ----------------------------#
login <- list(
email = myGorillaemail,
password = keyring::key_get("Gorilla", myGorillaemail))
login_res <- POST("https://gorilla.sc/api/login", body = login, encode = "form")
#---------------------- cycle over the list of urls --------------------------#
for (i in 1:length(url)){
file_end <- ""
experiment_download_url <- GET(url[i])
content_type_you_need_to_save <- experiment_download_url$headers$`content-type`
#------------------------- check content type -------------------------------#
if (content_type_you_need_to_save == "text/csv") {
file_end <- ".csv"
} else if (content_type_you_need_to_save == "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet") {
file_end <- ".xlsx"
} else if (content_type_you_need_to_save == "audio/wav") {
file_end <- ".weba"
} else if (content_type_you_need_to_save == "video/webm;codecs=vp8") {
file_end <- ".webm"
}
file_download_url <- experiment_download_url$url
file_name_saver <- paste0(substring(as.character(url[i]),54,nchar(as.character(url[i]))))
#download the file
downloader::download(file_download_url, paste0(output, file_name_saver), mode = "wb", quiet = T)
print(paste0("file ",i))
}
print("------ download done -------")
}
```
## filter2
This is a function which filters a column of data removing values which are some number of standard deviations above/below the mean for that participant, possibly in some condition/subcondition.
- im: the input matrix (a data frame)
- svn: a list of the names of factors to be group by (subject name + one or more conditions)
- fn: the name of the column containing the data to be filtered
- lim: how many standard deviations above/below the mean to filter
The function returns an input matrix identical to the input matrix but with an additional columns giving the group means and the "filtered" data.
```{r}
filter2 = function(im, svn, fn, lim)
{
## work out means lisfor each subject for each word
x = list()
y = ""
for(n in svn) x=append(im[n],x)
for(n in svn) y=paste(y,n,sep="_")
means = aggregate(im[fn], by = x, mean, na.rm=T)
head(means)
nocols = dim(means)[2]
colnames(means)[nocols] = "means"
sds = aggregate(im[fn], by = x, sd, na.rm=T)
head(sds)
nocols = dim(sds)[2]
colnames(sds)[nocols] = "sds"
gs = merge(means,sds)
## because if there is just one value it doesn't have a stand deviation and don't want to just disregard all of these
gs$sds[is.na(gs$sds)] = 0
gs$max = gs$means + lim*gs$sds
gs$min = gs$means- lim*gs$sds
im2 = merge(im,gs, sort=F)
im2[paste(fn,"filt",sep="")] = im2[fn]
cn= dim(im2)[2] ## get colnumber (last one added)
im2[,cn][im2[,fn]> im2$max] = ""
im2[,cn][im2[,fn]< im2$min] = ""
im2[,cn]= as.numeric(im2[,cn])
names(im2)[names(im2)=="means"] = paste("mean", y, sep="_")
names(im2)[names(im2)=="sds"] = paste("sd", y, sep="_")
names(im2)[names(im2)=="max"] = paste("max", y, sep="_")
names(im2)[names(im2)=="min"] = paste("min", y, sep="_")
return(im2)
}
```
## generate_bin
Author: Catriona Silvey
This function generates some multilevel binary data.
Arguments:
- n_subj = number of subjects
- n_obs = total number of trials per subject
- alpha = veridical grand mean log odds performance
- beta1 = veridical effect of cond1
- beta2 = veridical effect of cond2
- beta3 = veridical cond1 * cond2 interaction effect
- subj_corrs = list of correlations between random effects
- subj_tau = list of SDs of random effects
```{r}
generate_bin <- function(n_subj, n_obs, alpha, beta1, beta2, beta3, subj_corrs, subj_tau) {
# make data frame where number of rows = number of subjects * number of trials per subject
data <- data.frame(matrix(0, ncol=0, nrow = n_subj * n_obs))
# make subject vector and add to this data frame
data$subject <- as.factor(rep(seq(1:n_subj), each = n_obs))
# make condition 1 vector - within subjects
# half trials one value, half trials the other
data$cond1 <- as.factor(rep(c(0,1), each = n_obs/2))
# make centred version
data$c_cond1 <- as.numeric(data$cond1) - mean(as.numeric(data$cond1))
# make condition 2 vector - also within subjects
# 1/4 trials one value, 1/4 trials the other, then repeat
# this ensures cond1 and cond2 are not identical
data$cond2 <- as.factor(rep(c(0,1), each = n_obs/4))
# make centred version
data$c_cond2 <- as.numeric(data$cond2) - mean(as.numeric(data$cond2))
# for subject effects
# first, we put the correlations between the random effects in a matrix
# * if changing to simulate fewer random effects & hence fewer correlations,
# this will need to be adjusted
corr_matrix <- matrix(c(1, subj_corrs[1], subj_corrs[2], subj_corrs[3],
subj_corrs[1], 1, subj_corrs[4], subj_corrs[5],
subj_corrs[2], subj_corrs[4], 1, subj_corrs[6],
subj_corrs[3], subj_corrs[5], subj_corrs[6], 1), nrow = 4)
# next, construct variance covariance matrix for subject effects
# We multiply the subject effect sds (in matrix form) by the correlation matrix
# and then again by the subject effect sds
# so we end up with the sds squared (on the diagonal) = variance,
# and covariances between each pair of subject effects on the off-diagonal
# * if changing to simulate fewer random effects, this should still work fine,
# assuming corr_matrix has been adjusted appropriately
subj_v_cov <- diag(subj_tau) %*% corr_matrix %*% diag(subj_tau)
# Create the correlated subject effects, using mvrnorm to sample from multivariate normal distribution
# means of subject intercepts and slopes are 0
u <- mvrnorm(n = n_subj, c(0,0,0,0), subj_v_cov)
# check the correlation - this should be fairly accurate in large samples
# print(cor(u))
# check the SDs - again, should be fairly accurate in large samples
# print(sd(u[,1]))
# print(sd(u[,2]))
# print(sd(u[,3]))
# print(sd(u[,4]))
# finally, generate data on the basis of these parameters
data <- data %>%
mutate(
# We first calculate the linear predictor eta for each row in the data frame
# = overall intercept + subject intercept +
eta = alpha + u[data$subject,1] +
# cond1 value * (cond1 fixed effect + cond1 random slope) +
data$c_cond1 * (beta1 + u[data$subject,2]) +
# cond2 value * (cond2 fixed effect + cond2 random slope) +
data$c_cond2 * (beta2 + u[data$subject,3]) +
# cond1 * cond2 value * (interaction fixed effect + interaction random slope) +
(data$c_cond1 * data$c_cond2) * (beta3 + u[data$subject,4]),
# then transform by inverse logit to a probability (for this combo of subject/condition)
mu = inv.logit(eta),
# finally, generate a 0 or 1 for this row based on the probability for this subject/condition
y = rbinom(nrow(data),1,mu))
return(data)
}
```
## getmode
Author: [Tutorials point](https://www.tutorialspoint.com/r/r_mean_median_mode.htm).
```{r}
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
```
## get_coeffs
This function allows us to inspect particular coefficients from the output of an LME model by putting them in table.
- x: the output returned when running lmer or glmer (i.e. an object of type lmerMod or glmerMod)
- list: a list of the names of the coefficients to be extracted (e.g. c(“variable1”, “variable1:variable2”))
```{r}
get_coeffs <- function(x,list){(as.data.frame(summary(x)$coefficients)[list,])}
```
## get_coeffs2
outputs a set of specific coefficients for an lmer model
x must match the form summary(lmermodel)$coefficients as applied to the output of an lmermodel. List should be a list of the coefficients which should be reported.
```{r}
get_coeffs2 <- function(x,list){(x[list,])}
```
## inverse_log_odd
It takes a log-odds value and calculates the proportion
```{r}
inverse_log_odd<-function(lo) {exp(lo)/(1 + exp(lo))}
```
## lizCenter