This repository has been archived by the owner on Jul 28, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
analysis.Rmd
192 lines (130 loc) · 5.34 KB
/
analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---
title: "Tooth Growth Analysis"
output: github_document
---
In this project, we are going to explore the ToothGrowth R dataset and do inferential data analysis aiming to answer the following questions:
1 - Is there a relation between tooth length and supplement used?
2 - Does changing the supplement dosage interfere with the tooth length?
The project is set in literate programming, allowing full research reproducibility.
## Tools Used
- R language compiler
- R base graphic devices
- Tidyverse library packages
- RMarkdown library package
- Knitr library
package
## Files
- **[CODEBOOK](https://github.com/vcwild/tooth-growth/blob/master/analysis.pdf)**:
step-by-step book explaining the code
processing.
- **[Figures](https://github.com/vcwild/tooth-growth/tree/master/analysis_files/figure-gfm)**:
the plotted
images
- **[Infsim.Rmd](https://github.com/vcwild/tooth-growth/blob/master/analysis.Rmd)**:
the script to compile the project from source
# CODEBOOK
## Setup
```{r setup, include=TRUE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE, warning = FALSE)
```
### Provide a basic summary of the data
Loading required packages and data set
```{r libs}
require(tidyverse)
data <- ToothGrowth
```
## Exploratory Data Analysis
### Summary of the data
```{r summary}
means <- data %>%
group_by(supp, dose) %>%
summarize(tooth.length = mean(len))
means
# Classes
nclass = round(sqrt(length(data$len)))
hist(data$len, col = "slateblue1", main = "Histogram of Tooth Length", xlab = "Tooth Length", nclass = nclass)
# OJ - orange juice
# VC - vitamin C
```
## Inferential Analysis
#### Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose
### Hypotesis 1 - Relation between tooth length and the supplement used
```{r powerone}
# Separate samples for analysis
supp = data$supp
len_by_OJ = data$len[supp == "OJ"] #sample_b
len_by_VC = data$len[supp == "VC"] #sample_a
delta = abs(mean(len_by_OJ) - mean(len_by_VC))
pooledsd = (sd(len_by_OJ) + sd(len_by_VC))/2
t1_error = 0.05
# Statistical Power
power.t.test(n = length(len_by_VC), sig.level = t1_error, delta = delta, sd = pooledsd, alternative = "one.sided", type = "two.sample")$power
```
The statistical power for the t-test is too low!! (below .80)
The outcome of the t-test can result in a false positive!
```{r hypone}
# The Null Hypothesis
# H0: µa >= µb
# The Alternative Hypothesis
# Ha: µa < µb
# T-test: non-paired, variance based on approx of df, µa < H0
t.test(len_by_VC, len_by_OJ, conf.level = .95, var.equal = TRUE, paired = FALSE, alternative = "less")$p.value
```
The p-value shows strong evidence to reject the null hypothesis (pval < 0.05), indicating high significance for the alternative hypothesis between the tooth length and the supplement used.
### Hypothesis Conclusion
We can imply that the p-value for this observation is too high in relation to the threshold, and the statistical power of the analysis is too low, meaning that the alternative hypothesis may be a false positive.
### Hypothesis 2 - Comparing tooth growth by dose
```{r hyptwopower}
# Separate samples for analysis
half = data$len[data$dose == .5] # sample_a
one = data$len[data$dose == 1] # sample_b
two = data$len[data$dose == 2] # sample_c
length(half) == length(one)
delta = abs(mean(half) - mean(one))
pooledsd = (sd(half) + sd(one))/2
t1_error = 0.05
# Statistical Power
power.t.test(n = length(half), sig.level = t1_error, delta = delta, sd = pooledsd, alternative = "one.sided", type = "two.sample")$power
```
Very high Statistical Power, meaning the outcome of the t-test is very accurate!!
```{r hyptwoless}
# The Null Hypothesis
# H0: µa >= µb
# The Alternative Hypothesis
# Ha: µa < µb
# T-test, non-paired, var by approx of df, µa < µb
t.test(half, one, alternative = "less", paired = FALSE, var.equal = TRUE, conf.level = .95)$p.value
```
The p-value shows strong evidence to reject the null hypothesis (pval < 0.05), indicating proof for the alternative hypothesis between lower supplement doses and tooth length.
```{r hyptwopower2}
length(two) == length(one)
delta = abs(mean(two) - mean(one))
pooledsd = (sd(two) + sd(one))/2
t1_error = 0.05
# Statistical Power
power.t.test(n = length(two), sig.level = t1_error, delta = delta, sd = pooledsd, alternative = "one.sided", type = "two.sample")$power
```
Very high Statistical Power, meaning the outcome of the t-test is very accurate!!
```{r hyptwogreater}
# The Null Hypothesis
# H0: µc <= µb
# The Alternative Hypothesis
# Ha: µc > µb
# T-test, non-paired, var by approx of df, µc > µb
t.test(two, one, alternative = "greater", paired = FALSE, var.equal = TRUE, conf.level = .95)$p.value
```
The p-value shows strong evidence to reject the null hypothesis (pval < 0.05), indicating proof for the alternative hypothesis between higher supplement doses and tooth length.
### Hypothesis Conclusion
We can confirm with a 95% confidence interval that, for the true population distribution, the supplement dosage level interfere with tooth growth.
We can visualize this statement in the following plot:
```{r concplot}
ggplot(data, aes(dose, len, group = supp, fill = supp)) +
geom_bar(stat = "identity", position = "dodge", color = "white") +
labs(
x = "Dose",
y = "Tooth Length",
fill = "Supplement"
) +
coord_flip() +
theme_minimal()
```