diff --git a/conditionals.qmd b/conditionals.qmd index 265570d..823b0a7 100644 --- a/conditionals.qmd +++ b/conditionals.qmd @@ -3,6 +3,7 @@ ```{r} #| echo: false #| message: false + source("_common.R") ``` @@ -48,6 +49,7 @@ f <- function(x) { cat("D") } ``` + ```{r} f(0) f(1) @@ -65,6 +67,7 @@ absValue <- function(x) { return(x) } ``` + ```{r} absValue(7) # Returns 7 absValue(-7) # Also returns 7 @@ -112,6 +115,7 @@ f <- function(x) { cat("G") } ``` + ```{r} f(0) f(1) @@ -138,6 +142,7 @@ getLetterGrade <- function(score) { return(grade) } ``` + ```{r} cat("103 -->", getLetterGrade(103)) cat(" 88 -->", getLetterGrade(88)) diff --git a/creating-functions.qmd b/creating-functions.qmd index b233223..f1a69e8 100644 --- a/creating-functions.qmd +++ b/creating-functions.qmd @@ -3,6 +3,7 @@ ```{r} #| echo: false #| message: false + source("_common.R") ``` @@ -55,7 +56,10 @@ For example, here's the function `mySqrt(n)`, which returns the square root of ` |`mySqrt` | `<-` | `function` | `(n)` | `{ return(n^0.5) }` | And here's `mySqrt(n)` written in the typical format: -```{r, eval=FALSE} + +```{r} +#| eval: false + mySqrt <- function(n) { return(n^0.5) } @@ -71,6 +75,7 @@ square <- function(x) { return(y) } ``` + ```{r} square(2) square(8) @@ -84,6 +89,7 @@ sumTwoValues <- function(x, y) { return(value) } ``` + ```{r} sumTwoValues(2, 3) sumTwoValues(3, 4) @@ -96,6 +102,7 @@ doSomething <- function() { cat("Carpe diem!") # The cat() function prints whatever's inside it to the console } ``` + ```{r} doSomething() ``` @@ -109,6 +116,7 @@ f <- function(x, y=10) { return(x + y) } ``` + ```{r} f(5) # 15 f(5, 1) # 6 @@ -123,6 +131,7 @@ isPositive <- function(x) { return (x > 0) } ``` + ```{r} isPositive(5) # TRUE isPositive(-5) # FALSE @@ -138,6 +147,7 @@ isPositive <- function(x) { cat("Goodbye!") # Does not run ("dead code") } ``` + ```{r} x <- isPositive(5) # Prints Hello, then assigns TRUE to x x @@ -152,15 +162,18 @@ f <- function(x) { x + 42 } ``` + ```{r} f(5) ``` + ```{r} f <- function(x) { x + 42 x + 7 } ``` + ```{r} f(5) ``` @@ -174,6 +187,7 @@ printX <- function(x) { cat("The value of x provided is", x) } ``` + ```{r} printX(7) printX(42) @@ -186,6 +200,7 @@ cubed <- function(x) { cat(x^3) } ``` + ```{r} cubed(2) # Seems to work 2*cubed(2) # Expected 16...didn't work @@ -198,6 +213,7 @@ cubed <- function(x) { return(x^3) # That's better! } ``` + ```{r} cubed(2) # Works! 2*cubed(2) # Works! @@ -235,6 +251,7 @@ minSquared <- function(x, y) { return(smaller^2) } ``` + ```{r} minSquared(3, 4) minSquared(4, 3) @@ -242,7 +259,9 @@ minSquared(4, 3) If you try to call a local variable in the global environment, you'll get an error: -```{r error=TRUE} +```{r} +#| error: true + square <- function(x) { y <- x^2 return(y) @@ -252,15 +271,21 @@ y _"Global"_ variables are those in the global environment. These will show up in the "Environment" pane in RStudio. You can call these inside functions, but this is **BAD** practice. Here's an example (**Don't do this!**): -```{r include=FALSE} +```{r} +#| include: false + n <- NULL ``` -```{r error=TRUE} + +```{r} +#| error: true + printN <- function() { cat(n) # n is not local -- so it is global (bad idea!!!) } printN() # Nothing happens because n isn't defined ``` + ```{r} n = 5 # Define n in the global environment printN() diff --git a/data-analysis.qmd b/data-analysis.qmd index 001a1a6..ded1f48 100644 --- a/data-analysis.qmd +++ b/data-analysis.qmd @@ -3,6 +3,7 @@ ```{r} #| echo: false #| message: false + source("_common.R") ``` @@ -35,10 +36,10 @@ head(orings) We can see that the dataset contains observations about the temperatures of launches and O-ring damage, but we don't yet have _information_. One step forward towards _information_ is to simply plot the data to _see_ if there might be a relationship between temperature and O-ring damage: ```{r} +#| label: "challenger-temps" #| message: false #| fig.width: 8 #| fig.height: 3 -#| label: "challenger-temps" library(ggplot2) diff --git a/data-frames.qmd b/data-frames.qmd index 5238379..1fbd68e 100644 --- a/data-frames.qmd +++ b/data-frames.qmd @@ -3,6 +3,7 @@ ```{r} #| echo: false #| message: false + source("_common.R") ``` @@ -53,6 +54,7 @@ The [tibble](https://r4ds.had.co.nz/tibbles.html) is an improved version of the ```{r} #| message: false + library(dplyr) ``` @@ -192,6 +194,7 @@ beatles$deceased == FALSE ``` Then, you could insert this logical vector in the row position of the `[]` brackets to filter only the rows that are `TRUE`: + ```{r} beatles[beatles$deceased == FALSE,] ``` @@ -250,21 +253,33 @@ There are generally two ways to load external data. Many R packages come with pre-loaded datasets. For example, the **ggplot2** library (which we'll [use soon](data-visualization.html) to make plots in R) comes with the `msleep` dataset already loaded. To see this, install **ggplot2** and load the library: -```{r, eval=FALSE, message=FALSE} -install.packages("ggplot2") +```{r} +#| eval: false +#| message: false + +# install.packages("ggplot2") # Do this only once! library(ggplot2) + head(msleep) # Preview just the first 6 rows of the data frame ``` -```{r, echo=FALSE, message=FALSE} + +```{r} +#| echo: false +#| message: false + library(ggplot2) + head(msleep) ``` If you want to see all of the different datasets that any particular package contains, you can call the `data()` function after loading a library. For example, here are all the dataset that are contained in the **ggplot2** library: -```{r, eval=FALSE} +```{r} +#| eval: false + data(package = "ggplot2") ``` + ``` Data sets in package 'ggplot2': @@ -307,8 +322,11 @@ pathToData <- here('data', 'data.csv') 2. Import the data -```{r, eval=FALSE} +```{r} +#| eval: false + library(readr) + df <- read_csv(pathToData) ``` @@ -361,6 +379,7 @@ Now load the data: ```{r} library(readr) + msleep <- read_csv(here('data', 'msleep.csv')) ``` @@ -431,50 +450,62 @@ But just to give you an idea of where we're going, here are a few pieces of info 1) It appears that mammalian brain and body weight are logarithmically correlated - cool! ```{r} +#| label: 'msleep-scatter1' #| fig.height: 4 #| fig.width: 6 #| message: false #| warning: false library(ggplot2) -ggplot(msleep, aes(x=brainwt, y=bodywt)) + + +ggplot(msleep, aes(x = brainwt, y = bodywt)) + geom_point(alpha=0.6) + - stat_smooth(method='lm', col='red', se=F, size=0.7) + + stat_smooth(method = 'lm', col = 'red', se = FALSE, size = 0.7) + scale_x_log10() + scale_y_log10() + - labs(x='log(brain weight) in g', y='log(body weight) in kg') + + labs( + x = 'log(brain weight) in g', + y = 'log(body weight) in kg' + ) + theme_minimal() ``` 2) It appears there may also be a negative, logarithmic relationship (albeit weaker) between the size of mammalian brains and how much they sleep - cool! ```{r} +#| label: 'msleep-scatter2' #| fig.height: 4 #| fig.width: 6 #| message: false #| warning: false -ggplot(msleep, aes(x=brainwt, y=sleep_total)) + - geom_point(alpha=0.6) + +ggplot(msleep, aes(x = brainwt, y = sleep_total)) + + geom_point(alpha = 0.6) + scale_x_log10() + scale_y_log10() + - stat_smooth(method='lm', col='red', se=F, size=0.7) + - labs(x='log(brain weight) in g', y='log(total sleep time) in hours') + + stat_smooth(method = 'lm', col = 'red', se = FALSE, size = 0.7) + + labs( + x = 'log(brain weight) in g', + y = 'log(total sleep time) in hours' + ) + theme_minimal() ``` 3) Wow, there's a lot of variation in how much different mammals sleep - cool! ```{r} +#| label: 'msleep-bars' #| fig.height: 4 #| fig.width: 6 #| message: false #| warning: false -ggplot(msleep, aes(x=sleep_total)) + +ggplot(msleep, aes(x = sleep_total)) + geom_histogram() + - labs(x = 'Total sleep time in hours', - title = 'Histogram of total sleep time') + + labs( + x = 'Total sleep time in hours', + title = 'Histogram of total sleep time' + ) + theme_minimal() ``` diff --git a/data-visualization.qmd b/data-visualization.qmd index 6887ec8..c9a8cc7 100644 --- a/data-visualization.qmd +++ b/data-visualization.qmd @@ -3,6 +3,7 @@ ```{r} #| echo: false #| message: false + source("_common.R") ``` @@ -25,7 +26,9 @@ source("_common.R") ...and one of the best ways to develop insights from data is to _visualize_ the data. If you're completely new to data visualization, I recommend watching [this 40-minute video](https://www.youtube.com/watch?v=fSgEeI2Xpdc) on how humans see data, by John Rauser. This is one of the best overviews I've ever seen of how we can exploit our understanding of human psychology to design effective charts: - +