Much of the time biologists spend while performing data analysis involves manipulating, reformatting, and transforming data. In this lecture, we'll use tidyverse
functions from the dplyr
package to perform these "data munging" tasks. We'll cover a few of the functions available for these tasks and learn a common programming structure (piping), which will equip you to continue developing powerful data manipulation code that will be foundational for visualization and statistical analysis.
- Select and apply functions from
tidyverse
to perform manipulations to tabular data - Apply pipes as a programming structure to connect input and output from multiple functions
- To view slides in presentation mode, open
lecture.html
. - To download the in-class exercise, use class_exercise.md as a template to create a
.Rmd
file with your own title block and code cells. - RStudio has a data manipulation cheatsheet that should help you identify what functions are useful for certain tasks.
- We've had some questions about the difference between R Markdown documents and R Notebooks. The short answer is that R Notebooks are a specialized document written in R Markdown that updates as you run the code, while regular R Markdown documents need to be knit at the end of writing code for content to appear in the final document. More information can be found here.