Skip to content

Reproducible science

Steve Harris edited this page Apr 13, 2016 · 2 revisions

Reproducible research

Introduce the concept

When given the same starting point, you can always have the same output. So that if someone else is given the same data, they will get the same results as you.

This involves:

  • Writing scripts (Source)
  • Commenting within the script about what you're doing

In R:

  • library(ggplot2) #We load this library
  • Show a screenshot of R-Studio's text editor

Introduce Github

  • To save the files and have version control
  • Introduce Git within R Studio
  • Live demo of Git within R Studio / Screenshots

Show an end result

  • Pull in a commit to show the next step

Improvers

  • R markdown
  • Branches

Issues

  • General comments (find the example to highlight the issue on our dataset).. e.g. In this lesson, we will get the excel data into R...etc.

  • Emphasis on what Reproducible science practically means (front loading the work to minimise headaches later. e.g. adding one more row of observations means simply re-running the script rather than having to recreate the graphs from scratch...etc)

  • More emphasis on using Github as a fancy version of saving stuff in R-studio.

  • ?What does the names(x[, 1:3]) add to the overall presentation. Matrix subselection can look scary.

  • Make normal workflow look worse as it actually is (e.g. graphs overlaying the text...)

  • Less emphasis on Rmarkdown, more on commenting the script, saving the script to Github.

  • In the 40 minutes of practical work, save the script to github. Then we can insert data into the data-sheet and demonstrate that you can just rerun the script to get a different mean.

Clone this wiki locally