diff --git a/.renvignore b/.renvignore new file mode 100644 index 0000000..04a4248 --- /dev/null +++ b/.renvignore @@ -0,0 +1,10 @@ +.Rproj.user +.Rhistory +.Rdata +.httr-oauth +.DS_Store + +/.quarto/ + +transcriptomics/week-3/data-raw/ + diff --git a/core/week-2/study_before_workshop.qmd b/core/week-2/study_before_workshop.qmd index 8bdf889..6d10214 100644 --- a/core/week-2/study_before_workshop.qmd +++ b/core/week-2/study_before_workshop.qmd @@ -9,7 +9,7 @@ toc-location: right 2. 📖 Read [Workflow in RStudio](https://3mmarand.github.io/comp4biosci/workflow_rstudio.html). You may find it helpful to remind yourself about RStudio Projects. In previous years, you have submitted an "RStudio Project" as part of your BABS work. In this module, you will submit "Supporting Information" for your Project Report. The Supporting Information is a documented and organised collection of all the digital parts of your research project. This includes data (or instructions for accessing data), code and/or non-coded processing, instructions for use, computational requirements and outputs. The Supporting Information could be a single RStudio Project (like you have done previously but with better documentation) or a folder that includes an RStudio Project and other material/scripts. -3.💻 Set up the Virtual Desktop. I very strongly recommend working on +3. Set up the Virtual Desktop. I very strongly recommend working on the University computers for this work. You will be using more specialised R packages than you might be used to. This is especially important if you often have difficulty updating and or installing software on your own machine, diff --git a/core/week-6/Y12345678.zip b/core/week-6/Y12345678.zip deleted file mode 100644 index 8593ba2..0000000 Binary files a/core/week-6/Y12345678.zip and /dev/null differ diff --git a/core/week-6/images/mentimeter_qr_code.png b/core/week-6/images/mentimeter_qr_code.png new file mode 100644 index 0000000..cb8123e Binary files /dev/null and b/core/week-6/images/mentimeter_qr_code.png differ diff --git a/core/week-6/overview.qmd b/core/week-6/overview.qmd index 272e405..632e43e 100644 --- a/core/week-6/overview.qmd +++ b/core/week-6/overview.qmd @@ -5,25 +5,34 @@ toc: true toc-location: right --- -This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work. We will also examine some file types and the concept of tidy data. +We considered how to organise reproducible data analyses in +[Core: Supporting Information 1](../week-2/overview.qmd). This week we will +consider how to document and curate reproducible data analyses. You will +add a README to your project and discover all the software you are using in R. +The workshop will also include a questions and answers section. ### Learning objectives The successful student will be able to: -- explain the organisation of files and directories in a file systems including root, home and working directories -- explain absolute and relative file paths -- explain why working reproducibly is important -- know how to use a project-oriented workflow to organise work -- be able to give files human- and machine-readable names -- outline some common biological data file formats +- Describe the purpose of a README file + +- List the key components of a README file + +- Use `sessioninfo` to document the software used in an R project + +- Write a README file for a project + + + ### Instructions 1. [Prepare](study_before_workshop.qmd) - i. 📖 Read Understanding file systems + i. Revise [Core: Supporting Information 1](../week-2/overview.qmd) + and make a note of queries you have 2. [Workshop](workshop.qmd) diff --git a/core/week-6/study_after_workshop.qmd b/core/week-6/study_after_workshop.qmd index c6956d1..4beb2b9 100644 --- a/core/week-6/study_after_workshop.qmd +++ b/core/week-6/study_after_workshop.qmd @@ -1,6 +1,6 @@ --- title: "Independent Study to consolidate this week" -subtitle: "Core 1" +subtitle: "Core: Supporting Information 2" toc: true toc-location: right format: @@ -9,12 +9,4 @@ format: code-summary: "Answer - don't look until you have tried!" --- -These are suggestions -## BIO00088H Group Research Project students - -1. Revise previous Data Analysis materials. You can find the version you took on the VLE site for 17C / 08C. However, my latest versions (in development) are here: [Data Analysis in R](https://3mmarand.github.io/R4BABS/). The Becoming a Bioscientist (BABS) modules replace the Laboratory and Professional Skills modules. BABS1 and BABS2 are stage one, and I've tried to improve them over 17C / 08C. The site is also searchable (icon top right) - -## MSc Bioinformatics students doing BIO00070M - -1. Make sure you carry out the [preparatory work for week 2 of 52M](https://3mmarand.github.io/R4BABS/pgt52m/week-2/overview.html) diff --git a/core/week-6/study_before_workshop.qmd b/core/week-6/study_before_workshop.qmd index bc32f3b..f7661f7 100644 --- a/core/week-6/study_before_workshop.qmd +++ b/core/week-6/study_before_workshop.qmd @@ -1,10 +1,25 @@ --- title: "Independent Study to prepare for workshop" -subtitle: "Core 1" +subtitle: "Core: Supporting Information 2" toc: true toc-location: right --- -1. 📖 Read [Understanding file systems](https://3mmarand.github.io/comp4biosci/file_systems.html). This is an approximately 15 - 20 minute read revising file types and filesystems. It covers concepts of working directories and paths. We learned these ideas in stage 1 and you may feel completely confident with them but many students will benefit from a refresher. For BIO00070M students, this is part of the work you will also be asked to complete for BIO00052M Data Analysis in R. +1. Revise [Core: Supporting Information 1 Organising Reproducible Data + Analyses](../week-2/overview.qmd). -2. In previous years you have submitted and RStudio Project as part of your BABS work. In this module you will develop this by submitting a Research Compendium. A Research Compendium is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, (like you have done previously but with better documentation) or it might be a folder including an Quarto/RStudio Project and other material/scripts including the description of unscripted processing. You might want to remind yourself of the example RStudio Project, [Y12345678.zip](Y12345678.zip) used in BABS 2. \ No newline at end of file + i. Do you know your Supporting information will most likely be be a + structured folder which *is* either an RStudio Project or + contains an RStudio Project? + ii. Are you following the best practices code formatting and style? + If not, go through your scripts and edit. + iii. Do you have numbers hard coded where they could be variables? + iv. Are you using a sensible naming convention for files and + variables? Have you written it down? + v. Make a note of queries you have. Take some time to formulate and + write down your questions. The more specific and clear your + question is, the better answer I will be able to provide. + vi. Post your questions here: + [Menti](https://www.menti.com/m86rqcbb88) Code: 3306 3222. QR: + ![mentimeter qr + code](images/mentimeter_qr_code.png){width="400"} diff --git a/core/week-6/styles.css b/core/week-6/styles.css deleted file mode 100644 index 2ff0570..0000000 --- a/core/week-6/styles.css +++ /dev/null @@ -1,16 +0,0 @@ -/* css styles */ - - -@import url('https://fonts.googleapis.com/css2?family=Open+Sans&family=Source+Code+Pro&display=swap'); - - -// fonts - -$font-family-monospace: "Source Code Pro"; - -/*-- scss:rules --*/ - -code.sourceCode { - font-size: 1.3em; -} - diff --git a/core/week-6/workshop.qmd b/core/week-6/workshop.qmd index c925577..ba18a2c 100644 --- a/core/week-6/workshop.qmd +++ b/core/week-6/workshop.qmd @@ -1,6 +1,6 @@ --- title: "Workshop" -subtitle: "Organising Reproducible Data Analyses" +subtitle: "Supporting Information 2 Documenting and curating Reproducible Data Analyses" author: "Emma Rand" toc: true toc-depth: 4 @@ -19,158 +19,9 @@ editor: ## Session overview -In this workshop we will discuss why reproducibility matters and how to -organise your work to make it reproducible. We will cover: +In this workshop -# Reproducibility -## What is reproducibility? - -- **Reproducible: Same data + same analysis = identical results**. - *"... obtaining consistent results using the same input data; - computational steps, methods, and code; and conditions of analysis. - This definition is synonymous with"computational reproducibility"* - [@nationalacademiesofsciences2019] - -- Replicable: Different data + same analysis = qualitatively similar - results. The work is not dependent on the specificities of the data. - -- Robust: Same data + different analysis = qualitatively similar or - identical results. The work is not dependent on the specificities of - the analysis. - -- Generalisable: Different data + different analysis = qualitatively - similar results and same conclusions. The findings can be - generalised - -[![The Turing Way\'s definitions of reproducible research -](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions) - -## Why does it matter? - -![futureself, CC-BY-NC, by Julen -Colomb](images/future_you.png){fig-alt="Person working at a computer with an offstage person asking 'How is the analysis going?' The person at the computer replies 'Can't understand the date...and the data collector does not answer my emails or calls' Person offstage: 'That's terrible! So cruel! Who did collect the data? I will sack them!' Person at the computer: 'um...I did, 3 years ago.'" -width="400"} - -- Five selfish reasons to work reproducibly [@markowetz2015]. - Alternatively, see the very entertaining - [talk](https://youtu.be/yVT07Sukv9Q) - -- Many high profile cases of work which did not reproduce e.g. Anil - Potti unravelled by @baggerly2009 - -- **Will** become standard in Science and publishing e.g OECD Global - Science Forum Building digital workforce capacity and skills for - data-intensive science [@oecdglobalscienceforum2020] - -## How to achieve reproducibility - -- Scripting - -- Organisation: Project-oriented workflows with file and folder - structure, naming things - -- Documentation: Readme files, code comments, metadata, version - control - -# Scripting - -## Rationale for scripting? - -- Science is the generation of ideas, designing work to test them and - reporting the results. - -- We ensure laboratory and field work is replicable, robust and - generalisable by planning and recording in lab books and using - standard protocols. Repeating results is still hard. - -- Workflows for computational projects, and the data analysis and - reporting of other work can, and should, be 100% reproducible! - -- Scripting is the way to achieve this. - -# Organisation - -## Project-oriented workflow - -- use folders to organise your work - -- you are aiming for structured, systematic and repeatable. - -- inputs and outputs should be clearly identifiable from structure - and/or naming - -Examples - -``` --- liver_transcriptome/ - |__data - |__raw/ - |__processed/ - |__images/ - |__code/ - |__reports/ - |__figures/ -``` - -## Naming things - -![documents, CC-BY-NC, -https://xkcd.com/1459/](images/xkcd-comic-file-names.png){fig-alt="A comic figure is looking over the shoulder of another and is shocked by a list of files with names like 'Untitled 138 copy.docx' and 'Untitled 243.doc'. Caption: 'Protip: Never look in someone else's documents folder'"} - -Guiding principle - Have a convention! Good file names are: - -- machine readable - -- human readable - -- play nicely with sorting - -I suggest - -- no spaces in names - -- use snake_case or kebab-case rather than CamelCase or dot.case - -- use all lower case except very occasionally where convention is - otherwise, e.g., README, LICENSE - -- ordering: use left-padded numbers e.g., 01, 02....99 or 001, - 002....999 - -- dates [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format: - 2020-10-16 - -- write down your conventions - -``` --- liver_transcriptome/ - |__data - |__raw/ - |__2022-03-21_donor_1.csv - |__2022-03-21_donor_2.csv - |__2022-03-21_donor_3.csv - |__2022-05-14_donor_1.csv - |__2022-05-14_donor_2.csv - |__2022-05-14_donor_3.csv - |__processed/ - |__images/ - |__code/ - |__functions/ - |__summarise.R - |__normalise.R - |__theme_volcano.R - |__01_data_processing.py - |__02_exploratory.R - |__03_modelling.R - |__04_figures.R - |__reports/ - |__01_report.qmd - |__02_supplementary.qmd - |__figures/ - |__01_volcano_donor_1_vs_donor_2.eps - |__02_volcano_donor_1_vs_donor_3.eps -``` # Documentation @@ -211,35 +62,7 @@ Python: - Ideally, a summary of changes with the date -``` --- liver_transcriptome/ - |__data - |__raw/ - |__2022-03-21_donor_1.csv - |__2022-03-21_donor_2.csv - |__2022-03-21_donor_3.csv - |__2022-05-14_donor_1.csv - |__2022-05-14_donor_2.csv - |__2022-05-14_donor_3.csv - |__processed/ - |__images/ - |__code/ - |__functions/ - |__summarise.R - |__normalise.R - |__theme_volcano.R - |__01_data_processing.py - |__02_exploratory.R - |__03_modelling.R - |__04_figures.R - |__README.md - |__reports/ - |__01_report.qmd - |__02_supplementary.qmd - |__figures/ - |__01_volcano_donor_1_vs_donor_2.eps - |__02_volcano_donor_1_vs_donor_3.eps -``` + ## Code comments @@ -248,49 +71,7 @@ Python: explain what the code is doing and why. They are also used to temporarily remove code from execution. -# Github co-pilot demo - -# Quarto demo - -# Useful exercises - -- Want github co-pilot? - - 🎬 Create a [GitHub account](https://github.com/) - - 🎬 Apply for [student - benefits](https://education.github.com/discount_requests/application) - -- Update R and RStudio - - 🎬 [Update R]() - - 🎬 [Update RStudio](https://posit.co/download/rstudio-desktop/). You - will need the prelease [Dessert - Sunflower](https://dailies.rstudio.com/rstudio/desert-sunflower/) - for github Copilot integration - -- Install package building tools - - 🎬 Windows Install - [Rtools](https://cran.r-project.org/bin/windows/Rtools/rtools43/rtools.html) - - 🎬 Mac install [Xcode from Mac App - Store](https://apps.apple.com/ca/app/xcode/id497799835?mt=12) - -- Update packages: - - 🎬 devtools, tidyverse, BiocManager, readxl - -- Install Quarto - - 🎬 [Install Quarto](https://quarto.org) - -- Install Zotero - - 🎬 Install [Zotero](https://www.zotero.org/) - 🎬 [Sign up for an account](https://www.zotero.org/user/register) You're finished! diff --git a/update-notes.txt b/update-notes.txt index 20a181a..2fb6d29 100644 --- a/update-notes.txt +++ b/update-notes.txt @@ -86,7 +86,7 @@ Curate your and reorganise your code restart R to try. exchange projects with a friend. Do they understand? Readme -- how to make +- how to make: create a new text file in the top level of your project - what goes in - software including versions - session info