diff --git a/_quarto.yml b/_quarto.yml index 91ecd08..f2779e4 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -44,7 +44,7 @@ website: - href: core/core.qmd - text: --- - text: --- - - section: "Week 1: xxx" + - section: "Week 1: Organising analyses" contents: - href: core/week-1/overview.qmd text: About @@ -55,7 +55,7 @@ website: - href: core/week-1/study_after_workshop.qmd text: Consolidate! - text: --- - - section: "Week 2: xxx" + - section: "Week 2: Workflow tips" contents: - href: core/week-2/overview.qmd text: About diff --git a/core/core.qmd b/core/core.qmd index bcadc25..ad995fa 100644 --- a/core/core.qmd +++ b/core/core.qmd @@ -1,71 +1,53 @@ --- -title: "Core Data Analysis for Group Project" +title: "Core Data Analysis" toc: true toc-location: right --- # Content +There are three workshops taken by everyone on BIO00088H and BIO00070M. These are in weeks 1, 2 and 6. The first two cover some useful workflow tips and how to organise your analyses effectively so they are reproducible but you will also have the chance to revise material from stage 1 and 2. +Good organisation is important because you will want to be able to set work aside for holidays and assessment periods and then restart easily. You will also be assessed on the organisation, reproducibility and transparency of your work. -## Week 1 Core 1 Organising Reproducible Data Analyses +## Week 1 Core 1 Organising reproducible data analyses -Note no R coding (too early for MSc BIN 70M who share the core and omics teaching) +This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work. We will also examine some file types and the concept of tidy data. -Before -- Optional revision: What they forgot to teach you about computers: operating systems, file systems, file types, working directories and paths -- Why reproducibility (BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown) +## Week 2 Core 2 Workflow tips -Workshop + + -- Project organisation: folders, files -- Project oriented workflow -- Naming things -- File formats -- Data management - - google drive: https://www.york.ac.uk/it-services/services/drive/#tab-6 - - documenting - - organisation within files -- Data files. Similarities and differences - - Sequences data - - Image data - - Structure data. -- Keeping a lab book -- Readme -- Reference managers: Zotero? + + + + + -After -## Week 2 Core 2 NEEDS A TITLE + -Before -Possibly: -Code formatting and style, 😎 Cool code Tips, Code 'algorithmically.', Writing functions (R and python??) (BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown) + - -Workshop - - - -After + ## Week 6 Core 3 Reproducible Reporting -Before - (BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown and BIO00058M-Data-science-2020/slides/04_advanced_rmarkdown.html) -Literate programming -What is quarto -markdown basics: text, code chunks, headings -yaml -automatic numbering of figures and tables -cross references -special characters -citations + + + + + + + + + + + -Workshop + -practice doing the above with your project and data -After diff --git a/core/week-1/images/reproducibility_court.png b/core/week-1/images/reproducibility_court.png new file mode 100644 index 0000000..a0dc984 Binary files /dev/null and b/core/week-1/images/reproducibility_court.png differ diff --git a/core/week-1/images/reproducible-matrix.jpg b/core/week-1/images/reproducible-matrix.jpg new file mode 100644 index 0000000..01287fc Binary files /dev/null and b/core/week-1/images/reproducible-matrix.jpg differ diff --git a/core/week-1/overview.qmd b/core/week-1/overview.qmd index f805429..60d2297 100644 --- a/core/week-1/overview.qmd +++ b/core/week-1/overview.qmd @@ -1,33 +1,32 @@ --- title: "Overview" subtitle: "Core 1: Organising Data Analyses" +toc: true +toc-location: right --- -xxxxx +This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work. We will also examine some file types and the concept of tidy data. + ### Learning objectives -- dd -- dd. -- dd -- d +The successful student will be able to: + +- explain the organisation of files and directories in a file systems including root, home and working directories +- explain absolute and relative file paths +- explain why working reproducibly is important +- know how to use a project-oriented workflow to organise work +- be able to give files human- and machine-readable names +- outline some common biological data file formats ### Instructions 1. [Prepare](study_before_workshop.qmd) - i. 📖 Read [What they forgot to teach you about computers](https://3mmarand.github.io/comp4biosci/what_they_forgot.html) + i. 📖 Read Understanding file systems 2. [Workshop](workshop.qmd) - i. 💻 dd. - - ii. 💻 ddd - - iii. 💻 ddd 3. [Consolidate](study_after_workshop.qmd) - i. 💻 dd - - ii. 💻 dd diff --git a/core/week-1/study_after_workshop.qmd b/core/week-1/study_after_workshop.qmd index 2591f8c..1c5c95b 100644 --- a/core/week-1/study_after_workshop.qmd +++ b/core/week-1/study_after_workshop.qmd @@ -1,25 +1,22 @@ --- title: "Independent Study to consolidate this week" subtitle: "Core 1" +toc: true +toc-location: right format: html: code-fold: true code-summary: "Answer - don't look until you have tried!" --- -# Set up +## BIO00088H Group Research Project students -If you have just opened RStudio you will want to load the packages and import the data. +1. Start to build the the file and folder infrastructure for your project + - + - + - -```{r} -#| code-fold: false -library(tidyverse) -library(readxl) -``` -1. 💻 xx. +## MSc Bioinformatics students doing BIO00070M -```{r} - - -``` +1. \ No newline at end of file diff --git a/core/week-1/study_before_workshop.qmd b/core/week-1/study_before_workshop.qmd index 952b776..4058ea9 100644 --- a/core/week-1/study_before_workshop.qmd +++ b/core/week-1/study_before_workshop.qmd @@ -1,6 +1,8 @@ --- title: "Independent Study to prepare for workshop" subtitle: "Core 1" +toc: true +toc-location: right --- -1. 📖 Read xxxx +1. 📖 Read [Understanding file systems](https://3mmarand.github.io/comp4biosci/file_systems.html). This is an approximately 15 - 20 minute read revising file types and filesystems. It covers concepts of working directories and paths. We learned these ideas in stage 1 and you may feel completely confident with them but many students will benefit from a refresher. For BIO00070M students, this is part of the work you will also be asked to complete for BIO00052M Data Analysis in R. diff --git a/core/week-1/workshop.qmd b/core/week-1/workshop.qmd index e85a3cd..aebd839 100644 --- a/core/week-1/workshop.qmd +++ b/core/week-1/workshop.qmd @@ -1,5 +1,5 @@ --- -title: "Workshop" +title: "Workshop - 🚧 still in Construction" subtitle: "Organising Reproducible Data Analyses" author: "Emma Rand" toc: true @@ -22,47 +22,65 @@ library(tidyverse) # Introduction - ## Session overview -In this workshop you will +In this workshop you will -::: callout-note -## Key -::: +# What is reproducibility? -# Getting started +- **Reproducible: Same data + same analysis = identical results**. *"... obtaining consistent results using the same input +data; computational steps, methods, and code; and conditions of +analysis. This definition is synonymous with "computational +reproducibility"* [@nationalacademiesofsciences2019] +- Replicable: Different data + same analysis = qualitatively similar + results. The work is not dependent on the specificities of the data. -# Exercises +- Robust: Same data + different analysis = qualitatively similar or + identical results. The work is not dependent on the specificities of + the analysis. +- Generalisable: Different data + different analysis = qualitatively + similar results and same conclusions. The findings can be + generalised -🎬 -- Project organisation: folders, files -- Project oriented workflow -- Naming things -- File formats -- Data management - - google drive: https://www.york.ac.uk/it-services/services/drive/#tab-6 - - documenting - - organisation within files -- Data files. Similarities and differences - - Sequences data - - Image data - - Structure data. -- Keeping notes -- Readme -- Reference managers: Zotero +[![The Turing Way\'s definitions of reproducible research +[@community2022] +](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions) +# Why reproducibility? -You're finished! +(BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown) - -# 🥳 Well Done! 🎉 +# Project organisation + +## Project oriented workflow + +- Readme +- folders, files + +## Naming things + + +## File formats + +Data files. +- Sequences data +- Image data +- Structure data + +Similarities and differences + +## Tidy data reminder + +You're finished! + +# 🥳 Well Done! 🎉 # Independent study following the workshop @@ -70,10 +88,18 @@ You're finished! # The Code file -These contain all the code needed in the workshop even where it is not visible on the webpage. +These contain all the code needed in the workshop even where it is not +visible on the webpage. -The [workshop.qmd](workshop.qmd) file is the file I use to compile the practical. Qmd stands for Quarto markdown. It allows code and ordinary text to be interleaved to produce well-formatted reports including webpages. Right-click on the link and choose Save-As to download. You will be able to open the Qmd file in RStudio. Alternatively, [View in Browser](https://github.com/3mmaRand/). Coding and thinking answers are marked with `#---CODING ANSWER---` and `#---THINKING ANSWER---` +The [workshop.qmd](workshop.qmd) file is the file I use to compile the +practical. Qmd stands for Quarto markdown. It allows code and ordinary +text to be interleaved to produce well-formatted reports including +webpages. Right-click on the link and choose Save-As to download. You +will be able to open the Qmd file in RStudio. Alternatively, [View in +Browser](https://github.com/3mmaRand/). Coding and thinking answers are +marked with `#---CODING ANSWER---` and `#---THINKING ANSWER---` -Pages made with R [@R-core], Quarto [@allaire2022], `knitr` [@knitr], `kableExtra` [@kableExtra] +Pages made with R [@R-core], Quarto [@allaire2022], `knitr` [@knitr], +`kableExtra` [@kableExtra] # References diff --git a/index.qmd b/index.qmd index 7f2f1ec..ddcb835 100644 --- a/index.qmd +++ b/index.qmd @@ -6,9 +6,12 @@ toc-location: right # Overview -You are either an integrated masters student doing 88H Group Project or a MSc Bioinformatics student doing 70M +You are *either* -Data Analysis for the Group Research Project (BIO00088H) compromises six workshops covering computational skills needed in your project. Three of these are core and taken by everyone and three are specific to your project type. +- an integrated masters student doing BIO00088H Group Research Project *or* +- an MSc Bioinformatics student doing BIO00070M Research, Professional and Team Skills + +For students doing BIO00088H, Data Analysis compromises six workshops covering computational skills needed in your project. Three of these are core and taken by everyone and three are specific to your project type. MSc Bioinformatics students do the Core workshops and the 'omics workshops as part of BIO00070M. The project types are: @@ -32,7 +35,7 @@ The data analysis workshops are: | 5 | omics/structure/images 3 | | 6 | Core 3 | -MSc Bioinformatics students do the Core workshops and the 'omics workshops as part of 70M + ## Module Learning Outcome linked to this content diff --git a/references.bib b/references.bib index 1b731df..106b37d 100644 --- a/references.bib +++ b/references.bib @@ -158,3 +158,24 @@ @article{nestorowa2016 file = {C\:\\Users\\er13\\Zotero\\storage\\UYBU6MJ7\\Nestorowa et al. - 2016 - A single-cell resolution map of mouse hematopoieti.pdf;C\:\\Users\\er13\\Zotero\\storage\\798ZR55F\\A-single-cell-resolution-map-of-mouse.html} } + +@book{nationalacademiesofsciences2019, + title = {Understanding Reproducibility and Replicability}, + author = {National Academies of Sciences, Engineering and Medicine, and Policy, and Affairs, Global and Committee on Science, {Engineering, Medicine} and Public Policy, and Board on Research Data, and Information, and Division on Engineering, and Physical Sciences, and Committee on Applied, and Statistics, Theoretical and Board on Mathematical Sciences, and Analytics, and Division on Earth, and Life Studies, and Nuclear, and Radiation Studies Board, and Division of Behavioral, and Social Sciences, and Education, and Committee on National Statistics, and Board on Behavioral, Cognitive and Sensory Sciences, and Committee on Reproducibility, and Replicability, }, + year = {2019}, + month = {05}, + date = {2019-05-07}, + publisher = {National Academies Press (US)}, + url = {https://www.ncbi.nlm.nih.gov/books/NBK547546/} +} + +@book{community2022, + title = {The Turing Way: A handbook for reproducible, ethical and collaborative research}, + author = {Community, The Turing Way}, + year = {2022}, + month = {07}, + date = {2022-07-27}, + publisher = {Zenodo}, + doi = {10.5281/ZENODO.3233853}, + url = {https://zenodo.org/record/3233853} +}