Merge pull request #74 from 3mmaRand/feature/core-w6-2024

Feature/core w6 2024
3mmaRand · Nov 4, 2024 · c55073d · c55073d
2 parents dad40ee + 8cb9cba
commit c55073d
Show file tree

Hide file tree

Showing 10 changed files with 51 additions and 260 deletions.
diff --git a/.renvignore b/.renvignore
@@ -0,0 +1,10 @@
+.Rproj.user
+.Rhistory
+.Rdata
+.httr-oauth
+.DS_Store
+
+/.quarto/
+
+transcriptomics/week-3/data-raw/
+
diff --git a/core/week-2/study_before_workshop.qmd b/core/week-2/study_before_workshop.qmd
@@ -9,7 +9,7 @@ toc-location: right
 
 2.  📖 Read [Workflow in RStudio](https://3mmarand.github.io/comp4biosci/workflow_rstudio.html). You may find it helpful to remind yourself about RStudio Projects. In previous years, you have submitted an "RStudio Project" as part of your BABS work. In this module, you will submit "Supporting Information" for your Project Report. The Supporting Information is a documented and organised collection of all the digital parts of your research project. This includes data (or instructions for accessing data), code and/or non-coded processing, instructions for use, computational requirements and outputs. The Supporting Information could be a single RStudio Project (like you have done previously but with better documentation) or a folder that includes an RStudio Project and other material/scripts.
 
-3.💻 Set up the Virtual Desktop. I very strongly recommend working on 
+3.  Set up the Virtual Desktop. I very strongly recommend working on 
 the University computers for this work. You will be using more specialised R 
 packages than you might be used to. This is especially important if you often 
 have difficulty updating and or installing software on your own machine,

diff --git a/core/week-6/Y12345678.zip b/core/week-6/Y12345678.zip
diff --git a/core/week-6/images/mentimeter_qr_code.png b/core/week-6/images/mentimeter_qr_code.png
diff --git a/core/week-6/overview.qmd b/core/week-6/overview.qmd
@@ -5,25 +5,34 @@ toc: true
 toc-location: right
 ---
 
-This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work. We will also examine some file types and the concept of tidy data.
+We considered how to organise reproducible data analyses in 
+[Core: Supporting Information 1](../week-2/overview.qmd). This week we will 
+consider how to document and curate reproducible data analyses. You will 
+add a README to your project and discover all the software you are using in R. 
+The workshop will also include a questions and answers section.
 
 
 ### Learning objectives
 
 The successful student will be able to:
 
--   explain the organisation of files and directories in a file systems including root, home and working directories
--   explain absolute and relative file paths
--   explain why working reproducibly is important
--   know how to use a project-oriented workflow to organise work
--   be able to give files human- and machine-readable names
--   outline some common biological data file formats
+-   Describe the purpose of a README file
+
+-   List the key components of a README file
+
+-   Use `sessioninfo` to document the software used in an R project
+
+-   Write a README file for a project
+
+
+
 
 ### Instructions
 
 1.  [Prepare](study_before_workshop.qmd)
 
-    i.  📖 Read Understanding file systems
+    i.  Revise [Core: Supporting Information 1](../week-2/overview.qmd)
+        and make a note of queries you have
 
 2.  [Workshop](workshop.qmd)
 

diff --git a/core/week-6/study_after_workshop.qmd b/core/week-6/study_after_workshop.qmd
@@ -1,6 +1,6 @@
 ---
 title: "Independent Study to consolidate this week"
-subtitle: "Core 1"
+subtitle: "Core: Supporting Information 2"
 toc: true
 toc-location: right
 format:
@@ -9,12 +9,4 @@ format:
     code-summary: "Answer - don't look until you have tried!"
 ---
 
-These are suggestions
 
-## BIO00088H Group Research Project students
-
-1.  Revise previous Data Analysis materials. You can find the version you took on the VLE site for 17C / 08C. However, my latest versions (in development) are here: [Data Analysis in R](https://3mmarand.github.io/R4BABS/). The Becoming a Bioscientist (BABS) modules replace the Laboratory and Professional Skills modules. BABS1 and BABS2 are stage one, and I've tried to improve them over 17C / 08C. The site is also searchable (icon top right)
-
-## MSc Bioinformatics students doing BIO00070M
-
-1.  Make sure you carry out the [preparatory work for week 2 of 52M](https://3mmarand.github.io/R4BABS/pgt52m/week-2/overview.html)
diff --git a/core/week-6/study_before_workshop.qmd b/core/week-6/study_before_workshop.qmd
@@ -1,10 +1,25 @@
 ---
 title: "Independent Study to prepare for workshop"
-subtitle: "Core 1"
+subtitle: "Core: Supporting Information 2"
 toc: true
 toc-location: right
 ---
 
-1.  📖 Read  [Understanding file systems](https://3mmarand.github.io/comp4biosci/file_systems.html). This is an approximately 15 - 20 minute read revising file types and filesystems. It covers concepts of working directories and paths. We learned these ideas in stage 1 and you may feel completely confident with them but many students will benefit from a refresher. For BIO00070M students, this is part of the work you will also be asked to complete for BIO00052M Data Analysis in R.
+1.  Revise [Core: Supporting Information 1 Organising Reproducible Data
+    Analyses](../week-2/overview.qmd).
 
-2.  In previous years you have submitted and RStudio Project as part of your BABS work. In this module you will develop this by submitting a Research Compendium. A Research Compendium is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, (like you have done previously but with better documentation) or it might be a folder including an Quarto/RStudio Project and other material/scripts including the description of unscripted processing. You might want to remind yourself of the example RStudio Project, [Y12345678.zip](Y12345678.zip) used in BABS 2.
+    i.  Do you know your Supporting information will most likely be be a
+        structured folder which *is* either an RStudio Project or
+        contains an RStudio Project?
+    ii. Are you following the best practices code formatting and style?
+        If not, go through your scripts and edit.
+    iii. Do you have numbers hard coded where they could be variables?
+    iv. Are you using a sensible naming convention for files and
+        variables? Have you written it down?
+    v.  Make a note of queries you have. Take some time to formulate and
+        write down your questions. The more specific and clear your
+        question is, the better answer I will be able to provide.
+    vi. Post your questions here:
+        [Menti](https://www.menti.com/m86rqcbb88) Code: 3306 3222. QR:
+        ![mentimeter qr
+        code](images/mentimeter_qr_code.png){width="400"}
diff --git a/core/week-6/styles.css b/core/week-6/styles.css
diff --git a/core/week-6/workshop.qmd b/core/week-6/workshop.qmd
@@ -1,6 +1,6 @@
 ---
 title: "Workshop"
-subtitle: "Organising Reproducible Data Analyses"
+subtitle: "Supporting Information 2 Documenting and curating Reproducible Data Analyses"
 author: "Emma Rand"
 toc: true
 toc-depth: 4
@@ -19,158 +19,9 @@ editor:
 
 ## Session overview
 
-In this workshop we will discuss why reproducibility matters and how to
-organise your work to make it reproducible. We will cover:
+In this workshop 
 
-# Reproducibility
 
-## What is reproducibility?
-
--   **Reproducible: Same data + same analysis = identical results**.
-    *"... obtaining consistent results using the same input data;
-    computational steps, methods, and code; and conditions of analysis.
-    This definition is synonymous with"computational reproducibility"*
-    [@nationalacademiesofsciences2019]
-
--   Replicable: Different data + same analysis = qualitatively similar
-    results. The work is not dependent on the specificities of the data.
-
--   Robust: Same data + different analysis = qualitatively similar or
-    identical results. The work is not dependent on the specificities of
-    the analysis.
-
--   Generalisable: Different data + different analysis = qualitatively
-    similar results and same conclusions. The findings can be
-    generalised
-
-[![The Turing Way\'s definitions of reproducible research
-](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions)
-
-## Why does it matter?
-
-![futureself, CC-BY-NC, by Julen
-Colomb](images/future_you.png){fig-alt="Person working at a computer with an offstage person asking 'How is the analysis going?' The person at the computer replies 'Can't understand the date...and the data collector does not answer my emails or calls' Person offstage: 'That's terrible! So cruel! Who did collect the data? I will sack them!' Person at the computer: 'um...I did, 3 years ago.'"
-width="400"}
-
--   Five selfish reasons to work reproducibly [@markowetz2015].
-    Alternatively, see the very entertaining
-    [talk](https://youtu.be/yVT07Sukv9Q)
-
--   Many high profile cases of work which did not reproduce e.g. Anil
-    Potti unravelled by @baggerly2009
-
--   **Will** become standard in Science and publishing e.g OECD Global
-    Science Forum Building digital workforce capacity and skills for
-    data-intensive science [@oecdglobalscienceforum2020]
-
-## How to achieve reproducibility
-
--   Scripting
-
--   Organisation: Project-oriented workflows with file and folder
-    structure, naming things
-
--   Documentation: Readme files, code comments, metadata, version
-    control
-
-# Scripting
-
-## Rationale for scripting?
-
--   Science is the generation of ideas, designing work to test them and
-    reporting the results.
-
--   We ensure laboratory and field work is replicable, robust and
-    generalisable by planning and recording in lab books and using
-    standard protocols. Repeating results is still hard.
-
--   Workflows for computational projects, and the data analysis and
-    reporting of other work can, and should, be 100% reproducible!
-
--   Scripting is the way to achieve this.
-
-# Organisation
-
-## Project-oriented workflow
-
--   use folders to organise your work
-
--   you are aiming for structured, systematic and repeatable.
-
--   inputs and outputs should be clearly identifiable from structure
-    and/or naming
-
-Examples
-
-```         
--- liver_transcriptome/
-   |__data
-      |__raw/
-      |__processed/
-   |__images/
-   |__code/
-   |__reports/
-   |__figures/
-```
-
-## Naming things
-
-![documents, CC-BY-NC,
-https://xkcd.com/1459/](images/xkcd-comic-file-names.png){fig-alt="A comic figure is looking over the shoulder of another and is shocked by a list of files with names like 'Untitled 138 copy.docx' and 'Untitled 243.doc'. Caption: 'Protip: Never look in someone else's documents folder'"}
-
-Guiding principle - Have a convention! Good file names are:
-
--   machine readable
-
--   human readable
-
--   play nicely with sorting
-
-I suggest
-
--   no spaces in names
-
--   use snake_case or kebab-case rather than CamelCase or dot.case
-
--   use all lower case except very occasionally where convention is
-    otherwise, e.g., README, LICENSE
-
--   ordering: use left-padded numbers e.g., 01, 02....99 or 001,
-    002....999
-
--   dates [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format:
-    2020-10-16
-
--   write down your conventions
-
-```         
--- liver_transcriptome/
-   |__data
-      |__raw/
-         |__2022-03-21_donor_1.csv
-         |__2022-03-21_donor_2.csv
-         |__2022-03-21_donor_3.csv
-         |__2022-05-14_donor_1.csv
-         |__2022-05-14_donor_2.csv
-         |__2022-05-14_donor_3.csv
-      |__processed/
-   |__images/
-   |__code/
-      |__functions/
-         |__summarise.R
-         |__normalise.R
-         |__theme_volcano.R
-      |__01_data_processing.py
-      |__02_exploratory.R
-      |__03_modelling.R
-      |__04_figures.R
-   |__reports/
-      |__01_report.qmd
-      |__02_supplementary.qmd
-   |__figures/
-      |__01_volcano_donor_1_vs_donor_2.eps
-      |__02_volcano_donor_1_vs_donor_3.eps
-```
 
 # Documentation
 
@@ -211,35 +62,7 @@ Python:
 
 -   Ideally, a summary of changes with the date
 
-```         
--- liver_transcriptome/
-   |__data
-      |__raw/
-         |__2022-03-21_donor_1.csv
-         |__2022-03-21_donor_2.csv
-         |__2022-03-21_donor_3.csv
-         |__2022-05-14_donor_1.csv
-         |__2022-05-14_donor_2.csv
-         |__2022-05-14_donor_3.csv
-      |__processed/
-   |__images/
-   |__code/
-      |__functions/
-         |__summarise.R
-         |__normalise.R
-         |__theme_volcano.R
-      |__01_data_processing.py
-      |__02_exploratory.R
-      |__03_modelling.R
-      |__04_figures.R
-   |__README.md
-   |__reports/
-      |__01_report.qmd
-      |__02_supplementary.qmd
-   |__figures/
-      |__01_volcano_donor_1_vs_donor_2.eps
-      |__02_volcano_donor_1_vs_donor_3.eps
-```
+
 
 ## Code comments
 
@@ -248,49 +71,7 @@ Python:
     explain what the code is doing and why. They are also used to
     temporarily remove code from execution.
 
-# Github co-pilot demo
-
-# Quarto demo
-
-# Useful exercises
-
--   Want github co-pilot?
-
-    🎬 Create a [GitHub account](https://github.com/)
-
-    🎬 Apply for [student
-    benefits](https://education.github.com/discount_requests/application)
-
--   Update R and RStudio
-
-    🎬 [Update R]()
-
-    🎬 [Update RStudio](https://posit.co/download/rstudio-desktop/). You
-    will need the prelease [Dessert
-    Sunflower](https://dailies.rstudio.com/rstudio/desert-sunflower/)
-    for github Copilot integration
-
--   Install package building tools
-
-    🎬 Windows Install
-    [Rtools](https://cran.r-project.org/bin/windows/Rtools/rtools43/rtools.html)
-
-    🎬 Mac install [Xcode from Mac App
-    Store](https://apps.apple.com/ca/app/xcode/id497799835?mt=12)
-
--   Update packages:
-
-    🎬 devtools, tidyverse, BiocManager, readxl
-
--   Install Quarto
-
-    🎬 [Install Quarto](https://quarto.org)
-
--   Install Zotero
-
-    🎬 Install [Zotero](https://www.zotero.org/)
 
-    🎬 [Sign up for an account](https://www.zotero.org/user/register)
 
 You're finished!
 

diff --git a/update-notes.txt b/update-notes.txt
@@ -86,7 +86,7 @@ Curate your and reorganise your code
 restart R to try. exchange projects with a friend. Do they understand?
 
 Readme
-- how to make
+- how to make: create a new text file in the top level of your project
 - what goes in
 - software including versions
 - session info