Skip to content

Commit

Permalink
Merge pull request #1 from 3mmaRand/draft-core-1
Browse files Browse the repository at this point in the history
Draft core 1
  • Loading branch information
3mmaRand committed Sep 24, 2023
2 parents 24b77ab + 5e06b0a commit 8d29a21
Show file tree
Hide file tree
Showing 10 changed files with 137 additions and 107 deletions.
4 changes: 2 additions & 2 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ website:
- href: core/core.qmd
- text: ---
- text: ---
- section: "Week 1: xxx"
- section: "Week 1: Organising analyses"
contents:
- href: core/week-1/overview.qmd
text: About
Expand All @@ -55,7 +55,7 @@ website:
- href: core/week-1/study_after_workshop.qmd
text: Consolidate!
- text: ---
- section: "Week 2: xxx"
- section: "Week 2: Workflow tips"
contents:
- href: core/week-2/overview.qmd
text: About
Expand Down
74 changes: 28 additions & 46 deletions core/core.qmd
Original file line number Diff line number Diff line change
@@ -1,71 +1,53 @@
---
title: "Core Data Analysis for Group Project"
title: "Core Data Analysis"
toc: true
toc-location: right
---

# Content
There are three workshops taken by everyone on BIO00088H and BIO00070M. These are in weeks 1, 2 and 6. The first two cover some useful workflow tips and how to organise your analyses effectively so they are reproducible but you will also have the chance to revise material from stage 1 and 2.

Good organisation is important because you will want to be able to set work aside for holidays and assessment periods and then restart easily. You will also be assessed on the organisation, reproducibility and transparency of your work.

## Week 1 Core 1 Organising Reproducible Data Analyses
## Week 1 Core 1 Organising reproducible data analyses

Note no R coding (too early for MSc BIN 70M who share the core and omics teaching)
This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work. We will also examine some file types and the concept of tidy data.

Before
- Optional revision: What they forgot to teach you about computers: operating systems, file systems, file types, working directories and paths

- Why reproducibility (BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown)
## Week 2 Core 2 Workflow tips

Workshop
<!-- Before -->
<!-- Apply to GitHub Global Campus as a student -->

- Project organisation: folders, files
- Project oriented workflow
- Naming things
- File formats
- Data management
- google drive: https://www.york.ac.uk/it-services/services/drive/#tab-6
- documenting
- organisation within files
- Data files. Similarities and differences
- Sequences data
- Image data
- Structure data.
- Keeping a lab book
- Readme
- Reference managers: Zotero?
<!-- - Reference managers: Zotero -->
<!-- - github copilot -->
<!-- - chat gtp -->
<!-- - Data management -->
<!-- - google drive: https://www.york.ac.uk/it-services/services/drive/#tab-6 -->

After

## Week 2 Core 2 NEEDS A TITLE
<!-- Workshop -->

Before
Possibly:
Code formatting and style, 😎 Cool code Tips, Code 'algorithmically.', Writing functions (R and python??) (BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown)
<!-- Code formatting and style, 😎 Cool code Tips, Code 'algorithmically.', Writing functions (R and python??) (BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown) -->



Workshop



After
<!-- After -->

## Week 6 Core 3 Reproducible Reporting

Before
(BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown and BIO00058M-Data-science-2020/slides/04_advanced_rmarkdown.html)
Literate programming
What is quarto
markdown basics: text, code chunks, headings
yaml
automatic numbering of figures and tables
cross references
special characters
citations
<!-- Before -->
<!-- (BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown and BIO00058M-Data-science-2020/slides/04_advanced_rmarkdown.html) -->
<!-- Literate programming -->
<!-- What is quarto -->
<!-- markdown basics: text, code chunks, headings -->
<!-- yaml -->
<!-- automatic numbering of figures and tables -->
<!-- cross references -->
<!-- special characters -->
<!-- citations -->

<!-- Workshop -->

Workshop
<!-- practice doing the above with your project and data -->

practice doing the above with your project and data

After
Binary file added core/week-1/images/reproducibility_court.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added core/week-1/images/reproducible-matrix.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 13 additions & 14 deletions core/week-1/overview.qmd
Original file line number Diff line number Diff line change
@@ -1,33 +1,32 @@
---
title: "Overview"
subtitle: "Core 1: Organising Data Analyses"
toc: true
toc-location: right
---

xxxxx
This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work. We will also examine some file types and the concept of tidy data.


### Learning objectives

- dd
- dd.
- dd
- d
The successful student will be able to:

- explain the organisation of files and directories in a file systems including root, home and working directories
- explain absolute and relative file paths
- explain why working reproducibly is important
- know how to use a project-oriented workflow to organise work
- be able to give files human- and machine-readable names
- outline some common biological data file formats

### Instructions

1. [Prepare](study_before_workshop.qmd)

i. 📖 Read [What they forgot to teach you about computers](https://3mmarand.github.io/comp4biosci/what_they_forgot.html)
i. 📖 Read Understanding file systems

2. [Workshop](workshop.qmd)

i. 💻 dd.

ii. 💻 ddd

iii. 💻 ddd

3. [Consolidate](study_after_workshop.qmd)

i. 💻 dd

ii. 💻 dd
21 changes: 9 additions & 12 deletions core/week-1/study_after_workshop.qmd
Original file line number Diff line number Diff line change
@@ -1,25 +1,22 @@
---
title: "Independent Study to consolidate this week"
subtitle: "Core 1"
toc: true
toc-location: right
format:
html:
code-fold: true
code-summary: "Answer - don't look until you have tried!"
---

# Set up
## BIO00088H Group Research Project students

If you have just opened RStudio you will want to load the packages and import the data.
1. Start to build the the file and folder infrastructure for your project
-
-
-

```{r}
#| code-fold: false
library(tidyverse)
library(readxl)
```

1. 💻 xx.
## MSc Bioinformatics students doing BIO00070M

```{r}
```
1.
4 changes: 3 additions & 1 deletion core/week-1/study_before_workshop.qmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
---
title: "Independent Study to prepare for workshop"
subtitle: "Core 1"
toc: true
toc-location: right
---

1. 📖 Read xxxx
1. 📖 Read [Understanding file systems](https://3mmarand.github.io/comp4biosci/file_systems.html). This is an approximately 15 - 20 minute read revising file types and filesystems. It covers concepts of working directories and paths. We learned these ideas in stage 1 and you may feel completely confident with them but many students will benefit from a refresher. For BIO00070M students, this is part of the work you will also be asked to complete for BIO00052M Data Analysis in R.
84 changes: 55 additions & 29 deletions core/week-1/workshop.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Workshop"
title: "Workshop - 🚧 still in Construction"
subtitle: "Organising Reproducible Data Analyses"
author: "Emma Rand"
toc: true
Expand All @@ -22,58 +22,84 @@ library(tidyverse)

# Introduction


## Session overview

In this workshop you will
In this workshop you will


::: callout-note
## Key


:::
# What is reproducibility?

# Getting started
- **Reproducible: Same data + same analysis = identical results**. *"... obtaining consistent results using the same input
data; computational steps, methods, and code; and conditions of
analysis. This definition is synonymous with "computational
reproducibility"* [@nationalacademiesofsciences2019]

- Replicable: Different data + same analysis = qualitatively similar
results. The work is not dependent on the specificities of the data.

# Exercises
- Robust: Same data + different analysis = qualitatively similar or
identical results. The work is not dependent on the specificities of
the analysis.

- Generalisable: Different data + different analysis = qualitatively
similar results and same conclusions. The findings can be
generalised

🎬
- Project organisation: folders, files
- Project oriented workflow
- Naming things
- File formats
- Data management
- google drive: https://www.york.ac.uk/it-services/services/drive/#tab-6
- documenting
- organisation within files
- Data files. Similarities and differences
- Sequences data
- Image data
- Structure data.
- Keeping notes
- Readme
- Reference managers: Zotero
[![The Turing Way\'s definitions of reproducible research
[@community2022]
](images/reproducible-matrix.jpg){fig-alt="Two by Two cell matrix. Columns are Data, either same or different. Rows are Analysis either same or different. Each of cells contain one of the definitions for reproducibility"}](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions#rr-overview-definitions)

# Why reproducibility?

You're finished!
(BIO00058M-Data-science-2020/slides/03_repro_and_intro_to_rmarkdown) -

# 🥳 Well Done! 🎉

# Project organisation

## Project oriented workflow

- Readme
- folders, files

## Naming things



## File formats

Data files.
- Sequences data
- Image data
- Structure data

Similarities and differences

## Tidy data reminder

You're finished!

# 🥳 Well Done! 🎉

# Independent study following the workshop

[Consolidate](study_after_workshop.qmd)

# The Code file

These contain all the code needed in the workshop even where it is not visible on the webpage.
These contain all the code needed in the workshop even where it is not
visible on the webpage.

The [workshop.qmd](workshop.qmd) file is the file I use to compile the practical. Qmd stands for Quarto markdown. It allows code and ordinary text to be interleaved to produce well-formatted reports including webpages. Right-click on the link and choose Save-As to download. You will be able to open the Qmd file in RStudio. Alternatively, [View in Browser](https://github.com/3mmaRand/). Coding and thinking answers are marked with `#---CODING ANSWER---` and `#---THINKING ANSWER---`
The [workshop.qmd](workshop.qmd) file is the file I use to compile the
practical. Qmd stands for Quarto markdown. It allows code and ordinary
text to be interleaved to produce well-formatted reports including
webpages. Right-click on the link and choose Save-As to download. You
will be able to open the Qmd file in RStudio. Alternatively, [View in
Browser](https://github.com/3mmaRand/). Coding and thinking answers are
marked with `#---CODING ANSWER---` and `#---THINKING ANSWER---`

Pages made with R [@R-core], Quarto [@allaire2022], `knitr` [@knitr], `kableExtra` [@kableExtra]
Pages made with R [@R-core], Quarto [@allaire2022], `knitr` [@knitr],
`kableExtra` [@kableExtra]

# References
9 changes: 6 additions & 3 deletions index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@ toc-location: right

# Overview

You are either an integrated masters student doing 88H Group Project or a MSc Bioinformatics student doing 70M
You are *either*

Data Analysis for the Group Research Project (BIO00088H) compromises six workshops covering computational skills needed in your project. Three of these are core and taken by everyone and three are specific to your project type.
- an integrated masters student doing BIO00088H Group Research Project *or*
- an MSc Bioinformatics student doing BIO00070M Research, Professional and Team Skills

For students doing BIO00088H, Data Analysis compromises six workshops covering computational skills needed in your project. Three of these are core and taken by everyone and three are specific to your project type. MSc Bioinformatics students do the Core workshops and the 'omics workshops as part of BIO00070M.

The project types are:

Expand All @@ -32,7 +35,7 @@ The data analysis workshops are:
| 5 | omics/structure/images 3 |
| 6 | Core 3 |

MSc Bioinformatics students do the Core workshops and the 'omics workshops as part of 70M


## Module Learning Outcome linked to this content

Expand Down
21 changes: 21 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,24 @@ @article{nestorowa2016
file = {C\:\\Users\\er13\\Zotero\\storage\\UYBU6MJ7\\Nestorowa et al. - 2016 - A single-cell resolution map of mouse hematopoieti.pdf;C\:\\Users\\er13\\Zotero\\storage\\798ZR55F\\A-single-cell-resolution-map-of-mouse.html}
}


@book{nationalacademiesofsciences2019,
title = {Understanding Reproducibility and Replicability},
author = {National Academies of Sciences, Engineering and Medicine, and Policy, and Affairs, Global and Committee on Science, {Engineering, Medicine} and Public Policy, and Board on Research Data, and Information, and Division on Engineering, and Physical Sciences, and Committee on Applied, and Statistics, Theoretical and Board on Mathematical Sciences, and Analytics, and Division on Earth, and Life Studies, and Nuclear, and Radiation Studies Board, and Division of Behavioral, and Social Sciences, and Education, and Committee on National Statistics, and Board on Behavioral, Cognitive and Sensory Sciences, and Committee on Reproducibility, and Replicability, },
year = {2019},
month = {05},
date = {2019-05-07},
publisher = {National Academies Press (US)},
url = {https://www.ncbi.nlm.nih.gov/books/NBK547546/}
}

@book{community2022,
title = {The Turing Way: A handbook for reproducible, ethical and collaborative research},
author = {Community, The Turing Way},
year = {2022},
month = {07},
date = {2022-07-27},
publisher = {Zenodo},
doi = {10.5281/ZENODO.3233853},
url = {https://zenodo.org/record/3233853}
}

0 comments on commit 8d29a21

Please sign in to comment.