total 228
-drwxr-xr-x 2 runner docker 4096 Dec 5 12:12 data
-drwxr-xr-x 2 runner docker 4096 Dec 5 12:12 images
--rw-r--r-- 1 runner docker 1597 Dec 5 12:12 overview.qmd
--rw-r--r-- 1 runner docker 25553 Dec 5 12:16 study_after_workshop.html
--rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd
--rw-r--r-- 1 runner docker 70839 Dec 5 12:16 study_before_workshop.html
--rw-r--r-- 1 runner docker 4807 Dec 5 12:12 study_before_workshop.ipynb
--rw-r--r-- 1 runner docker 13029 Dec 5 12:12 study_before_workshop.qmd
--rw-r--r-- 1 runner docker 58063 Dec 5 12:12 workshop.html
--rw-r--r-- 1 runner docker 8550 Dec 5 12:12 workshop.qmd
--rw-r--r-- 1 runner docker 8564 Dec 5 12:16 workshop.rmarkdown
-drwxr-xr-x 3 runner docker 4096 Dec 5 12:12 workshop_files
+drwxr-xr-x 2 runner docker 4096 Dec 15 12:46 data
+drwxr-xr-x 2 runner docker 4096 Dec 15 12:46 images
+-rw-r--r-- 1 runner docker 27497 Dec 15 13:09 overview.html
+-rw-r--r-- 1 runner docker 1597 Dec 15 12:46 overview.qmd
+-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd
+-rw-r--r-- 1 runner docker 70988 Dec 15 13:09 study_before_workshop.html
+-rw-r--r-- 1 runner docker 4807 Dec 15 12:46 study_before_workshop.ipynb
+-rw-r--r-- 1 runner docker 13029 Dec 15 12:46 study_before_workshop.qmd
+-rw-r--r-- 1 runner docker 58063 Dec 15 12:46 workshop.html
+-rw-r--r-- 1 runner docker 8550 Dec 15 12:46 workshop.qmd
+-rw-r--r-- 1 runner docker 8564 Dec 15 13:09 workshop.rmarkdown
+drwxr-xr-x 3 runner docker 4096 Dec 15 12:46 workshop_files
You can use more than one option at once. The -h option stands for “human readable” and makes the file sizes easier to understand for humans:
@@ -512,18 +512,18 @@
Workshop
ls-hl
total 228K
-drwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 data
-drwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 images
--rw-r--r-- 1 runner docker 1.6K Dec 5 12:12 overview.qmd
--rw-r--r-- 1 runner docker 25K Dec 5 12:16 study_after_workshop.html
--rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd
--rw-r--r-- 1 runner docker 70K Dec 5 12:16 study_before_workshop.html
--rw-r--r-- 1 runner docker 4.7K Dec 5 12:12 study_before_workshop.ipynb
--rw-r--r-- 1 runner docker 13K Dec 5 12:12 study_before_workshop.qmd
--rw-r--r-- 1 runner docker 57K Dec 5 12:12 workshop.html
--rw-r--r-- 1 runner docker 8.4K Dec 5 12:12 workshop.qmd
--rw-r--r-- 1 runner docker 8.4K Dec 5 12:16 workshop.rmarkdown
-drwxr-xr-x 3 runner docker 4.0K Dec 5 12:12 workshop_files
+drwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 data
+drwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 images
+-rw-r--r-- 1 runner docker 27K Dec 15 13:09 overview.html
+-rw-r--r-- 1 runner docker 1.6K Dec 15 12:46 overview.qmd
+-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd
+-rw-r--r-- 1 runner docker 70K Dec 15 13:09 study_before_workshop.html
+-rw-r--r-- 1 runner docker 4.7K Dec 15 12:46 study_before_workshop.ipynb
+-rw-r--r-- 1 runner docker 13K Dec 15 12:46 study_before_workshop.qmd
+-rw-r--r-- 1 runner docker 57K Dec 15 12:46 workshop.html
+-rw-r--r-- 1 runner docker 8.4K Dec 15 12:46 workshop.qmd
+-rw-r--r-- 1 runner docker 8.4K Dec 15 13:09 workshop.rmarkdown
+drwxr-xr-x 3 runner docker 4.0K Dec 15 12:46 workshop_files
The -a option stands for “all” and shows us all the files, including hidden files.
@@ -531,20 +531,20 @@
Workshop
ls-alh
total 236K
-drwxr-xr-x 5 runner docker 4.0K Dec 5 12:16 .
-drwxr-xr-x 6 runner docker 4.0K Dec 5 12:16 ..
-drwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 data
-drwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 images
--rw-r--r-- 1 runner docker 1.6K Dec 5 12:12 overview.qmd
--rw-r--r-- 1 runner docker 25K Dec 5 12:16 study_after_workshop.html
--rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd
--rw-r--r-- 1 runner docker 70K Dec 5 12:16 study_before_workshop.html
--rw-r--r-- 1 runner docker 4.7K Dec 5 12:12 study_before_workshop.ipynb
--rw-r--r-- 1 runner docker 13K Dec 5 12:12 study_before_workshop.qmd
--rw-r--r-- 1 runner docker 57K Dec 5 12:12 workshop.html
--rw-r--r-- 1 runner docker 8.4K Dec 5 12:12 workshop.qmd
--rw-r--r-- 1 runner docker 8.4K Dec 5 12:16 workshop.rmarkdown
-drwxr-xr-x 3 runner docker 4.0K Dec 5 12:12 workshop_files
+drwxr-xr-x 5 runner docker 4.0K Dec 15 13:09 .
+drwxr-xr-x 6 runner docker 4.0K Dec 15 13:09 ..
+drwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 data
+drwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 images
+-rw-r--r-- 1 runner docker 27K Dec 15 13:09 overview.html
+-rw-r--r-- 1 runner docker 1.6K Dec 15 12:46 overview.qmd
+-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd
+-rw-r--r-- 1 runner docker 70K Dec 15 13:09 study_before_workshop.html
+-rw-r--r-- 1 runner docker 4.7K Dec 15 12:46 study_before_workshop.ipynb
+-rw-r--r-- 1 runner docker 13K Dec 15 12:46 study_before_workshop.qmd
+-rw-r--r-- 1 runner docker 57K Dec 15 12:46 workshop.html
+-rw-r--r-- 1 runner docker 8.4K Dec 15 12:46 workshop.qmd
+-rw-r--r-- 1 runner docker 8.4K Dec 15 13:09 workshop.rmarkdown
+drwxr-xr-x 3 runner docker 4.0K Dec 15 12:46 workshop_files
You can move about with the cd command, which stands for “change directory”. You can use it to move into a directory by specifying the path to the directory:
You will get a warning Duplicated column names deduplicated: 'avgX' => 'avgX_1' [15] for each of the files because the csv files each have two columns called avgX. If you click on the tracking dataframe you see is contains the data from all the files.
Now we can add columns for the type and the concentration by processing the values in the file. The values are like track/343_0.txt so we need to remove .txt and track/ and separate the remaining words into two columns.
The provided data is cumulative/absolute. We need to calculate the change in VFA with time. There is a function, lag() that will help us do this. It will take the previous value and subtract it from the current value. We need to do that separately for each sample_replicate so we need to group by sample_replicate first. We also need to make sure the data is in the right order so we will arrange by sample_replicate and time_day.
Now we have two dataframes, one for the cumulative data and one for the change in VFA.
+
To make conversions from mM to g/l we need to do mM * 0.001 * MW. We will import the molecular weight data, pivot the VFA data to long format and join the molecular weight data to the VFA data. Then we can calculate the g/l. We will do this for both the cumulative and delta dataframes.
We have 8 genes in our dataset. PCA will allow us to plot our samples in the “VFA” space so we can see if treatments, time or replicate cluster.
+
However, PCA expects a matrix with samples in rows and VFA, the variables, in columns. We will need to select the columns we need and pivot wider. Then convert to a matrix.
The scale. argument tells prcomp() to scale the data to have a mean of 0 and a standard deviation of 1. The rank. argument tells prcomp() to only calculate the first 4 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.
Importance of first k=4 (out of 8) components:
+ PC1 PC2 PC3 PC4
+Standard deviation 2.4977 0.9026 0.77959 0.45567
+Proportion of Variance 0.7798 0.1018 0.07597 0.02595
+Cumulative Proportion 0.7798 0.8816 0.95760 0.98355
+
+
+
The Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.7798 of the variance, the second 0.1018, and the third 0.07597. Together the first three components explain nearly 96% of the total variance in the data. Plotting PC1 against PC2 will capture about 78% of the variance which is likely much better than we would get plotting any two VFA against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the samples.
+
🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the sample information from vfa_cummul_pca:
We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the vfa to be the same since it makes sense to see what clusters of genes correlate with the treatments.
+
🎬 Set the number of clusters for the treatments and vfa:
The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can “Show in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function.
+
One of the NC replicates at time = 22 is very different from the other replicates. The CN10 treatments cluster together at high time points. CN10 samples are more similar to NC samples early on. Most of the VFAs behave similarly with highest values later in the experiment for CN10 but isohexanoate and hexanoate differ. The difference might be because isohexanoate is especially low in the NC replicates at time = 1 and hexanoate is especially high in the NC replicate 2 at time = 22
+Galili, Tal, O’Callaghan, Alan, Sidi, Jonathan, Sievert, and Carson. 2017. “Heatmaply: An r Package for Creating Interactive Cluster Heatmaps for Online Publishing.”Bioinformatics. https://doi.org/10.1093/bioinformatics/btx657.
+
+
+R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
+
+
+Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.
+
+
+Xie, Yihui. 2022. “Knitr: A General-Purpose Package for Dynamic Report Generation in r.”https://yihui.org/knitr/.
+
On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are samples. We can see that the FGF-treated samples cluster together and the control samples cluster together. We can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples (the pink cluster) and the other shows genes down regulated (more blue, the blue cluster) in the FGF-treated samples.
It will take a minute to run and display. On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are cells. We can see that cells of the same type don’t cluster that well together. We can also see two clusters of genes but the pattern of gene is not as clear as it was for the frogs and the correspondence with the cell clusters is not as strong.
diff --git a/omics/week-5/workshop_files/figure-html/unnamed-chunk-33-1.png b/omics/week-5/workshop_files/figure-html/unnamed-chunk-33-1.png
index 9b266eb..6e14f92 100644
Binary files a/omics/week-5/workshop_files/figure-html/unnamed-chunk-33-1.png and b/omics/week-5/workshop_files/figure-html/unnamed-chunk-33-1.png differ
diff --git a/omics/week-5/workshop_files/figure-html/unnamed-chunk-65-1.png b/omics/week-5/workshop_files/figure-html/unnamed-chunk-65-1.png
index 062e017..75ba304 100644
Binary files a/omics/week-5/workshop_files/figure-html/unnamed-chunk-65-1.png and b/omics/week-5/workshop_files/figure-html/unnamed-chunk-65-1.png differ
diff --git a/search.json b/search.json
index dfccaa9..216232c 100644
--- a/search.json
+++ b/search.json
@@ -1,1479 +1,1542 @@
[
{
- "objectID": "structures/structures.html",
- "href": "structures/structures.html",
- "title": "Structure Data Analysis for Group Project",
- "section": "",
- "text": "There is an RStudio project containing a Quarto version of the the Antibody Mimetics Workshop by Michael Plevin & Jon Agirre. Instructions to obtain the RStudio project are at the bottom of this document after the set up instructions.\nYou might find RStudio useful for Python because you are already familiar with it. It is also a good way to create Quarto documents with code chunks in more than one language. Quarto documents can be used in RStudio, VS Code or Jupyter notebooks\nSome set up is required before you will be able to execute code in antibody_mimetics_workshop_3.qmd. This in contrast to the Colab notebook which is a cloud-based Jupyter notebook and does not require any set up (except installing packages).\n\n🎬 If using your own machine, install Python from https://www.python.org/downloads/. This should not be necessary if you are using a university machine where Python is already installed.\n🎬 If using your own machine and you did not install Quarto in the Core 1 workshop, install it now from https://quarto.org/docs/get-started/. This should not be necessary if you are using a university machine where quarto is already installed.\n🎬 Open RStudio and check you are using a “Git bash” Terminal: Tools | Global Options| Terminal | New Terminal opens with… . If the option to choose Git bash, you will need to install Git from https://git-scm.com/downloads. Quit RStudio first. This should not be necessary if you are using a university machine where Git bash is already installed.\n🎬 If on your own machine: In RStudio, install the quarto and the recticulate packages. This should not be necessary if you are using a university machine where these packages are already installed.\n🎬 Whether you are using your own machine or a university machine, you need to install some python packages. In RStudio and go to the Terminal window (behind the Console window). Run the following commands in the Terminal window:\npython -m pip install --upgrade pip setuptools wheel\nYou may get these warnings about scripts not being on the path. You can ignore these.\n WARNING: The script wheel.exe is installed in 'C:\\Users\\er13\\AppData\\Roaming\\Python\\Python39\\Scripts' which is not on PATH.\n Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n WARNING: The scripts pip.exe, pip3.11.exe, pip3.9.exe and pip3.exe are installed in 'C:\\Users\\er13\\AppData\\Roaming\\Python\\Python39\\Scripts' which is not on PATH.\n Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\nERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\nspyder 5.1.5 requires pyqt5<5.13, which is not installed.\nspyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.\nconda-repo-cli 1.0.4 requires pathlib, which is not installed.\nanaconda-project 0.10.2 requires ruamel-yaml, which is not installed.\nSuccessfully installed pip-23.3.1 setuptools-69.0.2 wheel-0.41.3\npython -m pip install session_info\npython -m pip install wget\npython -m pip install gemmi\nNote: On my windows laptop at home, I also had to install C++ Build Tools to be able to install the gemmi python package. If this is true for you, you will get a fail message telling you to install C++ build tools if you need them. These are from https://visualstudio.microsoft.com/visual-cpp-build-tools/ You need to check the Workloads tab and select C++ build tools.\n\nYou can then install the gemmi package again.\nI think that’s it! You can now download the RStudio project and run each chunk in the quarto document.\nThere is an example RStudio project here: structure-analysis. You can also download the project as a zip file from there but there is some code that will do that automatically for you. Since this is an RStudio Project, do not run the code from inside a project. You may want to navigate to a particular directory or edit the destdir:\n\nusethis::use_course(url = \"3mmaRand/structure-analysis\", destdir = \".\")\n\nYou can agree to deleting the zip. You should find RStudio restarts and you have a new project called structure-analysis-xxxxxx. The xxxxxx is a commit reference - you do not need to worry about that, it is just a way to tell you which version of the repo you downloaded.\nYou should be able to open the antibody_mimetics_workshop_3.qmd file and run each chunk. You can also knit the document to html."
- },
- {
- "objectID": "structures/structures.html#programmatic-protein-structure-analysis",
- "href": "structures/structures.html#programmatic-protein-structure-analysis",
- "title": "Structure Data Analysis for Group Project",
+ "objectID": "omics/week-4/study_after_workshop.html",
+ "href": "omics/week-4/study_after_workshop.html",
+ "title": "Independent Study to consolidate this week",
"section": "",
- "text": "There is an RStudio project containing a Quarto version of the the Antibody Mimetics Workshop by Michael Plevin & Jon Agirre. Instructions to obtain the RStudio project are at the bottom of this document after the set up instructions.\nYou might find RStudio useful for Python because you are already familiar with it. It is also a good way to create Quarto documents with code chunks in more than one language. Quarto documents can be used in RStudio, VS Code or Jupyter notebooks\nSome set up is required before you will be able to execute code in antibody_mimetics_workshop_3.qmd. This in contrast to the Colab notebook which is a cloud-based Jupyter notebook and does not require any set up (except installing packages).\n\n🎬 If using your own machine, install Python from https://www.python.org/downloads/. This should not be necessary if you are using a university machine where Python is already installed.\n🎬 If using your own machine and you did not install Quarto in the Core 1 workshop, install it now from https://quarto.org/docs/get-started/. This should not be necessary if you are using a university machine where quarto is already installed.\n🎬 Open RStudio and check you are using a “Git bash” Terminal: Tools | Global Options| Terminal | New Terminal opens with… . If the option to choose Git bash, you will need to install Git from https://git-scm.com/downloads. Quit RStudio first. This should not be necessary if you are using a university machine where Git bash is already installed.\n🎬 If on your own machine: In RStudio, install the quarto and the recticulate packages. This should not be necessary if you are using a university machine where these packages are already installed.\n🎬 Whether you are using your own machine or a university machine, you need to install some python packages. In RStudio and go to the Terminal window (behind the Console window). Run the following commands in the Terminal window:\npython -m pip install --upgrade pip setuptools wheel\nYou may get these warnings about scripts not being on the path. You can ignore these.\n WARNING: The script wheel.exe is installed in 'C:\\Users\\er13\\AppData\\Roaming\\Python\\Python39\\Scripts' which is not on PATH.\n Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n WARNING: The scripts pip.exe, pip3.11.exe, pip3.9.exe and pip3.exe are installed in 'C:\\Users\\er13\\AppData\\Roaming\\Python\\Python39\\Scripts' which is not on PATH.\n Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\nERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\nspyder 5.1.5 requires pyqt5<5.13, which is not installed.\nspyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.\nconda-repo-cli 1.0.4 requires pathlib, which is not installed.\nanaconda-project 0.10.2 requires ruamel-yaml, which is not installed.\nSuccessfully installed pip-23.3.1 setuptools-69.0.2 wheel-0.41.3\npython -m pip install session_info\npython -m pip install wget\npython -m pip install gemmi\nNote: On my windows laptop at home, I also had to install C++ Build Tools to be able to install the gemmi python package. If this is true for you, you will get a fail message telling you to install C++ build tools if you need them. These are from https://visualstudio.microsoft.com/visual-cpp-build-tools/ You need to check the Workloads tab and select C++ build tools.\n\nYou can then install the gemmi package again.\nI think that’s it! You can now download the RStudio project and run each chunk in the quarto document.\nThere is an example RStudio project here: structure-analysis. You can also download the project as a zip file from there but there is some code that will do that automatically for you. Since this is an RStudio Project, do not run the code from inside a project. You may want to navigate to a particular directory or edit the destdir:\n\nusethis::use_course(url = \"3mmaRand/structure-analysis\", destdir = \".\")\n\nYou can agree to deleting the zip. You should find RStudio restarts and you have a new project called structure-analysis-xxxxxx. The xxxxxx is a commit reference - you do not need to worry about that, it is just a way to tell you which version of the repo you downloaded.\nYou should be able to open the antibody_mimetics_workshop_3.qmd file and run each chunk. You can also knit the document to html."
+ "text": "You need only do the section for your own project data\n🐸 Frogs\n🎬 Open your frogs-88H Project and script you began in the Consolidation study last week. This is likely to be cont-fgf-s20.R or cont-fgf-s14.R. Use the differential expression analysis you did in the workshop (in cont-fgf-s30.R) as a template to continue your script.\n🐭 Mice\n🎬 Open your mice-88H Project. Make a new script and, using hspc-prog.R as a template, repeat the analysis on a different comparisons.\n🍂 xxxx\n🎬 Follow one of the other examples"
},
{
- "objectID": "core/week-2/workshop.html",
- "href": "core/week-2/workshop.html",
- "title": "Workshop",
- "section": "",
- "text": "In this workshop you will"
+ "objectID": "omics/week-4/study_before_workshop.html#overview",
+ "href": "omics/week-4/study_before_workshop.html#overview",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Overview",
+ "text": "Overview\nIn these slides we will:\n\n\nCheck where you are\n\nlearn some concepts in differential expression\n\nlog2 fold changes\nMultiple correction\nnormalisation\nstatistical model\n\n\nFind out what packages to install before the workshop"
},
{
- "objectID": "core/week-2/workshop.html#session-overview",
- "href": "core/week-2/workshop.html#session-overview",
- "title": "Workshop",
- "section": "",
- "text": "In this workshop you will"
+ "objectID": "omics/week-4/study_before_workshop.html#what-we-did-in-omics-1-hello-data",
+ "href": "omics/week-4/study_before_workshop.html#what-we-did-in-omics-1-hello-data",
+ "title": "Independent Study to prepare for workshop",
+ "section": "What we did in Omics 1: 👋 Hello data!",
+ "text": "What we did in Omics 1: 👋 Hello data!\n\n\n\nDiscovered how many rows and columns we had in our datasets and what these were.\nExamined the distribution\n\nof values across the whole dataset\nof values across the samples/cells (i.e., averaged across genes) to see variation between samples/cells\nof values across the genes (i.e., averaged across samples/cells) to see variation between genes\n\n\nSaved files of filtered or summarised data."
},
{
- "objectID": "core/week-2/workshop.html#omics",
- "href": "core/week-2/workshop.html#omics",
- "title": "Workshop",
- "section": "Omics",
- "text": "Omics\n\ngene/transcript/protein/metabolite expression\ntranscriptomics 1\ntranscriptomics 2\nproteomics"
+ "objectID": "omics/week-4/study_before_workshop.html#where-should-you-be-1",
+ "href": "omics/week-4/study_before_workshop.html#where-should-you-be-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Where should you be?",
+ "text": "Where should you be?\nAfter the Omics 1: 👋 Hello data! Workshop including:\n\n🤗 Look after future you! and\nthe Independent Study to consolidate, you should have:"
},
{
- "objectID": "core/week-2/workshop.html#images",
- "href": "core/week-2/workshop.html#images",
- "title": "Workshop",
- "section": "Images",
- "text": "Images\ncontrol_merged.tif\nlibrary(ijtiff)\nimg <- read_tif(\"data/control_merged.tif\")\nimg\n\nan image at least one and usually more matrices of numbers representing the intensity of light at each pixel in the image\nthe number of matrices depends on the number of ‘channels’ in the image\na channel is a colour in the image\na frame is a single image in a series of images\nwe might normally call this a multi-dimensional array: x and y coordinates of the pixels are 2 dimensions, the channel is the third dimension and time is the forth dimension\n\ndisplay(img)"
+ "objectID": "omics/week-4/study_before_workshop.html#frogs",
+ "href": "omics/week-4/study_before_workshop.html#frogs",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🐸 Frogs",
+ "text": "🐸 Frogs\n\n\nAn RStudio Project called frogs-88H which contains:\n\nRaw data (S14, S20 and S30)\nProcessed data (s30_filtered.csv, s30_summary_gene.csv, s30_summary_gene_filtered.csv, s30_summary_samp.csv and equivalents for S14 OR S20)\nTwo scripts called cont-fgf-s30.R and cont-fgf-s20.R OR cont-fgf-s14.R\n\n\n\n\n\nFiles should be organised into folders. Code should well commented and easy to read."
},
{
- "objectID": "core/week-2/workshop.html#structure",
- "href": "core/week-2/workshop.html#structure",
- "title": "Workshop",
- "section": "Structure",
- "text": "Structure\n1cq2.pdb"
+ "objectID": "omics/week-4/study_before_workshop.html#mice",
+ "href": "omics/week-4/study_before_workshop.html#mice",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🐭 Mice",
+ "text": "🐭 Mice\n\nAn RStudio Project called mice-88H which contains\n\nRaw data (hspc, prog, lthsc)\nProcessed data (hspc_summary_gene.csv, hspc_summary_samp.csv, prog_summary_gene.csv, prog_summary_samp.csv)\n\n\nOne script called hspc-prog.R\n\n\nFiles should be organised into folders. Code should well commented and easy to read."
},
{
- "objectID": "core/week-2/workshop.html#the-command-line",
- "href": "core/week-2/workshop.html#the-command-line",
- "title": "Workshop",
- "section": "The command line",
- "text": "The command line\nThe command line - or shell - is a text interface for your computer. It’s a program that takes in commands, which it passes on to the computer’s operating system to run.\n\nWindows PowerShell is a command-line in windows. It uses bash-like commands unlike the Command Prompt which uses dos commands (a sort of windows only language). You can open is by going to Start | Windows PowerShell or by searching for it in the search bar.\nTerminal is the command line in Mac OS X. You can open it by going to Applications | Utilities | Terminal or by searching for it in the Spotlight search bar.\ngit bash. I used the bash shell that comes with Git"
+ "objectID": "omics/week-4/study_before_workshop.html#section",
+ "href": "omics/week-4/study_before_workshop.html#section",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🍂",
+ "text": "🍂\nEither of the other examples."
},
{
- "objectID": "core/week-2/workshop.html#rstudio-terminal",
- "href": "core/week-2/workshop.html#rstudio-terminal",
- "title": "Workshop",
- "section": "RStudio terminal",
- "text": "RStudio terminal\nThe RStudio terminal is a convenient interface to the shell without leaving RStudio. It is useful for running commands that are not available in R. For example, you can use it to run other programs like fasqc, git, ftp, ssh\nNavigating your file system\nSeveral commands are frequently used to create, inspect, rename, and delete files and directories.\n$\nThe dollar sign is the prompt (like > on the R console), which shows us that the shell is waiting for input.\nYou can find out where you are using the pwd command, which stands for “print working directory”.\n\npwd\n\n/home/runner/work/BIO00088H-data/BIO00088H-data/core/week-2\n\n\nYou can find out what you can see with ls which stands for “list”.\n\nls\n\ndata\nimages\noverview.qmd\nstudy_after_workshop.html\nstudy_after_workshop.qmd\nstudy_before_workshop.html\nstudy_before_workshop.ipynb\nstudy_before_workshop.qmd\nworkshop.html\nworkshop.qmd\nworkshop.rmarkdown\nworkshop_files\n\n\nYou might have noticed that unlike R, the commands do not have brackets after them. Instead, options (or switches) are given after the command. For example, we can modify the ls command to give us more information with the -l option, which stands for “long”.\n\nls -l\n\ntotal 228\ndrwxr-xr-x 2 runner docker 4096 Dec 5 12:12 data\ndrwxr-xr-x 2 runner docker 4096 Dec 5 12:12 images\n-rw-r--r-- 1 runner docker 1597 Dec 5 12:12 overview.qmd\n-rw-r--r-- 1 runner docker 25553 Dec 5 12:16 study_after_workshop.html\n-rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd\n-rw-r--r-- 1 runner docker 70839 Dec 5 12:16 study_before_workshop.html\n-rw-r--r-- 1 runner docker 4807 Dec 5 12:12 study_before_workshop.ipynb\n-rw-r--r-- 1 runner docker 13029 Dec 5 12:12 study_before_workshop.qmd\n-rw-r--r-- 1 runner docker 58063 Dec 5 12:12 workshop.html\n-rw-r--r-- 1 runner docker 8550 Dec 5 12:12 workshop.qmd\n-rw-r--r-- 1 runner docker 8564 Dec 5 12:16 workshop.rmarkdown\ndrwxr-xr-x 3 runner docker 4096 Dec 5 12:12 workshop_files\n\n\nYou can use more than one option at once. The -h option stands for “human readable” and makes the file sizes easier to understand for humans:\n\nls -hl\n\ntotal 228K\ndrwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 data\ndrwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 images\n-rw-r--r-- 1 runner docker 1.6K Dec 5 12:12 overview.qmd\n-rw-r--r-- 1 runner docker 25K Dec 5 12:16 study_after_workshop.html\n-rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd\n-rw-r--r-- 1 runner docker 70K Dec 5 12:16 study_before_workshop.html\n-rw-r--r-- 1 runner docker 4.7K Dec 5 12:12 study_before_workshop.ipynb\n-rw-r--r-- 1 runner docker 13K Dec 5 12:12 study_before_workshop.qmd\n-rw-r--r-- 1 runner docker 57K Dec 5 12:12 workshop.html\n-rw-r--r-- 1 runner docker 8.4K Dec 5 12:12 workshop.qmd\n-rw-r--r-- 1 runner docker 8.4K Dec 5 12:16 workshop.rmarkdown\ndrwxr-xr-x 3 runner docker 4.0K Dec 5 12:12 workshop_files\n\n\nThe -a option stands for “all” and shows us all the files, including hidden files.\n\nls -alh\n\ntotal 236K\ndrwxr-xr-x 5 runner docker 4.0K Dec 5 12:16 .\ndrwxr-xr-x 6 runner docker 4.0K Dec 5 12:16 ..\ndrwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 data\ndrwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 images\n-rw-r--r-- 1 runner docker 1.6K Dec 5 12:12 overview.qmd\n-rw-r--r-- 1 runner docker 25K Dec 5 12:16 study_after_workshop.html\n-rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd\n-rw-r--r-- 1 runner docker 70K Dec 5 12:16 study_before_workshop.html\n-rw-r--r-- 1 runner docker 4.7K Dec 5 12:12 study_before_workshop.ipynb\n-rw-r--r-- 1 runner docker 13K Dec 5 12:12 study_before_workshop.qmd\n-rw-r--r-- 1 runner docker 57K Dec 5 12:12 workshop.html\n-rw-r--r-- 1 runner docker 8.4K Dec 5 12:12 workshop.qmd\n-rw-r--r-- 1 runner docker 8.4K Dec 5 12:16 workshop.rmarkdown\ndrwxr-xr-x 3 runner docker 4.0K Dec 5 12:12 workshop_files\n\n\nYou can move about with the cd command, which stands for “change directory”. You can use it to move into a directory by specifying the path to the directory:\n\ncd data\npwd\ncd ..\npwd\ncd data\npwd\n\n/home/runner/work/BIO00088H-data/BIO00088H-data/core/week-2/data\n/home/runner/work/BIO00088H-data/BIO00088H-data/core/week-2\n/home/runner/work/BIO00088H-data/BIO00088H-data/core/week-2/data\n\n\nhead 1cq2.pdb\nHEADER OXYGEN STORAGE/TRANSPORT 04-AUG-99 1CQ2 \nTITLE NEUTRON STRUCTURE OF FULLY DEUTERATED SPERM WHALE MYOGLOBIN AT 2.0 \nTITLE 2 ANGSTROM \nCOMPND MOL_ID: 1; \nCOMPND 2 MOLECULE: MYOGLOBIN; \nCOMPND 3 CHAIN: A; \nCOMPND 4 ENGINEERED: YES; \nCOMPND 5 OTHER_DETAILS: PROTEIN IS FULLY DEUTERATED \nSOURCE MOL_ID: 1; \nSOURCE 2 ORGANISM_SCIENTIFIC: PHYSETER CATODON; \nhead -20 data/1cq2.pdb\nHEADER OXYGEN STORAGE/TRANSPORT 04-AUG-99 1CQ2 \nTITLE NEUTRON STRUCTURE OF FULLY DEUTERATED SPERM WHALE MYOGLOBIN AT 2.0 \nTITLE 2 ANGSTROM \nCOMPND MOL_ID: 1; \nCOMPND 2 MOLECULE: MYOGLOBIN; \nCOMPND 3 CHAIN: A; \nCOMPND 4 ENGINEERED: YES; \nCOMPND 5 OTHER_DETAILS: PROTEIN IS FULLY DEUTERATED \nSOURCE MOL_ID: 1; \nSOURCE 2 ORGANISM_SCIENTIFIC: PHYSETER CATODON; \nSOURCE 3 ORGANISM_COMMON: SPERM WHALE; \nSOURCE 4 ORGANISM_TAXID: 9755; \nSOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; \nSOURCE 6 EXPRESSION_SYSTEM_TAXID: 562; \nSOURCE 7 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID; \nSOURCE 8 EXPRESSION_SYSTEM_PLASMID: PET15A \nKEYWDS HELICAL, GLOBULAR, ALL-HYDROGEN CONTAINING STRUCTURE, OXYGEN STORAGE- \nKEYWDS 2 TRANSPORT COMPLEX \nEXPDTA NEUTRON DIFFRACTION \nAUTHOR F.SHU,V.RAMAKRISHNAN,B.P.SCHOENBORN \nless 1cq2.pdb\nless is a program that displays the contents of a file, one page at a time. It is useful for viewing large files because it does not load the whole file into memory before displaying it. Instead, it reads and displays a few lines at a time. You can navigate forward through the file with the spacebar, and backwards with the b key. Press q to quit.\nA wildcard is a character that can be used as a substitute for any of a class of characters in a search, The most common wildcard characters are the asterisk (*) and the question mark (?).\nls *.csv\ncp stands for “copy”. You can copy a file from one directory to another by giving cp the path to the file you want to copy and the path to the destination directory.\ncp 1cq2.pdb copy_of_1cq2.pdb\ncp 1cq2.pdb ../copy_of_1cq2.pdb\ncp 1cq2.pdb ../bob.txt\nTo delete a file use the rm command, which stands for “remove”.\nrm ../bob.txt\nbut be careful because the file will be gone forever. There is no “are you sure?” or undo.\nTo move a file from one directory to another, use the mv command. mv works like cp except that it also deletes the original file.\nmv ../copy_of_1cq2.pdb .\nMake a directory\nmkdir mynewdir"
+ "objectID": "omics/week-4/study_before_workshop.html#if-you-do-not-have-those",
+ "href": "omics/week-4/study_before_workshop.html#if-you-do-not-have-those",
+ "title": "Independent Study to prepare for workshop",
+ "section": "If you do not have those",
+ "text": "If you do not have those\nGo through:\n\nOmics 1: 👋 Hello data! Workshop including:\n🤗 Look after future you! and\nthe Independent Study to consolidate"
},
{
- "objectID": "core/week-2/workshop.html#differences-between-r-and-python",
- "href": "core/week-2/workshop.html#differences-between-r-and-python",
- "title": "Workshop",
- "section": "Differences between R and python",
- "text": "Differences between R and python\nDemo\nYou’re finished!"
+ "objectID": "omics/week-4/study_before_workshop.html#differential-expression-1",
+ "href": "omics/week-4/study_before_workshop.html#differential-expression-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Differential expression",
+ "text": "Differential expression\n\n\nThe goal of differential expression is to test whether there is a significant difference in gene expression between groups.\nA large number of computational methods have been developed for differential expression analysis\nR is the leading language for differential expression analysis"
},
{
- "objectID": "core/week-2/study_before_workshop.html#overview",
- "href": "core/week-2/study_before_workshop.html#overview",
+ "objectID": "omics/week-4/study_before_workshop.html#differential-expression-2",
+ "href": "omics/week-4/study_before_workshop.html#differential-expression-2",
"title": "Independent Study to prepare for workshop",
- "section": "Overview",
- "text": "Overview\n\nRStudio Projects revisited\n\nusing usethis package\nAdding a README\n\n\nFormatting code\nCode algorithmically / algebraically."
+ "section": "Differential expression",
+ "text": "Differential expression\n\n\nthe statistical concepts are very similar to those you have already encountered in stages 1 and 2\nyou are essentially doing paired- or independent-samples tests\nbut you are doing a lot of them! One for every gene\ndata need normalisation before comparison"
},
{
- "objectID": "core/week-2/study_before_workshop.html#reproducibility-is-a-continuum",
- "href": "core/week-2/study_before_workshop.html#reproducibility-is-a-continuum",
+ "objectID": "omics/week-4/study_before_workshop.html#statistical-concepts",
+ "href": "omics/week-4/study_before_workshop.html#statistical-concepts",
"title": "Independent Study to prepare for workshop",
- "section": "Reproducibility is a continuum",
- "text": "Reproducibility is a continuum\nSome is better than none!\n\nOrganise your project\n\nScript everything.\n\nFormat code and follow a consistent style.\n\nCode algorithmically\nModularise your code: organise into sections and scripts\nDocument your project - commenting, READMEs\nUse literate programming e.g., R Markdown or Quarto\n\n\n\nMore advanced: Version control, continuous integration, environments, containers"
+ "section": "Statistical concepts",
+ "text": "Statistical concepts\nLike familiar tests:\n\n\nthe type of test (the function) you use depends on the type of data you have and the type of assumptions you want to make\nthe tests work by comparing the variation between groups to the variation within groups.\nyou will get: the difference between groups, a test statistic, and a p-value\nyou also get an adjusted p-value which is the ‘correction’ for multiple testing"
},
{
- "objectID": "core/week-2/study_before_workshop.html#rstudio-projects",
- "href": "core/week-2/study_before_workshop.html#rstudio-projects",
+ "objectID": "omics/week-4/study_before_workshop.html#the-difference-between-groups",
+ "href": "omics/week-4/study_before_workshop.html#the-difference-between-groups",
"title": "Independent Study to prepare for workshop",
- "section": "RStudio Projects",
- "text": "RStudio Projects\n\n\nWe used RStudio Projects in stage one but they are so useful, it is worth covering them again in case you are not yet using them.\nWe will also cover the usethisworkflow to create an RStudio Project.\nRStudio Projects make it easy to manage working directories and paths because they set the working directory to the RStudio Projects directory automatically."
+ "section": "The difference between groups",
+ "text": "The difference between groups\n\n\nThe difference between groups is given as the log2 fold change in expression between groups\nA fold change is the expression in one group divided by the expression in the other group\nwe use fold changes because the absolute expression values may not be accurate and relative changes are what matters\nwe use log2 fold changes because they are symmetrical around 0"
},
{
- "objectID": "core/week-2/study_before_workshop.html#rstudio-projects-1",
- "href": "core/week-2/study_before_workshop.html#rstudio-projects-1",
+ "objectID": "omics/week-4/study_before_workshop.html#log2-fold-change",
+ "href": "omics/week-4/study_before_workshop.html#log2-fold-change",
"title": "Independent Study to prepare for workshop",
- "section": "RStudio Projects",
- "text": "RStudio Projects\n\n\n\n-- stem_cell_rna\n |__stem_cell_rna.Rproj \n |__raw_ data/ \n |__2019-03-21_donor_1.csv\n |__README. md\n |__R/\n |__01_data_processing.R\n |__02_exploratory.R\n |__functions/\n |__theme_volcano.R\n |__normalise.R\n\n\nThe project directory is the folder at the top 1\n\n\nThanks to Mine Çetinkaya-Rundel who helped me work out how to highlight a line https://gist.github.com/mine-cetinkaya-rundel/3af3415eab70a65be3791c3dcff6e2e3. Note to futureself: the engine: knitr matters."
+ "section": "log2 fold change",
+ "text": "log2 fold change\n\n\nlog2 means log to the base 2\nSuppose the expression in group A is 5 and the expression in group B is 8\nA/B = 5/8 = 0.625 and B/A = 8/5 = 1.6\nIf B is greater than A the range of A/B is 0 to 1 but the range of B/A is 1 to infinity\nHowever, if we take the log2 of A/B we get -0.678 and the log2 of B/A is 0.678."
},
{
- "objectID": "core/week-2/study_before_workshop.html#rstudio-projects-2",
- "href": "core/week-2/study_before_workshop.html#rstudio-projects-2",
+ "objectID": "omics/week-4/study_before_workshop.html#adjusted-p-value",
+ "href": "omics/week-4/study_before_workshop.html#adjusted-p-value",
"title": "Independent Study to prepare for workshop",
- "section": "RStudio Projects",
- "text": "RStudio Projects\n\n\n\n-- stem_cell_rna\n |__stem_cell_rna.Rproj \n |__raw_ data/ \n |__2019-03-21_donor_1.csv\n |__README. md\n |__R/\n |__01_data_processing.R\n |__02_exploratory.R\n |__functions/\n |__theme_volcano.R\n |__normalise.R\n\n\nthe .RProj file is directly under the project folder. Its presence is what makes the folder an RStudio Project"
+ "section": "Adjusted p-value",
+ "text": "Adjusted p-value\n\n\nThe p-value has to be adjusted because of the number of tested being done\nIn stage 1, we used Tukey’s HSD to adjust for multiple testing following an ANOVA\nHere the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995) is used to adjust for multiple testing\nBH controls the False Discovery Rate (FDR)\nThe FDR is the proportion of false positives among the genes called significant"
},
{
- "objectID": "core/week-2/study_before_workshop.html#rstudio-projects-3",
- "href": "core/week-2/study_before_workshop.html#rstudio-projects-3",
+ "objectID": "omics/week-4/study_before_workshop.html#normalisation",
+ "href": "omics/week-4/study_before_workshop.html#normalisation",
"title": "Independent Study to prepare for workshop",
- "section": "RStudio Projects",
- "text": "RStudio Projects\n\n\nWhen you open an RStudio Project, the working directory is set to the Project directory (i.e., the location of the .Rproj file).\nWhen you use an RStudio Project you do not need to use setwd()\nWhen someone, including future you, opens the project on another machine, all the paths just work."
+ "section": "Normalisation",
+ "text": "Normalisation\n\n\nNormalisation adjusts raw counts to account for factors that prevent direct comparisons\nNormalisation usually influences the experimental design as well as the analysis\nThe 🐭 mouse data have been normalised to simplify the analysis for you; the 🐸 frog data have not but the DE method will do this for you.\nNormalisation is a big topic. See Düren, Lederer, and Qin (2022); Bullard et al. (2010); Lytal, Ran, and An (2020); Abrams et al. (2019); Vallejos et al. (2017); Evans, Hardin, and Stoebel (2017)"
},
{
- "objectID": "core/week-2/study_before_workshop.html#rstudio-projects-4",
- "href": "core/week-2/study_before_workshop.html#rstudio-projects-4",
+ "objectID": "omics/week-4/study_before_workshop.html#type-of-test-the-function",
+ "href": "omics/week-4/study_before_workshop.html#type-of-test-the-function",
"title": "Independent Study to prepare for workshop",
- "section": "RStudio Projects",
- "text": "RStudio Projects\n\nJenny BryanIn the words of Jenny Bryan:\n\n“If the first line of your R script is setwd(”C:/Users/jenny/path/that/only/I/have”) I will come into your office and SET YOUR COMPUTER ON FIRE”"
+ "section": "Type of test (the function)",
+ "text": "Type of test (the function)\n\n\nA large number of computational methods have been developed for differential expression analysis\nMethods vary in the types of normalisation they do, the statistical model they use, and the assumptions they make\nSome of the most well-known methods are provided by: DESeq2 (Love, Huber, and Anders 2014), edgeR (Robinson, McCarthy, and Smyth 2010; McCarthy, Chen, and Smyth 2012; Chen, Lun, and Smyth 2016), limma (Ritchie et al. 2015) and scran (Lun, McCarthy, and Marioni 2016)"
},
{
- "objectID": "core/week-2/study_before_workshop.html#creating-an-rstudio-project",
- "href": "core/week-2/study_before_workshop.html#creating-an-rstudio-project",
+ "objectID": "omics/week-4/study_before_workshop.html#type-of-test-the-function-1",
+ "href": "omics/week-4/study_before_workshop.html#type-of-test-the-function-1",
"title": "Independent Study to prepare for workshop",
- "section": "Creating an RStudio Project",
- "text": "Creating an RStudio Project\nThere are two ways to create an RStudio Project.\n\nUsing one of the two menus\nUsing the usethis package"
+ "section": "Type of test (the function)",
+ "text": "Type of test (the function)\n\n\n\nDESeq2 and edgeR\n\nboth require raw counts as input\nboth assume that most genes are not DE\nboth use a negative binomial distribution1 to model the data\nuse slightly different normalisation methods: DESeq2 uses the median of ratios method; edgeR uses the trimmed mean of M values (TMM) method\n\n\n\n\nA discrete distribution for counts, similar to the Poisson distribution"
},
{
- "objectID": "core/week-2/study_before_workshop.html#using-a-menu",
- "href": "core/week-2/study_before_workshop.html#using-a-menu",
+ "objectID": "omics/week-4/study_before_workshop.html#type-of-test-the-function-2",
+ "href": "omics/week-4/study_before_workshop.html#type-of-test-the-function-2",
"title": "Independent Study to prepare for workshop",
- "section": "Using a menu",
- "text": "Using a menu\nThere are two menus:\n\nTop left, File menu\nTop Right, drop-down indicated by the .RProj icon\n\nThey both do the same thing.\nIn both cases you choose: New Project | New Directory | New Project\n\nMake sure you “Browse” to the folder you want to create the project."
+ "section": "Type of test (the function)",
+ "text": "Type of test (the function)\n\n\nscran\n\nworks on normalized log-expression values\nperforms Welch t-tests"
},
{
- "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-1",
- "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-1",
+ "objectID": "omics/week-4/study_before_workshop.html#meta-data",
+ "href": "omics/week-4/study_before_workshop.html#meta-data",
"title": "Independent Study to prepare for workshop",
- "section": "Using the usethis package",
- "text": "Using the usethis package\nI occasionally use the menu but I mostly use the usethis package.\n\n🎬 Go to RStudio and check your working directory:\n\ngetwd()\n\n\"C:/Users/er13/Desktop\"\n\n\n❔ Is your working directory a good place to create a Project folder?"
+ "section": "Meta data",
+ "text": "Meta data\n\n\nDE methods require two types of data: the expression data and the meta data\nThe meta data is the information about the samples\nIt says which samples (columns) are in which group (s)\nIt is usually stored in a separate file"
},
{
- "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-2",
- "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-2",
+ "objectID": "omics/week-4/study_before_workshop.html#data",
+ "href": "omics/week-4/study_before_workshop.html#data",
"title": "Independent Study to prepare for workshop",
- "section": "Using the usethis package",
- "text": "Using the usethis package\nIf this is a good place to create a Project directory then…\n🎬 Create a project with:\n\nusethis::create_project(\"bananas\")"
+ "section": "🐸 Data",
+ "text": "🐸 Data\n\nExpression for the whole transcriptome X. laevis v10.1 genome assembly\nValues are raw counts\nThe statistical analysis method we will use DESeq2 (Love, Huber, and Anders 2014) requires raw counts and performs the normalisation itself"
},
{
- "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-3",
- "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-3",
+ "objectID": "omics/week-4/study_before_workshop.html#data-1",
+ "href": "omics/week-4/study_before_workshop.html#data-1",
"title": "Independent Study to prepare for workshop",
- "section": "Using the usethis package",
- "text": "Using the usethis package\nOtherwise\nIf you want the project directory elsewhere, you will need to give the relative path, e.g.\n\nusethis::create_project(\"../Documents/bananas\")"
+ "section": "🐭 Data",
+ "text": "🐭 Data\n\nExpression for a subset of genes, the surfaceome\nValues are log2 normalised values\nThe statistical analysis method we will use scran (Lun, McCarthy, and Marioni 2016) requires normalised values"
},
{
- "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-4",
- "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-4",
+ "objectID": "omics/week-4/study_before_workshop.html#packages-to-install-before-the-workshop",
+ "href": "omics/week-4/study_before_workshop.html#packages-to-install-before-the-workshop",
"title": "Independent Study to prepare for workshop",
- "section": "Using the usethis package",
- "text": "Using the usethis package\nThe output will look like this and a new RStudio session will start.\n> usethis::create_project(\"bananas\")\n√ Creating 'bananas/'\n√ Setting active project to 'C:/Users/er13/Desktop/bananas'\n√ Creating 'R/'\n√ Writing 'bananas.Rproj'\n√ Adding '.Rproj.user' to '.gitignore'\n√ Opening 'C:/Users/er13/Desktop/bananas/' in new RStudio session\n√ Setting active project to '<no active project>'"
+ "section": "Packages to install before the workshop",
+ "text": "Packages to install before the workshop\nBiocManager from CRAN in the the normal way:\n\ninstall.packages(\"BiocManager\")\n\nDESeq2 from Bioconductor using BiocManager:\n\nBiocManager::install(\"DESeq2\")\n\nscran from Bioconductor using BiocManager:\n\nBiocManager::install(\"scran\")"
},
{
- "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-5",
- "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-5",
+ "objectID": "omics/week-4/study_before_workshop.html#workshops-1",
+ "href": "omics/week-4/study_before_workshop.html#workshops-1",
"title": "Independent Study to prepare for workshop",
- "section": "Using the usethis package",
- "text": "Using the usethis package\nWhen you create a new RStudio Project with usethis:\n\n\nA folder called bananas/ is created\nRStudio starts a new session in bananas/ i.e., your working directory is now bananas/\n\nA folder called R/ is created\nA file called bananas.Rproj is created\nA file called .gitignore is created\nA hidden directory called .Rproj.user is created"
+ "section": "Workshops",
+ "text": "Workshops\n\nOmics 1: Hello data Getting to know the data. Checking the distributions of values\nOmics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments.\nOmics 3: Visualising and Interpreting. PCA, Volcano plots and heatmaps to visualise results. Interpreting the results and finding out more about genes of interest."
},
{
- "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-6",
- "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-6",
+ "objectID": "omics/week-4/study_before_workshop.html#references",
+ "href": "omics/week-4/study_before_workshop.html#references",
"title": "Independent Study to prepare for workshop",
- "section": "Using the usethis package",
- "text": "Using the usethis package\n\n\nthe .Rproj file is what makes the directory an RStudio Project\nthe Rproj.user directory is where project-specific temporary files are stored. You don’t need to mess with it.\nthe .gitignore is used for version controlled projects. If not using git, you can ignore it."
+ "section": "References",
+ "text": "References\n\n\n🔗 About Omics 2: Statistical Analysis\n\n\n\nAbrams, Zachary B., Travis S. Johnson, Kun Huang, Philip R. O. Payne, and Kevin Coombes. 2019. “A Protocol to Evaluate RNA Sequencing Normalization Methods.” BMC Bioinformatics 20 (24): 679. https://doi.org/10.1186/s12859-019-3247-x.\n\n\nBenjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” J. R. Stat. Soc. Series B Stat. Methodol. 57 (1): 289–300. http://www.jstor.org/stable/2346101.\n\n\nBullard, James H., Elizabeth Purdom, Kasper D. Hansen, and Sandrine Dudoit. 2010. “Evaluation of Statistical Methods for Normalization and Differential Expression in mRNA-Seq Experiments.” BMC Bioinformatics 11 (1): 94. https://doi.org/10.1186/1471-2105-11-94.\n\n\nChen, Yunshun, Aaron T. L. Lun, and Gordon K. Smyth. 2016. “From Reads to Genes to Pathways: Differential Expression Analysis of RNA-Seq Experiments Using Rsubread and the edgeR Quasi-Likelihood Pipeline.” https://doi.org/10.12688/f1000research.8987.2.\n\n\nDüren, Yannick, Johannes Lederer, and Li-Xuan Qin. 2022. “Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method.” Nucleic Acids Research 50 (10): e56. https://doi.org/10.1093/nar/gkac064.\n\n\nEvans, Ciaran, Johanna Hardin, and Daniel M Stoebel. 2017. “Selecting Between-Sample RNA-Seq Normalization Methods from the Perspective of Their Assumptions.” Briefings in Bioinformatics 19 (5): 776–92. https://doi.org/10.1093/bib/bbx008.\n\n\nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2” 15: 550. https://doi.org/10.1186/s13059-014-0550-8.\n\n\nLun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor” 5: 2122. https://doi.org/10.12688/f1000research.9501.2.\n\n\nLytal, Nicholas, Di Ran, and Lingling An. 2020. “Normalization Methods on Single-Cell RNA-Seq Data: An Empirical Survey.” Frontiers in Genetics 11. https://www.frontiersin.org/articles/10.3389/fgene.2020.00041.\n\n\nMcCarthy, Davis J., Yunshun Chen, and Gordon K. Smyth. 2012. “Differential Expression Analysis of Multifactor RNA-Seq Experiments with Respect to Biological Variation.” Nucleic Acids Research 40 (10): 4288–97. https://doi.org/10.1093/nar/gks042.\n\n\nRitchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi, and Gordon K. Smyth. 2015. “Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.\n\n\nRobinson, Mark D., Davis J. McCarthy, and Gordon K. Smyth. 2010. “edgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26 (1): 139–40. https://doi.org/10.1093/bioinformatics/btp616.\n\n\nVallejos, Catalina A., Davide Risso, Antonio Scialdone, Sandrine Dudoit, and John C. Marioni. 2017. “Normalizing Single-Cell RNA Sequencing Data: Challenges and Opportunities.” Nature Methods 14 (6): 565–71. https://doi.org/10.1038/nmeth.4292."
},
{
- "objectID": "core/week-2/study_before_workshop.html#opening-and-closing",
- "href": "core/week-2/study_before_workshop.html#opening-and-closing",
- "title": "Independent Study to prepare for workshop",
- "section": "Opening and closing",
- "text": "Opening and closing\nYou can close an RStudio Project with ONE of:\n\nFile | Close Project\nUsing the drop-down option on the far right of the tool bar where you see the Project name\n\n\nYou can open an RStudio Project with ONE of:\n\nFile | Open Project or File | Recent Projects\n\nUsing the drop-down option on the far right of the tool bar where you see the Project name\n\nDouble-clicking an .Rproj file from your file explorer/finder\n\nWhen you open project, a new R session starts."
+ "objectID": "omics/kelly/workshop.html",
+ "href": "omics/kelly/workshop.html",
+ "title": "Kelly",
+ "section": "",
+ "text": "VFAs from AD vials\n\nTwo treatments: straw (CN10) and water (NC)\n10 time points: 1, 3, 5, 9, 11, 13, 16, 18, 20, 22\nthree replicates per treatment per time point\n2 x 10 x 3 = 60 groups\n8 VFA with concentration in mM (millimolar): acetate, propanoate, isobutyrate, butyrate, isopentanoate, pentanoate, isohexanoate, hexanoate\n\nTo calculate from this data\n\nRecalculate the data into grams per litre\n\nconvert to molar: 1 millimolar to molar = 0.001 molar\nmultiply by the molecular weight of each VFA\n\n\nCalculate Change in VFA g/l with time\nCalculate the percent representation of each VFA, by mM and by weight\n\n\n8 VFA in mM for 60 samples vfa.csv\nMolecular weights for each VFA in grams per mole mol_wt.txt\n\n\n🎬 Start RStudio from the Start menu\n🎬 Make an RStudio project. Be deliberate about where you create it so that it is a good place for you\n🎬 Use the Files pane to make new folders for the data. I suggest data-raw and data-processed\n🎬 Make a new script called analysis.R to carry out the rest of the work.\n🎬 Load tidyverse (Wickham et al. 2019) for importing, summarising, plotting and filtering.\n\nlibrary(tidyverse)\n\n\n🎬 Save the files to data-raw. Open them and examine them. You may want to use Excel for the csv file.\n🎬 Answer the following questions:\n\nWhat is in the rows and columns of each file?\nHow many rows and columns are there in each file?\nHow are the data organised ?\n\n🎬 Import\n\nvfa_cummul <- read_csv(\"data-raw/vfa.csv\") |> janitor::clean_names()\n\n🎬 Split treatment and replicate to separate columns so there is a treatment column:\n\nvfa_cummul <- vfa_cummul |> \n separate(col = sample_replicate, \n into = c(\"treatment\", \"replicate\"), \n sep = \"-\",\n remove = FALSE)\n\nThe provided data is cumulative/absolute. We need to calculate the change in VFA with time. There is a function, lag() that will help us do this. It will take the previous value and subtract it from the current value. We need to do that separately for each sample_replicate so we need to group by sample_replicate first. We also need to make sure the data is in the right order so we will arrange by sample_replicate and time_day.\n🎬 Create dataframe for the change in VFA\n\nvfa_delta <- vfa_cummul |> \n group_by(sample_replicate) |> \n arrange(sample_replicate, time_day) |>\n mutate(acetate = acetate - lag(acetate),\n propanoate = propanoate - lag(propanoate),\n isobutyrate = isobutyrate - lag(isobutyrate),\n butyrate = butyrate - lag(butyrate),\n isopentanoate = isopentanoate - lag(isopentanoate),\n pentanoate = pentanoate - lag(pentanoate),\n isohexanoate = isohexanoate - lag(isohexanoate),\n hexanoate = hexanoate - lag(hexanoate))\n\nNow we have two dataframes, one for the cumulative data and one for the change in VFA.\nTo make conversions from mM to g/l we need to do mM * 0.001 * MW. We will import the molecular weight data, pivot the VFA data to long format and join the molecular weight data to the VFA data. Then we can calculate the g/l. We will do this for both the cumulative and delta dataframes.\n🎬 import molecular weight data\n\nmol_wt <- read_table(\"data-raw/mol_wt.txt\") |>\n mutate(vfa = tolower(vfa))\n\n🎬 Pivot the cumulative data to long format:\nView vfa_cummul to check you understand what you have done.\n🎬 Join molecular weight to data and calculate g/l (mutate to convert to g/l * 0.001 * MW):\n\nvfa_cummul <- vfa_cummul |> \n left_join(mol_wt, by = \"vfa\") |>\n mutate(conc_g_l = conc_mM * 0.001 * mw)\n\nView vfa_cummul to check you understand what you have done.\n🎬 Add a column which is the percent representation of each VFA for mM and g/l:\n\nvfa_cummul <- vfa_cummul |> \n group_by(sample_replicate, time_day) |> \n mutate(percent_conc_g_l = conc_g_l / sum(conc_g_l) * 100,\n percent_conc_mM = conc_mM / sum(conc_mM) * 100)\n\n🎬 Pivot the change data, delta_vfa to long format:\nView vfa_delta to check it looks like vfa_cummul\n🎬 Join molecular weight to data and calculate g/l (mutate to convert to g/l * 0.001 * MW):\n\n🎬 Make summary data for graphing\n\nvfa_cummul_summary <- vfa_cummul |> \n group_by(treatment, time_day, vfa) |> \n summarise(mean_g_l = mean(conc_g_l),\n se_g_l = sd(conc_g_l)/sqrt(length(conc_g_l)),\n mean_mM = mean(conc_mM),\n se_mM = sd(conc_mM)/sqrt(length(conc_mM))) |> \n ungroup()\n\n\nvfa_delta_summary <- vfa_delta |> \n group_by(treatment, time_day, vfa) |> \n summarise(mean_g_l = mean(conc_g_l),\n se_g_l = sd(conc_g_l)/sqrt(length(conc_g_l)),\n mean_mM = mean(conc_mM),\n se_mM = sd(conc_mM)/sqrt(length(conc_mM))) |> \n ungroup()\n\n🎬 Graph the cumulative data, grams per litre:\n\nvfa_cummul_summary |> \n ggplot(aes(x = time_day, colour = vfa)) +\n geom_line(aes(y = mean_g_l), \n linewidth = 1) +\n geom_errorbar(aes(ymin = mean_g_l - se_g_l,\n ymax = mean_g_l + se_g_l),\n width = 0.5, \n show.legend = F,\n linewidth = 1) +\n scale_color_viridis_d(name = NULL) +\n scale_x_continuous(name = \"Time (days)\") +\n scale_y_continuous(name = \"Mean VFA concentration (g/l)\") +\n theme_bw() +\n facet_wrap(~treatment) +\n theme(strip.background = element_blank())\n\n\n\n\n🎬 Graph the change data, grams per litre:\n\nvfa_delta_summary |> \n ggplot(aes(x = time_day, colour = vfa)) +\n geom_line(aes(y = mean_g_l), \n linewidth = 1) +\n geom_errorbar(aes(ymin = mean_g_l - se_g_l,\n ymax = mean_g_l + se_g_l),\n width = 0.5, \n show.legend = F,\n linewidth = 1) +\n scale_color_viridis_d(name = NULL) +\n scale_x_continuous(name = \"Time (days)\") +\n scale_y_continuous(name = \"Mean change in VFA concentration (g/l)\") +\n theme_bw() +\n facet_wrap(~treatment) +\n theme(strip.background = element_blank())\n\n\n\n\n🎬 Graph the mean percent representation of each VFA g/l. Note geom_col() will plot proportion if we setposition = \"fill\"\n\nvfa_cummul_summary |> \n ggplot(aes(x = time_day, y = mean_g_l, fill = vfa)) +\n geom_col(position = \"fill\") +\n scale_fill_viridis_d(name = NULL) +\n scale_x_continuous(name = \"Time (days)\") +\n scale_y_continuous(name = \"Mean Proportion VFA\") +\n theme_bw() +\n facet_wrap(~treatment) +\n theme(strip.background = element_blank())\n\n\n\n\n\nWe have 8 genes in our dataset. PCA will allow us to plot our samples in the “VFA” space so we can see if treatments, time or replicate cluster.\nHowever, PCA expects a matrix with samples in rows and VFA, the variables, in columns. We will need to select the columns we need and pivot wider. Then convert to a matrix.\n🎬\n\nvfa_cummul_pca <- vfa_cummul |> \n select(sample_replicate, \n treatment, \n replicate, \n time_day, \n vfa, \n conc_g_l) |> \n pivot_wider(names_from = vfa, \n values_from = conc_g_l)\n\n\nmat <- vfa_cummul_pca |> \n ungroup() |>\n select(-sample_replicate, \n -treatment, \n -replicate, \n -time_day) |> \n as.matrix()\n\n🎬 Perform PCA on the matrix:\n\npca <- mat |>\n prcomp(scale. = TRUE, \n rank. = 4) \n\nThe scale. argument tells prcomp() to scale the data to have a mean of 0 and a standard deviation of 1. The rank. argument tells prcomp() to only calculate the first 4 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.\n\nsummary(pca)\n\nImportance of first k=4 (out of 8) components:\n PC1 PC2 PC3 PC4\nStandard deviation 2.4977 0.9026 0.77959 0.45567\nProportion of Variance 0.7798 0.1018 0.07597 0.02595\nCumulative Proportion 0.7798 0.8816 0.95760 0.98355\n\n\nThe Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.7798 of the variance, the second 0.1018, and the third 0.07597. Together the first three components explain nearly 96% of the total variance in the data. Plotting PC1 against PC2 will capture about 78% of the variance which is likely much better than we would get plotting any two VFA against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the samples.\n🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the sample information from vfa_cummul_pca:\n\npca_labelled <- data.frame(pca$x,\n sample_replicate = vfa_cummul_pca$sample_replicate,\n treatment = vfa_cummul_pca$treatment,\n replicate = vfa_cummul_pca$replicate,\n time_day = vfa_cummul_pca$time_day) \n\nThe dataframe should look like this:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPC1\nPC2\nPC3\nPC4\nsample_replicate\ntreatment\nreplicate\ntime_day\n\n\n\n-2.9592362\n0.6710553\n0.0068846\n-0.4453904\nCN10-1\nCN10\n1\n1\n\n\n-2.7153060\n0.7338367\n-0.2856872\n-0.2030110\nCN10-2\nCN10\n2\n1\n\n\n-2.7423102\n0.8246832\n-0.4964249\n-0.1434490\nCN10-3\nCN10\n3\n1\n\n\n-1.1909064\n-1.0360724\n1.1249513\n-0.7360599\nCN10-1\nCN10\n1\n3\n\n\n-1.3831563\n0.9572091\n-1.5561657\n0.0582755\nCN10-2\nCN10\n2\n3\n\n\n-1.1628940\n-0.0865412\n-0.6046780\n-0.1976743\nCN10-3\nCN10\n3\n3\n\n\n-0.2769661\n-0.2221055\n1.1579897\n-0.6079395\nCN10-1\nCN10\n1\n5\n\n\n0.3480962\n0.3612522\n0.5841649\n-0.0612366\nCN10-2\nCN10\n2\n5\n\n\n-0.7281116\n1.6179706\n-0.6430170\n0.0660727\nCN10-3\nCN10\n3\n5\n\n\n0.9333578\n-0.1339061\n1.0870945\n-0.4374103\nCN10-1\nCN10\n1\n9\n\n\n2.0277528\n0.6993342\n0.3850147\n0.0723540\nCN10-2\nCN10\n2\n9\n\n\n1.9931908\n0.5127260\n0.6605782\n0.1841974\nCN10-3\nCN10\n3\n9\n\n\n1.8365692\n-0.4189762\n0.7029015\n-0.3873133\nCN10-1\nCN10\n1\n11\n\n\n2.3313978\n0.3274834\n-0.0135608\n0.0264372\nCN10-2\nCN10\n2\n11\n\n\n1.5833035\n0.9263509\n-0.1909483\n0.1358320\nCN10-3\nCN10\n3\n11\n\n\n2.8498246\n0.3815854\n-0.4763500\n-0.0280281\nCN10-1\nCN10\n1\n13\n\n\n3.5652461\n-0.0836709\n-0.5948483\n-0.1612809\nCN10-2\nCN10\n2\n13\n\n\n4.1314944\n-1.2254642\n0.2699666\n-0.3152100\nCN10-3\nCN10\n3\n13\n\n\n3.7338024\n-0.6744610\n0.4344639\n-0.3736234\nCN10-1\nCN10\n1\n16\n\n\n3.6748427\n0.5202498\n-0.4333685\n-0.1607235\nCN10-2\nCN10\n2\n16\n\n\n3.9057053\n0.3599520\n-0.3049074\n0.0540037\nCN10-3\nCN10\n3\n16\n\n\n3.4561583\n-0.0996639\n0.4472090\n-0.0185889\nCN10-1\nCN10\n1\n18\n\n\n3.6354729\n0.3809673\n-0.0934957\n0.0018722\nCN10-2\nCN10\n2\n18\n\n\n2.9872250\n0.7890400\n-0.2361098\n-0.1628506\nCN10-3\nCN10\n3\n18\n\n\n3.3562231\n-0.2866224\n0.1331068\n-0.2056366\nCN10-1\nCN10\n1\n20\n\n\n3.2009943\n0.4795967\n-0.2092384\n-0.5962183\nCN10-2\nCN10\n2\n20\n\n\n3.9948127\n0.7772640\n-0.3181372\n0.1218382\nCN10-3\nCN10\n3\n20\n\n\n2.8874207\n0.4554681\n0.3106044\n-0.2220240\nCN10-1\nCN10\n1\n22\n\n\n3.6868864\n0.9681097\n-0.2174166\n-0.2246775\nCN10-2\nCN10\n2\n22\n\n\n4.8689622\n0.5218563\n-0.2906042\n0.3532981\nCN10-3\nCN10\n3\n22\n\n\n-3.8483418\n1.5205541\n-0.8809715\n-0.5306228\nNC-1\nNC\n1\n1\n\n\n-3.7653460\n1.5598499\n-1.0570798\n-0.4075397\nNC-2\nNC\n2\n1\n\n\n-3.8586309\n1.6044929\n-1.0936576\n-0.4292404\nNC-3\nNC\n3\n1\n\n\n-2.6934553\n-0.9198406\n0.7439841\n-0.9881115\nNC-1\nNC\n1\n3\n\n\n-2.5064076\n-1.0856761\n0.6334250\n-0.8999028\nNC-2\nNC\n2\n3\n\n\n-2.4097945\n-1.2731546\n1.1767665\n-0.8715948\nNC-3\nNC\n3\n3\n\n\n-3.0567309\n0.5804906\n-0.1391344\n-0.3701763\nNC-1\nNC\n1\n5\n\n\n-2.3511737\n-0.3692016\n0.7053757\n-0.3284113\nNC-2\nNC\n2\n5\n\n\n-2.6752311\n-0.0637855\n0.4692194\n-0.3841240\nNC-3\nNC\n3\n5\n\n\n-1.2335368\n-0.6717374\n0.2155285\n0.1060486\nNC-1\nNC\n1\n9\n\n\n-1.6550689\n0.1576557\n0.0687658\n0.2750388\nNC-2\nNC\n2\n9\n\n\n-0.8948103\n-0.8171884\n0.8062876\n0.5032756\nNC-3\nNC\n3\n9\n\n\n-1.2512737\n-0.4720993\n0.4071788\n0.4693106\nNC-1\nNC\n1\n11\n\n\n-1.8091407\n0.0552546\n0.0424090\n0.3918222\nNC-2\nNC\n2\n11\n\n\n-2.4225566\n0.4998948\n-0.1987773\n0.1959282\nNC-3\nNC\n3\n11\n\n\n-0.9193427\n-0.7741826\n0.0918984\n0.5089847\nNC-1\nNC\n1\n13\n\n\n-0.8800183\n-0.7850404\n0.0895146\n0.6050052\nNC-2\nNC\n2\n13\n\n\n-1.3075763\n-0.2525829\n-0.2993318\n0.5874269\nNC-3\nNC\n3\n13\n\n\n-0.9543813\n-0.3170305\n0.0885062\n0.7153071\nNC-1\nNC\n1\n16\n\n\n-0.4303679\n-0.9952374\n0.2038883\n0.8214647\nNC-2\nNC\n2\n16\n\n\n-0.9457300\n-0.7180646\n0.3081282\n0.6563748\nNC-3\nNC\n3\n16\n\n\n-1.3830063\n0.0614677\n-0.2805342\n0.5462137\nNC-1\nNC\n1\n18\n\n\n-0.7960522\n-0.5792768\n-0.0369684\n0.6621526\nNC-2\nNC\n2\n18\n\n\n-1.6822927\n0.1041656\n0.0634251\n0.4337240\nNC-3\nNC\n3\n18\n\n\n-1.3157478\n-0.0835664\n-0.1246253\n0.5599467\nNC-1\nNC\n1\n20\n\n\n-1.7425068\n0.3029227\n-0.0161466\n0.5134360\nNC-2\nNC\n2\n20\n\n\n-1.3970678\n-0.2923056\n0.4324586\n0.4765460\nNC-3\nNC\n3\n20\n\n\n-1.0777451\n-0.1232925\n0.2388682\n0.7585307\nNC-1\nNC\n1\n22\n\n\n0.4851039\n-4.1291445\n-4.0625050\n-0.4582436\nNC-2\nNC\n2\n22\n\n\n-1.0516226\n-0.7228479\n1.0641320\n0.4955951\nNC-3\nNC\n3\n22\n\n\n\n\n\n🎬 Plot PC1 against PC2 and colour by time and shape by treatment:\n\npca_labelled |> \n ggplot(aes(x = PC1, y = PC2, \n colour = factor(time_day),\n shape = treatment)) +\n geom_point(size = 3) +\n scale_colour_viridis_d(end = 0.95, begin = 0.15,\n name = \"Time\") +\n scale_shape_manual(values = c(17, 19),\n name = NULL) +\n theme_classic()\n\n\n\n\n🎬 Plot PC1 against PC2 and colour by time and facet treatment:\n\npca_labelled |> \n ggplot(aes(x = PC1, y = PC2, colour = factor(time_day))) +\n geom_point(size = 3) +\n scale_colour_viridis_d(end = 0.95, begin = 0.15,\n name = \"Time\") +\n facet_wrap(~treatment, ncol = 1) +\n theme_classic()\n\n\n\n\nreplicates are similar at the same time and treatment especially early as we might expect. PC is essentially an axis of time.\n\nWe are going to create an interactive heatmap with the heatmaply (Galili et al. 2017) package. heatmaply takes a matrix as input so we can use mat\n🎬 Set the rownames to the sample id whihcih is combination of sample_replicate and time_day:\n\nrownames(mat) <- interaction(vfa_cummul_pca$sample_replicate, \n vfa_cummul_pca$time_day)\n\nYou might want to view the matrix by clicking on it in the environment pane.\n🎬 Load the heatmaply package:\n\nlibrary(heatmaply)\n\nWe need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the vfa to be the same since it makes sense to see what clusters of genes correlate with the treatments.\n🎬 Set the number of clusters for the treatments and vfa:\n\nn_treatment_clusters <- 2\nn_vfa_clusters <- 2\n\n🎬 Create the heatmap:\n\nheatmaply(mat, \n scale = \"column\",\n k_col = n_vfa_clusters,\n k_row = n_treatment_clusters,\n fontsize_row = 7, fontsize_col = 10,\n labCol = colnames(mat),\n labRow = rownames(mat),\n heatmap_layers = theme(axis.line = element_blank()))\n\n\n\n\n\nThe heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can “Show in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function.\nOne of the NC replicates at time = 22 is very different from the other replicates. The CN10 treatments cluster together at high time points. CN10 samples are more similar to NC samples early on. Most of the VFAs behave similarly with highest values later in the experiment for CN10 but isohexanoate and hexanoate differ. The difference might be because isohexanoate is especially low in the NC replicates at time = 1 and hexanoate is especially high in the NC replicate 2 at time = 22\nPages made with R (R Core Team 2023), Quarto (Allaire et al. 2022), knitr (Xie 2022), kableExtra (Zhu 2021)"
},
{
- "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-7",
- "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-7",
- "title": "Independent Study to prepare for workshop",
- "section": "Using the usethis package",
- "text": "Using the usethis package\nOnce the RStudio project has been created, usethis helps you follow good practice.\n\n🎬 We can add a README with:\n\nusethis::use_readme_md()\n\n\n\nThis creates a file called README.md, with a little default text, in the Project directory and opens it for editing.\n\n\nmd stands for markdown, it is a extremely widely used text formatting language which is readable as plain text. If you have ever used asterisks to make text bold or italic, you have used markdown."
+ "objectID": "omics/kelly/workshop.html#overview",
+ "href": "omics/kelly/workshop.html#overview",
+ "title": "Kelly",
+ "section": "",
+ "text": "VFAs from AD vials\n\nTwo treatments: straw (CN10) and water (NC)\n10 time points: 1, 3, 5, 9, 11, 13, 16, 18, 20, 22\nthree replicates per treatment per time point\n2 x 10 x 3 = 60 groups\n8 VFA with concentration in mM (millimolar): acetate, propanoate, isobutyrate, butyrate, isopentanoate, pentanoate, isohexanoate, hexanoate\n\nTo calculate from this data\n\nRecalculate the data into grams per litre\n\nconvert to molar: 1 millimolar to molar = 0.001 molar\nmultiply by the molecular weight of each VFA\n\n\nCalculate Change in VFA g/l with time\nCalculate the percent representation of each VFA, by mM and by weight"
},
{
- "objectID": "core/week-2/study_before_workshop.html#code-formatting-and-style-1",
- "href": "core/week-2/study_before_workshop.html#code-formatting-and-style-1",
- "title": "Independent Study to prepare for workshop",
- "section": "Code formatting and style",
- "text": "Code formatting and style\n\n“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.”\n\nThe tidyverse style guide"
+ "objectID": "omics/kelly/workshop.html#data-files",
+ "href": "omics/kelly/workshop.html#data-files",
+ "title": "Kelly",
+ "section": "",
+ "text": "8 VFA in mM for 60 samples vfa.csv\nMolecular weights for each VFA in grams per mole mol_wt.txt"
},
{
- "objectID": "core/week-2/study_before_workshop.html#code-formatting-and-style-2",
- "href": "core/week-2/study_before_workshop.html#code-formatting-and-style-2",
- "title": "Independent Study to prepare for workshop",
- "section": "Code formatting and style",
- "text": "Code formatting and style\nWe have all written code which is hard to read!\nWe all improve over time.\n\n\n\nThe only way to write good code is to write tons of shitty code first. Feeling shame about bad code stops you from getting to good code— Hadley Wickham (@hadleywickham) April 17, 2015"
+ "objectID": "omics/kelly/workshop.html#set-up-a-project",
+ "href": "omics/kelly/workshop.html#set-up-a-project",
+ "title": "Kelly",
+ "section": "",
+ "text": "🎬 Start RStudio from the Start menu\n🎬 Make an RStudio project. Be deliberate about where you create it so that it is a good place for you\n🎬 Use the Files pane to make new folders for the data. I suggest data-raw and data-processed\n🎬 Make a new script called analysis.R to carry out the rest of the work.\n🎬 Load tidyverse (Wickham et al. 2019) for importing, summarising, plotting and filtering.\n\nlibrary(tidyverse)"
},
{
- "objectID": "core/week-2/study_before_workshop.html#code-formatting-and-style-3",
- "href": "core/week-2/study_before_workshop.html#code-formatting-and-style-3",
- "title": "Independent Study to prepare for workshop",
- "section": "Code formatting and style",
- "text": "Code formatting and style\nSome keys points:\n\nbe consistent, emulate experienced coders\n\nuse snake_case for variable names (not CamelCase, dot.case)\n\nuse <- not = for assignment\n\nuse spacing around most operators and after commas\n\nuse indentation\n\navoid long lines, break up code blocks with new lines\n\nuse \" for quoting text (not ') unless the text contains double quotes"
+ "objectID": "omics/kelly/workshop.html#examine-the-data",
+ "href": "omics/kelly/workshop.html#examine-the-data",
+ "title": "Kelly",
+ "section": "",
+ "text": "🎬 Save the files to data-raw. Open them and examine them. You may want to use Excel for the csv file.\n🎬 Answer the following questions:\n\nWhat is in the rows and columns of each file?\nHow many rows and columns are there in each file?\nHow are the data organised ?"
},
{
- "objectID": "core/week-2/study_before_workshop.html#ugly-code",
- "href": "core/week-2/study_before_workshop.html#ugly-code",
- "title": "Independent Study to prepare for workshop",
- "section": "😩 Ugly code 😩",
- "text": "😩 Ugly code 😩\n\ndata<-read_csv('../data-raw/Y101_Y102_Y201_Y202_Y101-5.csv',skip=2)\nlibrary(janitor);sol<-clean_names(data)\ndata=data|>filter(str_detect(description,\"OS=Homo sapiens\"))|>filter(x1pep=='x')\ndata=data|>\nmutate(g=str_extract(description,\n\"GN=[^\\\\s]+\")|>str_replace(\"GN=\",''))\ndata<-data|>mutate(id=str_extract(accession,\"1::[^;]+\")|>str_replace(\"1::\",\"\"))"
+ "objectID": "omics/kelly/workshop.html#import",
+ "href": "omics/kelly/workshop.html#import",
+ "title": "Kelly",
+ "section": "",
+ "text": "🎬 Import\n\nvfa_cummul <- read_csv(\"data-raw/vfa.csv\") |> janitor::clean_names()\n\n🎬 Split treatment and replicate to separate columns so there is a treatment column:\n\nvfa_cummul <- vfa_cummul |> \n separate(col = sample_replicate, \n into = c(\"treatment\", \"replicate\"), \n sep = \"-\",\n remove = FALSE)\n\nThe provided data is cumulative/absolute. We need to calculate the change in VFA with time. There is a function, lag() that will help us do this. It will take the previous value and subtract it from the current value. We need to do that separately for each sample_replicate so we need to group by sample_replicate first. We also need to make sure the data is in the right order so we will arrange by sample_replicate and time_day.\n🎬 Create dataframe for the change in VFA\n\nvfa_delta <- vfa_cummul |> \n group_by(sample_replicate) |> \n arrange(sample_replicate, time_day) |>\n mutate(acetate = acetate - lag(acetate),\n propanoate = propanoate - lag(propanoate),\n isobutyrate = isobutyrate - lag(isobutyrate),\n butyrate = butyrate - lag(butyrate),\n isopentanoate = isopentanoate - lag(isopentanoate),\n pentanoate = pentanoate - lag(pentanoate),\n isohexanoate = isohexanoate - lag(isohexanoate),\n hexanoate = hexanoate - lag(hexanoate))\n\nNow we have two dataframes, one for the cumulative data and one for the change in VFA.\nTo make conversions from mM to g/l we need to do mM * 0.001 * MW. We will import the molecular weight data, pivot the VFA data to long format and join the molecular weight data to the VFA data. Then we can calculate the g/l. We will do this for both the cumulative and delta dataframes.\n🎬 import molecular weight data\n\nmol_wt <- read_table(\"data-raw/mol_wt.txt\") |>\n mutate(vfa = tolower(vfa))\n\n🎬 Pivot the cumulative data to long format:\nView vfa_cummul to check you understand what you have done.\n🎬 Join molecular weight to data and calculate g/l (mutate to convert to g/l * 0.001 * MW):\n\nvfa_cummul <- vfa_cummul |> \n left_join(mol_wt, by = \"vfa\") |>\n mutate(conc_g_l = conc_mM * 0.001 * mw)\n\nView vfa_cummul to check you understand what you have done.\n🎬 Add a column which is the percent representation of each VFA for mM and g/l:\n\nvfa_cummul <- vfa_cummul |> \n group_by(sample_replicate, time_day) |> \n mutate(percent_conc_g_l = conc_g_l / sum(conc_g_l) * 100,\n percent_conc_mM = conc_mM / sum(conc_mM) * 100)\n\n🎬 Pivot the change data, delta_vfa to long format:\nView vfa_delta to check it looks like vfa_cummul\n🎬 Join molecular weight to data and calculate g/l (mutate to convert to g/l * 0.001 * MW):"
},
{
- "objectID": "core/week-2/study_before_workshop.html#ugly-code-1",
- "href": "core/week-2/study_before_workshop.html#ugly-code-1",
- "title": "Independent Study to prepare for workshop",
- "section": "😩 Ugly code 😩",
- "text": "😩 Ugly code 😩\n\nno spacing or indentation\ninconsistent splitting of code blocks over lines\ninconsistent use of quote characters\nno comments\nvariable names convey no meaning\nuse of = for assignment and inconsistently\nmultiple commands on a line\nlibrary statement in the middle of the analysis"
+ "objectID": "omics/kelly/workshop.html#graphs",
+ "href": "omics/kelly/workshop.html#graphs",
+ "title": "Kelly",
+ "section": "",
+ "text": "🎬 Make summary data for graphing\n\nvfa_cummul_summary <- vfa_cummul |> \n group_by(treatment, time_day, vfa) |> \n summarise(mean_g_l = mean(conc_g_l),\n se_g_l = sd(conc_g_l)/sqrt(length(conc_g_l)),\n mean_mM = mean(conc_mM),\n se_mM = sd(conc_mM)/sqrt(length(conc_mM))) |> \n ungroup()\n\n\nvfa_delta_summary <- vfa_delta |> \n group_by(treatment, time_day, vfa) |> \n summarise(mean_g_l = mean(conc_g_l),\n se_g_l = sd(conc_g_l)/sqrt(length(conc_g_l)),\n mean_mM = mean(conc_mM),\n se_mM = sd(conc_mM)/sqrt(length(conc_mM))) |> \n ungroup()\n\n🎬 Graph the cumulative data, grams per litre:\n\nvfa_cummul_summary |> \n ggplot(aes(x = time_day, colour = vfa)) +\n geom_line(aes(y = mean_g_l), \n linewidth = 1) +\n geom_errorbar(aes(ymin = mean_g_l - se_g_l,\n ymax = mean_g_l + se_g_l),\n width = 0.5, \n show.legend = F,\n linewidth = 1) +\n scale_color_viridis_d(name = NULL) +\n scale_x_continuous(name = \"Time (days)\") +\n scale_y_continuous(name = \"Mean VFA concentration (g/l)\") +\n theme_bw() +\n facet_wrap(~treatment) +\n theme(strip.background = element_blank())\n\n\n\n\n🎬 Graph the change data, grams per litre:\n\nvfa_delta_summary |> \n ggplot(aes(x = time_day, colour = vfa)) +\n geom_line(aes(y = mean_g_l), \n linewidth = 1) +\n geom_errorbar(aes(ymin = mean_g_l - se_g_l,\n ymax = mean_g_l + se_g_l),\n width = 0.5, \n show.legend = F,\n linewidth = 1) +\n scale_color_viridis_d(name = NULL) +\n scale_x_continuous(name = \"Time (days)\") +\n scale_y_continuous(name = \"Mean change in VFA concentration (g/l)\") +\n theme_bw() +\n facet_wrap(~treatment) +\n theme(strip.background = element_blank())\n\n\n\n\n🎬 Graph the mean percent representation of each VFA g/l. Note geom_col() will plot proportion if we setposition = \"fill\"\n\nvfa_cummul_summary |> \n ggplot(aes(x = time_day, y = mean_g_l, fill = vfa)) +\n geom_col(position = \"fill\") +\n scale_fill_viridis_d(name = NULL) +\n scale_x_continuous(name = \"Time (days)\") +\n scale_y_continuous(name = \"Mean Proportion VFA\") +\n theme_bw() +\n facet_wrap(~treatment) +\n theme(strip.background = element_blank())"
},
{
- "objectID": "core/week-2/study_before_workshop.html#cool-code",
- "href": "core/week-2/study_before_workshop.html#cool-code",
- "title": "Independent Study to prepare for workshop",
- "section": "😎 Cool code 😎",
- "text": "😎 Cool code 😎\n\n# Packages ----------------------------------------------------------------\nlibrary(tidyverse)\nlibrary(janitor)\n\n# Import ------------------------------------------------------------------\n\n# define file name\nfile <- \"../data-raw/Y101_Y102_Y201_Y202_Y101-5.csv\"\n\n# import: column headers and data are from row 3\nsolu_protein <- read_csv(file, skip = 2) |>\n janitor::clean_names()\n\n# Tidy data ----------------------------------------------------------------\n\n# filter out the bovine proteins and those proteins \n# identified from fewer than 2 peptides\nsolu_protein <- solu_protein |>\n filter(str_detect(description, \"OS=Homo sapiens\")) |>\n filter(x1pep == \"x\")\n\n# Extract the genename from description column to a column\n# of its own\nsolu_protein <- solu_protein |>\n mutate(genename = str_extract(description,\"GN=[^\\\\s]+\") |>\n str_replace(\"GN=\", \"\"))\n\n# Extract the top protein identifier from accession column (first\n# Uniprot ID after \"1::\") to a column of its own\nsolu_protein <- solu_protein |>\n mutate(protid = str_extract(accession, \"1::[^;]+\") |>\n str_replace(\"1::\", \"\"))"
- },
- {
- "objectID": "core/week-2/study_before_workshop.html#cool-code-1",
- "href": "core/week-2/study_before_workshop.html#cool-code-1",
- "title": "Independent Study to prepare for workshop",
- "section": "😎 Cool code 😎",
- "text": "😎 Cool code 😎\n\nlibrary() calls collected\nUses code sections to make it easier to navigate\nUses white space and proper indentation\nCommented\nUses more informative name for the dataframe"
- },
- {
- "objectID": "core/week-2/study_before_workshop.html#code-algorithmically-1",
- "href": "core/week-2/study_before_workshop.html#code-algorithmically-1",
- "title": "Independent Study to prepare for workshop",
- "section": "Code ‘algorithmically’",
- "text": "Code ‘algorithmically’\n\n\nWrite code which expresses the structure of the problem/solution.\nAvoid hard coding numbers if at all possible - declare variables instead\nDeclare frequently used values as variables at the start e.g., colour schemes, figure saving settings"
- },
- {
- "objectID": "core/week-2/study_before_workshop.html#hard-coding-numbers.",
- "href": "core/week-2/study_before_workshop.html#hard-coding-numbers.",
- "title": "Independent Study to prepare for workshop",
- "section": "😩 Hard coding numbers.",
- "text": "😩 Hard coding numbers.\n\n\nSuppose we want to calculate the sums of squares, \\(SS(x)\\), for the number of eggs in five nests.\nThe formula is given by: \\(\\sum (x_i- \\bar{x})^2\\)\nWe could calculate the mean and copy it, and the individual numbers into the formula"
+ "objectID": "omics/kelly/workshop.html#view-the-relationship-between-samples-using-pca",
+ "href": "omics/kelly/workshop.html#view-the-relationship-between-samples-using-pca",
+ "title": "Kelly",
+ "section": "",
+ "text": "We have 8 genes in our dataset. PCA will allow us to plot our samples in the “VFA” space so we can see if treatments, time or replicate cluster.\nHowever, PCA expects a matrix with samples in rows and VFA, the variables, in columns. We will need to select the columns we need and pivot wider. Then convert to a matrix.\n🎬\n\nvfa_cummul_pca <- vfa_cummul |> \n select(sample_replicate, \n treatment, \n replicate, \n time_day, \n vfa, \n conc_g_l) |> \n pivot_wider(names_from = vfa, \n values_from = conc_g_l)\n\n\nmat <- vfa_cummul_pca |> \n ungroup() |>\n select(-sample_replicate, \n -treatment, \n -replicate, \n -time_day) |> \n as.matrix()\n\n🎬 Perform PCA on the matrix:\n\npca <- mat |>\n prcomp(scale. = TRUE, \n rank. = 4) \n\nThe scale. argument tells prcomp() to scale the data to have a mean of 0 and a standard deviation of 1. The rank. argument tells prcomp() to only calculate the first 4 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.\n\nsummary(pca)\n\nImportance of first k=4 (out of 8) components:\n PC1 PC2 PC3 PC4\nStandard deviation 2.4977 0.9026 0.77959 0.45567\nProportion of Variance 0.7798 0.1018 0.07597 0.02595\nCumulative Proportion 0.7798 0.8816 0.95760 0.98355\n\n\nThe Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.7798 of the variance, the second 0.1018, and the third 0.07597. Together the first three components explain nearly 96% of the total variance in the data. Plotting PC1 against PC2 will capture about 78% of the variance which is likely much better than we would get plotting any two VFA against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the samples.\n🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the sample information from vfa_cummul_pca:\n\npca_labelled <- data.frame(pca$x,\n sample_replicate = vfa_cummul_pca$sample_replicate,\n treatment = vfa_cummul_pca$treatment,\n replicate = vfa_cummul_pca$replicate,\n time_day = vfa_cummul_pca$time_day) \n\nThe dataframe should look like this:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPC1\nPC2\nPC3\nPC4\nsample_replicate\ntreatment\nreplicate\ntime_day\n\n\n\n-2.9592362\n0.6710553\n0.0068846\n-0.4453904\nCN10-1\nCN10\n1\n1\n\n\n-2.7153060\n0.7338367\n-0.2856872\n-0.2030110\nCN10-2\nCN10\n2\n1\n\n\n-2.7423102\n0.8246832\n-0.4964249\n-0.1434490\nCN10-3\nCN10\n3\n1\n\n\n-1.1909064\n-1.0360724\n1.1249513\n-0.7360599\nCN10-1\nCN10\n1\n3\n\n\n-1.3831563\n0.9572091\n-1.5561657\n0.0582755\nCN10-2\nCN10\n2\n3\n\n\n-1.1628940\n-0.0865412\n-0.6046780\n-0.1976743\nCN10-3\nCN10\n3\n3\n\n\n-0.2769661\n-0.2221055\n1.1579897\n-0.6079395\nCN10-1\nCN10\n1\n5\n\n\n0.3480962\n0.3612522\n0.5841649\n-0.0612366\nCN10-2\nCN10\n2\n5\n\n\n-0.7281116\n1.6179706\n-0.6430170\n0.0660727\nCN10-3\nCN10\n3\n5\n\n\n0.9333578\n-0.1339061\n1.0870945\n-0.4374103\nCN10-1\nCN10\n1\n9\n\n\n2.0277528\n0.6993342\n0.3850147\n0.0723540\nCN10-2\nCN10\n2\n9\n\n\n1.9931908\n0.5127260\n0.6605782\n0.1841974\nCN10-3\nCN10\n3\n9\n\n\n1.8365692\n-0.4189762\n0.7029015\n-0.3873133\nCN10-1\nCN10\n1\n11\n\n\n2.3313978\n0.3274834\n-0.0135608\n0.0264372\nCN10-2\nCN10\n2\n11\n\n\n1.5833035\n0.9263509\n-0.1909483\n0.1358320\nCN10-3\nCN10\n3\n11\n\n\n2.8498246\n0.3815854\n-0.4763500\n-0.0280281\nCN10-1\nCN10\n1\n13\n\n\n3.5652461\n-0.0836709\n-0.5948483\n-0.1612809\nCN10-2\nCN10\n2\n13\n\n\n4.1314944\n-1.2254642\n0.2699666\n-0.3152100\nCN10-3\nCN10\n3\n13\n\n\n3.7338024\n-0.6744610\n0.4344639\n-0.3736234\nCN10-1\nCN10\n1\n16\n\n\n3.6748427\n0.5202498\n-0.4333685\n-0.1607235\nCN10-2\nCN10\n2\n16\n\n\n3.9057053\n0.3599520\n-0.3049074\n0.0540037\nCN10-3\nCN10\n3\n16\n\n\n3.4561583\n-0.0996639\n0.4472090\n-0.0185889\nCN10-1\nCN10\n1\n18\n\n\n3.6354729\n0.3809673\n-0.0934957\n0.0018722\nCN10-2\nCN10\n2\n18\n\n\n2.9872250\n0.7890400\n-0.2361098\n-0.1628506\nCN10-3\nCN10\n3\n18\n\n\n3.3562231\n-0.2866224\n0.1331068\n-0.2056366\nCN10-1\nCN10\n1\n20\n\n\n3.2009943\n0.4795967\n-0.2092384\n-0.5962183\nCN10-2\nCN10\n2\n20\n\n\n3.9948127\n0.7772640\n-0.3181372\n0.1218382\nCN10-3\nCN10\n3\n20\n\n\n2.8874207\n0.4554681\n0.3106044\n-0.2220240\nCN10-1\nCN10\n1\n22\n\n\n3.6868864\n0.9681097\n-0.2174166\n-0.2246775\nCN10-2\nCN10\n2\n22\n\n\n4.8689622\n0.5218563\n-0.2906042\n0.3532981\nCN10-3\nCN10\n3\n22\n\n\n-3.8483418\n1.5205541\n-0.8809715\n-0.5306228\nNC-1\nNC\n1\n1\n\n\n-3.7653460\n1.5598499\n-1.0570798\n-0.4075397\nNC-2\nNC\n2\n1\n\n\n-3.8586309\n1.6044929\n-1.0936576\n-0.4292404\nNC-3\nNC\n3\n1\n\n\n-2.6934553\n-0.9198406\n0.7439841\n-0.9881115\nNC-1\nNC\n1\n3\n\n\n-2.5064076\n-1.0856761\n0.6334250\n-0.8999028\nNC-2\nNC\n2\n3\n\n\n-2.4097945\n-1.2731546\n1.1767665\n-0.8715948\nNC-3\nNC\n3\n3\n\n\n-3.0567309\n0.5804906\n-0.1391344\n-0.3701763\nNC-1\nNC\n1\n5\n\n\n-2.3511737\n-0.3692016\n0.7053757\n-0.3284113\nNC-2\nNC\n2\n5\n\n\n-2.6752311\n-0.0637855\n0.4692194\n-0.3841240\nNC-3\nNC\n3\n5\n\n\n-1.2335368\n-0.6717374\n0.2155285\n0.1060486\nNC-1\nNC\n1\n9\n\n\n-1.6550689\n0.1576557\n0.0687658\n0.2750388\nNC-2\nNC\n2\n9\n\n\n-0.8948103\n-0.8171884\n0.8062876\n0.5032756\nNC-3\nNC\n3\n9\n\n\n-1.2512737\n-0.4720993\n0.4071788\n0.4693106\nNC-1\nNC\n1\n11\n\n\n-1.8091407\n0.0552546\n0.0424090\n0.3918222\nNC-2\nNC\n2\n11\n\n\n-2.4225566\n0.4998948\n-0.1987773\n0.1959282\nNC-3\nNC\n3\n11\n\n\n-0.9193427\n-0.7741826\n0.0918984\n0.5089847\nNC-1\nNC\n1\n13\n\n\n-0.8800183\n-0.7850404\n0.0895146\n0.6050052\nNC-2\nNC\n2\n13\n\n\n-1.3075763\n-0.2525829\n-0.2993318\n0.5874269\nNC-3\nNC\n3\n13\n\n\n-0.9543813\n-0.3170305\n0.0885062\n0.7153071\nNC-1\nNC\n1\n16\n\n\n-0.4303679\n-0.9952374\n0.2038883\n0.8214647\nNC-2\nNC\n2\n16\n\n\n-0.9457300\n-0.7180646\n0.3081282\n0.6563748\nNC-3\nNC\n3\n16\n\n\n-1.3830063\n0.0614677\n-0.2805342\n0.5462137\nNC-1\nNC\n1\n18\n\n\n-0.7960522\n-0.5792768\n-0.0369684\n0.6621526\nNC-2\nNC\n2\n18\n\n\n-1.6822927\n0.1041656\n0.0634251\n0.4337240\nNC-3\nNC\n3\n18\n\n\n-1.3157478\n-0.0835664\n-0.1246253\n0.5599467\nNC-1\nNC\n1\n20\n\n\n-1.7425068\n0.3029227\n-0.0161466\n0.5134360\nNC-2\nNC\n2\n20\n\n\n-1.3970678\n-0.2923056\n0.4324586\n0.4765460\nNC-3\nNC\n3\n20\n\n\n-1.0777451\n-0.1232925\n0.2388682\n0.7585307\nNC-1\nNC\n1\n22\n\n\n0.4851039\n-4.1291445\n-4.0625050\n-0.4582436\nNC-2\nNC\n2\n22\n\n\n-1.0516226\n-0.7228479\n1.0641320\n0.4955951\nNC-3\nNC\n3\n22\n\n\n\n\n\n🎬 Plot PC1 against PC2 and colour by time and shape by treatment:\n\npca_labelled |> \n ggplot(aes(x = PC1, y = PC2, \n colour = factor(time_day),\n shape = treatment)) +\n geom_point(size = 3) +\n scale_colour_viridis_d(end = 0.95, begin = 0.15,\n name = \"Time\") +\n scale_shape_manual(values = c(17, 19),\n name = NULL) +\n theme_classic()\n\n\n\n\n🎬 Plot PC1 against PC2 and colour by time and facet treatment:\n\npca_labelled |> \n ggplot(aes(x = PC1, y = PC2, colour = factor(time_day))) +\n geom_point(size = 3) +\n scale_colour_viridis_d(end = 0.95, begin = 0.15,\n name = \"Time\") +\n facet_wrap(~treatment, ncol = 1) +\n theme_classic()\n\n\n\n\nreplicates are similar at the same time and treatment especially early as we might expect. PC is essentially an axis of time."
},
{
- "objectID": "core/week-2/study_before_workshop.html#hard-coding-numbers.-1",
- "href": "core/week-2/study_before_workshop.html#hard-coding-numbers.-1",
- "title": "Independent Study to prepare for workshop",
- "section": "😩 Hard coding numbers.",
- "text": "😩 Hard coding numbers.\n\n# mean number of eggs per nest\nsum(3, 5, 6, 7, 8) / 5\n\n[1] 5.8\n\n# ss(x) of number of eggs\n(3 - 5.8)^2 + (5 - 5.8)^2 + (6 - 5.8)^2 + (7 - 5.8)^2 + (8 - 5.8)^2\n\n[1] 14.8\n\n\nI am coding the calculation of the mean rather using the mean() function only to explain what ‘coding algorithmically’ means using a simple example."
+ "objectID": "omics/kelly/workshop.html#visualise-the-vfa-concentration-using-a-heatmap",
+ "href": "omics/kelly/workshop.html#visualise-the-vfa-concentration-using-a-heatmap",
+ "title": "Kelly",
+ "section": "",
+ "text": "We are going to create an interactive heatmap with the heatmaply (Galili et al. 2017) package. heatmaply takes a matrix as input so we can use mat\n🎬 Set the rownames to the sample id whihcih is combination of sample_replicate and time_day:\n\nrownames(mat) <- interaction(vfa_cummul_pca$sample_replicate, \n vfa_cummul_pca$time_day)\n\nYou might want to view the matrix by clicking on it in the environment pane.\n🎬 Load the heatmaply package:\n\nlibrary(heatmaply)\n\nWe need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the vfa to be the same since it makes sense to see what clusters of genes correlate with the treatments.\n🎬 Set the number of clusters for the treatments and vfa:\n\nn_treatment_clusters <- 2\nn_vfa_clusters <- 2\n\n🎬 Create the heatmap:\n\nheatmaply(mat, \n scale = \"column\",\n k_col = n_vfa_clusters,\n k_row = n_treatment_clusters,\n fontsize_row = 7, fontsize_col = 10,\n labCol = colnames(mat),\n labRow = rownames(mat),\n heatmap_layers = theme(axis.line = element_blank()))\n\n\n\n\n\nThe heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can “Show in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function.\nOne of the NC replicates at time = 22 is very different from the other replicates. The CN10 treatments cluster together at high time points. CN10 samples are more similar to NC samples early on. Most of the VFAs behave similarly with highest values later in the experiment for CN10 but isohexanoate and hexanoate differ. The difference might be because isohexanoate is especially low in the NC replicates at time = 1 and hexanoate is especially high in the NC replicate 2 at time = 22\nPages made with R (R Core Team 2023), Quarto (Allaire et al. 2022), knitr (Xie 2022), kableExtra (Zhu 2021)"
},
{
- "objectID": "core/week-2/study_before_workshop.html#hard-coding-numbers",
- "href": "core/week-2/study_before_workshop.html#hard-coding-numbers",
- "title": "Independent Study to prepare for workshop",
- "section": "😩 Hard coding numbers",
- "text": "😩 Hard coding numbers\n\n\nif any of the sample numbers must be altered, all the code needs changing\nit is hard to tell that the output of the first line is a mean\nits hard to recognise that the numbers in the mean calculation correspond to those in the next calculation\nit is hard to tell that 5 is just the number of nests\nno way of know if numbers are the same by coincidence or they refer to the same thing"
+ "objectID": "omics/week-5/workshop.html",
+ "href": "omics/week-5/workshop.html",
+ "title": "Workshop",
+ "section": "",
+ "text": "In the workshop, you will learn how to merge gene information into our results, conduct and plot a Principle Component Analysis (PCA) as well as how to create a nicely formatted Volcano plot and heatmap."
},
{
- "objectID": "core/week-2/study_before_workshop.html#better",
- "href": "core/week-2/study_before_workshop.html#better",
- "title": "Independent Study to prepare for workshop",
- "section": "😎 Better",
- "text": "😎 Better\n\n# eggs each nest\neggs <- c(3, 5, 6, 7, 8)\n\n# mean eggs per nest\nmean_eggs <- sum(eggs) / length(eggs)\n\n# ss(x) of number of eggs\nsum((eggs - mean_eggs)^2)\n\n[1] 14.8"
+ "objectID": "omics/week-5/workshop.html#session-overview",
+ "href": "omics/week-5/workshop.html#session-overview",
+ "title": "Workshop",
+ "section": "",
+ "text": "In the workshop, you will learn how to merge gene information into our results, conduct and plot a Principle Component Analysis (PCA) as well as how to create a nicely formatted Volcano plot and heatmap."
},
{
- "objectID": "core/week-2/study_before_workshop.html#better-1",
- "href": "core/week-2/study_before_workshop.html#better-1",
- "title": "Independent Study to prepare for workshop",
- "section": "😎 Better",
- "text": "😎 Better\n\n\nthe commenting is similar but it is easier to follow\nif any of the sample numbers must be altered, only that number needs changing\nassigning a value you will later use to a variable with a meaningful name allows us to understand the first and second calculations\nmakes use of R’s elementwise calculation which resembles the formula (i.e., is expressed as the general rule)"
+ "objectID": "omics/week-5/workshop.html#import",
+ "href": "omics/week-5/workshop.html#import",
+ "title": "Workshop",
+ "section": "Import",
+ "text": "Import\nWe need to import both the normalised counts and the statistical results. We will need all of these for the visualisation and interpretation.\n🎬 Import files saved from last week from the results folder: S30_normalised_counts.csv and S30_results.csv. I used the names s30_count_norm and s30_results for the dataframes.\n🎬 Remind yourself what is in the rows and columns and the structure of the dataframes (perhaps using glimpse())\n\n\n\n\n\n\n\n\n\n\n\n\n\nIt is useful to have this information in a single dataframe to which we will add the gene information from xenbase. Having all the information together will make it easier to interpret the results and select genes of interest.\n🎬 Merge the two dataframes:\n\n# merge the results with the normalised counts\ns30_results <- s30_count_norm |>\n left_join(s30_results, by = \"xenbase_gene_id\")\n\nThis means you have the counts for each sample along with the statistical results for each gene."
},
{
- "objectID": "core/week-2/study_before_workshop.html#summary",
- "href": "core/week-2/study_before_workshop.html#summary",
- "title": "Independent Study to prepare for workshop",
- "section": "Summary",
- "text": "Summary\n\n\nUse an RStudio project for any R work (you can also incorporate other languages)\nWrite Cool code not Ugly code: space, consistency, indentation, comments, meaningful variable names\nWrite code which expresses the structure of the problem/solution.\nAvoid hard coding numbers if at all possible - declare variables instead"
+ "objectID": "omics/week-5/workshop.html#add-gene-information-from-xenbase",
+ "href": "omics/week-5/workshop.html#add-gene-information-from-xenbase",
+ "title": "Workshop",
+ "section": "Add gene information from Xenbase",
+ "text": "Add gene information from Xenbase\n\nI got the information from the Xenbase information pages under Data Reports | Gene Information\nThis is listed: Xenbase Gene Product Information [readme] gzipped gpi (tab separated)\nClick on the readme link to see the file format and columns\nI downloaded xenbase.gpi.gz, unzipped it, removed header lines and the Xenopus tropicalis (taxon:8364) entries and saved it as xenbase_info.xlsx\n\nIf you want to emulate what I did you can use the following commands in the terminal after downloading the file:\ngunzip xenbase.gpi.gz\nless xenbase.gpi\nq\ngunzip unzips the file and less allows you to view the file. q quits the viewer. You will see the header lines and that the file contains both Xenopus tropicalis and Xenopus laevis. I read the file in with read_tsv (skipping the first header lines) then filtered out the Xenopus tropicalis entries, dropped some columns and saved the file as an excel file.\nHowever, I have already done this for you and saved the file as xenbase_info.xlsx in the meta folder. We will import this file and join it to the results dataframe.\n🎬 Load the readxl (Wickham and Bryan 2023) package:\n\nlibrary(readxl)\n\n🎬 Import the Xenbase gene information file:\n\ngene_info <- read_excel(\"meta/xenbase_info.xlsx\") \n\nYou should view the resulting dataframe to see what information is available. You can use glimpse() or View().\n🎬 Merge the gene information with the results:\n\n# join the gene info with the results\ns30_results <- s30_results |>\n left_join(gene_info, by = \"xenbase_gene_id\")\n\nWe will also find it useful to import the metadata that maps the sample names to treatments. This will allow us to label the samples in the visualisations.\n🎬 Import the metadata that maps the sample names to treatments:\n\n# Import metadata that maps the sample names to treatments\nmeta <- read_table(\"meta/frog_meta_data.txt\")\nrow.names(meta) <- meta$sample_id\n# We only need the s30\nmeta_s30 <- meta |>\n dplyr::filter(stage == \"stage_30\")"
},
{
- "objectID": "core/week-2/study_before_workshop.html#references",
- "href": "core/week-2/study_before_workshop.html#references",
- "title": "Independent Study to prepare for workshop",
- "section": "References",
- "text": "References\n\n\n🔗 About Core 2: File types, workflow tips and other tools\n\n\n\nBryan, Jennifer. 2018. “Excuse Me, Do You Have a Moment to Talk about Version Control?” Am. Stat. 72 (1): 20–27. https://doi.org/10.1080/00031305.2017.1399928.\n\n\nBryan, Jennifer, Jim Hester, Shannon Pileggi, and E. David Aja. n.d. What They Forgot to Teach You about r. https://rstats.wtf/.\n\n\nSandve, Geir Kjetil, Anton Nekrutenko, James Taylor, and Eivind Hovig. 2013. “Ten Simple Rules for Reproducible Computational Research.” PLoS Comput. Biol. 9 (10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285.\n\n\nWilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. “Good Enough Practices in Scientific Computing.” PLoS Comput. Biol. 13 (6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510."
+ "objectID": "omics/week-5/workshop.html#log2-transform-the-data",
+ "href": "omics/week-5/workshop.html#log2-transform-the-data",
+ "title": "Workshop",
+ "section": "log2 transform the data",
+ "text": "log2 transform the data\nWe use the normalised counts for data visualisations so that the comparisons are meaningful. Since the fold changes are given is log2 it is useful to log2 transform the normalised counts too. We will add columns to the dataframe with these transformed values. Since we have some counts of 0 we will add a tiny amount to avoid -Inf values.\n🎬 log2 transform the normalised counts:\n\n# log2 transform the counts plus a tiny amount to avoid log(0)\ns30_results <- s30_results |>\n mutate(across(starts_with(\"s30\"), \n \\(x) log2(x + 0.001),\n .names = \"log2_{.col}\"))\n\nThis is a wonderful bit or R wizardry. We are using the across() function to apply a transformation to multiple columns. We have selected all the columns that start with s30. The \\(x) is an “anonymous” function that takes the value of the column and adds 0.001 to it before applying the log2() function. The .names = \"log2_{.col}\" argument tells across() to name the new columns with the prefix log2_ followed by the original column name. You can read more about across() and anonymous functions from my posit::conf(2023) workshop\nI recommend viewing the dataframe to see the new columns.\nWe now have dataframe with all the information we need: normalised counts, log2 normalised counts, statistical comparisons with fold changes and p values, and information about the gene other than just the id."
},
{
- "objectID": "core/week-1/workshop.html",
- "href": "core/week-1/workshop.html",
+ "objectID": "omics/week-5/workshop.html#write-the-significant-genes-to-file",
+ "href": "omics/week-5/workshop.html#write-the-significant-genes-to-file",
"title": "Workshop",
- "section": "",
- "text": "In this workshop we will discuss why reproducibility matters and how to organise your work to make it reproducible. We will cover:"
+ "section": "Write the significant genes to file",
+ "text": "Write the significant genes to file\nWe will create dataframe of the significant genes and write them to file. These are the files you want to examine in more detail along with the visualisations to select your genes of interest.\n🎬 Create a dataframe of the genes significant at the 0.01 level:\n\ns30_results_sig0.01 <- s30_results |> \n filter(padj <= 0.01)\n\n🎬 Write the dataframe to file\n🎬 Create a dataframe of the genes significant at the 0.05 level and write to file:\n❓How many genes are significant at the 0.01 and 0.05 levels?"
},
{
- "objectID": "core/week-1/workshop.html#session-overview",
- "href": "core/week-1/workshop.html#session-overview",
+ "objectID": "omics/week-5/workshop.html#view-the-relationship-between-samples-using-pca",
+ "href": "omics/week-5/workshop.html#view-the-relationship-between-samples-using-pca",
"title": "Workshop",
- "section": "",
- "text": "In this workshop we will discuss why reproducibility matters and how to organise your work to make it reproducible. We will cover:"
+ "section": "View the relationship between samples using PCA",
+ "text": "View the relationship between samples using PCA\nWe have 10,136 genes in our dataset. PCA will allow us to plot our samples in the “gene expression” space so we can see if FGF-treated sample cluster together and control samples cluster together as we would expect. We do this on the log2 transformed normalised counts.\nOur data have genes in rows and samples in columns which is a common organisation for gene expression data. However, PCA expects samples in rows and genes, the variables, in columns. We can transpose the data to get it in the correct format.\n🎬 Transpose the log2 transformed normalised counts:\n\ns30_log2_trans <- s30_results |> \n select(starts_with(\"log2_\")) |>\n t() |> \n data.frame()\n\nWe have used the select() function to select all the columns that start with log2_. We then use the t() function to transpose the dataframe. We then convert the resulting matrix to a dataframe using data.frame(). If you view that dataframe you’ll see it has default column name which we can fix using colnames() to set the column names to the Xenbase gene ids.\n🎬 Set the column names to the Xenbase gene ids:\n\ncolnames(s30_log2_trans) <- s30_results$xenbase_gene_id\n\n🎬 Perform PCA on the log2 transformed normalised counts:\n\npca <- s30_log2_trans |>\n prcomp(rank. = 4) \n\nThe rank. argument tells prcomp() to only calculate the first 4 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.\n\nsummary(pca)\n\nImportance of first k=4 (out of 6) components:\n PC1 PC2 PC3 PC4\nStandard deviation 64.0124 47.3351 38.4706 31.4111\nProportion of Variance 0.4243 0.2320 0.1532 0.1022\nCumulative Proportion 0.4243 0.6562 0.8095 0.9116\n\n\nThe Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.4243 of the variance, the second 0.2320, and the third 0.1532. Together the first three components explain nearly 81% of the total variance in the data. Plotting PC1 against PC2 will capture about 66% of the variance which is likely much better than we would get plotting any two genes against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the samples.\n🎬 Remove log2 from the row names:\n\nsample_id <- row.names(s30_log2_trans) |> str_remove(\"log2_\")\n\n🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the sample ids:\n\npca_labelled <- data.frame(pca$x,\n sample_id)\n\n🎬 Merge with the metadata so we can label points by treatment and sibling pair:\n\npca_labelled <- pca_labelled |> \n left_join(meta_s30, \n by = \"sample_id\")\n\nSince the metadata contained the sample ids, it was especially important to remove the log2_ from the row names so that the join would work. The dataframe should look like this:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPC1\nPC2\nPC3\nPC4\nsample_id\nstage\ntreatment\nsibling_rep\n\n\n\n-76.38391\n0.814699\n-60.728327\n-5.820669\nS30_C_5\nstage_30\ncontrol\nfive\n\n\n-67.02571\n25.668563\n51.476835\n28.480254\nS30_C_6\nstage_30\ncontrol\nsix\n\n\n-14.02772\n-78.474054\n15.282058\n-9.213076\nS30_C_A\nstage_30\ncontrol\nA\n\n\n47.60726\n49.035510\n-19.288753\n20.928290\nS30_F_5\nstage_30\nFGF\nfive\n\n\n26.04954\n32.914201\n20.206072\n-55.752818\nS30_F_6\nstage_30\nFGF\nsix\n\n\n83.78054\n-29.958919\n-6.947884\n21.378020\nS30_F_A\nstage_30\nFGF\nA\n\n\n\n\n\n🎬 Plot PC1 against PC2 and colour by sibling pair and shape by treatment:\n\npca <- pca_labelled |> \n ggplot(aes(x = PC1, y = PC2, \n colour = sibling_rep,\n shape = treatment)) +\n geom_point(size = 3) +\n scale_colour_viridis_d(end = 0.95, begin = 0.15,\n name = \"Sibling pair\",\n labels = c(\"A\", \".5\", \".6\")) +\n scale_shape_manual(values = c(21, 19),\n name = NULL,\n labels = c(\"Control\", \"FGF-Treated\")) +\n theme_classic()\npca\n\n\n\n\nThere is a good separation between treatments on PCA1. The sibling pairs do not seem to cluster together.\n🎬 Save the plot to file:\n\nggsave(\"figures/frog-s30-pca.png\",\n plot = pca,\n height = 3, \n width = 4,\n units = \"in\",\n device = \"png\")"
},
{
- "objectID": "core/week-1/workshop.html#what-is-reproducibility",
- "href": "core/week-1/workshop.html#what-is-reproducibility",
+ "objectID": "omics/week-5/workshop.html#visualise-the-expression-of-the-most-significant-genes-using-a-heatmap",
+ "href": "omics/week-5/workshop.html#visualise-the-expression-of-the-most-significant-genes-using-a-heatmap",
"title": "Workshop",
- "section": "What is reproducibility?",
- "text": "What is reproducibility?\n\nReproducible: Same data + same analysis = identical results. “… obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis. This definition is synonymous with”computational reproducibility” (National Academies of Sciences et al. 2019)\nReplicable: Different data + same analysis = qualitatively similar results. The work is not dependent on the specificities of the data.\nRobust: Same data + different analysis = qualitatively similar or identical results. The work is not dependent on the specificities of the analysis.\nGeneralisable: Different data + different analysis = qualitatively similar results and same conclusions. The findings can be generalised\n\n\n\n\nThe Turing Way's definitions of reproducible research"
+ "section": "Visualise the expression of the most significant genes using a heatmap",
+ "text": "Visualise the expression of the most significant genes using a heatmap\nA heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level.\nWe are going to create an interactive heatmap with the heatmaply (Galili et al. 2017) package. heatmaply takes a matrix as input so we need to convert a dataframe of the log2 values to a matrix. We will also set the rownames to the Xenbase gene symbols.\n🎬 Convert a dataframe of the log2 values to a matrix:\n\nmat <- s30_results_sig0.01 |> \n select(starts_with(\"log2_\")) |>\n as.matrix()\n\n🎬 Set the rownames to the Xenbase gene symbols:\n\nrownames(mat) <- s30_results_sig0.01$xenbase_gene_symbol\n\nYou might want to view the matrix by clicking on it in the environment pane.\n🎬 Load the heatmaply package:\n\nlibrary(heatmaply)\n\nWe need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the treatments.\n🎬 Set the number of clusters for the treatments and genes:\n\nn_treatment_clusters <- 2\nn_gene_clusters <- 2\n\n🎬 Create the heatmap:\n\nheatmaply(mat, \n scale = \"row\",\n k_col = n_treatment_clusters,\n k_row = n_gene_clusters,\n fontsize_row = 7, fontsize_col = 10,\n labCol = str_remove(colnames(mat), pattern = \"log2_\"),\n labRow = rownames(mat),\n heatmap_layers = theme(axis.line = element_blank()))\n\n\n\n\n\nOn the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are samples. We can see that the FGF-treated samples cluster together and the control samples cluster together. We can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples (the pink cluster) and the other shows genes down regulated (more blue, the blue cluster) in the FGF-treated samples.\nThe heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can “Show in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function."
},
{
- "objectID": "core/week-1/workshop.html#why-does-it-matter",
- "href": "core/week-1/workshop.html#why-does-it-matter",
+ "objectID": "omics/week-5/workshop.html#visualise-all-the-results-with-a-volcano-plot",
+ "href": "omics/week-5/workshop.html#visualise-all-the-results-with-a-volcano-plot",
"title": "Workshop",
- "section": "Why does it matter?",
- "text": "Why does it matter?\n\n\n\nfutureself, CC-BY-NC, by Julen Colomb\n\n\n\nFive selfish reasons to work reproducibly (Markowetz 2015). Alternatively, see the very entertaining talk\nMany high profile cases of work which did not reproduce e.g. Anil Potti unravelled by Baggerly and Coombes (2009)\nWill become standard in Science and publishing e.g OECD Global Science Forum Building digital workforce capacity and skills for data-intensive science (OECD Global Science Forum 2020)"
+ "section": "Visualise all the results with a volcano plot",
+ "text": "Visualise all the results with a volcano plot\ncolour the points if padj < 0.05 and log2FoldChange > 1\n\nlibrary(ggrepel)\n\n\ns30_results <- s30_results |> \n mutate(log10_padj = -log10(padj),\n sig = padj < 0.05,\n bigfc = abs(log2FoldChange) >= 2) \n\n\nvol <- s30_results |> \n ggplot(aes(x = log2FoldChange, \n y = log10_padj, \n colour = interaction(sig, bigfc))) +\n geom_point() +\n geom_hline(yintercept = -log10(0.05), \n linetype = \"dashed\") +\n geom_vline(xintercept = 2, \n linetype = \"dashed\") +\n geom_vline(xintercept = -2, \n linetype = \"dashed\") +\n scale_x_continuous(expand = c(0, 0)) +\n scale_y_continuous(expand = c(0, 0)) +\n scale_colour_manual(values = c(\"gray\", \n \"pink\",\n \"gray30\",\n \"deeppink\")) +\n geom_text_repel(data = subset(s30_results, \n bigfc & sig),\n aes(label = xenbase_gene_symbol),\n size = 3,\n max.overlaps = 50) +\n theme_classic() +\n theme(legend.position = \"none\")\nvol\n\n\n\n\n\nggsave(\"figures/frog-s30-volcano.png\",\n plot = vol,\n height = 4.5, \n width = 4.5,\n units = \"in\",\n device = \"png\")"
},
{
- "objectID": "core/week-1/workshop.html#how-to-achieve-reproducibility",
- "href": "core/week-1/workshop.html#how-to-achieve-reproducibility",
+ "objectID": "omics/week-5/workshop.html#import-1",
+ "href": "omics/week-5/workshop.html#import-1",
"title": "Workshop",
- "section": "How to achieve reproducibility",
- "text": "How to achieve reproducibility\n\nScripting\nOrganisation: Project-oriented workflows with file and folder structure, naming things\nDocumentation: Readme files, code comments, metadata, version control"
+ "section": "Import",
+ "text": "Import\nWe need to import both the normalised counts and the statistical results. We will need all of these for the visualisation and interpretation.\n🎬 Import the normalised counts for the Prog and HSPC cell types. I used the names prog and hspc for the dataframes.\n🎬 Combine the two dataframes (minus one set of gene ids) into one dataframe called prog_hspc:\n\n# combine into one dataframe dropping one of the gene id columns\nprog_hspc <- bind_cols(prog, hspc[-1])\n\n🎬 Import the statistical results in results/prog_hspc_results.csv. I used the name prog_hspc_results for the dataframe.\n🎬 Remind yourself what is in the rows and columns and the structure of the dataframe (perhaps using glimpse())\n\n\n\n\n\n\nIt is useful to have this information in a single dataframe to which we will add the gene information from Ensembl Having all the information together will make it easier to interpret the results and select genes of interest.\n🎬 Merge the two dataframes:\n\n# merge stats results with normalise values\nprog_hspc_results <- prog_hspc_results |> \n left_join(prog_hspc, by = \"ensembl_gene_id\")\n\nThis means you have the counts for each sample along with the statistical results for each gene."
},
{
- "objectID": "core/week-1/workshop.html#rationale-for-scripting",
- "href": "core/week-1/workshop.html#rationale-for-scripting",
+ "objectID": "omics/week-5/workshop.html#add-gene-information-from-ensembl-using-biomart",
+ "href": "omics/week-5/workshop.html#add-gene-information-from-ensembl-using-biomart",
"title": "Workshop",
- "section": "Rationale for scripting?",
- "text": "Rationale for scripting?\n\nScience is the generation of ideas, designing work to test them and reporting the results.\nWe ensure laboratory and field work is replicable, robust and generalisable by planning and recording in lab books and using standard protocols. Repeating results is still hard.\nWorkflows for computational projects, and the data analysis and reporting of other work can, and should, be 100% reproducible!\nScripting is the way to achieve this."
+ "section": "Add gene information from Ensembl using biomaRt",
+ "text": "Add gene information from Ensembl using biomaRt\nEnsembl (Martin et al. 2023; Birney et al. 2004)is a bioinformatics project to organise all the biological information around the sequences of large genomes. The are a large number of databases but BioMart (Smedley et al. 2009) provides a consistent interface to the material. There are web-based tools to use these but the R package biomaRt (Durinck et al. 2009) gives you programmatic access making it easier to integrate information into R dataframes\n🎬 Load the biomaRt (Durinck et al. 2009) package:\n\nlibrary(biomaRt)\n\n🎬 Connect to the mouse database and see the first 20 bits of information we can retrieve:\n\n# Connect to the mouse database\nensembl <- useMart(biomart = \"ensembl\", \n dataset = \"mmusculus_gene_ensembl\")\n\n# See what information we can retrieve\nlistAttributes(mart = ensembl) |> head(20)\n\n name description\n1 ensembl_gene_id Gene stable ID\n2 ensembl_gene_id_version Gene stable ID version\n3 ensembl_transcript_id Transcript stable ID\n4 ensembl_transcript_id_version Transcript stable ID version\n5 ensembl_peptide_id Protein stable ID\n6 ensembl_peptide_id_version Protein stable ID version\n7 ensembl_exon_id Exon stable ID\n8 description Gene description\n9 chromosome_name Chromosome/scaffold name\n10 start_position Gene start (bp)\n11 end_position Gene end (bp)\n12 strand Strand\n13 band Karyotype band\n14 transcript_start Transcript start (bp)\n15 transcript_end Transcript end (bp)\n16 transcription_start_site Transcription start site (TSS)\n17 transcript_length Transcript length (including UTRs and CDS)\n18 transcript_tsl Transcript support level (TSL)\n19 transcript_gencode_basic GENCODE basic annotation\n20 transcript_appris APPRIS annotation\n page\n1 feature_page\n2 feature_page\n3 feature_page\n4 feature_page\n5 feature_page\n6 feature_page\n7 feature_page\n8 feature_page\n9 feature_page\n10 feature_page\n11 feature_page\n12 feature_page\n13 feature_page\n14 feature_page\n15 feature_page\n16 feature_page\n17 feature_page\n18 feature_page\n19 feature_page\n20 feature_page\n\n\nThere are many (2,985!) possible bits of information (attributes) that can be obtained. You can replace head(20) with View() to see them all.\nWe use the getBM() function to retrieve information from the database. The filters argument is used to specified what kind of identifier we are supplying to retrieve information. The attributes argument is used to select the information we want to retrieve. The values argument is used to specify the identifiers. The mart argument is used to specify the connection we created.\n🎬 Get the gene information:\n\ngene_info <- getBM(filters = \"ensembl_gene_id\",\n attributes = c(\"ensembl_gene_id\",\n \"external_gene_name\",\n \"description\"),\n values = prog_hspc_results$ensembl_gene_id,\n mart = ensembl)\n\nWe are getting the gene name and and a description. We also need to get the id because we will use that to merge the gene_info dataframe with the prog_hspc_results dataframe. Notice the dataframe returned only has 279 rows - one of the ids does not have information.\n🎬 We can find which is missing with:\n\nprog_hspc_results |> select(ensembl_gene_id) |> \n filter(!ensembl_gene_id %in% gene_info$ensembl_gene_id)\n\nError:\n! [conflicted] select found in 2 packages.\nEither pick the one you want with `::`:\n• biomaRt::select\n• plotly::select\nOr declare a preference with `conflicts_prefer()`:\n• `conflicts_prefer(biomaRt::select)`\n• `conflicts_prefer(plotly::select)`\n\n\nOh, conflicted has flagged a conflict for us.\n🎬 Take the appropriate action to resolve the conflict:\n❓ What is the id which is missing information?\n\n\nWe might want to look that up - but let’s worry about it later if it turns out to be something important.\n🎬 Merge the gene information with the results:\n\nprog_hspc_results <- prog_hspc_results |> \n left_join(gene_info, by = \"ensembl_gene_id\")\n\nI recommend viewing the dataframe to see the new columns. We now have dataframe with all the info we need, normalised counts, log2 normalised counts, statistical comparisons with fold changes and p values, information about the gene other than just the id"
},
{
- "objectID": "core/week-1/workshop.html#project-oriented-workflow",
- "href": "core/week-1/workshop.html#project-oriented-workflow",
+ "objectID": "omics/week-5/workshop.html#write-the-significant-genes-to-file-1",
+ "href": "omics/week-5/workshop.html#write-the-significant-genes-to-file-1",
"title": "Workshop",
- "section": "Project-oriented workflow",
- "text": "Project-oriented workflow\n\nuse folders to organise your work\nyou are aiming for structured, systematic and repeatable.\ninputs and outputs should be clearly identifiable from structure and/or naming\n\nExamples\n-- liver_transcriptome/\n |__data\n |__raw/\n |__processed/\n |__images/\n |__code/\n |__reports/\n |__figures/"
+ "section": "Write the significant genes to file",
+ "text": "Write the significant genes to file\nWe will create dateframe of the signifcant genes and write them to file. These are the files you want to examine in more detail along with the visualisations to select your genes of interest.\n🎬 Create a dataframe of the genes significant at the 0.01 level:\n\nprog_hspc_results_sig0.01 <- prog_hspc_results |> \n filter(FDR <= 0.01)\n\n🎬 Write the dataframe to file\n🎬 Create a dataframe of the genes significant at the 0.05 level and write to file:\n❓How many genes are significant at the 0.01 and 0.05 levels?"
},
{
- "objectID": "core/week-1/workshop.html#naming-things",
- "href": "core/week-1/workshop.html#naming-things",
+ "objectID": "omics/week-5/workshop.html#view-the-relationship-between-cells-using-pca",
+ "href": "omics/week-5/workshop.html#view-the-relationship-between-cells-using-pca",
"title": "Workshop",
- "section": "Naming things",
- "text": "Naming things\n\n\n\ndocuments, CC-BY-NC, https://xkcd.com/1459/\n\n\nGuiding principle - Have a convention! Good file names are:\n\nmachine readable\nhuman readable\nplay nicely with sorting\n\nI suggest\n\nno spaces in names\nuse snake_case or kebab-case rather than CamelCase or dot.case\nuse all lower case except very occasionally where convention is otherwise, e.g., README, LICENSE\nordering: use left-padded numbers e.g., 01, 02….99 or 001, 002….999\ndates ISO 8601 format: 2020-10-16\nwrite down your conventions\n\n-- liver_transcriptome/\n |__data\n |__raw/\n |__2022-03-21_donor_1.csv\n |__2022-03-21_donor_2.csv\n |__2022-03-21_donor_3.csv\n |__2022-05-14_donor_1.csv\n |__2022-05-14_donor_2.csv\n |__2022-05-14_donor_3.csv\n |__processed/\n |__images/\n |__code/\n |__functions/\n |__summarise.R\n |__normalise.R\n |__theme_volcano.R\n |__01_data_processing.py\n |__02_exploratory.R\n |__03_modelling.R\n |__04_figures.R\n |__reports/\n |__01_report.qmd\n |__02_supplementary.qmd\n |__figures/\n |__01_volcano_donor_1_vs_donor_2.eps\n |__02_volcano_donor_1_vs_donor_3.eps"
+ "section": "View the relationship between cells using PCA",
+ "text": "View the relationship between cells using PCA\nWe have 280 genes in our dataset. PCA will allow us to plot our cells in the “gene expression” space so we can see if Prog cells cluster together and HSPC cells cluster together as we would expect. We do this on the log2 transformed normalised counts.\nOur data have genes in rows and samples in columns which is a common organisation for gene expression data. However, PCA expects cells in rows and genes, the variables, in columns. We can transpose the data to get it in the correct format.\n🎬 Transpose the log2 transformed normalised counts:\n\nprog_hspc_trans <- prog_hspc_results |> \n dplyr::select(starts_with(c(\"Prog_\", \"HSPC_\"))) |>\n t() |> \n data.frame()\n\nWe have used the select() function to select all the columns that start with Prog_ or HSPC_. We then use the t() function to transpose the dataframe. We then convert the resulting matrix to a dataframe using data.frame(). If you view that dataframe you’ll see it has default column name which we can fix using colnames() to set the column names to the gene ids.\n🎬 Set the column names to the gene ids:\n\ncolnames(prog_hspc_trans) <- prog_hspc_results$ensembl_gene_id\n\nperform PCA using standard functions\n\npca <- prog_hspc_trans |>\n prcomp(rank. = 15) \n\nThe rank. argument tells prcomp() to only calculate the first 15 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.\n\nsummary(pca)\n\nImportance of first k=15 (out of 280) components:\n PC1 PC2 PC3 PC4 PC5 PC6 PC7\nStandard deviation 12.5612 8.36646 5.98988 5.41386 4.55730 4.06142 3.84444\nProportion of Variance 0.1099 0.04874 0.02498 0.02041 0.01446 0.01149 0.01029\nCumulative Proportion 0.1099 0.15861 0.18359 0.20400 0.21846 0.22995 0.24024\n PC8 PC9 PC10 PC11 PC12 PC13 PC14\nStandard deviation 3.70848 3.66899 3.5549 3.48508 3.44964 3.42393 3.37882\nProportion of Variance 0.00958 0.00937 0.0088 0.00846 0.00829 0.00816 0.00795\nCumulative Proportion 0.24982 0.25919 0.2680 0.27645 0.28473 0.29290 0.30085\n PC15\nStandard deviation 3.33622\nProportion of Variance 0.00775\nCumulative Proportion 0.30860\n\n\nThe Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.1099 of the variance, the second 0.04874, and the third 0.2498. Together the first three components explain 18% of the total variance in the data. Plotting PC1 against PC2 will capture about 16% of the variance. This is not that high but it likely better than we would get plotting any two genes against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the cells.\n🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the cell ids:\n\npca_labelled <- data.frame(pca$x,\n cell_id = row.names(prog_hspc_trans))\n\nIt will be helpful to add a column for the cell type so we can label points. One way to do this is to extract the information in the cell_id column into two columns.\n🎬 Extract the cell type and cell number from the cell_id column (keeping the cell_id column):\n\npca_labelled <- pca_labelled |> \n extract(cell_id, \n remove = FALSE,\n c(\"cell_type\", \"cell_number\"),\n \"([a-zA-Z]{4})_([0-9]{3})\")\n\n\"([a-zA-Z]{4})_([0-9]{3})\" is a regular expression - or regex. [a-zA-Z] means any lower or upper case letter, {4} means 4 of them, and [0-9] means any number, {3} means 3 of them. The brackets around the two parts of the regex mean we want to extract those parts. The first part goes into cell_type and the second part goes into cell_number. The _ between the two patterns matches the underscore and the fact it isn’t in a bracket means we don’t want to keep it.\nWe can now plot the PC1 and PC2 scores.\n🎬 Plot PC1 against PC2 and colour the points by cell type:\n\npca <- pca_labelled |> \n ggplot(aes(x = PC1, y = PC2, \n colour = cell_type)) +\n geom_point(alpha = 0.4) +\n scale_colour_viridis_d(end = 0.8, begin = 0.15,\n name = \"Cell type\") +\n theme_classic()\npca\n\n\n\n\nFairly good separation of cell types but plenty of overlap\n🎬 Save the plot to file:\n\nggsave(\"figures/prog_hspc-pca.png\",\n plot = pca,\n height = 3, \n width = 4,\n units = \"in\",\n device = \"png\")"
},
{
- "objectID": "core/week-1/workshop.html#readme-files",
- "href": "core/week-1/workshop.html#readme-files",
+ "objectID": "omics/week-5/workshop.html#visualise-the-expression-of-the-most-significant-genes-using-a-heatmap-1",
+ "href": "omics/week-5/workshop.html#visualise-the-expression-of-the-most-significant-genes-using-a-heatmap-1",
"title": "Workshop",
- "section": "Readme files",
- "text": "Readme files\nREADMEs are a form of documentation which have been widely used for a long time. They contain all the information about the other files in a directory. They can be extensive but need not be. Concise is good. Bullet points are good\n\nGive a project title and description, brief\nstart date, last updated date and contact information\nOutline the folder structure\nGive software requirements: programs and versions used or required. There are packages that give session information in R Wickham et al. (2021) and Python Ostblom, Joel (2019)\n\nR:\nsessioninfo::session_info()\nPython:\nimport session_info\nsession_info.show()\n\nInstructions run the code, build reports, and reproduce the figures etc\nWhere to find the data, outputs\nAny other information that needed to understand and recreate the work\nIdeally, a summary of changes with the date\n\n-- liver_transcriptome/\n |__data\n |__raw/\n |__2022-03-21_donor_1.csv\n |__2022-03-21_donor_2.csv\n |__2022-03-21_donor_3.csv\n |__2022-05-14_donor_1.csv\n |__2022-05-14_donor_2.csv\n |__2022-05-14_donor_3.csv\n |__processed/\n |__images/\n |__code/\n |__functions/\n |__summarise.R\n |__normalise.R\n |__theme_volcano.R\n |__01_data_processing.py\n |__02_exploratory.R\n |__03_modelling.R\n |__04_figures.R\n |__README.md\n |__reports/\n |__01_report.qmd\n |__02_supplementary.qmd\n |__figures/\n |__01_volcano_donor_1_vs_donor_2.eps\n |__02_volcano_donor_1_vs_donor_3.eps"
+ "section": "Visualise the expression of the most significant genes using a heatmap",
+ "text": "Visualise the expression of the most significant genes using a heatmap\nA heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level.\nWe are going to create an interactive heatmap with the heatmaply (Galili et al. 2017) package. heatmaply takes a matrix as input so we need to convert a dataframe of the log2 values to a matrix. We will also set the rownames to the gene names.\n🎬 Convert a dataframe of the log2 values to a matrix. I have used sample() to select 70 random columns so the heatmap is generated quickly:\n\nmat <- prog_hspc_results_sig0.01 |> \n dplyr::select(starts_with(c(\"Prog\", \"HSPC\"))) |>\n dplyr::select(sample(1:1499, size = 70)) |>\n as.matrix()\n\n🎬 Set the row names to the gene names:\n\nrownames(mat) <- prog_hspc_results_sig0.01$external_gene_name\n\nYou might want to view the matrix by clicking on it in the environment pane.\n🎬 Load the heatmaply package:\n\nlibrary(heatmaply)\n\nWe need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the cell types to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the cell types.\n\nn_cell_clusters <- 2\nn_gene_clusters <- 2\n\n🎬 Create the heatmap:\n\nheatmaply(mat, \n scale = \"row\",\n k_col = n_cell_clusters,\n k_row = n_gene_clusters,\n fontsize_row = 7, fontsize_col = 10,\n labCol = colnames(mat),\n labRow = rownames(mat),\n heatmap_layers = theme(axis.line = element_blank()))\n\n\n\n\n\nIt will take a minute to run and display. On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are cells. We can see that cells of the same type don’t cluster that well together. We can also see two clusters of genes but the pattern of gene is not as clear as it was for the frogs and the correspondence with the cell clusters is not as strong.\nThe heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can “Show in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function.\nUsing all the cells is worth doing but it will take a while to generate the heatmap and then show in the viewer so do it sometime when you’re ready for a coffee break."
},
{
- "objectID": "core/week-1/workshop.html#code-comments",
- "href": "core/week-1/workshop.html#code-comments",
+ "objectID": "omics/week-5/workshop.html#visualise-all-the-results-with-a-volcano-plot-1",
+ "href": "omics/week-5/workshop.html#visualise-all-the-results-with-a-volcano-plot-1",
"title": "Workshop",
- "section": "Code comments",
- "text": "Code comments\n\nComments are notes in the code which are not executed. They are ignored by the computer but are read by humans. They are used to explain what the code is doing and why. They are also used to temporarily remove code from execution."
+ "section": "Visualise all the results with a volcano plot",
+ "text": "Visualise all the results with a volcano plot\ncolour the points if FDR < 0.05 and prog_hspc_results > 1\n\nlibrary(ggrepel)\n\n\nprog_hspc_results <- prog_hspc_results |> \n mutate(log10_FDR = -log10(FDR),\n sig = FDR < 0.05,\n bigfc = abs(summary.logFC) >= 2) \n\n\nvol <- prog_hspc_results |> \n ggplot(aes(x = summary.logFC, \n y = log10_FDR, \n colour = interaction(sig, bigfc))) +\n geom_point() +\n geom_hline(yintercept = -log10(0.05), \n linetype = \"dashed\") +\n geom_vline(xintercept = 2, \n linetype = \"dashed\") +\n geom_vline(xintercept = -2, \n linetype = \"dashed\") +\n scale_x_continuous(expand = c(0, 0)) +\n scale_y_continuous(expand = c(0, 0)) +\n scale_colour_manual(values = c(\"gray\",\n \"pink\",\n \"deeppink\")) +\n geom_text_repel(data = subset(prog_hspc_results, \n bigfc & sig),\n aes(label = external_gene_name),\n size = 3,\n max.overlaps = 50) +\n theme_classic() +\n theme(legend.position = \"none\")\nvol\n\n\n\n\n\nggsave(\"figures/prog-hspc-volcano.png\",\n plot = vol,\n height = 4.5, \n width = 4.5,\n units = \"in\",\n device = \"png\")"
},
{
- "objectID": "core/week-1/study_before_workshop.html",
- "href": "core/week-1/study_before_workshop.html",
- "title": "Independent Study to prepare for workshop",
+ "objectID": "omics/week-5/overview.html",
+ "href": "omics/week-5/overview.html",
+ "title": "Overview",
"section": "",
- "text": "📖 Read Understanding file systems. This is an approximately 15 - 20 minute read revising file types and filesystems. It covers concepts of working directories and paths. We learned these ideas in stage 1 and you may feel completely confident with them but many students will benefit from a refresher. For BIO00070M students, this is part of the work you will also be asked to complete for BIO00052M Data Analysis in R."
+ "text": "This week we cover how to visualise and interpret the results of your differential expression analysis. The independent study will allow you to check you have what you should have following the Omics 2: Statistical Analysis workshop and Consolidation study. It will also summarise the the methods and plots we will go through in the workshop. In the workshop, we will learn how to merge gene information into our results, conduct a Principle Component Analysis (PCA) and plot the results as well as how to create a nicely formatted Volcano plot and heatmap.\nWe suggest you sit together with your group in the workshop.\n\nLearning objectives\nThe successful student will be able to:\n\nverify they have the required RStudio Project set up and the data and code files from the previous Workshop and Consolidation study\nexplain where gene information came from and add it to their results\nperform a PCA and understand how to interpret them\ncreate a heatmap and understand how to interpret them\ncreate a volcano plot and understand how to interpret them\n\n\n\nInstructions\n\nPrepare\n\n📖 Read what you should have so far and about concepts in PCA, volcano plots and heatmaps.\n\nWorkshop\n\n💻 Add gene information to the results of DE\n💻 Perform and plot a PCA\n💻 Visualise results with a heatmap\n💻 Visualise all the results with a volcano plot\nLook after future you!\n\nConsolidate\n\n💻 Use the work you completed in the workshop as a template to apply to a new case.\n\n\n\n\nReferences"
},
{
- "objectID": "core/week-11/workshop.html",
- "href": "core/week-11/workshop.html",
+ "objectID": "omics/week-3/workshop.html",
+ "href": "omics/week-3/workshop.html",
"title": "Workshop",
"section": "",
- "text": "Literate programming is a way of writing code and text together in a single document\nThe document is then processed to produce a report\nQuarto (recommended) or R Markdown\n\nIn this workshop we will go through an example quarto document. You will learn:\n\nwhat the YAML header is\nformatting (bold, italics, headings)\nto control default and individual chunk options\nhow to add citations\nfigures and tables with cross referencing and automatic numbering\nhow to use inline coding to report results\nhow to insert special characters and equations"
+ "text": "In this workshop you will learn what steps to take to get a good understanding of your ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control.\nYou should examine all three data sets because the comparisons will give you a stronger understanding of your own project data."
},
{
- "objectID": "core/week-11/workshop.html#literate-programming",
- "href": "core/week-11/workshop.html#literate-programming",
+ "objectID": "omics/week-3/workshop.html#session-overview",
+ "href": "omics/week-3/workshop.html#session-overview",
"title": "Workshop",
"section": "",
- "text": "Literate programming is a way of writing code and text together in a single document\nThe document is then processed to produce a report\nQuarto (recommended) or R Markdown"
+ "text": "In this workshop you will learn what steps to take to get a good understanding of your ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control.\nYou should examine all three data sets because the comparisons will give you a stronger understanding of your own project data."
},
{
- "objectID": "core/week-11/workshop.html#session-overview",
- "href": "core/week-11/workshop.html#session-overview",
+ "objectID": "omics/week-3/workshop.html#set-up-a-project",
+ "href": "omics/week-3/workshop.html#set-up-a-project",
"title": "Workshop",
- "section": "",
- "text": "In this workshop we will go through an example quarto document. You will learn:\n\nwhat the YAML header is\nformatting (bold, italics, headings)\nto control default and individual chunk options\nhow to add citations\nfigures and tables with cross referencing and automatic numbering\nhow to use inline coding to report results\nhow to insert special characters and equations"
+ "section": "Set up a Project",
+ "text": "Set up a Project\n🎬 Start RStudio from the Start menu\n🎬 Make an RStudio project. Be deliberate about where you create it so that it is a good place for you\n🎬 Use the Files pane to make new folders for the data. I suggest data-raw and data-processed\n🎬 Make a new script called workshop-1.R to carry out the rest of the work.\n🎬 Record what you do and what you find out. All of it!\n🎬 Load tidyverse (Wickham et al. 2019) for importing, summarising, plotting and filtering.\n\nlibrary(tidyverse)"
},
{
- "objectID": "core/week-11/study_before_workshop.html#module-assessment",
- "href": "core/week-11/study_before_workshop.html#module-assessment",
- "title": "Independent Study to prepare for workshop",
- "section": "Module assessment",
- "text": "Module assessment\nThis module is assessed by:\n\nOral presentation 30%\nProject Report and Research Compendium 70% of which\n\n50% report\n20% compendium\n\n\nThese slides are a guide to Research compendium."
+ "objectID": "omics/week-3/workshop.html#examine-the-data-in-a-spreadsheet",
+ "href": "omics/week-3/workshop.html#examine-the-data-in-a-spreadsheet",
+ "title": "Workshop",
+ "section": "Examine the data in a spreadsheet",
+ "text": "Examine the data in a spreadsheet\nThese are the three datasets. Each set compromises several files.\n🐸 Frog development data:\n\nxlaevis_counts_S14.csv\nxlaevis_counts_S20.csv\nxlaevis_counts_S30.csv\n\n🐭 Stem cell data:\n\nsurfaceome_hspc.csv\nsurfaceome_prog.csv\nsurfaceome_lthsc.csv\n\n🍂 xxxx data:\n\nxxx\nxxx\n\n🎬 Save the files to data-raw and open them in Excel\n🎬 Answer the following questions:\n\nDescribe how the sets of data are similar and how they are different.\nWhat is in the rows and columns of each file?\nHow many rows and columns are there in each file? Are these the same? In all cases or some cases? Why?\nGoogle an id. Where does your search take you? How much information is available?\n\n🎬 Did you record all that??"
},
{
- "objectID": "core/week-11/study_before_workshop.html#what-is-a-research-compendium",
- "href": "core/week-11/study_before_workshop.html#what-is-a-research-compendium",
- "title": "Independent Study to prepare for workshop",
- "section": "What is a Research Compendium?",
- "text": "What is a Research Compendium?\nOverview of assessment\n\nStage 3 Integrated Masters students are expected to submit a Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual.\nStudents will be assessed on the technical complexity, completeness and organisation of their compendium and the completeness, reproducibility and clarity of their documentation at the project and the code/process level. Marking will focus on the reproducibility of the results and the clarity of the decision making processes rather than the interpretation of the results which is covered in the report. There is no word or size limit for any part of the compendium but its contents should be concise and minimal. Extraneous text, code or files will be penalised."
+ "objectID": "omics/week-3/workshop.html#import",
+ "href": "omics/week-3/workshop.html#import",
+ "title": "Workshop",
+ "section": "Import",
+ "text": "Import\nNow let’s get the data into R and visualise it.\n🎬 Import xlaevis_counts_S30.csv, surfaceome_hspc.csv and xxxxxxxx\n\n# 🐸 import the s30 data\ns30 <- read_csv(\"data-raw/xlaevis_counts_S30.csv\")\n\n\n# 🐭 import the hspc data\nhspc <- read_csv(\"data-raw/surfaceome_hspc.csv\")\n\n\n# 🍂 xxxx import the xxxx data\n# prog <- read_csv(\"\")\n\n🎬 Check these have the number of rows and column you were expecting and that column types and names are as expected."
},
{
- "objectID": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-1",
- "href": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-1",
- "title": "Independent Study to prepare for workshop",
- "section": "What is a Research Compendium?",
- "text": "What is a Research Compendium?\nOverview of assessment\n\nStage 3 Integrated Masters students are expected to submit a Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual.\nStudents will be assessed on the technical complexity, completeness and organisation of their compendium and the completeness, reproducibility and clarity of their documentation at the project and the code/process level. Marking will focus on the reproducibility of the results and the clarity of the decision making processes rather than the interpretation of the results which is covered in the report. There is no word or size limit for any part of the compendium but its contents should be concise and minimal. Extraneous text, code or files will be penalised."
+ "objectID": "omics/week-3/workshop.html#explore",
+ "href": "omics/week-3/workshop.html#explore",
+ "title": "Workshop",
+ "section": "Explore",
+ "text": "Explore\nThe first task is to get an overview. We want to know\n\nare there any missing values? If so, how many and how are they distributed?\nhow may zeros are there and how are they distributed\ndoes it look as tough all the samples/cells were equally “successful”? Can we spot any problematic anomalies?\nwhat is the distribution of values?\n\nIf our data collection has gone well we would hope to see approximately the same average expression in each sample or cell of the same type. That is replicates should be similar. We would also expect to see that the average expression of genes varies. We might have genes which are zero in every cell/sample. We will want to to filter those out.\nWe get this overview by looking at:\n\nThe distribution of values across the whole dataset\nThe distribution of values across the sample/cells (i.e., averaged across genes). This allows us to see variation between samples/cells:\nThe distribution of values across the genes (i.e., averaged across samples/cells). This allows us to see variation between genes.\n\nDistribution of values across the whole dataset\nIn all data sets, the values are spread over multiple columns so in order to plot the distribution as a whole, we will need to first use pivot_longer() to put the data in ‘tidy’ format (Wickham 2014) by stacking the columns. We could save a copy of the stacked data and then plot it, but here, I have just piped the stacked data straight into ggplot().\n🐸 Frogs\n🎬 Pivot the counts (stack the columns) so all the counts are in a single column (count) and pipe into ggplot() to create a histogram:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n ggplot(aes(x = count)) +\n geom_histogram()\n\n\n\n\nThis data is very skewed - there are so many low values that we can’t see the tiny bars for the higher values. Logging the counts is a way to make the distribution more visible.\n🎬 Repeat the plot on log of the counts.\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n ggplot(aes(x = log10(count))) +\n geom_histogram()\n\n\n\n\nI’ve used base 10 only because it easy to convert to the original scale (1 is 10, 2 is 100, 3 is 1000 etc). The warning about rows being removed is expected - these are the counts of 0 since you can’t log a value of 0. The peak at zero suggests quite a few counts of 1. We would expect we would expect the distribution of counts to be roughly log normal because this is expression of all the genes in the genome1. That small peak near the low end suggests that these lower counts might be anomalies.\nThe excess number of low counts indicates we might want to create a cut off for quality control. The removal of low counts is a common processing step in ’omic data. We will revisit this after we have considered the distribution of counts across samples and genes.\n🐭 Mice\n🎬 Pivot the expression values (stack the columns) so all the counts are in a single column (expr) and pipe into ggplot() to create a histogram:\n\nhspc |>\n pivot_longer(cols = -ensembl_gene_id,\n names_to = \"cell\",\n values_to = \"expr\") |> \n ggplot(aes(x = expr)) +\n geom_histogram()\n\n\n\n\nThis is a very striking distribution. Is it what we are expecting? Again,the excess number of low values is almost certainly anomalous. They will be inaccurate measure and we will want to exclude expression values below (about) 1. We will revisit this after we have considered the distribution of expression across cells and genes.\nWhat about the bimodal appearance of the the ‘real’ values? If we had the whole genome we would not expect to see such a pattern - we’d expect to see a roughly normal distribution2. However, this is a subset of the genome and the nature of the subsetting has had an influence here. These are a subset of cell surface proteins that show a significant difference between at least two of twelve cell subtypes. That is, all of these genes are either high or low.\nDistribution of values across the sample/cells\n🐸 Frog samples\nSummary statistics including the the number of NAs can be seen using the summary(). It is most helpful which you have up to about 30 columns. There is nothing special about the number 30, it is just that text summaries of a larger number of columns are difficult to grasp.\n🎬 Get a quick overview of the columns:\n\n# examine all the columns quickly\n# works well with smaller numbers of column\nsummary(s30)\n\n xenbase_gene_id S30_C_5 S30_C_6 S30_C_A \n Length:11893 Min. : 0.0 Min. : 0.0 Min. : 0.0 \n Class :character 1st Qu.: 14.0 1st Qu.: 14.0 1st Qu.: 23.0 \n Mode :character Median : 70.0 Median : 75.0 Median : 107.0 \n Mean : 317.1 Mean : 335.8 Mean : 426.3 \n 3rd Qu.: 205.0 3rd Qu.: 220.0 3rd Qu.: 301.0 \n Max. :101746.0 Max. :118708.0 Max. :117945.0 \n S30_F_5 S30_F_6 S30_F_A \n Min. : 0.0 Min. : 0.0 Min. : 0.0 \n 1st Qu.: 19.0 1st Qu.: 17.0 1st Qu.: 16.0 \n Median : 88.0 Median : 84.0 Median : 69.0 \n Mean : 376.2 Mean : 376.5 Mean : 260.4 \n 3rd Qu.: 251.0 3rd Qu.: 246.0 3rd Qu.: 187.0 \n Max. :117573.0 Max. :130672.0 Max. :61531.0 \n\n\nNotice that: - the minimum count is 0 and the maximums are very high in all the columns - the medians are quite a lot lower than the means so the data are skewed (hump to the left, tail to the right) - there must be quite a lot of zeros - the columns are roughly similar and it doesn’t look like there is an anomalous replicate.\nTo find out how may zeros there are in a column we can make use of the fact that TRUE evaluates to 1 and FALSE evaluates to 0. This means sum(S30_C_5 == 0) gives the number of 0 in the S30_C_5 column\n🎬 Find the number of zeros in all six columns:\n\ns30 |>\n summarise(sum(S30_C_5 == 0),\n sum(S30_C_6 == 0),\n sum(S30_C_A == 0),\n sum(S30_F_5 == 0),\n sum(S30_F_6 == 0),\n sum(S30_F_A == 0))\n\n# A tibble: 1 × 6\n `sum(S30_C_5 == 0)` `sum(S30_C_6 == 0)` `sum(S30_C_A == 0)`\n <int> <int> <int>\n1 1340 1361 998\n# ℹ 3 more variables: `sum(S30_F_5 == 0)` <int>, `sum(S30_F_6 == 0)` <int>,\n# `sum(S30_F_A == 0)` <int>\n\n\nThere is a better way of doing this that saves you having to repeat so much code - especially useful if you have a lot more than 6 columns. We can use pivot_longer() to put the data in tidy format and then use the group_by() and summarise() approach we have used extensively before.\n🎬 Find the number of zeros in all columns:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n group_by(sample) |>\n summarise(n_zero = sum(count == 0))\n\n# A tibble: 6 × 2\n sample n_zero\n <chr> <int>\n1 S30_C_5 1340\n2 S30_C_6 1361\n3 S30_C_A 998\n4 S30_F_5 1210\n5 S30_F_6 1199\n6 S30_F_A 963\n\n\nYou could expand to get all the summary information\n🎬 Summarise all the samples:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n group_by(sample) |>\n summarise(min = min(count),\n lowerq = quantile(count, 0.25),\n mean = mean(count),\n median = median(count),\n upperq = quantile(count, 0.75),\n max = max(count),\n n_zero = sum(count == 0))\n\n# A tibble: 6 × 8\n sample min lowerq mean median upperq max n_zero\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>\n1 S30_C_5 0 14 317. 70 205 101746 1340\n2 S30_C_6 0 14 336. 75 220 118708 1361\n3 S30_C_A 0 23 426. 107 301 117945 998\n4 S30_F_5 0 19 376. 88 251 117573 1210\n5 S30_F_6 0 17 376. 84 246 130672 1199\n6 S30_F_A 0 16 260. 69 187 61531 963\n\n\nThe mean count ranges from 260 to 426.\nOne advantage this has over using summary() is that the output is a dataframe. For results, this is useful, and makes it easier to:\n\nwrite to file\nuse in ggplot()\n\nformat in a Quarto report\n\n🎬 Save the summary as a dataframe, s30_summary_samp.\nWe can write to file using write_csv()\n🎬 Write s30_summary_samp to a file called “s30_summary_samp.csv”:\n\nwrite_csv(s30_summary_samp, \n file = \"data-processed/s30_summary_samp.csv\")\n\nPlotting the distribution of values is perhaps the easiest way to understand the data. We could plot each column separately or we can pipe the tidy format of data into ggplot() and make use of facet_wrap()\n🎬 Pivot the data and pipe into ggplot:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n ggplot(aes(count)) +\n geom_density() +\n facet_wrap(. ~ sample, nrow = 3)\n\n\n\n\nWe have many values (11893) so we are not limited to using geom_histogram(). geom_density() gives us a smooth distribution.\nWe have many low values and a few very high ones which makes it tricky to see the distributions. Logging the counts will make these clearer.\n🎬 Repeat the graph but taking the base 10 log of the counts:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n ggplot(aes(log10(count))) +\n geom_density() +\n facet_wrap(. ~ sample, nrow = 3)\n\n\n\n\nThe key information to take from these plots is:\n\nthe distributions are roughly similar in width, height, location and overall shape so it doesn’t look as though we have any suspect samples\nthe peak at zero suggests quite a few counts of 1.\nsince we would expect the distribution of counts in each sample to be roughly log normal so that the small rise near the low end, even before the peak at zero, suggests that these lower counts might be anomalies.\n\nThe excess number of low counts indicates we might want to create a cut off for quality control. The removal of low counts is a common processing step in ’omic data. We will revisit this after we have considered the distribution of counts across genes (averaged over the samples).\n🐭 Mouse cells\nWe used the summary() function to get an overview of the columns in the frog data. Let’s try that here.\n🎬 Get a quick overview of the columns:\n\nsummary(hspc)\n\n ensembl_gene_id HSPC_001 HSPC_002 HSPC_003 \n Length:280 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Mode :character Median : 0.000 Median : 0.000 Median : 0.9929 \n Mean : 2.143 Mean : 1.673 Mean : 2.5964 \n 3rd Qu.: 2.120 3rd Qu.: 2.239 3rd Qu.: 6.1559 \n Max. :12.567 Max. :11.976 Max. :11.1138 \n HSPC_004 HSPC_006 HSPC_008 HSPC_009 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 \n Median : 0.000 Median : 1.276 Median : 0.000 Median :0.000 \n Mean : 1.851 Mean : 2.338 Mean : 2.375 Mean :2.220 \n 3rd Qu.: 2.466 3rd Qu.: 3.536 3rd Qu.: 3.851 3rd Qu.:3.594 \n Max. :11.133 Max. :10.014 Max. :11.574 Max. :9.997 \n HSPC_011 HSPC_012 HSPC_014 HSPC_015 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.750 Median : 0.000 Median : 0.000 \n Mean : 2.285 Mean : 2.431 Mean : 2.295 Mean : 2.515 \n 3rd Qu.: 3.193 3rd Qu.: 3.741 3rd Qu.: 3.150 3rd Qu.: 3.789 \n Max. :11.260 Max. :10.905 Max. :11.051 Max. :10.751 \n HSPC_016 HSPC_017 HSPC_018 HSPC_020 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.9488 Median : 0.000 Median : 1.248 Median : 0.000 \n Mean : 2.6115 Mean : 2.146 Mean : 2.710 Mean : 2.509 \n 3rd Qu.: 5.9412 3rd Qu.: 2.357 3rd Qu.: 6.006 3rd Qu.: 4.470 \n Max. :11.3082 Max. :12.058 Max. :11.894 Max. :11.281 \n HSPC_021 HSPC_022 HSPC_023 HSPC_024 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.170 Mean : 2.287 Mean : 2.314 Mean : 2.195 \n 3rd Qu.: 2.996 3rd Qu.: 3.351 3rd Qu.: 2.749 3rd Qu.: 2.944 \n Max. :10.709 Max. :11.814 Max. :12.113 Max. :11.279 \n HSPC_025 HSPC_026 HSPC_027 HSPC_028 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.572 Median : 1.385 Median : 0.000 Median : 0.000 \n Mean : 2.710 Mean : 2.721 Mean : 2.458 Mean : 1.906 \n 3rd Qu.: 5.735 3rd Qu.: 6.392 3rd Qu.: 5.496 3rd Qu.: 2.037 \n Max. :11.309 Max. :10.865 Max. :11.266 Max. :10.777 \n HSPC_030 HSPC_031 HSPC_033 HSPC_034 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.119 Median : 0.9026 Median : 0.000 Median : 0.7984 \n Mean : 2.338 Mean : 2.3049 Mean : 1.938 Mean : 2.3220 \n 3rd Qu.: 3.005 3rd Qu.: 2.9919 3rd Qu.: 2.434 3rd Qu.: 4.8324 \n Max. :11.391 Max. :11.1748 Max. :10.808 Max. :10.6707 \n HSPC_035 HSPC_036 HSPC_037 HSPC_038 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8879 Median : 1.517 Median : 0.000 \n Mean : 1.810 Mean : 2.6918 Mean : 2.327 Mean : 2.212 \n 3rd Qu.: 2.175 3rd Qu.: 5.9822 3rd Qu.: 3.079 3rd Qu.: 2.867 \n Max. :11.221 Max. :11.3018 Max. :11.399 Max. :12.275 \n HSPC_040 HSPC_041 HSPC_042 HSPC_043 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.8673 Median : 1.342 \n Mean : 2.509 Mean : 2.492 Mean : 2.3673 Mean : 2.420 \n 3rd Qu.: 3.995 3rd Qu.: 3.943 3rd Qu.: 3.8371 3rd Qu.: 3.731 \n Max. :11.863 Max. :11.016 Max. :11.4852 Max. :11.123 \n HSPC_044 HSPC_045 HSPC_046 HSPC_047 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.8452 Median : 2.195 \n Mean : 2.382 Mean : 2.277 Mean : 1.9707 Mean : 2.498 \n 3rd Qu.: 3.998 3rd Qu.: 2.843 3rd Qu.: 2.0656 3rd Qu.: 3.937 \n Max. :10.782 Max. :10.629 Max. :11.0311 Max. :10.180 \n HSPC_048 HSPC_049 HSPC_050 HSPC_051 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.108 Median : 1.275 Median : 0.000 Median : 0.9757 \n Mean : 2.289 Mean : 2.453 Mean : 2.673 Mean : 2.2693 \n 3rd Qu.: 2.988 3rd Qu.: 3.819 3rd Qu.: 5.772 3rd Qu.: 3.1644 \n Max. :10.335 Max. :11.844 Max. :11.301 Max. :10.8692 \n HSPC_052 HSPC_053 HSPC_054 HSPC_055 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.509 Median : 0.818 Median : 0.000 Median : 0.000 \n Mean : 2.561 Mean : 2.684 Mean : 2.107 Mean : 1.959 \n 3rd Qu.: 4.644 3rd Qu.: 5.937 3rd Qu.: 2.568 3rd Qu.: 2.573 \n Max. :11.674 Max. :11.624 Max. :10.770 Max. :11.105 \n HSPC_056 HSPC_057 HSPC_058 HSPC_060 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.399 Median : 0.000 \n Mean : 2.295 Mean : 2.430 Mean : 2.296 Mean : 2.112 \n 3rd Qu.: 3.721 3rd Qu.: 3.806 3rd Qu.: 3.199 3rd Qu.: 2.201 \n Max. :11.627 Max. :10.575 Max. :11.134 Max. :10.631 \n HSPC_061 HSPC_062 HSPC_063 HSPC_064 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.515 Median : 1.101 \n Mean : 1.934 Mean : 2.129 Mean : 2.508 Mean : 2.696 \n 3rd Qu.: 2.489 3rd Qu.: 2.875 3rd Qu.: 4.895 3rd Qu.: 6.412 \n Max. :11.190 Max. :10.433 Max. :10.994 Max. :10.873 \n HSPC_065 HSPC_066 HSPC_067 HSPC_068 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.4852 Median : 0.000 Median : 1.441 Median : 0.000 \n Mean : 2.2676 Mean : 2.136 Mean : 2.480 Mean : 2.449 \n 3rd Qu.: 3.8217 3rd Qu.: 2.632 3rd Qu.: 3.548 3rd Qu.: 4.517 \n Max. :10.9023 Max. :11.608 Max. :11.147 Max. :10.901 \n HSPC_069 HSPC_070 HSPC_071 HSPC_072 \n Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8949 Median : 0.9272 Median : 1.121 \n Mean : 2.406 Mean : 2.5826 Mean : 2.2844 Mean : 2.545 \n 3rd Qu.: 4.705 3rd Qu.: 5.4749 3rd Qu.: 3.2531 3rd Qu.: 4.939 \n Max. :11.258 Max. :11.6715 Max. :10.7886 Max. :11.397 \n HSPC_073 HSPC_074 HSPC_075 HSPC_076 \n Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.00 Median : 1.674 Median : 0.000 \n Mean : 2.491 Mean : 2.46 Mean : 2.413 Mean : 2.289 \n 3rd Qu.: 4.134 3rd Qu.: 3.40 3rd Qu.: 3.013 3rd Qu.: 2.550 \n Max. :11.844 Max. :11.66 Max. :11.976 Max. :12.136 \n HSPC_077 HSPC_078 HSPC_079 HSPC_080 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.6624 Median : 1.492 Median : 1.384 Median : 1.013 \n Mean : 2.4336 Mean : 2.637 Mean : 2.432 Mean : 2.881 \n 3rd Qu.: 5.4937 3rd Qu.: 5.472 3rd Qu.: 3.617 3rd Qu.: 7.220 \n Max. :11.6020 Max. :10.673 Max. :11.199 Max. :11.836 \n HSPC_081 HSPC_082 HSPC_083 HSPC_084 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.7671 Median : 0.000 Median : 1.896 Median : 1.128 \n Mean : 1.9227 Mean : 2.474 Mean : 2.864 Mean : 2.289 \n 3rd Qu.: 1.6349 3rd Qu.: 3.488 3rd Qu.: 5.101 3rd Qu.: 2.792 \n Max. :11.4681 Max. :11.962 Max. :10.865 Max. :11.834 \n HSPC_085 HSPC_087 HSPC_088 HSPC_089 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.157 Mean : 2.314 Mean : 2.202 Mean : 2.329 \n 3rd Qu.: 3.010 3rd Qu.: 3.245 3rd Qu.: 2.092 3rd Qu.: 3.246 \n Max. :10.809 Max. :10.976 Max. :11.362 Max. :11.301 \n HSPC_090 HSPC_094 HSPC_095 HSPC_096 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 \n Median : 0.000 Median : 0.000 Median : 2.055 Median :0.000 \n Mean : 2.286 Mean : 2.186 Mean : 2.756 Mean :2.348 \n 3rd Qu.: 4.174 3rd Qu.: 2.002 3rd Qu.: 4.370 3rd Qu.:4.482 \n Max. :11.124 Max. :11.694 Max. :11.385 Max. :9.601 \n HSPC_098 HSPC_099 HSPC_100 HSPC_101 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.007 \n Mean : 2.209 Mean : 2.082 Mean : 2.313 Mean : 2.587 \n 3rd Qu.: 3.354 3rd Qu.: 2.505 3rd Qu.: 2.775 3rd Qu.: 5.334 \n Max. :11.070 Max. :10.200 Max. :11.452 Max. :11.456 \n HSPC_102 HSPC_103 HSPC_104 HSPC_105 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.111 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.210 Mean : 2.853 Mean : 2.099 Mean : 1.893 \n 3rd Qu.: 2.993 3rd Qu.: 6.123 3rd Qu.: 2.720 3rd Qu.: 2.129 \n Max. :11.153 Max. :11.328 Max. :10.746 Max. :10.721 \n HSPC_106 HSPC_107 HSPC_108 HSPC_109 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.595 \n Mean : 1.980 Mean : 2.279 Mean : 2.296 Mean : 2.420 \n 3rd Qu.: 2.425 3rd Qu.: 3.396 3rd Qu.: 3.361 3rd Qu.: 4.006 \n Max. :10.919 Max. :10.982 Max. :11.744 Max. :10.463 \n HSPC_110 HSPC_111 HSPC_114 HSPC_115 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.9173 Median : 2.349 \n Mean : 2.159 Mean : 1.800 Mean : 1.8376 Mean : 2.943 \n 3rd Qu.: 2.667 3rd Qu.: 2.214 3rd Qu.: 1.8741 3rd Qu.: 6.223 \n Max. :11.121 Max. :11.109 Max. :10.4645 Max. :11.124 \n HSPC_117 HSPC_118 HSPC_119 HSPC_120 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.187 \n Mean : 1.919 Mean : 1.855 Mean : 2.289 Mean : 2.041 \n 3rd Qu.: 2.306 3rd Qu.: 2.387 3rd Qu.: 3.292 3rd Qu.: 2.610 \n Max. :14.579 Max. :11.119 Max. :12.534 Max. :11.438 \n HSPC_121 HSPC_122 HSPC_123 HSPC_125 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.803 Mean : 2.072 Mean : 2.200 Mean : 2.116 \n 3rd Qu.: 5.798 3rd Qu.: 2.140 3rd Qu.: 3.215 3rd Qu.: 2.409 \n Max. :11.320 Max. :11.013 Max. :11.163 Max. :11.368 \n HSPC_126 HSPC_127 HSPC_130 HSPC_131 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.9381 Median : 1.147 Median : 0.000 Median : 0.000 \n Mean : 2.0014 Mean : 2.287 Mean : 2.551 Mean : 2.240 \n 3rd Qu.: 2.2215 3rd Qu.: 3.051 3rd Qu.: 3.968 3rd Qu.: 3.773 \n Max. :10.9622 Max. :11.028 Max. :10.585 Max. :11.216 \n HSPC_132 HSPC_133 HSPC_134 HSPC_135 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.4438 Median : 2.234 Median : 0.000 Median : 0.000 \n Mean : 2.1659 Mean : 2.582 Mean : 2.335 Mean : 2.402 \n 3rd Qu.: 1.8512 3rd Qu.: 4.591 3rd Qu.: 3.659 3rd Qu.: 4.134 \n Max. :10.6431 Max. :10.730 Max. :11.995 Max. :11.573 \n HSPC_136 HSPC_138 HSPC_139 HSPC_140 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.7062 Median : 2.078 Median : 0.000 \n Mean : 2.546 Mean : 2.1054 Mean : 2.876 Mean : 2.220 \n 3rd Qu.: 5.219 3rd Qu.: 1.8181 3rd Qu.: 4.604 3rd Qu.: 3.716 \n Max. :11.281 Max. :11.1177 Max. :11.013 Max. :10.893 \n HSPC_141 HSPC_142 HSPC_143 HSPC_144 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.075 \n Mean : 2.385 Mean : 2.232 Mean : 2.592 Mean : 2.004 \n 3rd Qu.: 4.149 3rd Qu.: 2.523 3rd Qu.: 4.248 3rd Qu.: 2.441 \n Max. :11.099 Max. :11.902 Max. :12.932 Max. :11.121 \n HSPC_146 HSPC_148 HSPC_149 HSPC_151 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.9711 \n Mean : 2.418 Mean : 2.385 Mean : 2.314 Mean : 2.4375 \n 3rd Qu.: 4.430 3rd Qu.: 3.288 3rd Qu.: 3.139 3rd Qu.: 3.2523 \n Max. :10.385 Max. :12.823 Max. :10.910 Max. :11.7148 \n HSPC_152 HSPC_153 HSPC_154 HSPC_155 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.247 Mean : 2.415 Mean : 2.476 Mean : 2.468 \n 3rd Qu.: 3.293 3rd Qu.: 3.524 3rd Qu.: 4.653 3rd Qu.: 3.621 \n Max. :12.463 Max. :12.205 Max. :11.437 Max. :11.207 \n HSPC_156 HSPC_157 HSPC_158 HSPC_159 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.5545 Median : 1.993 Median : 0.000 Median : 0.000 \n Mean : 2.2297 Mean : 2.493 Mean : 2.119 Mean : 2.461 \n 3rd Qu.: 2.0977 3rd Qu.: 3.692 3rd Qu.: 2.930 3rd Qu.: 3.340 \n Max. :11.2431 Max. :10.539 Max. :11.336 Max. :11.123 \n HSPC_161 HSPC_162 HSPC_164 HSPC_165 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.701 Median : 0.7152 Median : 0.000 Median : 0.000 \n Mean : 2.533 Mean : 2.3473 Mean : 2.161 Mean : 2.084 \n 3rd Qu.: 3.616 3rd Qu.: 2.4973 3rd Qu.: 2.553 3rd Qu.: 3.020 \n Max. :11.429 Max. :11.0065 Max. :11.865 Max. :10.282 \n HSPC_166 HSPC_168 HSPC_169 HSPC_170 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.002 Median : 1.158 Median : 0.000 \n Mean : 2.177 Mean : 2.390 Mean : 2.038 Mean : 2.401 \n 3rd Qu.: 3.296 3rd Qu.: 4.701 3rd Qu.: 2.232 3rd Qu.: 3.703 \n Max. :11.427 Max. :10.393 Max. :10.447 Max. :11.288 \n HSPC_171 HSPC_172 HSPC_173 HSPC_174 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.525 Median : 0.7679 Median : 0.000 Median : 1.257 \n Mean : 2.312 Mean : 2.3115 Mean : 2.288 Mean : 2.444 \n 3rd Qu.: 2.729 3rd Qu.: 3.7889 3rd Qu.: 3.037 3rd Qu.: 4.996 \n Max. :10.468 Max. :11.1442 Max. :11.074 Max. :11.095 \n HSPC_175 HSPC_176 HSPC_177 HSPC_178 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.496 Median : 2.024 Median : 1.971 Median : 1.003 \n Mean : 2.613 Mean : 2.593 Mean : 2.421 Mean : 2.277 \n 3rd Qu.: 4.845 3rd Qu.: 4.092 3rd Qu.: 3.665 3rd Qu.: 2.812 \n Max. :11.235 Max. :10.379 Max. :10.864 Max. :10.979 \n HSPC_179 HSPC_180 HSPC_181 HSPC_182 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.836 Median : 1.544 Median : 2.030 Median : 0.000 \n Mean : 2.205 Mean : 2.556 Mean : 2.890 Mean : 2.363 \n 3rd Qu.: 2.300 3rd Qu.: 4.798 3rd Qu.: 4.846 3rd Qu.: 3.779 \n Max. :11.244 Max. :10.802 Max. :10.945 Max. :10.399 \n HSPC_183 HSPC_185 HSPC_186 HSPC_187 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.020 Median : 0.000 Median : 1.606 Median : 0.000 \n Mean : 2.242 Mean : 2.708 Mean : 2.053 Mean : 2.360 \n 3rd Qu.: 2.842 3rd Qu.: 4.855 3rd Qu.: 2.834 3rd Qu.: 3.541 \n Max. :10.530 Max. :11.079 Max. :11.016 Max. :10.923 \n HSPC_189 HSPC_190 HSPC_191 HSPC_192 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.412 \n Mean : 2.120 Mean : 2.417 Mean : 2.175 Mean : 2.192 \n 3rd Qu.: 2.652 3rd Qu.: 5.226 3rd Qu.: 2.574 3rd Qu.: 2.669 \n Max. :11.300 Max. :11.023 Max. :11.454 Max. :10.225 \n HSPC_193 HSPC_195 HSPC_196 HSPC_198 \n Min. :0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.:0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median :0.9691 Median : 0.9175 Median : 1.379 Median : 1.105 \n Mean :2.5448 Mean : 2.7307 Mean : 2.327 Mean : 2.155 \n 3rd Qu.:5.1191 3rd Qu.: 5.8899 3rd Qu.: 2.625 3rd Qu.: 2.756 \n Max. :9.8728 Max. :10.4757 Max. :11.319 Max. :11.405 \n HSPC_199 HSPC_200 HSPC_202 HSPC_203 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 1.069 Median : 1.572 Median : 0.8045 Median : 1.311 \n Mean : 1.909 Mean : 2.346 Mean : 2.1384 Mean : 2.058 \n 3rd Qu.: 2.431 3rd Qu.: 2.791 3rd Qu.: 2.0569 3rd Qu.: 2.792 \n Max. :11.377 Max. :11.334 Max. :11.0516 Max. :10.852 \n HSPC_204 HSPC_205 HSPC_206 HSPC_207 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.342 Median : 1.997 Median : 1.076 Median : 0.9235 \n Mean : 2.716 Mean : 2.520 Mean : 2.426 Mean : 2.2974 \n 3rd Qu.: 5.611 3rd Qu.: 4.244 3rd Qu.: 4.057 3rd Qu.: 2.6736 \n Max. :10.269 Max. :10.817 Max. :11.866 Max. :11.4287 \n HSPC_208 HSPC_210 HSPC_211 HSPC_212 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 2.263 Median : 1.021 Median : 1.351 Median : 0.000 \n Mean : 2.893 Mean : 2.315 Mean : 2.425 Mean : 2.336 \n 3rd Qu.: 5.014 3rd Qu.: 2.676 3rd Qu.: 3.820 3rd Qu.: 3.443 \n Max. :11.375 Max. :12.208 Max. :11.360 Max. :11.808 \n HSPC_213 HSPC_214 HSPC_215 HSPC_216 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.270 Median : 0.9195 Median : 1.653 Median : 0.8022 \n Mean : 2.483 Mean : 2.1976 Mean : 2.563 Mean : 2.6010 \n 3rd Qu.: 4.903 3rd Qu.: 2.7139 3rd Qu.: 4.344 3rd Qu.: 6.0076 \n Max. :11.548 Max. :10.6947 Max. :10.933 Max. :11.2119 \n HSPC_218 HSPC_219 HSPC_220 HSPC_221 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.027 Median : 0.000 Median : 1.269 \n Mean : 2.467 Mean : 2.291 Mean : 2.449 Mean : 2.641 \n 3rd Qu.: 3.980 3rd Qu.: 2.853 3rd Qu.: 4.486 3rd Qu.: 3.617 \n Max. :11.654 Max. :10.801 Max. :10.410 Max. :11.651 \n HSPC_222 HSPC_223 HSPC_224 HSPC_225 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.449 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.262 Mean : 2.271 Mean : 2.492 Mean : 2.585 \n 3rd Qu.: 3.271 3rd Qu.: 3.727 3rd Qu.: 3.769 3rd Qu.: 5.253 \n Max. :11.133 Max. :12.000 Max. :11.114 Max. :11.671 \n HSPC_227 HSPC_228 HSPC_229 HSPC_230 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 2.484 Median : 0.000 \n Mean : 2.492 Mean : 2.370 Mean : 2.742 Mean : 2.586 \n 3rd Qu.: 3.692 3rd Qu.: 4.488 3rd Qu.: 4.836 3rd Qu.: 5.188 \n Max. :10.815 Max. :10.165 Max. :11.143 Max. :10.734 \n HSPC_231 HSPC_232 HSPC_233 HSPC_235 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.869 Median : 1.254 Median : 0.000 \n Mean : 2.379 Mean : 2.264 Mean : 2.531 Mean : 2.552 \n 3rd Qu.: 4.787 3rd Qu.: 3.163 3rd Qu.: 3.925 3rd Qu.: 4.389 \n Max. :10.790 Max. :12.098 Max. :11.533 Max. :11.765 \n HSPC_236 HSPC_237 HSPC_239 HSPC_240 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 2.207 Median : 0.892 \n Mean : 2.205 Mean : 2.457 Mean : 2.656 Mean : 2.049 \n 3rd Qu.: 3.748 3rd Qu.: 3.488 3rd Qu.: 4.904 3rd Qu.: 2.617 \n Max. :10.234 Max. :10.630 Max. :10.858 Max. :10.528 \n HSPC_243 HSPC_244 HSPC_245 HSPC_246 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.118 Median : 0.7872 Median : 1.459 Median : 1.629 \n Mean : 2.311 Mean : 2.6638 Mean : 2.360 Mean : 2.321 \n 3rd Qu.: 2.574 3rd Qu.: 6.2395 3rd Qu.: 3.000 3rd Qu.: 3.229 \n Max. :11.069 Max. :10.0730 Max. :11.297 Max. :11.237 \n HSPC_247 HSPC_248 HSPC_249 HSPC_250 \n Min. :0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.:0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median :0.000 Median : 0.8453 Median : 0.000 Median : 1.278 \n Mean :2.537 Mean : 2.3719 Mean : 1.803 Mean : 2.751 \n 3rd Qu.:4.687 3rd Qu.: 3.3090 3rd Qu.: 2.335 3rd Qu.: 6.330 \n Max. :9.821 Max. :10.8128 Max. :10.568 Max. :11.256 \n HSPC_251 HSPC_253 HSPC_254 HSPC_255 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.9714 Median : 1.265 Median : 0.000 Median : 0.9098 \n Mean : 2.5626 Mean : 2.492 Mean : 2.177 Mean : 2.1878 \n 3rd Qu.: 4.9167 3rd Qu.: 4.185 3rd Qu.: 3.437 3rd Qu.: 2.4313 \n Max. :11.1252 Max. :10.435 Max. :10.422 Max. :10.7952 \n HSPC_256 HSPC_257 HSPC_258 HSPC_261 \n Min. : 0.0000 Min. : 0.000 Min. :0.0000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 0.0000 \n Median : 0.8248 Median : 1.241 Median :0.8526 Median : 0.5387 \n Mean : 2.1051 Mean : 2.630 Mean :2.0295 Mean : 2.1419 \n 3rd Qu.: 2.3331 3rd Qu.: 5.646 3rd Qu.:3.0784 3rd Qu.: 1.9352 \n Max. :13.0375 Max. :11.499 Max. :9.9116 Max. :11.3247 \n HSPC_263 HSPC_264 HSPC_265 HSPC_266 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.538 Median : 1.426 Median : 1.883 Median : 1.839 \n Mean : 2.613 Mean : 2.374 Mean : 3.177 Mean : 2.833 \n 3rd Qu.: 4.485 3rd Qu.: 3.238 3rd Qu.: 5.702 3rd Qu.: 5.801 \n Max. :10.571 Max. :11.136 Max. :12.436 Max. :10.338 \n HSPC_267 HSPC_268 HSPC_269 HSPC_270 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.9675 Median : 0.7787 Median : 0.8632 Median : 0.9637 \n Mean : 2.4910 Mean : 2.5342 Mean : 2.4029 Mean : 2.6899 \n 3rd Qu.: 3.5345 3rd Qu.: 4.9871 3rd Qu.: 4.3176 3rd Qu.: 5.7266 \n Max. :10.0139 Max. :10.7848 Max. :11.2689 Max. :11.1648 \n HSPC_271 HSPC_274 HSPC_275 HSPC_276 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 1.352 Median : 1.730 Median : 0.5252 Median : 1.156 \n Mean : 2.493 Mean : 2.382 Mean : 2.5375 Mean : 2.485 \n 3rd Qu.: 4.430 3rd Qu.: 3.360 3rd Qu.: 5.7329 3rd Qu.: 4.623 \n Max. :11.636 Max. :11.165 Max. :11.6234 Max. :11.562 \n HSPC_278 HSPC_279 HSPC_280 HSPC_281 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.487 Median : 1.608 Median : 2.611 \n Mean : 2.161 Mean : 2.497 Mean : 2.580 Mean : 2.737 \n 3rd Qu.: 2.270 3rd Qu.: 3.813 3rd Qu.: 3.985 3rd Qu.: 4.731 \n Max. :11.734 Max. :10.900 Max. :11.673 Max. :10.076 \n HSPC_282 HSPC_283 HSPC_285 HSPC_286 \n Min. : 0.0000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.7021 Median : 1.911 Median : 0.8658 Median : 1.178 \n Mean : 2.4272 Mean : 2.534 Mean : 2.4868 Mean : 2.293 \n 3rd Qu.: 4.1254 3rd Qu.: 3.888 3rd Qu.: 5.3804 3rd Qu.: 2.597 \n Max. :11.1094 Max. :10.258 Max. :10.5533 Max. :11.112 \n HSPC_287 HSPC_288 HSPC_289 HSPC_290 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.049 Median : 0.8548 Median : 1.953 Median : 1.176 \n Mean : 2.775 Mean : 2.6412 Mean : 2.925 Mean : 2.304 \n 3rd Qu.: 5.476 3rd Qu.: 5.4204 3rd Qu.: 5.613 3rd Qu.: 3.445 \n Max. :10.925 Max. :11.0814 Max. :10.199 Max. :11.094 \n HSPC_291 HSPC_292 HSPC_293 HSPC_294 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.176 Median : 1.320 Median : 1.077 Median : 0.9161 \n Mean : 2.662 Mean : 2.534 Mean : 2.538 Mean : 2.4365 \n 3rd Qu.: 5.690 3rd Qu.: 4.297 3rd Qu.: 3.458 3rd Qu.: 4.8204 \n Max. :12.255 Max. :11.090 Max. :10.987 Max. :10.6135 \n HSPC_295 HSPC_296 HSPC_297 HSPC_298 \n Min. :0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.:0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median :1.479 Median : 2.157 Median : 2.444 Median : 1.281 \n Mean :2.849 Mean : 2.977 Mean : 3.062 Mean : 2.277 \n 3rd Qu.:5.282 3rd Qu.: 5.006 3rd Qu.: 5.005 3rd Qu.: 2.749 \n Max. :9.986 Max. :10.830 Max. :11.009 Max. :10.636 \n HSPC_299 HSPC_300 HSPC_301 HSPC_302 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.716 Median : 1.163 Median : 2.235 Median : 2.240 \n Mean : 2.597 Mean : 2.346 Mean : 2.739 Mean : 2.890 \n 3rd Qu.: 3.762 3rd Qu.: 2.876 3rd Qu.: 4.593 3rd Qu.: 4.945 \n Max. :11.663 Max. :11.690 Max. :10.364 Max. :10.498 \n HSPC_303 HSPC_304 HSPC_305 HSPC_306 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.8348 Median : 0.9727 Median : 1.152 Median : 1.303 \n Mean : 2.3400 Mean : 2.3710 Mean : 2.469 Mean : 2.496 \n 3rd Qu.: 3.2942 3rd Qu.: 2.9942 3rd Qu.: 3.300 3rd Qu.: 3.015 \n Max. :10.3022 Max. :11.7185 Max. :11.051 Max. :11.211 \n HSPC_307 HSPC_308 HSPC_309 HSPC_310 \n Min. :0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.:0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median :1.976 Median : 1.634 Median : 1.804 Median : 1.743 \n Mean :2.873 Mean : 2.812 Mean : 2.892 Mean : 2.874 \n 3rd Qu.:5.396 3rd Qu.: 5.089 3rd Qu.: 5.165 3rd Qu.: 5.004 \n Max. :9.921 Max. :10.527 Max. :10.476 Max. :11.107 \n HSPC_312 HSPC_313 HSPC_314 HSPC_315 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.420 Median : 1.592 Median : 1.635 Median : 2.262 \n Mean : 2.645 Mean : 2.637 Mean : 2.564 Mean : 2.628 \n 3rd Qu.: 4.925 3rd Qu.: 4.257 3rd Qu.: 4.297 3rd Qu.: 4.092 \n Max. :11.367 Max. :10.644 Max. :10.882 Max. :12.140 \n HSPC_317 HSPC_318 HSPC_320 HSPC_321 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 2.335 Median : 1.728 Median : 2.340 Median : 1.835 \n Mean : 2.648 Mean : 2.637 Mean : 3.064 Mean : 2.742 \n 3rd Qu.: 4.103 3rd Qu.: 4.483 3rd Qu.: 5.325 3rd Qu.: 4.340 \n Max. :10.933 Max. :11.712 Max. :11.589 Max. :11.695 \n HSPC_322 HSPC_323 HSPC_324 HSPC_325 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.9842 Median : 0.989 Median : 1.088 Median : 2.132 \n Mean : 2.5948 Mean : 2.905 Mean : 2.655 Mean : 3.091 \n 3rd Qu.: 3.4619 3rd Qu.: 5.629 3rd Qu.: 3.772 3rd Qu.: 5.191 \n Max. :11.9594 Max. :12.267 Max. :11.310 Max. :11.134 \n HSPC_326 HSPC_327 HSPC_328 HSPC_329 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.781 Median : 1.085 Median : 1.936 Median : 1.954 \n Mean : 3.021 Mean : 2.838 Mean : 2.582 Mean : 3.034 \n 3rd Qu.: 5.582 3rd Qu.: 6.388 3rd Qu.: 4.048 3rd Qu.: 5.497 \n Max. :11.268 Max. :11.433 Max. :11.908 Max. :10.927 \n HSPC_330 HSPC_331 HSPC_332 HSPC_333 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.870 Median : 2.953 Median : 1.644 Median : 1.320 \n Mean : 2.791 Mean : 3.058 Mean : 2.768 Mean : 2.428 \n 3rd Qu.: 4.409 3rd Qu.: 5.118 3rd Qu.: 5.141 3rd Qu.: 2.985 \n Max. :11.561 Max. :10.855 Max. :10.420 Max. :11.946 \n HSPC_334 HSPC_335 HSPC_336 HSPC_337 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.931 Median : 1.541 Median : 2.761 Median : 0.000 \n Mean : 2.894 Mean : 2.746 Mean : 3.051 Mean : 2.415 \n 3rd Qu.: 4.160 3rd Qu.: 4.461 3rd Qu.: 4.408 3rd Qu.: 4.188 \n Max. :11.592 Max. :11.076 Max. :11.246 Max. :10.205 \n HSPC_338 HSPC_339 HSPC_341 HSPC_342 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.9553 Median : 0.4452 \n Mean : 2.205 Mean : 2.325 Mean : 2.0823 Mean : 2.4572 \n 3rd Qu.: 2.449 3rd Qu.: 3.136 3rd Qu.: 2.0118 3rd Qu.: 4.9582 \n Max. :12.052 Max. :11.858 Max. :11.3855 Max. :11.8066 \n HSPC_343 HSPC_344 HSPC_345 HSPC_346 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.5197 \n Mean : 2.363 Mean : 2.290 Mean : 1.984 Mean : 2.5126 \n 3rd Qu.: 4.285 3rd Qu.: 3.238 3rd Qu.: 2.561 3rd Qu.: 5.2033 \n Max. :11.422 Max. :11.877 Max. :10.939 Max. :11.1527 \n HSPC_348 HSPC_349 HSPC_350 HSPC_351 \n Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 \n Median : 1.113 Median : 0.000 Median : 0.00 Median : 0.000 \n Mean : 2.232 Mean : 1.949 Mean : 2.11 Mean : 2.259 \n 3rd Qu.: 2.875 3rd Qu.: 2.784 3rd Qu.: 3.07 3rd Qu.: 3.214 \n Max. :11.161 Max. :10.720 Max. :11.15 Max. :10.912 \n HSPC_352 HSPC_353 HSPC_354 HSPC_356 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.333 Mean : 2.162 Mean : 2.427 Mean : 2.135 \n 3rd Qu.: 3.197 3rd Qu.: 2.819 3rd Qu.: 3.808 3rd Qu.: 2.709 \n Max. :12.275 Max. :11.351 Max. :11.190 Max. :10.662 \n HSPC_358 HSPC_359 HSPC_360 HSPC_361 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.278 Mean : 2.012 Mean : 2.381 Mean : 2.137 \n 3rd Qu.: 3.608 3rd Qu.: 1.460 3rd Qu.: 3.044 3rd Qu.: 2.875 \n Max. :10.924 Max. :11.678 Max. :11.203 Max. :10.847 \n HSPC_362 HSPC_363 HSPC_365 HSPC_367 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 1.783 Mean : 1.987 Mean : 2.937 Mean : 2.449 \n 3rd Qu.: 1.594 3rd Qu.: 2.750 3rd Qu.: 5.572 3rd Qu.: 3.936 \n Max. :11.889 Max. :10.389 Max. :12.427 Max. :11.081 \n HSPC_368 HSPC_370 HSPC_371 HSPC_372 \n Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.7971 Median : 0.7613 Median : 0.000 \n Mean : 1.877 Mean : 2.7681 Mean : 2.4278 Mean : 2.487 \n 3rd Qu.: 2.018 3rd Qu.: 6.5358 3rd Qu.: 4.9578 3rd Qu.: 4.226 \n Max. :11.523 Max. :11.9636 Max. :11.4223 Max. :11.700 \n HSPC_373 HSPC_374 HSPC_376 HSPC_377 \n Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.00 Median : 0.000 Median : 0.000 \n Mean : 2.330 Mean : 2.21 Mean : 2.625 Mean : 2.456 \n 3rd Qu.: 3.784 3rd Qu.: 2.44 3rd Qu.: 4.365 3rd Qu.: 4.875 \n Max. :11.672 Max. :12.04 Max. :12.011 Max. :11.282 \n HSPC_380 HSPC_382 HSPC_383 HSPC_386 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.9728 Median : 1.753 Median : 0.000 \n Mean : 2.291 Mean : 2.3318 Mean : 2.307 Mean : 2.351 \n 3rd Qu.: 2.403 3rd Qu.: 2.7605 3rd Qu.: 3.113 3rd Qu.: 3.704 \n Max. :11.415 Max. :11.3370 Max. :11.592 Max. :11.079 \n HSPC_387 HSPC_388 HSPC_389 HSPC_390 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.9037 Median : 0.000 Median : 0.000 \n Mean : 2.255 Mean : 2.4969 Mean : 2.081 Mean : 2.131 \n 3rd Qu.: 3.151 3rd Qu.: 5.3587 3rd Qu.: 2.723 3rd Qu.: 2.738 \n Max. :11.700 Max. :10.9923 Max. :11.868 Max. :10.913 \n HSPC_391 HSPC_392 HSPC_393 HSPC_395 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.026 Mean : 2.356 Mean : 2.063 Mean : 1.779 \n 3rd Qu.: 2.126 3rd Qu.: 3.781 3rd Qu.: 2.163 3rd Qu.: 1.924 \n Max. :12.021 Max. :11.370 Max. :10.530 Max. :12.219 \n HSPC_396 HSPC_398 HSPC_399 HSPC_400 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.164 Mean : 2.309 Mean : 1.831 Mean : 2.091 \n 3rd Qu.: 2.681 3rd Qu.: 3.994 3rd Qu.: 1.844 3rd Qu.: 2.781 \n Max. :11.292 Max. :11.431 Max. :11.343 Max. :10.863 \n HSPC_402 HSPC_403 HSPC_404 HSPC_405 \n Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.00 Median : 0.000 Median : 0.5496 \n Mean : 2.343 Mean : 2.06 Mean : 1.878 Mean : 2.3660 \n 3rd Qu.: 4.552 3rd Qu.: 2.45 3rd Qu.: 1.644 3rd Qu.: 2.5449 \n Max. :11.444 Max. :12.00 Max. :11.188 Max. :12.2605 \n HSPC_406 HSPC_407 HSPC_408 HSPC_409 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.565 Median : 0.5775 Median : 0.000 \n Mean : 2.169 Mean : 2.611 Mean : 1.9174 Mean : 2.234 \n 3rd Qu.: 2.606 3rd Qu.: 6.000 3rd Qu.: 1.3086 3rd Qu.: 3.044 \n Max. :10.866 Max. :11.296 Max. :12.8185 Max. :11.595 \n HSPC_410 HSPC_411 HSPC_412 HSPC_413 \n Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.9059 Median : 0.6614 Median : 0.000 \n Mean : 2.308 Mean : 3.1194 Mean : 3.0437 Mean : 2.433 \n 3rd Qu.: 4.022 3rd Qu.: 7.7574 3rd Qu.: 7.4695 3rd Qu.: 3.329 \n Max. :11.620 Max. :12.0858 Max. :11.5582 Max. :12.549 \n HSPC_415 HSPC_416 HSPC_417 HSPC_418 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.7222 Median : 0.000 \n Mean : 2.904 Mean : 2.228 Mean : 2.4242 Mean : 2.508 \n 3rd Qu.: 5.531 3rd Qu.: 3.111 3rd Qu.: 3.0795 3rd Qu.: 3.249 \n Max. :12.359 Max. :11.338 Max. :12.0314 Max. :11.857 \n HSPC_419 HSPC_420 HSPC_421 HSPC_422 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.6924 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.6246 Mean : 2.514 Mean : 2.075 Mean : 2.552 \n 3rd Qu.: 4.8156 3rd Qu.: 5.709 3rd Qu.: 3.682 3rd Qu.: 5.382 \n Max. :12.0526 Max. :11.270 Max. :10.250 Max. :11.691 \n HSPC_423 HSPC_424 HSPC_425 HSPC_426 \n Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.00 Median : 0.000 Median : 1.016 Median : 0.000 \n Mean : 2.12 Mean : 2.225 Mean : 2.658 Mean : 2.235 \n 3rd Qu.: 1.55 3rd Qu.: 2.471 3rd Qu.: 6.474 3rd Qu.: 3.134 \n Max. :11.56 Max. :11.734 Max. :11.303 Max. :10.888 \n HSPC_427 HSPC_431 HSPC_432 HSPC_435 \n Min. : 0.000 Min. : 0.000 Min. :0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.102 Median :0.000 Median : 1.098 \n Mean : 1.829 Mean : 2.360 Mean :2.169 Mean : 2.060 \n 3rd Qu.: 2.980 3rd Qu.: 3.640 3rd Qu.:3.261 3rd Qu.: 2.744 \n Max. :10.517 Max. :10.533 Max. :9.911 Max. :10.677 \n HSPC_436 HSPC_440 HSPC_441 HSPC_442 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.4719 Median : 1.385 Median : 1.084 Median : 0.595 \n Mean : 2.3880 Mean : 1.712 Mean : 2.265 Mean : 2.109 \n 3rd Qu.: 4.3738 3rd Qu.: 2.079 3rd Qu.: 2.828 3rd Qu.: 2.193 \n Max. :11.2839 Max. :11.065 Max. :11.152 Max. :11.560 \n HSPC_443 HSPC_444 HSPC_446 HSPC_447 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.7734 Median : 1.374 Median : 0.000 Median : 1.113 \n Mean : 2.5663 Mean : 2.262 Mean : 1.475 Mean : 2.446 \n 3rd Qu.: 4.9423 3rd Qu.: 2.952 3rd Qu.: 1.683 3rd Qu.: 4.733 \n Max. :10.9262 Max. :10.705 Max. :10.545 Max. :10.303 \n HSPC_448 HSPC_449 HSPC_450 HSPC_451 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.139 Median : 1.344 Median : 0.000 Median : 1.759 \n Mean : 2.396 Mean : 2.164 Mean : 1.946 Mean : 1.806 \n 3rd Qu.: 3.660 3rd Qu.: 2.490 3rd Qu.: 2.483 3rd Qu.: 2.528 \n Max. :11.091 Max. :11.324 Max. :10.397 Max. :10.395 \n HSPC_453 HSPC_454 HSPC_455 HSPC_456 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.9321 Median : 0.5303 Median : 0.000 Median : 0.6497 \n Mean : 2.4906 Mean : 2.4477 Mean : 2.379 Mean : 2.4263 \n 3rd Qu.: 4.9604 3rd Qu.: 4.8773 3rd Qu.: 3.016 3rd Qu.: 5.4740 \n Max. :10.5263 Max. :11.1628 Max. :11.437 Max. :10.9787 \n HSPC_457 HSPC_459 HSPC_460 HSPC_461 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.313 \n Mean : 2.060 Mean : 2.403 Mean : 1.712 Mean : 1.875 \n 3rd Qu.: 2.937 3rd Qu.: 3.029 3rd Qu.: 1.598 3rd Qu.: 2.104 \n Max. :11.746 Max. :12.135 Max. :12.526 Max. :10.210 \n HSPC_462 HSPC_463 HSPC_465 HSPC_466 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.7257 Median : 0.000 Median : 0.5816 \n Mean : 2.095 Mean : 2.2325 Mean : 2.000 Mean : 1.9972 \n 3rd Qu.: 2.578 3rd Qu.: 2.3442 3rd Qu.: 2.633 3rd Qu.: 2.2384 \n Max. :11.429 Max. :11.1776 Max. :11.064 Max. :11.5475 \n HSPC_467 HSPC_468 HSPC_470 HSPC_471 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.177 Median : 0.649 Median : 0.000 Median : 0.000 \n Mean : 1.866 Mean : 2.130 Mean : 1.774 Mean : 2.279 \n 3rd Qu.: 2.258 3rd Qu.: 2.513 3rd Qu.: 1.931 3rd Qu.: 2.744 \n Max. :10.632 Max. :10.527 Max. :10.781 Max. :11.533 \n HSPC_472 HSPC_473 HSPC_474 HSPC_475 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.265 Mean : 2.168 Mean : 2.016 Mean : 2.339 \n 3rd Qu.: 2.982 3rd Qu.: 2.677 3rd Qu.: 2.061 3rd Qu.: 3.319 \n Max. :11.795 Max. :12.071 Max. :11.732 Max. :10.672 \n HSPC_477 HSPC_478 HSPC_479 HSPC_480 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.6278 Median : 0.000 Median : 1.281 Median : 1.034 \n Mean : 1.9910 Mean : 2.068 Mean : 2.175 Mean : 2.239 \n 3rd Qu.: 1.6695 3rd Qu.: 3.402 3rd Qu.: 3.028 3rd Qu.: 2.642 \n Max. :11.1171 Max. :12.113 Max. :11.277 Max. :10.641 \n HSPC_482 HSPC_483 HSPC_485 HSPC_486 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.088 Median : 0.6036 Median : 1.411 \n Mean : 1.998 Mean : 2.454 Mean : 2.3824 Mean : 2.078 \n 3rd Qu.: 2.648 3rd Qu.: 3.006 3rd Qu.: 4.8213 3rd Qu.: 2.579 \n Max. :13.948 Max. :10.722 Max. :11.8691 Max. :10.155 \n HSPC_488 HSPC_489 HSPC_490 HSPC_491 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.310 Median : 0.000 \n Mean : 1.809 Mean : 1.947 Mean : 2.518 Mean : 2.268 \n 3rd Qu.: 2.120 3rd Qu.: 2.330 3rd Qu.: 4.140 3rd Qu.: 3.300 \n Max. :11.271 Max. :11.518 Max. :11.646 Max. :10.366 \n HSPC_492 HSPC_493 HSPC_494 HSPC_495 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.127 Mean : 2.054 Mean : 2.255 Mean : 2.326 \n 3rd Qu.: 2.322 3rd Qu.: 3.060 3rd Qu.: 3.386 3rd Qu.: 3.812 \n Max. :11.674 Max. :10.404 Max. :10.461 Max. :10.304 \n HSPC_496 HSPC_497 HSPC_498 HSPC_499 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.945 Median : 0.5839 Median : 0.000 \n Mean : 1.938 Mean : 2.287 Mean : 2.3731 Mean : 2.045 \n 3rd Qu.: 2.227 3rd Qu.: 2.872 3rd Qu.: 3.6112 3rd Qu.: 2.358 \n Max. :11.323 Max. :11.873 Max. :11.3264 Max. :10.632 \n HSPC_500 HSPC_501 HSPC_502 HSPC_503 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.9146 Median : 0.7789 \n Mean : 2.199 Mean : 2.209 Mean : 2.2727 Mean : 2.4495 \n 3rd Qu.: 2.678 3rd Qu.: 3.150 3rd Qu.: 2.8888 3rd Qu.: 5.4034 \n Max. :11.665 Max. :10.727 Max. :11.4591 Max. :11.5376 \n HSPC_504 HSPC_505 HSPC_506 HSPC_507 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.137 Mean : 2.132 Mean : 2.017 Mean : 2.314 \n 3rd Qu.: 3.035 3rd Qu.: 2.744 3rd Qu.: 2.794 3rd Qu.: 3.175 \n Max. :11.625 Max. :11.385 Max. :11.467 Max. :11.232 \n HSPC_508 HSPC_509 HSPC_510 HSPC_512 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.2297 Median : 1.691 Median : 1.166 Median : 0.000 \n Mean : 1.9265 Mean : 2.548 Mean : 2.319 Mean : 2.482 \n 3rd Qu.: 0.8975 3rd Qu.: 4.397 3rd Qu.: 3.492 3rd Qu.: 3.753 \n Max. :12.0747 Max. :10.603 Max. :10.885 Max. :12.492 \n HSPC_514 HSPC_515 HSPC_516 HSPC_518 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.109 Median : 0.8853 Median : 0.000 \n Mean : 2.295 Mean : 2.298 Mean : 2.5439 Mean : 2.649 \n 3rd Qu.: 2.429 3rd Qu.: 2.560 3rd Qu.: 4.6629 3rd Qu.: 5.581 \n Max. :11.783 Max. :12.193 Max. :12.1718 Max. :11.838 \n HSPC_520 HSPC_521 HSPC_522 HSPC_523 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.3648 \n Mean : 2.295 Mean : 2.348 Mean : 2.529 Mean : 1.9471 \n 3rd Qu.: 2.975 3rd Qu.: 3.375 3rd Qu.: 5.350 3rd Qu.: 1.5726 \n Max. :12.289 Max. :11.712 Max. :10.364 Max. :12.5906 \n HSPC_524 HSPC_526 HSPC_527 HSPC_528 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.777 Median : 0.532 \n Mean : 1.989 Mean : 2.218 Mean : 2.133 Mean : 2.238 \n 3rd Qu.: 3.267 3rd Qu.: 2.431 3rd Qu.: 1.651 3rd Qu.: 2.095 \n Max. :12.105 Max. :10.870 Max. :12.017 Max. :12.183 \n HSPC_530 HSPC_532 HSPC_533 HSPC_534 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.7537 Median : 0.000 \n Mean : 2.017 Mean : 1.856 Mean : 1.7546 Mean : 2.183 \n 3rd Qu.: 2.514 3rd Qu.: 1.816 3rd Qu.: 1.3378 3rd Qu.: 2.311 \n Max. :11.549 Max. :11.255 Max. :11.5862 Max. :11.696 \n HSPC_535 HSPC_537 HSPC_538 HSPC_539 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.122 Mean : 2.010 Mean : 2.501 Mean : 2.463 \n 3rd Qu.: 2.733 3rd Qu.: 2.541 3rd Qu.: 4.886 3rd Qu.: 4.100 \n Max. :10.793 Max. :10.305 Max. :11.359 Max. :11.755 \n HSPC_540 HSPC_541 HSPC_543 HSPC_544 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.9898 Median : 2.362 Median : 0.000 Median : 0.8222 \n Mean : 2.1775 Mean : 2.613 Mean : 2.275 Mean : 2.8070 \n 3rd Qu.: 1.9846 3rd Qu.: 4.440 3rd Qu.: 2.690 3rd Qu.: 6.4209 \n Max. :12.2963 Max. :11.844 Max. :10.983 Max. :10.7976 \n HSPC_545 HSPC_546 HSPC_547 HSPC_548 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 1.485 Median : 0.000 Median : 0.6548 Median : 1.456 \n Mean : 2.215 Mean : 2.424 Mean : 2.5255 Mean : 2.415 \n 3rd Qu.: 2.677 3rd Qu.: 3.573 3rd Qu.: 2.8714 3rd Qu.: 2.639 \n Max. :11.815 Max. :11.235 Max. :11.8801 Max. :11.955 \n HSPC_549 HSPC_550 HSPC_551 HSPC_552 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.750 Median : 1.287 Median : 1.226 \n Mean : 2.149 Mean : 2.592 Mean : 2.680 Mean : 2.236 \n 3rd Qu.: 2.289 3rd Qu.: 4.686 3rd Qu.: 4.007 3rd Qu.: 2.669 \n Max. :11.827 Max. :12.064 Max. :11.874 Max. :11.581 \n HSPC_553 HSPC_554 HSPC_555 HSPC_556 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.4709 Median : 0.000 Median : 0.000 Median : 0.9369 \n Mean : 2.6931 Mean : 2.090 Mean : 1.903 Mean : 2.4784 \n 3rd Qu.: 6.4420 3rd Qu.: 2.158 3rd Qu.: 2.579 3rd Qu.: 3.4024 \n Max. :11.0566 Max. :11.755 Max. :11.245 Max. :11.9838 \n HSPC_557 HSPC_559 HSPC_560 HSPC_562 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.681 Median : 0.000 \n Mean : 1.972 Mean : 1.937 Mean : 2.082 Mean : 2.470 \n 3rd Qu.: 1.880 3rd Qu.: 2.411 3rd Qu.: 2.436 3rd Qu.: 4.148 \n Max. :11.792 Max. :11.871 Max. :11.761 Max. :11.958 \n HSPC_563 HSPC_566 HSPC_567 HSPC_568 \n Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.00 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 1.83 Mean : 2.486 Mean : 2.186 Mean : 2.267 \n 3rd Qu.: 2.29 3rd Qu.: 3.577 3rd Qu.: 2.254 3rd Qu.: 2.957 \n Max. :10.59 Max. :12.452 Max. :11.302 Max. :10.851 \n HSPC_569 HSPC_571 HSPC_573 HSPC_574 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.771 Median : 1.042 Median : 0.000 Median : 0.7547 \n Mean : 2.283 Mean : 2.213 Mean : 2.089 Mean : 2.3196 \n 3rd Qu.: 3.021 3rd Qu.: 2.879 3rd Qu.: 2.291 3rd Qu.: 5.6078 \n Max. :10.720 Max. :10.939 Max. :11.397 Max. :10.4741 \n HSPC_575 HSPC_576 HSPC_577 HSPC_578 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.606 Median : 0.000 \n Mean : 2.016 Mean : 2.206 Mean : 2.358 Mean : 2.257 \n 3rd Qu.: 2.267 3rd Qu.: 2.741 3rd Qu.: 3.198 3rd Qu.: 2.923 \n Max. :10.687 Max. :11.201 Max. :11.613 Max. :12.323 \n HSPC_579 HSPC_580 HSPC_582 HSPC_584 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.182 Median : 0.9442 Median : 0.000 Median : 0.000 \n Mean : 2.472 Mean : 2.4264 Mean : 2.218 Mean : 2.276 \n 3rd Qu.: 5.009 3rd Qu.: 3.5841 3rd Qu.: 3.332 3rd Qu.: 3.067 \n Max. :11.096 Max. :10.6790 Max. :10.882 Max. :10.954 \n HSPC_585 HSPC_586 HSPC_589 HSPC_590 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8915 Median : 0.000 Median : 1.192 \n Mean : 2.034 Mean : 2.0490 Mean : 2.274 Mean : 2.252 \n 3rd Qu.: 2.157 3rd Qu.: 1.8340 3rd Qu.: 3.655 3rd Qu.: 2.364 \n Max. :11.956 Max. :11.4729 Max. :11.198 Max. :10.673 \n HSPC_592 HSPC_593 HSPC_594 HSPC_595 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.228 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.317 Mean : 2.329 Mean : 2.474 Mean : 1.463 \n 3rd Qu.: 2.671 3rd Qu.: 3.263 3rd Qu.: 4.396 3rd Qu.: 1.757 \n Max. :12.036 Max. :10.626 Max. :11.347 Max. :11.286 \n HSPC_596 HSPC_597 HSPC_598 HSPC_599 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.392 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.283 Mean : 1.858 Mean : 1.954 Mean : 1.905 \n 3rd Qu.: 3.425 3rd Qu.: 2.296 3rd Qu.: 2.320 3rd Qu.: 2.497 \n Max. :10.899 Max. :11.002 Max. :11.117 Max. :11.248 \n HSPC_600 HSPC_601 HSPC_602 HSPC_603 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.662 Median : 0.000 \n Mean : 2.335 Mean : 1.905 Mean : 2.343 Mean : 2.281 \n 3rd Qu.: 3.827 3rd Qu.: 2.376 3rd Qu.: 3.272 3rd Qu.: 3.048 \n Max. :11.208 Max. :11.022 Max. :10.908 Max. :11.464 \n HSPC_604 HSPC_606 HSPC_607 HSPC_608 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.136 Mean : 2.392 Mean : 2.142 Mean : 2.139 \n 3rd Qu.: 2.516 3rd Qu.: 4.726 3rd Qu.: 3.187 3rd Qu.: 2.885 \n Max. :11.743 Max. :11.210 Max. :10.319 Max. :10.802 \n HSPC_610 HSPC_612 HSPC_613 HSPC_614 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.315 \n Mean : 2.327 Mean : 2.298 Mean : 2.228 Mean : 2.364 \n 3rd Qu.: 3.718 3rd Qu.: 3.138 3rd Qu.: 2.705 3rd Qu.: 3.136 \n Max. :10.860 Max. :11.564 Max. :10.560 Max. :11.824 \n HSPC_615 HSPC_617 HSPC_618 HSPC_620 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8525 Median : 0.000 Median : 0.000 \n Mean : 1.964 Mean : 2.2100 Mean : 2.229 Mean : 1.881 \n 3rd Qu.: 2.451 3rd Qu.: 2.3301 3rd Qu.: 2.885 3rd Qu.: 2.518 \n Max. :11.058 Max. :10.9434 Max. :11.210 Max. :11.388 \n HSPC_623 HSPC_624 HSPC_625 HSPC_626 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.7201 Median : 0.000 Median : 0.000 \n Mean : 2.563 Mean : 2.0968 Mean : 2.042 Mean : 2.262 \n 3rd Qu.: 4.626 3rd Qu.: 1.8437 3rd Qu.: 2.938 3rd Qu.: 3.424 \n Max. :10.954 Max. :10.9459 Max. :11.226 Max. :11.770 \n HSPC_627 HSPC_628 HSPC_629 HSPC_630 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.269 Mean : 2.302 Mean : 2.212 Mean : 2.519 \n 3rd Qu.: 3.952 3rd Qu.: 2.875 3rd Qu.: 2.625 3rd Qu.: 4.511 \n Max. :11.426 Max. :11.792 Max. :11.139 Max. :11.519 \n HSPC_631 HSPC_633 HSPC_634 HSPC_635 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.303 Mean : 2.329 Mean : 2.268 Mean : 2.054 \n 3rd Qu.: 2.685 3rd Qu.: 3.619 3rd Qu.: 3.662 3rd Qu.: 2.629 \n Max. :10.996 Max. :12.011 Max. :11.406 Max. :11.178 \n HSPC_636 HSPC_637 HSPC_638 HSPC_639 \n Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.9389 Median : 0.5101 Median : 0.9966 \n Mean : 1.953 Mean : 2.1351 Mean : 1.6966 Mean : 1.5879 \n 3rd Qu.: 2.129 3rd Qu.: 2.4817 3rd Qu.: 1.6879 3rd Qu.: 1.6840 \n Max. :11.057 Max. :11.1881 Max. :10.8837 Max. :10.9561 \n HSPC_640 HSPC_641 HSPC_643 HSPC_644 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.025 Median : 1.000 Median : 1.706 Median : 0.4904 \n Mean : 2.136 Mean : 1.957 Mean : 2.468 Mean : 2.4726 \n 3rd Qu.: 2.119 3rd Qu.: 2.001 3rd Qu.: 3.329 3rd Qu.: 5.6227 \n Max. :11.173 Max. :11.056 Max. :12.016 Max. :11.0232 \n HSPC_645 HSPC_646 HSPC_648 HSPC_649 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.7157 Median : 0.9959 Median : 1.519 Median : 0.7139 \n Mean : 2.3517 Mean : 2.0594 Mean : 2.267 Mean : 2.3593 \n 3rd Qu.: 4.5630 3rd Qu.: 2.3154 3rd Qu.: 2.722 3rd Qu.: 4.1542 \n Max. :10.9922 Max. :11.6070 Max. :11.243 Max. :10.7707 \n HSPC_651 HSPC_652 HSPC_654 HSPC_656 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 \n Median : 0.000 Median : 0.000 Median : 1.398 Median :0.000 \n Mean : 2.550 Mean : 1.764 Mean : 2.108 Mean :1.983 \n 3rd Qu.: 5.615 3rd Qu.: 2.038 3rd Qu.: 2.562 3rd Qu.:2.505 \n Max. :11.202 Max. :10.897 Max. :10.367 Max. :9.673 \n HSPC_657 HSPC_658 HSPC_660 HSPC_661 \n Min. : 0.000 Min. : 0.000 Min. :0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median :1.253 Median : 1.491 \n Mean : 1.839 Mean : 2.319 Mean :2.542 Mean : 2.401 \n 3rd Qu.: 2.239 3rd Qu.: 4.021 3rd Qu.:5.274 3rd Qu.: 2.775 \n Max. :12.132 Max. :11.264 Max. :9.852 Max. :11.647 \n HSPC_662 HSPC_663 HSPC_664 HSPC_665 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.9407 Median : 0.6452 \n Mean : 2.298 Mean : 2.726 Mean : 2.4039 Mean : 2.1211 \n 3rd Qu.: 2.939 3rd Qu.: 6.519 3rd Qu.: 3.4095 3rd Qu.: 2.0744 \n Max. :11.277 Max. :12.152 Max. :10.9423 Max. :12.0111 \n HSPC_666 HSPC_667 HSPC_668 HSPC_669 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.130 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.815 Mean : 2.075 Mean : 2.245 Mean : 1.992 \n 3rd Qu.: 6.359 3rd Qu.: 2.549 3rd Qu.: 2.407 3rd Qu.: 2.426 \n Max. :11.052 Max. :11.406 Max. :11.061 Max. :11.752 \n HSPC_670 HSPC_671 HSPC_672 HSPC_673 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.336 Mean : 2.349 Mean : 2.041 Mean : 2.148 \n 3rd Qu.: 3.188 3rd Qu.: 3.777 3rd Qu.: 2.057 3rd Qu.: 2.723 \n Max. :11.021 Max. :10.846 Max. :11.212 Max. :11.579 \n HSPC_674 HSPC_676 HSPC_678 HSPC_679 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.525 Median : 1.531 \n Mean : 1.801 Mean : 2.239 Mean : 2.298 Mean : 2.133 \n 3rd Qu.: 1.892 3rd Qu.: 3.097 3rd Qu.: 3.089 3rd Qu.: 2.737 \n Max. :10.875 Max. :10.496 Max. :12.125 Max. :11.583 \n HSPC_680 HSPC_681 HSPC_682 HSPC_683 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.037 Median : 1.043 Median : 1.180 \n Mean : 2.573 Mean : 2.277 Mean : 2.586 Mean : 2.498 \n 3rd Qu.: 4.165 3rd Qu.: 4.210 3rd Qu.: 5.432 3rd Qu.: 3.929 \n Max. :11.100 Max. :10.154 Max. :11.095 Max. :10.859 \n HSPC_687 HSPC_689 HSPC_690 HSPC_692 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 2.091 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.136 Mean : 2.489 Mean : 2.686 Mean : 2.285 \n 3rd Qu.: 2.911 3rd Qu.: 4.106 3rd Qu.: 5.055 3rd Qu.: 3.427 \n Max. :11.380 Max. :10.693 Max. :10.408 Max. :12.242 \n HSPC_695 HSPC_696 HSPC_697 HSPC_698 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.2681 Median : 1.538 Median : 1.271 Median : 0.000 \n Mean : 1.6151 Mean : 2.688 Mean : 2.529 Mean : 2.531 \n 3rd Qu.: 0.6895 3rd Qu.: 5.560 3rd Qu.: 4.779 3rd Qu.: 4.387 \n Max. :12.4139 Max. :10.880 Max. :10.292 Max. :12.146 \n HSPC_699 HSPC_700 HSPC_701 HSPC_702 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.157 Median : 0.000 \n Mean : 2.586 Mean : 2.402 Mean : 2.401 Mean : 2.723 \n 3rd Qu.: 4.595 3rd Qu.: 4.797 3rd Qu.: 3.889 3rd Qu.: 4.822 \n Max. :11.389 Max. :10.630 Max. :11.750 Max. :11.805 \n HSPC_703 HSPC_704 HSPC_705 HSPC_706 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 2.193 Median : 0.000 Median : 0.9795 Median : 1.273 \n Mean : 2.543 Mean : 2.598 Mean : 2.5048 Mean : 2.364 \n 3rd Qu.: 3.935 3rd Qu.: 4.335 3rd Qu.: 5.0680 3rd Qu.: 3.492 \n Max. :11.710 Max. :11.488 Max. :11.3580 Max. :10.447 \n HSPC_707 HSPC_708 HSPC_709 HSPC_714 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.361 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.371 Mean : 2.509 Mean : 2.601 Mean : 2.326 \n 3rd Qu.: 3.626 3rd Qu.: 3.832 3rd Qu.: 5.060 3rd Qu.: 3.324 \n Max. :11.796 Max. :10.865 Max. :10.145 Max. :11.126 \n HSPC_716 HSPC_717 HSPC_719 HSPC_720 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 1.154 Median : 1.855 Median : 0.8206 \n Mean : 2.325 Mean : 2.302 Mean : 2.519 Mean : 2.5768 \n 3rd Qu.: 3.356 3rd Qu.: 2.833 3rd Qu.: 4.115 3rd Qu.: 5.5594 \n Max. :11.812 Max. :11.047 Max. :12.237 Max. :10.5895 \n HSPC_721 HSPC_722 HSPC_723 HSPC_724 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 2.113 Median : 1.185 Median : 0.8421 Median : 0.6485 \n Mean : 2.205 Mean : 1.814 Mean : 2.6174 Mean : 1.9644 \n 3rd Qu.: 3.456 3rd Qu.: 2.269 3rd Qu.: 4.9545 3rd Qu.: 1.9402 \n Max. :10.706 Max. :10.709 Max. :11.5956 Max. :11.3505 \n HSPC_725 HSPC_727 HSPC_729 HSPC_730 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 1.577 Median : 1.576 Median : 0.9579 Median : 1.135 \n Mean : 2.483 Mean : 2.436 Mean : 2.2448 Mean : 2.445 \n 3rd Qu.: 3.741 3rd Qu.: 3.447 3rd Qu.: 2.7343 3rd Qu.: 3.475 \n Max. :10.647 Max. :11.512 Max. :10.9657 Max. :11.121 \n HSPC_731 HSPC_732 HSPC_733 HSPC_734 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.130 Median : 0.6937 Median : 1.436 Median : 0.7333 \n Mean : 2.854 Mean : 2.1051 Mean : 2.489 Mean : 2.5404 \n 3rd Qu.: 6.019 3rd Qu.: 2.0311 3rd Qu.: 3.738 3rd Qu.: 5.6282 \n Max. :10.471 Max. :11.0494 Max. :10.929 Max. :10.4547 \n HSPC_735 HSPC_736 HSPC_737 HSPC_738 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.033 Median : 0.6789 Median : 1.185 Median : 1.514 \n Mean : 2.389 Mean : 2.0224 Mean : 2.722 Mean : 2.503 \n 3rd Qu.: 3.056 3rd Qu.: 2.0017 3rd Qu.: 5.669 3rd Qu.: 3.602 \n Max. :10.866 Max. :11.8100 Max. :11.076 Max. :10.473 \n HSPC_740 HSPC_742 HSPC_743 HSPC_744 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8437 Median : 1.122 Median : 1.213 \n Mean : 2.506 Mean : 1.8949 Mean : 2.028 Mean : 2.048 \n 3rd Qu.: 3.794 3rd Qu.: 1.7586 3rd Qu.: 2.840 3rd Qu.: 2.309 \n Max. :10.618 Max. :11.6327 Max. :10.449 Max. :10.598 \n HSPC_745 HSPC_746 HSPC_747 HSPC_748 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 2.403 Median : 2.184 Median : 0.000 Median : 1.181 \n Mean : 2.309 Mean : 2.153 Mean : 2.543 Mean : 2.017 \n 3rd Qu.: 3.793 3rd Qu.: 3.016 3rd Qu.: 4.751 3rd Qu.: 2.264 \n Max. :10.882 Max. :10.988 Max. :10.860 Max. :12.153 \n HSPC_749 HSPC_750 HSPC_751 HSPC_752 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.444 Median : 1.030 Median : 1.567 Median : 2.228 \n Mean : 2.477 Mean : 2.370 Mean : 2.416 Mean : 2.529 \n 3rd Qu.: 3.501 3rd Qu.: 3.052 3rd Qu.: 3.435 3rd Qu.: 3.976 \n Max. :11.391 Max. :11.167 Max. :10.239 Max. :10.586 \n HSPC_753 HSPC_755 HSPC_756 HSPC_757 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.062 Median : 0.740 Median : 1.731 Median : 1.395 \n Mean : 2.313 Mean : 2.102 Mean : 2.592 Mean : 2.477 \n 3rd Qu.: 2.961 3rd Qu.: 2.509 3rd Qu.: 4.107 3rd Qu.: 3.253 \n Max. :11.202 Max. :10.559 Max. :10.783 Max. :10.973 \n HSPC_758 HSPC_759 HSPC_760 HSPC_761 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.8648 Median : 0.9415 Median : 1.052 Median : 0.6917 \n Mean : 2.6819 Mean : 2.1274 Mean : 2.288 Mean : 2.2992 \n 3rd Qu.: 4.7233 3rd Qu.: 2.2271 3rd Qu.: 2.404 3rd Qu.: 2.6015 \n Max. :11.1096 Max. :11.2534 Max. :11.008 Max. :11.7228 \n HSPC_762 HSPC_764 HSPC_765 HSPC_766 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.271 Median : 1.784 Median : 2.116 Median : 0.9828 \n Mean : 2.242 Mean : 2.068 Mean : 2.100 Mean : 2.1721 \n 3rd Qu.: 2.734 3rd Qu.: 3.059 3rd Qu.: 2.939 3rd Qu.: 2.6115 \n Max. :12.043 Max. :11.003 Max. :12.757 Max. :10.2002 \n HSPC_767 HSPC_768 HSPC_769 HSPC_770 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.6646 Median : 1.703 Median : 0.000 Median : 1.760 \n Mean : 1.9552 Mean : 2.365 Mean : 2.080 Mean : 2.343 \n 3rd Qu.: 1.9730 3rd Qu.: 3.325 3rd Qu.: 3.289 3rd Qu.: 3.122 \n Max. :11.3033 Max. :10.958 Max. :11.176 Max. :10.497 \n HSPC_771 HSPC_772 HSPC_773 HSPC_774 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 1.178 Median : 1.601 Median : 0.9901 Median : 0.9736 \n Mean : 2.527 Mean : 2.283 Mean : 1.8628 Mean : 2.5263 \n 3rd Qu.: 3.342 3rd Qu.: 2.828 3rd Qu.: 1.9851 3rd Qu.: 5.4694 \n Max. :11.156 Max. :10.625 Max. :10.7274 Max. :10.7701 \n HSPC_776 HSPC_777 HSPC_778 HSPC_780 \n Min. : 0.000 Min. : 0.000 Min. :0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.053 Median :1.435 Median : 1.178 \n Mean : 2.315 Mean : 2.110 Mean :2.130 Mean : 2.476 \n 3rd Qu.: 3.788 3rd Qu.: 2.673 3rd Qu.:3.488 3rd Qu.: 3.769 \n Max. :11.105 Max. :11.646 Max. :9.535 Max. :11.265 \n HSPC_781 HSPC_782 HSPC_783 HSPC_784 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.050 \n Mean : 1.911 Mean : 2.416 Mean : 2.254 Mean : 2.175 \n 3rd Qu.: 2.884 3rd Qu.: 3.872 3rd Qu.: 2.548 3rd Qu.: 2.468 \n Max. :11.445 Max. :10.161 Max. :10.970 Max. :10.958 \n HSPC_785 HSPC_786 HSPC_787 HSPC_788 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 1.148 Median : 0.000 Median : 0.9386 \n Mean : 2.230 Mean : 2.467 Mean : 2.100 Mean : 1.9749 \n 3rd Qu.: 2.466 3rd Qu.: 3.899 3rd Qu.: 2.991 3rd Qu.: 2.6662 \n Max. :11.041 Max. :11.080 Max. :10.690 Max. :11.1078 \n HSPC_789 HSPC_790 HSPC_791 HSPC_794 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.181 Median : 1.353 Median : 1.790 Median : 1.113 \n Mean : 2.225 Mean : 2.255 Mean : 2.699 Mean : 2.225 \n 3rd Qu.: 2.876 3rd Qu.: 2.852 3rd Qu.: 4.931 3rd Qu.: 2.768 \n Max. :11.245 Max. :11.558 Max. :11.104 Max. :11.118 \n HSPC_795 HSPC_796 HSPC_797 HSPC_798 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.8317 Median : 0.7001 Median : 0.8722 Median : 1.531 \n Mean : 2.3985 Mean : 2.6865 Mean : 2.6172 Mean : 2.485 \n 3rd Qu.: 3.4461 3rd Qu.: 5.6688 3rd Qu.: 5.3078 3rd Qu.: 3.098 \n Max. :11.0956 Max. :11.0829 Max. :11.4339 Max. :10.933 \n HSPC_799 HSPC_800 HSPC_801 HSPC_802 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.033 \n Mean : 2.179 Mean : 2.173 Mean : 2.427 Mean : 2.613 \n 3rd Qu.: 3.517 3rd Qu.: 2.865 3rd Qu.: 4.665 3rd Qu.: 3.780 \n Max. :11.666 Max. :11.263 Max. :10.905 Max. :10.864 \n HSPC_803 HSPC_804 HSPC_806 HSPC_807 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 2.103 Median : 0.7501 \n Mean : 2.395 Mean : 2.301 Mean : 2.222 Mean : 2.2476 \n 3rd Qu.: 3.883 3rd Qu.: 3.167 3rd Qu.: 3.445 3rd Qu.: 2.3481 \n Max. :10.766 Max. :11.298 Max. :10.326 Max. :11.2700 \n HSPC_808 HSPC_809 HSPC_810 HSPC_812 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.6619 Median : 1.788 Median : 1.651 Median : 0.8459 \n Mean : 2.1677 Mean : 2.544 Mean : 2.471 Mean : 2.2960 \n 3rd Qu.: 2.5355 3rd Qu.: 3.730 3rd Qu.: 3.662 3rd Qu.: 2.6906 \n Max. :10.9302 Max. :11.791 Max. :10.829 Max. :11.5500 \n HSPC_813 HSPC_814 HSPC_815 HSPC_816 \n Min. : 0.0000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.6278 Median : 1.110 Median : 0.9631 Median : 1.346 \n Mean : 2.2448 Mean : 2.762 Mean : 2.4587 Mean : 2.341 \n 3rd Qu.: 2.3066 3rd Qu.: 5.996 3rd Qu.: 3.4228 3rd Qu.: 2.842 \n Max. :12.0043 Max. :10.406 Max. :11.4527 Max. :11.151 \n HSPC_818 HSPC_819 HSPC_820 HSPC_821 \n Min. : 0.0000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.9967 Median : 1.365 Median : 0.8099 Median : 1.382 \n Mean : 2.3081 Mean : 2.426 Mean : 2.1063 Mean : 2.532 \n 3rd Qu.: 2.9942 3rd Qu.: 3.632 3rd Qu.: 2.4643 3rd Qu.: 3.462 \n Max. :11.9931 Max. :10.672 Max. :11.2412 Max. :12.126 \n HSPC_822 HSPC_824 HSPC_825 HSPC_826 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.387 Median : 0.000 Median : 1.386 Median : 1.324 \n Mean : 2.503 Mean : 2.084 Mean : 2.162 Mean : 2.398 \n 3rd Qu.: 3.799 3rd Qu.: 2.342 3rd Qu.: 2.897 3rd Qu.: 3.150 \n Max. :11.892 Max. :11.365 Max. :11.498 Max. :11.198 \n HSPC_827 HSPC_828 HSPC_831 HSPC_832 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.746 Median : 1.003 Median : 1.304 Median : 1.035 \n Mean : 2.239 Mean : 2.145 Mean : 2.589 Mean : 2.384 \n 3rd Qu.: 2.638 3rd Qu.: 2.326 3rd Qu.: 3.866 3rd Qu.: 3.450 \n Max. :12.101 Max. :10.710 Max. :10.839 Max. :10.686 \n HSPC_833 HSPC_834 HSPC_835 HSPC_836 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.7166 Median : 0.9245 Median : 1.006 Median : 0.000 \n Mean : 2.3553 Mean : 2.0872 Mean : 2.552 Mean : 2.471 \n 3rd Qu.: 3.9364 3rd Qu.: 2.4568 3rd Qu.: 4.034 3rd Qu.: 3.994 \n Max. :11.1695 Max. :11.1803 Max. :11.779 Max. :11.316 \n HSPC_837 HSPC_838 HSPC_839 HSPC_840 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.312 Median : 0.9838 Median : 1.083 Median : 1.867 \n Mean : 2.590 Mean : 2.5281 Mean : 2.380 Mean : 2.548 \n 3rd Qu.: 4.443 3rd Qu.: 3.5551 3rd Qu.: 3.743 3rd Qu.: 3.609 \n Max. :10.672 Max. :11.2707 Max. :10.966 Max. :10.867 \n HSPC_841 HSPC_842 HSPC_843 HSPC_844 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.439 Median : 1.774 Median : 1.257 Median : 1.584 \n Mean : 2.408 Mean : 2.380 Mean : 2.845 Mean : 2.627 \n 3rd Qu.: 3.494 3rd Qu.: 3.490 3rd Qu.: 6.768 3rd Qu.: 3.951 \n Max. :10.930 Max. :11.137 Max. :11.933 Max. :11.446 \n HSPC_845 HSPC_846 HSPC_848 HSPC_849 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.227 Median : 1.401 Median : 0.000 Median : 1.602 \n Mean : 2.464 Mean : 2.240 Mean : 2.152 Mean : 2.402 \n 3rd Qu.: 3.377 3rd Qu.: 2.920 3rd Qu.: 2.554 3rd Qu.: 2.920 \n Max. :10.535 Max. :11.519 Max. :11.266 Max. :11.678 \n HSPC_851 HSPC_852 \n Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 \n Mean : 2.319 Mean : 2.143 \n 3rd Qu.: 3.373 3rd Qu.: 2.901 \n Max. :11.602 Max. :11.469 \n\n\nHmmmm, did you get all that? Nope, me neither! We have 701 cells but we only have 6 samples for the frogs. We will need a different approach to get an overview but I find it is still useful to look at the few columns\n🎬 Get a quick overview the first 20 columns:\n\nsummary(hspc[1:20])\n\n ensembl_gene_id HSPC_001 HSPC_002 HSPC_003 \n Length:280 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Mode :character Median : 0.000 Median : 0.000 Median : 0.9929 \n Mean : 2.143 Mean : 1.673 Mean : 2.5964 \n 3rd Qu.: 2.120 3rd Qu.: 2.239 3rd Qu.: 6.1559 \n Max. :12.567 Max. :11.976 Max. :11.1138 \n HSPC_004 HSPC_006 HSPC_008 HSPC_009 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 \n Median : 0.000 Median : 1.276 Median : 0.000 Median :0.000 \n Mean : 1.851 Mean : 2.338 Mean : 2.375 Mean :2.220 \n 3rd Qu.: 2.466 3rd Qu.: 3.536 3rd Qu.: 3.851 3rd Qu.:3.594 \n Max. :11.133 Max. :10.014 Max. :11.574 Max. :9.997 \n HSPC_011 HSPC_012 HSPC_014 HSPC_015 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.750 Median : 0.000 Median : 0.000 \n Mean : 2.285 Mean : 2.431 Mean : 2.295 Mean : 2.515 \n 3rd Qu.: 3.193 3rd Qu.: 3.741 3rd Qu.: 3.150 3rd Qu.: 3.789 \n Max. :11.260 Max. :10.905 Max. :11.051 Max. :10.751 \n HSPC_016 HSPC_017 HSPC_018 HSPC_020 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.9488 Median : 0.000 Median : 1.248 Median : 0.000 \n Mean : 2.6115 Mean : 2.146 Mean : 2.710 Mean : 2.509 \n 3rd Qu.: 5.9412 3rd Qu.: 2.357 3rd Qu.: 6.006 3rd Qu.: 4.470 \n Max. :11.3082 Max. :12.058 Max. :11.894 Max. :11.281 \n HSPC_021 HSPC_022 HSPC_023 HSPC_024 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.170 Mean : 2.287 Mean : 2.314 Mean : 2.195 \n 3rd Qu.: 2.996 3rd Qu.: 3.351 3rd Qu.: 2.749 3rd Qu.: 2.944 \n Max. :10.709 Max. :11.814 Max. :12.113 Max. :11.279 \n\n\nNotice that:\n\nthe maximum value is much less high than for the frogs and has decimals. That is because the mouse data are logged (to base 2) normalised counts, not raw counts as they are in the frog data set.\na minimum value of 0 appears in all 20 columns - perhaps that is true across the whole dataset (or at least common)\nat least some of the medians are zeros so there must be quite a lot of zeros\nthe few columns we can see are roughly similar\nit would not be very practical to plot the distributions of values in cell cell using facet_wrap().\n\nIn this data set, there is even more of an advantage of using the pivot_longer(), group_by() and summarise() approach. We will be able to open the dataframe in the Viewer and make plots to examine whether the distributions are similar across cells.\n🎬 Summarise all the cells:\n\nhspc_summary_samp <- hspc |>\n pivot_longer(cols = -ensembl_gene_id,\n names_to = \"cell\",\n values_to = \"expr\") |>\n group_by(cell) |>\n summarise(min = min(expr),\n lowerq = quantile(expr, 0.25),\n mean = mean(expr),\n median = median(expr),\n sd = sd(expr),\n upperq = quantile(expr, 0.75),\n max = max(expr),\n n_zero = sum(expr == 0))\n\nNotice that I have used cell as the column name rather than sample and expr (expression) rather than count. I’ve also added the standard deviation.\n🎬 View the hspc_summary_samp dataframe (click on it in the environment).\nAll cells have quite a few zeros and the lower quartile is 0 for all cells, i.e., every cell has many genes with zero expression.\nTo get a better understanding of the distribution of expressions in cells we can create a ggplot using the pointrange geom. Pointrange puts a dot at the mean and a line between a minimum and a maximum such as +/- one s.d. Not unlike a boxplot, but when you need the boxes too be very narrow!\n🎬 Create a pointrange plot.\n\nhspc_summary_samp |> \n ggplot(aes(x = cell, y = mean)) +\n geom_pointrange(aes(ymin = mean - sd, \n ymax = mean + sd ),\n size = 0.1)\n\n\n\n\nYou will need to use the Zoom button to pop the plot window out so you can make it as wide as possible\nThe things to notice are:\n\nthe average expression in cells is similar for all cells. This is good to know - if some cells had much lower expression perhaps there is something wrong with them, or their sequencing, and they should be excluded.\nthe distributions are roughly similar in width too\n\nThe default order of cell is alphabetical. It can be easier to see these (non-) effects if we order the lines by the size of the mean.\n🎬 Order a pointrange plot with reorder(variable_to_order, order_by).\n\nhspc_summary_samp |> \n ggplot(aes(x = reorder(cell, mean), y = mean)) +\n geom_pointrange(aes(ymin = mean - sd, \n ymax = mean + sd ),\n size = 0.1)\n\n\n\n\nreorder() arranges cell in increasing size of mean\n🎬 Write hspc_summary_samp to a file called “hspc_summary_samp.csv”:\nDistribution of values across the genes\n🐸 Frog genes\nThere are lots of genes in this dataset therefore we will take the same approach as that we took for the distributions across mouse cells. We will pivot the data to tidy and then summarise the counts for each gene.\n🎬 Summarise the counts for each genes:\n\ns30_summary_gene <- s30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n group_by(xenbase_gene_id) |>\n summarise(min = min(count),\n lowerq = quantile(count, 0.25),\n sd = sd(count),\n mean = mean(count),\n median = median(count),\n upperq = quantile(count, 0.75),\n max = max(count),\n total = sum(count),\n n_zero = sum(count == 0))\n\nI have calculated the values we used before with one addition: the sum of the counts (total).\n🎬 View the s30_summary_gene dataframe.\nNotice that we have:\n\na lot of genes with counts of zero in every sample\na lot of genes with zero counts in several of the samples\nsome very very low counts.\n\nThese should be filtered out because they are unreliable - or, at the least, uninformative. The goal of our downstream analysis will be to see if there is a signifcance difference in gene expression between the control and FGF-treated sibling. Since we have only three replicates in each group, having one or two unreliable, missing or zero values, makes such a determination impossible for a particular gene. We will use the total counts and the number of samples with non-zero values to filter our genes later.\nAs we have a lot of genes, it is again helpful to plot the mean counts with pointrange to get an overview. We will plot the log of the counts - we saw earlier that logging made it easier to understand the distribution of counts over such a wide range. We will also order the genes from lowest to highest mean count.\n🎬 Plot the logged mean counts for each gene in order of size using geom_pointrange():\n\ns30_summary_gene |> \n ggplot(aes(x = reorder(xenbase_gene_id, mean), y = log10(mean))) +\n geom_pointrange(aes(ymin = log10(mean - sd), \n ymax = log10(mean + sd )),\n size = 0.1)\n\n\n\n\n(Remember, the warning is expected since we have zeros).\nYou can see we also have quite a few genes with means less than 1 (log below zero). Note that the variability between genes (average counts between 0 and 102586) is far greater than between samples (average counts from 260 to 426) which is exactly what we would expect to see.\n🎬 Write s30_summary_gene to a file called “s30_summary_gene.csv”:\n🐭 Mouse genes\nThere are fewer genes in this dataset, but still more than you can understand without the overview provided by a plot. We will again pivot the data to tidy and then summarise the expression for each gene.\n🎬 Summarise the expression for each genes:\n\nhspc_summary_gene <- hspc |>\n pivot_longer(cols = -ensembl_gene_id,\n names_to = \"cell\",\n values_to = \"expr\") |>\n group_by(ensembl_gene_id) |>\n summarise(min = min(expr),\n lowerq = quantile(expr, 0.25),\n sd = sd(expr),\n mean = mean(expr),\n median = median(expr),\n upperq = quantile(expr, 0.75),\n max = max(expr),\n total = sum(expr),\n n_zero = sum(expr == 0))\n\n🎬 View the hspc_summary_gene dataframe. Remember these are normalised and logged (base 2) so we should not see very large values.\nNotice that we have:\n\nno genes with 0 in every cell\nvery few genes (9) with no zeros at all\nquite a few genes with zero in many cells but this matters less than zeros in the frog samples because we had just 6 samples and we have 701 cells.\n\nAs we have a lot of genes, it is again helpful to plot the mean expression with pointrange to get an overview. We do not need to log the values but ordering the genes will help.\n🎬 Plot the logged mean counts for each gene in order of size using geom_pointrange():\n\nhspc_summary_gene |> \n ggplot(aes(x = reorder(ensembl_gene_id, mean), y = mean)) +\n geom_pointrange(aes(ymin = mean - sd, \n ymax = mean + sd),\n size = 0.1)\n\n\n\n\nNote again that the variability between genes (average expression between 0.02 and and 10.03) is far greater than between cells (average expression from1.46 to 3.18) which is expected.\n🎬 Write s30_summary_gene to a file called “s30_summary_gene.csv”:"
},
{
- "objectID": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-2",
- "href": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-2",
- "title": "Independent Study to prepare for workshop",
- "section": "What is a Research Compendium?",
- "text": "What is a Research Compendium?\n\n\n\nZipped folder containing all data, code and text associated with a research project organised and documented clearly. Any unscripted processing should be described.\nEverything needed to understand what the project is and reproduce the results, and no more. The compendium should not be a dumping ground for data files and scripts. It needs to be curated. You may generate files that are not needed to reproduce your work and these should be removed.\nYour compendium might be a single Quarto/RStudio Project, or it might be folder including an RStudio Project and some additional materials including the description of unscripted processing.\nIdeally uses literate programming to create submitted report"
+ "objectID": "omics/week-3/workshop.html#filtering-for-qc",
+ "href": "omics/week-3/workshop.html#filtering-for-qc",
+ "title": "Workshop",
+ "section": "Filtering for QC",
+ "text": "Filtering for QC\n🐸 Frog filtering\nOur samples look to be similarly well sequenced. There are no samples we should remove. However, some genes are not express or the expression values are so low in for a gene that they are uninformative. We will filter the s30_summary_gene dataframe to obtain a list of xenbase_gene_id we can use to filter s30.\nMy suggestion is to include only the genes with counts in at least 3 samples3 and those with total counts above 20.\n🎬 Filter the summary by gene dataframe:\n\ns30_summary_gene_filtered <- s30_summary_gene |> \n filter(total > 20) |> \n filter(n_zero < 4)\n\n🎬 Write the filtered summary by gene to file:\n\nwrite_csv(s30_summary_gene_filtered, \n file = \"data-processed/s30_summary_gene_filtered.csv\")\n\n🎬 Use the list of xenbase_gene_id in the filtered summary to filter the original dataset:\n\ns30_filtered <- s30 |> \n filter(xenbase_gene_id %in% s30_summary_gene_filtered$xenbase_gene_id)\n\n🎬 Write the filtered original to file:\n\nwrite_csv(s30_filtered, \n file = \"data-processed/s30_filtered.csv\")\n\n🐭 Mouse filtering\nWe will take a different approach to filtering the single cell data. For the Frog samples we are examining the control and the FGF treated samples. This means have a low number of counts overall means the gene is not really expressed (detected) in any condition, and filtering out those genes is removing things that definitely are not interesting. For the mice, we have examined only one cell type but will be making comparisons between cells types. It may be that low expression of a gene in this cell type tells us something if that gene is highly expressed in another cell type. Instead, we will make statistical comparisons between the cell types and then filter based on overall expression, the difference in expression between cell types and whether that difference is significant.\nThe number of “replicates” is also important. When you have only three in each group it is not possible to make statistical comparisons when several replicates are zero. This is less of an issue with single cell data."
},
{
- "objectID": "core/week-11/study_before_workshop.html#use-guidelines-from-core-1-and-2",
- "href": "core/week-11/study_before_workshop.html#use-guidelines-from-core-1-and-2",
- "title": "Independent Study to prepare for workshop",
- "section": "Use guidelines from Core 1 and 2",
- "text": "Use guidelines from Core 1 and 2\n\nfollow the guidance in Core 1 on organisation, naming things and documentation\nfollow the guidance in Core 2 on well-formatted code, consistency, modularisation and documentation"
+ "objectID": "omics/week-3/workshop.html#look-after-future-you",
+ "href": "omics/week-3/workshop.html#look-after-future-you",
+ "title": "Workshop",
+ "section": "🤗 Look after future you!",
+ "text": "🤗 Look after future you!\nYou need only do the section for your own project data\n🐸 Frogs and future you\n🎬 Create a new Project, frogs-88H, populated with folders and your data. Make a script file called cont-fgf-s30.R. This will a be commented analysis of the control vs FGF at S30 comparison. You will build on this each workshop and be able to use it as a template to examine other comparisons. Copy in the appropriate code and comments from workshop-1.R. Edit to improve your comments where your understanding has developed since you made them. Make sure you can close down RStudio, reopen it and run your whole script again.\n🐭 Mice and future you\n🎬 Create a new Project, mice-88H, populated with folders and your data. Make a script file called hspc-prog.R. This will a be commented analysis of the hspc cells vs the prog cells. At this point you will have only code for the hspc cells. You will build on this each workshop and be able to use it as a template to examine other comparisons. Copy in the appropriate code and comments from workshop-1.R. Edit to improve your comments where your understanding has developed since you made them. Make sure you can close down RStudio, reopen it and run your whole script again.\n🍂 xxxx and future you\nDo one of the other two examples."
},
{
- "objectID": "core/week-11/study_before_workshop.html#project-level-documentation",
- "href": "core/week-11/study_before_workshop.html#project-level-documentation",
- "title": "Independent Study to prepare for workshop",
- "section": "Project level documentation",
- "text": "Project level documentation\n\n\nas concise as possible, bullet points are good\nprimarily in the README file but some details may be in scripts\ntitle, concise description of the work, author exam number, date, overview of compendium contents\nall the software information including versions\ninstructions needed to reproduce the work, order of workflow, settings/parameter values for software"
+ "objectID": "omics/week-3/workshop.html#footnotes",
+ "href": "omics/week-3/workshop.html#footnotes",
+ "title": "Workshop",
+ "section": "Footnotes",
+ "text": "Footnotes\n\nThis a result of the Central limit theorem,one consequence of which is that adding together lots of distributions - whatever distributions they are - will tend to a normal distribution.↩︎\nThis a result of the Central limit theorem,one consequence of which is that adding together lots of distributions - whatever distributions they are - will tend to a normal distribution.↩︎\nI chose three because that would keep [0, 0, 0] [#,#,#]. This is difference we cannot test statistically, but which would matter biologically.↩︎"
},
{
- "objectID": "core/week-11/study_before_workshop.html#project-level-documentation---cont",
- "href": "core/week-11/study_before_workshop.html#project-level-documentation---cont",
- "title": "Independent Study to prepare for workshop",
- "section": "Project level documentation - cont",
- "text": "Project level documentation - cont\n\n\ndescription, format and provenance of the data\nstyle conventions used in the code,\nany other information needed to understand the project and reproduce the results"
+ "objectID": "omics/week-3/overview.html",
+ "href": "omics/week-3/overview.html",
+ "title": "Overview",
+ "section": "",
+ "text": "This week you will meet your data. The independent study will concisely cover how these data were generated and how they have been processed before being given to you. There will also be an overview of the analysis we will carry out over three workshops. In the workshop, you will learn what steps to take to get a good understanding of ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control.\nWe suggest you sit together with your group in the workshop.\n\nLearning objectives\nThe successful student will be able to:\n\nexplore ’omics data to find the number of rows and columns and know how these correspond to samples and variables\nexplore the distribution of expression measures across whole data sets, across variables and across samples by summarising and plotting\nexplain what distributions are expected and interpret the distributions they have\nexplain on what basis we might filter out variables or samples\nimport, explore and filter ’omics data reproducibly so they can understand and reuse their code in the future\n\n\n\nInstructions\n\nPrepare\n\n📖 Read how the data were generated and how they have been processed so far\n\nWorkshop\n\n💻 Set up a Project\n💻 Import data\n💻 Explore the distribution of values across samples/cells and across genes/species\n💻 Look after future you!\n\nConsolidate\n\n💻 Use the work you completed in the workshop as a template to apply to a new case."
},
{
- "objectID": "core/week-11/study_before_workshop.html#script-level-documentation",
- "href": "core/week-11/study_before_workshop.html#script-level-documentation",
- "title": "Independent Study to prepare for workshop",
- "section": "Script level documentation",
- "text": "Script level documentation\nShorthand for documentation at the script and/or code chunk level and/or process level where unscripted processing is used.\n\n\noverview of the script/chunk/process and its purpose\ncode comments"
+ "objectID": "core/week-2/study_after_workshop.html",
+ "href": "core/week-2/study_after_workshop.html",
+ "title": "Independent Study to consolidate this week",
+ "section": "",
+ "text": "bbbb"
},
{
- "objectID": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-3",
- "href": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-3",
+ "objectID": "core/week-2/study_before_workshop.html#overview",
+ "href": "core/week-2/study_before_workshop.html#overview",
"title": "Independent Study to prepare for workshop",
- "section": "What is a Research Compendium?",
- "text": "What is a Research Compendium?\n\n\nA research compendium is something you develop throughout your research project. It is not something you create at the end.\nYou update and reorganise as you go.\nWhen you plan your research include the planning of recording, organising, and documenting your data and its analysis.\nThink ahead to how and where you will be recording your data and how you will be analysing."
+ "section": "Overview",
+ "text": "Overview\n\nRStudio Projects revisited\n\nusing usethis package\nAdding a README\n\n\nFormatting code\nCode algorithmically / algebraically."
},
{
- "objectID": "core/week-11/study_before_workshop.html#further-reading",
- "href": "core/week-11/study_before_workshop.html#further-reading",
+ "objectID": "core/week-2/study_before_workshop.html#reproducibility-is-a-continuum",
+ "href": "core/week-2/study_before_workshop.html#reproducibility-is-a-continuum",
"title": "Independent Study to prepare for workshop",
- "section": "Further Reading",
- "text": "Further Reading\n\nThe Turing Way (Community 2022)\nPackaging Data Analytical Work Reproducibly Using R (and Friends) (Marwick, Boettiger, and Mullen 2018)\nTen simple rules for writing and sharing computational analyses in Jupyter Notebooks (Rule et al. 2019)\nTen Simple rules for (Sandve et al. 2013)"
+ "section": "Reproducibility is a continuum",
+ "text": "Reproducibility is a continuum\nSome is better than none!\n\nOrganise your project\n\nScript everything.\n\nFormat code and follow a consistent style.\n\nCode algorithmically\nModularise your code: organise into sections and scripts\nDocument your project - commenting, READMEs\nUse literate programming e.g., R Markdown or Quarto\n\n\n\nMore advanced: Version control, continuous integration, environments, containers"
},
{
- "objectID": "core/week-11/study_before_workshop.html#references",
- "href": "core/week-11/study_before_workshop.html#references",
+ "objectID": "core/week-2/study_before_workshop.html#rstudio-projects",
+ "href": "core/week-2/study_before_workshop.html#rstudio-projects",
"title": "Independent Study to prepare for workshop",
- "section": "References",
- "text": "References\n\n\n🔗 About Core 3 Research Compendia and Reproducible Reporting\n\n\n\nCommunity, The Turing Way. 2022. The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research. Zenodo. https://doi.org/10.5281/ZENODO.3233853.\n\n\nMarwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using r (and Friends).” The American Statistician 72 (1): 80–88. https://doi.org/10.1080/00031305.2017.1375986.\n\n\nRule, Adam, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, et al. 2019. “Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks.” Edited by Fran Lewitter. PLOS Computational Biology 15 (7): e1007007. https://doi.org/10.1371/journal.pcbi.1007007.\n\n\nSandve, Geir Kjetil, Anton Nekrutenko, James Taylor, and Eivind Hovig. 2013. “Ten Simple Rules for Reproducible Computational Research.” PLoS Comput. Biol. 9 (10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285."
- },
- {
- "objectID": "core/week-6/overview.html",
- "href": "core/week-6/overview.html",
- "title": "Overview",
- "section": "",
- "text": "This week’s session is a drop-in and introduces no new material. Instead, it is an opportunity to ask questions about the content from Core 1 and 2 and to revise skills from stage 1 and 2 as needed.\n\nInstructions\n\nPrepare\n\n📖 Review content from Core 1 and 2\n\nWorkshop\n\n💻 Ask questions about the content from Core 1 and 2 as needed\n💻 Revise skills from stage 1 and 2 (88H students) or 52M (70M students) as needed\n\nConsolidate\n\nThere is no consolidation work for this drop-in"
+ "section": "RStudio Projects",
+ "text": "RStudio Projects\n\n\nWe used RStudio Projects in stage one but they are so useful, it is worth covering them again in case you are not yet using them.\nWe will also cover the usethisworkflow to create an RStudio Project.\nRStudio Projects make it easy to manage working directories and paths because they set the working directory to the RStudio Projects directory automatically."
},
{
- "objectID": "core/week-6/study_after_workshop.html",
- "href": "core/week-6/study_after_workshop.html",
- "title": "Independent Study to consolidate this week",
- "section": "",
- "text": "There is no consolidation work other than to continue revising what you have learned over the course of your degree about data analysis."
+ "objectID": "core/week-2/study_before_workshop.html#rstudio-projects-1",
+ "href": "core/week-2/study_before_workshop.html#rstudio-projects-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "RStudio Projects",
+ "text": "RStudio Projects\n\n\n\n-- stem_cell_rna\n |__stem_cell_rna.Rproj \n |__raw_ data/ \n |__2019-03-21_donor_1.csv\n |__README. md\n |__R/\n |__01_data_processing.R\n |__02_exploratory.R\n |__functions/\n |__theme_volcano.R\n |__normalise.R\n\n\nThe project directory is the folder at the top 1\n\n\nThanks to Mine Çetinkaya-Rundel who helped me work out how to highlight a line https://gist.github.com/mine-cetinkaya-rundel/3af3415eab70a65be3791c3dcff6e2e3. Note to futureself: the engine: knitr matters."
},
{
- "objectID": "images/images.html",
- "href": "images/images.html",
- "title": "Image Data Analysis for Group Project",
- "section": "",
- "text": "The following ImageJ workflow uses the processing steps you used in workshop 3 with one change. That change is to save the results to file rather than having the results window pop up and saving from there. Or maybe two changes: it also tells you to use meaning systematic file names that will be easy to process when importing data. The RStudio workflow shows you how to import multiple files into one dataframe with columns indicating the treatment.\n\nSave files with systematic names: ev_0.avi 343_0.avi ev_1.avi 343_1.avi ev_2.5.avi 343_2.5.avi\nOpen ImageJ\nOpen video file eg ev_2.5.avi\n\nConvert to 8-bit: Image | Type | 8-bit\nCrop to petri dish: Select then Image | Crop\nCalculate average pixel intensity: Image | Stacks | Z Project\n\nProjection type: Average Intensity to create AVG_ev_2.5.avi\n\n\n\nSubtract average from image: Process | Image Calculator\n\nImage 1: ev_2.5.avi\n\nOperation: Subtract\nImage 2: AVG_ev_2.5.avi\n\nCreate new window: checked\nOK, Yes to Process all\n\n\nInvert: Edit | Invert\nAdjust threshold: Image | Adjust | Threshold\n\nMethod: Default\nThresholding: Default, B&W\nDark background: checked\nAuto or adjust a little but make sure the larvae do not disappear at later points in the video (use the slider)\nApply\n\n\nInvert: Edit | Invert\nTrack: Plugins | wrMTrck\n\nSet minSize: 10\nSet maxSize: 400\nSet maxVelocity: 10\nSet maxAreaChange: 200\nSet bendThreshold: 1\n\nImportant: check Save Results File This is different to what you did in the workshop. It will help because the results will be saved automatically rather than to saving from the Results window that other pops up. Consequently, you will be able to save the results files with systematic names relating to their treatments and then read them into R simultaneously. That will also allow you to add information from the name of the file (which has the treatment information) to the resulting dataframes\n\n\nwrMTrck window with the settings listed above shown\n\n\nClick OK. Save to a folder for all the tracking data files. I recommend deleting the “Results of..” part of the name\n\n\nCheck that the Summary window indicates 3 tracks and that the 3 larvae are what is tracked by using the slider on the Result image\nRepeat for all videos\n\nThis is the code you need to import multiple csv files into a single dataframe and add a column with the treatment information from the file name. This is why systematic file names are good.\nIt assumes\n\nyour files are called type_concentration.txt for example: ev_0.txt 343_0.txt ev_1.txt 343_1.txt ev_2.5.txt 343_2.5.txt.\nthe .txt datafile are in a folder called track inside your working directory\nyou have installed the following packages: tidyverse, janitor\n\n\n🎬 Load the tidyverse\n\nlibrary(tidyverse)\n\n🎬 Put the file names into a vector we will iterate through\n\n# get a vector of the file names\nfiles <- list.files(path = \"track\", full.names = TRUE )\n\nWe can use map_df() from the purrr package which is one of the tidyverse gems loaded with tidyvserse. map_df() will iterate through files and read them into a dataframe with a specified import function. We are using read_table(). map_df() keeps track of the file by adding an index column called file to the resulting dataframe. Instead of this being a number (1 - 6 here) we can use set_names() to use the file names instead. The clean_names() function from the janitor package will clean up the column names (make them lower case, replace spaces with _ remove special characters etc)\n🎬 Import multiple csv files into one dataframe called tracking\n\n# import multiple data files into one dataframe called tracking\n# using map_df() from purrr package\n# clean the column names up using janitor::clean_names()\ntracking <- files |> \n set_names() |>\n map_dfr(read_table, .id = \"file\") |>\n janitor::clean_names()\n\nYou will get a warning Duplicated column names deduplicated: 'avgX' => 'avgX_1' [15] for each of the files because the csv files each have two columns called avgX. If you click on the tracking dataframe you see is contains the data from all the files.\nNow we can add columns for the type and the concentration by processing the values in the file. The values are like track/343_0.txt so we need to remove .txt and track/ and separate the remaining words into two columns.\n🎬 Process the file column to add columns for the type and the concentration\n\n# extract type and concentration from file name\n# and put them into additopnal separate columns\ntracking <- tracking |> \n mutate(file = str_remove(file, \".txt\")) |>\n mutate(file = str_remove(file, \"track/\")) |>\n extract(file, remove = \n FALSE,\n into = c(\"type\", \"conc\"), \n regex = \"([^_]{2,3})_(.+)\") \n\n[^_]{2,3} matches two or three characters that are not _ at the start of the string (^)\n.+ matches one or more characters. The extract() function puts the first match into the first column, type, and the second match into the second column, conc. The remove = FALSE argument means the original column is kept.\nYou now have a dataframe with all the tracking data which is relatively easy to summarise and plot using tools you know.\nThere is an example RStudio project containing this code here: tips. You can also download the project as a zip file from there but there is some code that will do that automatically for you. Since this is an RStudio Project, do not run the code from inside a project. You may want to navigate to a particular directory or edit the destdir:\n\nusethis::use_course(url = \"3mmaRand/tips\", destdir = \".\")\n\nYou can agree to deleting the zip. You should find RStudio restarts and you have a new project called tips-xxxxxx. The xxxxxx is a commit reference - you do not need to worry about that, it is just a way to tell you which version of the repo you downloaded. You can now run the code in the project."
+ "objectID": "core/week-2/study_before_workshop.html#rstudio-projects-2",
+ "href": "core/week-2/study_before_workshop.html#rstudio-projects-2",
+ "title": "Independent Study to prepare for workshop",
+ "section": "RStudio Projects",
+ "text": "RStudio Projects\n\n\n\n-- stem_cell_rna\n |__stem_cell_rna.Rproj \n |__raw_ data/ \n |__2019-03-21_donor_1.csv\n |__README. md\n |__R/\n |__01_data_processing.R\n |__02_exploratory.R\n |__functions/\n |__theme_volcano.R\n |__normalise.R\n\n\nthe .RProj file is directly under the project folder. Its presence is what makes the folder an RStudio Project"
},
{
- "objectID": "images/images.html#worm-tracking",
- "href": "images/images.html#worm-tracking",
- "title": "Image Data Analysis for Group Project",
- "section": "",
- "text": "The following ImageJ workflow uses the processing steps you used in workshop 3 with one change. That change is to save the results to file rather than having the results window pop up and saving from there. Or maybe two changes: it also tells you to use meaning systematic file names that will be easy to process when importing data. The RStudio workflow shows you how to import multiple files into one dataframe with columns indicating the treatment.\n\nSave files with systematic names: ev_0.avi 343_0.avi ev_1.avi 343_1.avi ev_2.5.avi 343_2.5.avi\nOpen ImageJ\nOpen video file eg ev_2.5.avi\n\nConvert to 8-bit: Image | Type | 8-bit\nCrop to petri dish: Select then Image | Crop\nCalculate average pixel intensity: Image | Stacks | Z Project\n\nProjection type: Average Intensity to create AVG_ev_2.5.avi\n\n\n\nSubtract average from image: Process | Image Calculator\n\nImage 1: ev_2.5.avi\n\nOperation: Subtract\nImage 2: AVG_ev_2.5.avi\n\nCreate new window: checked\nOK, Yes to Process all\n\n\nInvert: Edit | Invert\nAdjust threshold: Image | Adjust | Threshold\n\nMethod: Default\nThresholding: Default, B&W\nDark background: checked\nAuto or adjust a little but make sure the larvae do not disappear at later points in the video (use the slider)\nApply\n\n\nInvert: Edit | Invert\nTrack: Plugins | wrMTrck\n\nSet minSize: 10\nSet maxSize: 400\nSet maxVelocity: 10\nSet maxAreaChange: 200\nSet bendThreshold: 1\n\nImportant: check Save Results File This is different to what you did in the workshop. It will help because the results will be saved automatically rather than to saving from the Results window that other pops up. Consequently, you will be able to save the results files with systematic names relating to their treatments and then read them into R simultaneously. That will also allow you to add information from the name of the file (which has the treatment information) to the resulting dataframes\n\n\nwrMTrck window with the settings listed above shown\n\n\nClick OK. Save to a folder for all the tracking data files. I recommend deleting the “Results of..” part of the name\n\n\nCheck that the Summary window indicates 3 tracks and that the 3 larvae are what is tracked by using the slider on the Result image\nRepeat for all videos\n\nThis is the code you need to import multiple csv files into a single dataframe and add a column with the treatment information from the file name. This is why systematic file names are good.\nIt assumes\n\nyour files are called type_concentration.txt for example: ev_0.txt 343_0.txt ev_1.txt 343_1.txt ev_2.5.txt 343_2.5.txt.\nthe .txt datafile are in a folder called track inside your working directory\nyou have installed the following packages: tidyverse, janitor\n\n\n🎬 Load the tidyverse\n\nlibrary(tidyverse)\n\n🎬 Put the file names into a vector we will iterate through\n\n# get a vector of the file names\nfiles <- list.files(path = \"track\", full.names = TRUE )\n\nWe can use map_df() from the purrr package which is one of the tidyverse gems loaded with tidyvserse. map_df() will iterate through files and read them into a dataframe with a specified import function. We are using read_table(). map_df() keeps track of the file by adding an index column called file to the resulting dataframe. Instead of this being a number (1 - 6 here) we can use set_names() to use the file names instead. The clean_names() function from the janitor package will clean up the column names (make them lower case, replace spaces with _ remove special characters etc)\n🎬 Import multiple csv files into one dataframe called tracking\n\n# import multiple data files into one dataframe called tracking\n# using map_df() from purrr package\n# clean the column names up using janitor::clean_names()\ntracking <- files |> \n set_names() |>\n map_dfr(read_table, .id = \"file\") |>\n janitor::clean_names()\n\nYou will get a warning Duplicated column names deduplicated: 'avgX' => 'avgX_1' [15] for each of the files because the csv files each have two columns called avgX. If you click on the tracking dataframe you see is contains the data from all the files.\nNow we can add columns for the type and the concentration by processing the values in the file. The values are like track/343_0.txt so we need to remove .txt and track/ and separate the remaining words into two columns.\n🎬 Process the file column to add columns for the type and the concentration\n\n# extract type and concentration from file name\n# and put them into additopnal separate columns\ntracking <- tracking |> \n mutate(file = str_remove(file, \".txt\")) |>\n mutate(file = str_remove(file, \"track/\")) |>\n extract(file, remove = \n FALSE,\n into = c(\"type\", \"conc\"), \n regex = \"([^_]{2,3})_(.+)\") \n\n[^_]{2,3} matches two or three characters that are not _ at the start of the string (^)\n.+ matches one or more characters. The extract() function puts the first match into the first column, type, and the second match into the second column, conc. The remove = FALSE argument means the original column is kept.\nYou now have a dataframe with all the tracking data which is relatively easy to summarise and plot using tools you know.\nThere is an example RStudio project containing this code here: tips. You can also download the project as a zip file from there but there is some code that will do that automatically for you. Since this is an RStudio Project, do not run the code from inside a project. You may want to navigate to a particular directory or edit the destdir:\n\nusethis::use_course(url = \"3mmaRand/tips\", destdir = \".\")\n\nYou can agree to deleting the zip. You should find RStudio restarts and you have a new project called tips-xxxxxx. The xxxxxx is a commit reference - you do not need to worry about that, it is just a way to tell you which version of the repo you downloaded. You can now run the code in the project."
+ "objectID": "core/week-2/study_before_workshop.html#rstudio-projects-3",
+ "href": "core/week-2/study_before_workshop.html#rstudio-projects-3",
+ "title": "Independent Study to prepare for workshop",
+ "section": "RStudio Projects",
+ "text": "RStudio Projects\n\n\nWhen you open an RStudio Project, the working directory is set to the Project directory (i.e., the location of the .Rproj file).\nWhen you use an RStudio Project you do not need to use setwd()\nWhen someone, including future you, opens the project on another machine, all the paths just work."
},
{
- "objectID": "omics/week-3/workshop.html",
- "href": "omics/week-3/workshop.html",
- "title": "Workshop",
- "section": "",
- "text": "In this workshop you will learn what steps to take to get a good understanding of your ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control.\nYou should examine all three data sets because the comparisons will give you a stronger understanding of your own project data."
+ "objectID": "core/week-2/study_before_workshop.html#rstudio-projects-4",
+ "href": "core/week-2/study_before_workshop.html#rstudio-projects-4",
+ "title": "Independent Study to prepare for workshop",
+ "section": "RStudio Projects",
+ "text": "RStudio Projects\n\nJenny BryanIn the words of Jenny Bryan:\n\n“If the first line of your R script is setwd(”C:/Users/jenny/path/that/only/I/have”) I will come into your office and SET YOUR COMPUTER ON FIRE”"
},
{
- "objectID": "omics/week-3/workshop.html#session-overview",
- "href": "omics/week-3/workshop.html#session-overview",
- "title": "Workshop",
- "section": "",
- "text": "In this workshop you will learn what steps to take to get a good understanding of your ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control.\nYou should examine all three data sets because the comparisons will give you a stronger understanding of your own project data."
+ "objectID": "core/week-2/study_before_workshop.html#creating-an-rstudio-project",
+ "href": "core/week-2/study_before_workshop.html#creating-an-rstudio-project",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Creating an RStudio Project",
+ "text": "Creating an RStudio Project\nThere are two ways to create an RStudio Project.\n\nUsing one of the two menus\nUsing the usethis package"
},
{
- "objectID": "omics/week-3/workshop.html#set-up-a-project",
- "href": "omics/week-3/workshop.html#set-up-a-project",
- "title": "Workshop",
- "section": "Set up a Project",
- "text": "Set up a Project\n🎬 Start RStudio from the Start menu\n🎬 Make an RStudio project. Be deliberate about where you create it so that it is a good place for you\n🎬 Use the Files pane to make new folders for the data. I suggest data-raw and data-processed\n🎬 Make a new script called workshop-1.R to carry out the rest of the work.\n🎬 Record what you do and what you find out. All of it!\n🎬 Load tidyverse (Wickham et al. 2019) for importing, summarising, plotting and filtering.\n\nlibrary(tidyverse)"
+ "objectID": "core/week-2/study_before_workshop.html#using-a-menu",
+ "href": "core/week-2/study_before_workshop.html#using-a-menu",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Using a menu",
+ "text": "Using a menu\nThere are two menus:\n\nTop left, File menu\nTop Right, drop-down indicated by the .RProj icon\n\nThey both do the same thing.\nIn both cases you choose: New Project | New Directory | New Project\n\nMake sure you “Browse” to the folder you want to create the project."
},
{
- "objectID": "omics/week-3/workshop.html#examine-the-data-in-a-spreadsheet",
- "href": "omics/week-3/workshop.html#examine-the-data-in-a-spreadsheet",
- "title": "Workshop",
- "section": "Examine the data in a spreadsheet",
- "text": "Examine the data in a spreadsheet\nThese are the three datasets. Each set compromises several files.\n🐸 Frog development data:\n\nxlaevis_counts_S14.csv\nxlaevis_counts_S20.csv\nxlaevis_counts_S30.csv\n\n🐭 Stem cell data:\n\nsurfaceome_hspc.csv\nsurfaceome_prog.csv\nsurfaceome_lthsc.csv\n\n🍂 xxxx data:\n\nxxx\nxxx\n\n🎬 Save the files to data-raw and open them in Excel\n🎬 Answer the following questions:\n\nDescribe how the sets of data are similar and how they are different.\nWhat is in the rows and columns of each file?\nHow many rows and columns are there in each file? Are these the same? In all cases or some cases? Why?\nGoogle an id. Where does your search take you? How much information is available?\n\n🎬 Did you record all that??"
+ "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-1",
+ "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Using the usethis package",
+ "text": "Using the usethis package\nI occasionally use the menu but I mostly use the usethis package.\n\n🎬 Go to RStudio and check your working directory:\n\ngetwd()\n\n\"C:/Users/er13/Desktop\"\n\n\n❔ Is your working directory a good place to create a Project folder?"
},
{
- "objectID": "omics/week-3/workshop.html#import",
- "href": "omics/week-3/workshop.html#import",
- "title": "Workshop",
- "section": "Import",
- "text": "Import\nNow let’s get the data into R and visualise it.\n🎬 Import xlaevis_counts_S30.csv, surfaceome_hspc.csv and xxxxxxxx\n\n# 🐸 import the s30 data\ns30 <- read_csv(\"data-raw/xlaevis_counts_S30.csv\")\n\n\n# 🐭 import the hspc data\nhspc <- read_csv(\"data-raw/surfaceome_hspc.csv\")\n\n\n# 🍂 xxxx import the xxxx data\n# prog <- read_csv(\"\")\n\n🎬 Check these have the number of rows and column you were expecting and that column types and names are as expected."
+ "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-2",
+ "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-2",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Using the usethis package",
+ "text": "Using the usethis package\nIf this is a good place to create a Project directory then…\n🎬 Create a project with:\n\nusethis::create_project(\"bananas\")"
},
{
- "objectID": "omics/week-3/workshop.html#explore",
- "href": "omics/week-3/workshop.html#explore",
- "title": "Workshop",
- "section": "Explore",
- "text": "Explore\nThe first task is to get an overview. We want to know\n\nare there any missing values? If so, how many and how are they distributed?\nhow may zeros are there and how are they distributed\ndoes it look as tough all the samples/cells were equally “successful”? Can we spot any problematic anomalies?\nwhat is the distribution of values?\n\nIf our data collection has gone well we would hope to see approximately the same average expression in each sample or cell of the same type. That is replicates should be similar. We would also expect to see that the average expression of genes varies. We might have genes which are zero in every cell/sample. We will want to to filter those out.\nWe get this overview by looking at:\n\nThe distribution of values across the whole dataset\nThe distribution of values across the sample/cells (i.e., averaged across genes). This allows us to see variation between samples/cells:\nThe distribution of values across the genes (i.e., averaged across samples/cells). This allows us to see variation between genes.\n\nDistribution of values across the whole dataset\nIn all data sets, the values are spread over multiple columns so in order to plot the distribution as a whole, we will need to first use pivot_longer() to put the data in ‘tidy’ format (Wickham 2014) by stacking the columns. We could save a copy of the stacked data and then plot it, but here, I have just piped the stacked data straight into ggplot().\n🐸 Frogs\n🎬 Pivot the counts (stack the columns) so all the counts are in a single column (count) and pipe into ggplot() to create a histogram:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n ggplot(aes(x = count)) +\n geom_histogram()\n\n\n\n\nThis data is very skewed - there are so many low values that we can’t see the tiny bars for the higher values. Logging the counts is a way to make the distribution more visible.\n🎬 Repeat the plot on log of the counts.\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n ggplot(aes(x = log10(count))) +\n geom_histogram()\n\n\n\n\nI’ve used base 10 only because it easy to convert to the original scale (1 is 10, 2 is 100, 3 is 1000 etc). The warning about rows being removed is expected - these are the counts of 0 since you can’t log a value of 0. The peak at zero suggests quite a few counts of 1. We would expect we would expect the distribution of counts to be roughly log normal because this is expression of all the genes in the genome1. That small peak near the low end suggests that these lower counts might be anomalies.\nThe excess number of low counts indicates we might want to create a cut off for quality control. The removal of low counts is a common processing step in ’omic data. We will revisit this after we have considered the distribution of counts across samples and genes.\n🐭 Mice\n🎬 Pivot the expression values (stack the columns) so all the counts are in a single column (expr) and pipe into ggplot() to create a histogram:\n\nhspc |>\n pivot_longer(cols = -ensembl_gene_id,\n names_to = \"cell\",\n values_to = \"expr\") |> \n ggplot(aes(x = expr)) +\n geom_histogram()\n\n\n\n\nThis is a very striking distribution. Is it what we are expecting? Again,the excess number of low values is almost certainly anomalous. They will be inaccurate measure and we will want to exclude expression values below (about) 1. We will revisit this after we have considered the distribution of expression across cells and genes.\nWhat about the bimodal appearance of the the ‘real’ values? If we had the whole genome we would not expect to see such a pattern - we’d expect to see a roughly normal distribution2. However, this is a subset of the genome and the nature of the subsetting has had an influence here. These are a subset of cell surface proteins that show a significant difference between at least two of twelve cell subtypes. That is, all of these genes are either high or low.\nDistribution of values across the sample/cells\n🐸 Frog samples\nSummary statistics including the the number of NAs can be seen using the summary(). It is most helpful which you have up to about 30 columns. There is nothing special about the number 30, it is just that text summaries of a larger number of columns are difficult to grasp.\n🎬 Get a quick overview of the columns:\n\n# examine all the columns quickly\n# works well with smaller numbers of column\nsummary(s30)\n\n xenbase_gene_id S30_C_5 S30_C_6 S30_C_A \n Length:11893 Min. : 0.0 Min. : 0.0 Min. : 0.0 \n Class :character 1st Qu.: 14.0 1st Qu.: 14.0 1st Qu.: 23.0 \n Mode :character Median : 70.0 Median : 75.0 Median : 107.0 \n Mean : 317.1 Mean : 335.8 Mean : 426.3 \n 3rd Qu.: 205.0 3rd Qu.: 220.0 3rd Qu.: 301.0 \n Max. :101746.0 Max. :118708.0 Max. :117945.0 \n S30_F_5 S30_F_6 S30_F_A \n Min. : 0.0 Min. : 0.0 Min. : 0.0 \n 1st Qu.: 19.0 1st Qu.: 17.0 1st Qu.: 16.0 \n Median : 88.0 Median : 84.0 Median : 69.0 \n Mean : 376.2 Mean : 376.5 Mean : 260.4 \n 3rd Qu.: 251.0 3rd Qu.: 246.0 3rd Qu.: 187.0 \n Max. :117573.0 Max. :130672.0 Max. :61531.0 \n\n\nNotice that: - the minimum count is 0 and the maximums are very high in all the columns - the medians are quite a lot lower than the means so the data are skewed (hump to the left, tail to the right) - there must be quite a lot of zeros - the columns are roughly similar and it doesn’t look like there is an anomalous replicate.\nTo find out how may zeros there are in a column we can make use of the fact that TRUE evaluates to 1 and FALSE evaluates to 0. This means sum(S30_C_5 == 0) gives the number of 0 in the S30_C_5 column\n🎬 Find the number of zeros in all six columns:\n\ns30 |>\n summarise(sum(S30_C_5 == 0),\n sum(S30_C_6 == 0),\n sum(S30_C_A == 0),\n sum(S30_F_5 == 0),\n sum(S30_F_6 == 0),\n sum(S30_F_A == 0))\n\n# A tibble: 1 × 6\n `sum(S30_C_5 == 0)` `sum(S30_C_6 == 0)` `sum(S30_C_A == 0)`\n <int> <int> <int>\n1 1340 1361 998\n# ℹ 3 more variables: `sum(S30_F_5 == 0)` <int>, `sum(S30_F_6 == 0)` <int>,\n# `sum(S30_F_A == 0)` <int>\n\n\nThere is a better way of doing this that saves you having to repeat so much code - especially useful if you have a lot more than 6 columns. We can use pivot_longer() to put the data in tidy format and then use the group_by() and summarise() approach we have used extensively before.\n🎬 Find the number of zeros in all columns:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n group_by(sample) |>\n summarise(n_zero = sum(count == 0))\n\n# A tibble: 6 × 2\n sample n_zero\n <chr> <int>\n1 S30_C_5 1340\n2 S30_C_6 1361\n3 S30_C_A 998\n4 S30_F_5 1210\n5 S30_F_6 1199\n6 S30_F_A 963\n\n\nYou could expand to get all the summary information\n🎬 Summarise all the samples:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n group_by(sample) |>\n summarise(min = min(count),\n lowerq = quantile(count, 0.25),\n mean = mean(count),\n median = median(count),\n upperq = quantile(count, 0.75),\n max = max(count),\n n_zero = sum(count == 0))\n\n# A tibble: 6 × 8\n sample min lowerq mean median upperq max n_zero\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>\n1 S30_C_5 0 14 317. 70 205 101746 1340\n2 S30_C_6 0 14 336. 75 220 118708 1361\n3 S30_C_A 0 23 426. 107 301 117945 998\n4 S30_F_5 0 19 376. 88 251 117573 1210\n5 S30_F_6 0 17 376. 84 246 130672 1199\n6 S30_F_A 0 16 260. 69 187 61531 963\n\n\nThe mean count ranges from 260 to 426.\nOne advantage this has over using summary() is that the output is a dataframe. For results, this is useful, and makes it easier to:\n\nwrite to file\nuse in ggplot()\n\nformat in a Quarto report\n\n🎬 Save the summary as a dataframe, s30_summary_samp.\nWe can write to file using write_csv()\n🎬 Write s30_summary_samp to a file called “s30_summary_samp.csv”:\n\nwrite_csv(s30_summary_samp, \n file = \"data-processed/s30_summary_samp.csv\")\n\nPlotting the distribution of values is perhaps the easiest way to understand the data. We could plot each column separately or we can pipe the tidy format of data into ggplot() and make use of facet_wrap()\n🎬 Pivot the data and pipe into ggplot:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n ggplot(aes(count)) +\n geom_density() +\n facet_wrap(. ~ sample, nrow = 3)\n\n\n\n\nWe have many values (11893) so we are not limited to using geom_histogram(). geom_density() gives us a smooth distribution.\nWe have many low values and a few very high ones which makes it tricky to see the distributions. Logging the counts will make these clearer.\n🎬 Repeat the graph but taking the base 10 log of the counts:\n\ns30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n ggplot(aes(log10(count))) +\n geom_density() +\n facet_wrap(. ~ sample, nrow = 3)\n\n\n\n\nThe key information to take from these plots is:\n\nthe distributions are roughly similar in width, height, location and overall shape so it doesn’t look as though we have any suspect samples\nthe peak at zero suggests quite a few counts of 1.\nsince we would expect the distribution of counts in each sample to be roughly log normal so that the small rise near the low end, even before the peak at zero, suggests that these lower counts might be anomalies.\n\nThe excess number of low counts indicates we might want to create a cut off for quality control. The removal of low counts is a common processing step in ’omic data. We will revisit this after we have considered the distribution of counts across genes (averaged over the samples).\n🐭 Mouse cells\nWe used the summary() function to get an overview of the columns in the frog data. Let’s try that here.\n🎬 Get a quick overview of the columns:\n\nsummary(hspc)\n\n ensembl_gene_id HSPC_001 HSPC_002 HSPC_003 \n Length:280 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Mode :character Median : 0.000 Median : 0.000 Median : 0.9929 \n Mean : 2.143 Mean : 1.673 Mean : 2.5964 \n 3rd Qu.: 2.120 3rd Qu.: 2.239 3rd Qu.: 6.1559 \n Max. :12.567 Max. :11.976 Max. :11.1138 \n HSPC_004 HSPC_006 HSPC_008 HSPC_009 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 \n Median : 0.000 Median : 1.276 Median : 0.000 Median :0.000 \n Mean : 1.851 Mean : 2.338 Mean : 2.375 Mean :2.220 \n 3rd Qu.: 2.466 3rd Qu.: 3.536 3rd Qu.: 3.851 3rd Qu.:3.594 \n Max. :11.133 Max. :10.014 Max. :11.574 Max. :9.997 \n HSPC_011 HSPC_012 HSPC_014 HSPC_015 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.750 Median : 0.000 Median : 0.000 \n Mean : 2.285 Mean : 2.431 Mean : 2.295 Mean : 2.515 \n 3rd Qu.: 3.193 3rd Qu.: 3.741 3rd Qu.: 3.150 3rd Qu.: 3.789 \n Max. :11.260 Max. :10.905 Max. :11.051 Max. :10.751 \n HSPC_016 HSPC_017 HSPC_018 HSPC_020 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.9488 Median : 0.000 Median : 1.248 Median : 0.000 \n Mean : 2.6115 Mean : 2.146 Mean : 2.710 Mean : 2.509 \n 3rd Qu.: 5.9412 3rd Qu.: 2.357 3rd Qu.: 6.006 3rd Qu.: 4.470 \n Max. :11.3082 Max. :12.058 Max. :11.894 Max. :11.281 \n HSPC_021 HSPC_022 HSPC_023 HSPC_024 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.170 Mean : 2.287 Mean : 2.314 Mean : 2.195 \n 3rd Qu.: 2.996 3rd Qu.: 3.351 3rd Qu.: 2.749 3rd Qu.: 2.944 \n Max. :10.709 Max. :11.814 Max. :12.113 Max. :11.279 \n HSPC_025 HSPC_026 HSPC_027 HSPC_028 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.572 Median : 1.385 Median : 0.000 Median : 0.000 \n Mean : 2.710 Mean : 2.721 Mean : 2.458 Mean : 1.906 \n 3rd Qu.: 5.735 3rd Qu.: 6.392 3rd Qu.: 5.496 3rd Qu.: 2.037 \n Max. :11.309 Max. :10.865 Max. :11.266 Max. :10.777 \n HSPC_030 HSPC_031 HSPC_033 HSPC_034 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.119 Median : 0.9026 Median : 0.000 Median : 0.7984 \n Mean : 2.338 Mean : 2.3049 Mean : 1.938 Mean : 2.3220 \n 3rd Qu.: 3.005 3rd Qu.: 2.9919 3rd Qu.: 2.434 3rd Qu.: 4.8324 \n Max. :11.391 Max. :11.1748 Max. :10.808 Max. :10.6707 \n HSPC_035 HSPC_036 HSPC_037 HSPC_038 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8879 Median : 1.517 Median : 0.000 \n Mean : 1.810 Mean : 2.6918 Mean : 2.327 Mean : 2.212 \n 3rd Qu.: 2.175 3rd Qu.: 5.9822 3rd Qu.: 3.079 3rd Qu.: 2.867 \n Max. :11.221 Max. :11.3018 Max. :11.399 Max. :12.275 \n HSPC_040 HSPC_041 HSPC_042 HSPC_043 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.8673 Median : 1.342 \n Mean : 2.509 Mean : 2.492 Mean : 2.3673 Mean : 2.420 \n 3rd Qu.: 3.995 3rd Qu.: 3.943 3rd Qu.: 3.8371 3rd Qu.: 3.731 \n Max. :11.863 Max. :11.016 Max. :11.4852 Max. :11.123 \n HSPC_044 HSPC_045 HSPC_046 HSPC_047 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.8452 Median : 2.195 \n Mean : 2.382 Mean : 2.277 Mean : 1.9707 Mean : 2.498 \n 3rd Qu.: 3.998 3rd Qu.: 2.843 3rd Qu.: 2.0656 3rd Qu.: 3.937 \n Max. :10.782 Max. :10.629 Max. :11.0311 Max. :10.180 \n HSPC_048 HSPC_049 HSPC_050 HSPC_051 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.108 Median : 1.275 Median : 0.000 Median : 0.9757 \n Mean : 2.289 Mean : 2.453 Mean : 2.673 Mean : 2.2693 \n 3rd Qu.: 2.988 3rd Qu.: 3.819 3rd Qu.: 5.772 3rd Qu.: 3.1644 \n Max. :10.335 Max. :11.844 Max. :11.301 Max. :10.8692 \n HSPC_052 HSPC_053 HSPC_054 HSPC_055 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.509 Median : 0.818 Median : 0.000 Median : 0.000 \n Mean : 2.561 Mean : 2.684 Mean : 2.107 Mean : 1.959 \n 3rd Qu.: 4.644 3rd Qu.: 5.937 3rd Qu.: 2.568 3rd Qu.: 2.573 \n Max. :11.674 Max. :11.624 Max. :10.770 Max. :11.105 \n HSPC_056 HSPC_057 HSPC_058 HSPC_060 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.399 Median : 0.000 \n Mean : 2.295 Mean : 2.430 Mean : 2.296 Mean : 2.112 \n 3rd Qu.: 3.721 3rd Qu.: 3.806 3rd Qu.: 3.199 3rd Qu.: 2.201 \n Max. :11.627 Max. :10.575 Max. :11.134 Max. :10.631 \n HSPC_061 HSPC_062 HSPC_063 HSPC_064 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.515 Median : 1.101 \n Mean : 1.934 Mean : 2.129 Mean : 2.508 Mean : 2.696 \n 3rd Qu.: 2.489 3rd Qu.: 2.875 3rd Qu.: 4.895 3rd Qu.: 6.412 \n Max. :11.190 Max. :10.433 Max. :10.994 Max. :10.873 \n HSPC_065 HSPC_066 HSPC_067 HSPC_068 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.4852 Median : 0.000 Median : 1.441 Median : 0.000 \n Mean : 2.2676 Mean : 2.136 Mean : 2.480 Mean : 2.449 \n 3rd Qu.: 3.8217 3rd Qu.: 2.632 3rd Qu.: 3.548 3rd Qu.: 4.517 \n Max. :10.9023 Max. :11.608 Max. :11.147 Max. :10.901 \n HSPC_069 HSPC_070 HSPC_071 HSPC_072 \n Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8949 Median : 0.9272 Median : 1.121 \n Mean : 2.406 Mean : 2.5826 Mean : 2.2844 Mean : 2.545 \n 3rd Qu.: 4.705 3rd Qu.: 5.4749 3rd Qu.: 3.2531 3rd Qu.: 4.939 \n Max. :11.258 Max. :11.6715 Max. :10.7886 Max. :11.397 \n HSPC_073 HSPC_074 HSPC_075 HSPC_076 \n Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.00 Median : 1.674 Median : 0.000 \n Mean : 2.491 Mean : 2.46 Mean : 2.413 Mean : 2.289 \n 3rd Qu.: 4.134 3rd Qu.: 3.40 3rd Qu.: 3.013 3rd Qu.: 2.550 \n Max. :11.844 Max. :11.66 Max. :11.976 Max. :12.136 \n HSPC_077 HSPC_078 HSPC_079 HSPC_080 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.6624 Median : 1.492 Median : 1.384 Median : 1.013 \n Mean : 2.4336 Mean : 2.637 Mean : 2.432 Mean : 2.881 \n 3rd Qu.: 5.4937 3rd Qu.: 5.472 3rd Qu.: 3.617 3rd Qu.: 7.220 \n Max. :11.6020 Max. :10.673 Max. :11.199 Max. :11.836 \n HSPC_081 HSPC_082 HSPC_083 HSPC_084 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.7671 Median : 0.000 Median : 1.896 Median : 1.128 \n Mean : 1.9227 Mean : 2.474 Mean : 2.864 Mean : 2.289 \n 3rd Qu.: 1.6349 3rd Qu.: 3.488 3rd Qu.: 5.101 3rd Qu.: 2.792 \n Max. :11.4681 Max. :11.962 Max. :10.865 Max. :11.834 \n HSPC_085 HSPC_087 HSPC_088 HSPC_089 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.157 Mean : 2.314 Mean : 2.202 Mean : 2.329 \n 3rd Qu.: 3.010 3rd Qu.: 3.245 3rd Qu.: 2.092 3rd Qu.: 3.246 \n Max. :10.809 Max. :10.976 Max. :11.362 Max. :11.301 \n HSPC_090 HSPC_094 HSPC_095 HSPC_096 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 \n Median : 0.000 Median : 0.000 Median : 2.055 Median :0.000 \n Mean : 2.286 Mean : 2.186 Mean : 2.756 Mean :2.348 \n 3rd Qu.: 4.174 3rd Qu.: 2.002 3rd Qu.: 4.370 3rd Qu.:4.482 \n Max. :11.124 Max. :11.694 Max. :11.385 Max. :9.601 \n HSPC_098 HSPC_099 HSPC_100 HSPC_101 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.007 \n Mean : 2.209 Mean : 2.082 Mean : 2.313 Mean : 2.587 \n 3rd Qu.: 3.354 3rd Qu.: 2.505 3rd Qu.: 2.775 3rd Qu.: 5.334 \n Max. :11.070 Max. :10.200 Max. :11.452 Max. :11.456 \n HSPC_102 HSPC_103 HSPC_104 HSPC_105 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.111 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.210 Mean : 2.853 Mean : 2.099 Mean : 1.893 \n 3rd Qu.: 2.993 3rd Qu.: 6.123 3rd Qu.: 2.720 3rd Qu.: 2.129 \n Max. :11.153 Max. :11.328 Max. :10.746 Max. :10.721 \n HSPC_106 HSPC_107 HSPC_108 HSPC_109 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.595 \n Mean : 1.980 Mean : 2.279 Mean : 2.296 Mean : 2.420 \n 3rd Qu.: 2.425 3rd Qu.: 3.396 3rd Qu.: 3.361 3rd Qu.: 4.006 \n Max. :10.919 Max. :10.982 Max. :11.744 Max. :10.463 \n HSPC_110 HSPC_111 HSPC_114 HSPC_115 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.9173 Median : 2.349 \n Mean : 2.159 Mean : 1.800 Mean : 1.8376 Mean : 2.943 \n 3rd Qu.: 2.667 3rd Qu.: 2.214 3rd Qu.: 1.8741 3rd Qu.: 6.223 \n Max. :11.121 Max. :11.109 Max. :10.4645 Max. :11.124 \n HSPC_117 HSPC_118 HSPC_119 HSPC_120 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.187 \n Mean : 1.919 Mean : 1.855 Mean : 2.289 Mean : 2.041 \n 3rd Qu.: 2.306 3rd Qu.: 2.387 3rd Qu.: 3.292 3rd Qu.: 2.610 \n Max. :14.579 Max. :11.119 Max. :12.534 Max. :11.438 \n HSPC_121 HSPC_122 HSPC_123 HSPC_125 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.803 Mean : 2.072 Mean : 2.200 Mean : 2.116 \n 3rd Qu.: 5.798 3rd Qu.: 2.140 3rd Qu.: 3.215 3rd Qu.: 2.409 \n Max. :11.320 Max. :11.013 Max. :11.163 Max. :11.368 \n HSPC_126 HSPC_127 HSPC_130 HSPC_131 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.9381 Median : 1.147 Median : 0.000 Median : 0.000 \n Mean : 2.0014 Mean : 2.287 Mean : 2.551 Mean : 2.240 \n 3rd Qu.: 2.2215 3rd Qu.: 3.051 3rd Qu.: 3.968 3rd Qu.: 3.773 \n Max. :10.9622 Max. :11.028 Max. :10.585 Max. :11.216 \n HSPC_132 HSPC_133 HSPC_134 HSPC_135 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.4438 Median : 2.234 Median : 0.000 Median : 0.000 \n Mean : 2.1659 Mean : 2.582 Mean : 2.335 Mean : 2.402 \n 3rd Qu.: 1.8512 3rd Qu.: 4.591 3rd Qu.: 3.659 3rd Qu.: 4.134 \n Max. :10.6431 Max. :10.730 Max. :11.995 Max. :11.573 \n HSPC_136 HSPC_138 HSPC_139 HSPC_140 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.7062 Median : 2.078 Median : 0.000 \n Mean : 2.546 Mean : 2.1054 Mean : 2.876 Mean : 2.220 \n 3rd Qu.: 5.219 3rd Qu.: 1.8181 3rd Qu.: 4.604 3rd Qu.: 3.716 \n Max. :11.281 Max. :11.1177 Max. :11.013 Max. :10.893 \n HSPC_141 HSPC_142 HSPC_143 HSPC_144 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.075 \n Mean : 2.385 Mean : 2.232 Mean : 2.592 Mean : 2.004 \n 3rd Qu.: 4.149 3rd Qu.: 2.523 3rd Qu.: 4.248 3rd Qu.: 2.441 \n Max. :11.099 Max. :11.902 Max. :12.932 Max. :11.121 \n HSPC_146 HSPC_148 HSPC_149 HSPC_151 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.9711 \n Mean : 2.418 Mean : 2.385 Mean : 2.314 Mean : 2.4375 \n 3rd Qu.: 4.430 3rd Qu.: 3.288 3rd Qu.: 3.139 3rd Qu.: 3.2523 \n Max. :10.385 Max. :12.823 Max. :10.910 Max. :11.7148 \n HSPC_152 HSPC_153 HSPC_154 HSPC_155 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.247 Mean : 2.415 Mean : 2.476 Mean : 2.468 \n 3rd Qu.: 3.293 3rd Qu.: 3.524 3rd Qu.: 4.653 3rd Qu.: 3.621 \n Max. :12.463 Max. :12.205 Max. :11.437 Max. :11.207 \n HSPC_156 HSPC_157 HSPC_158 HSPC_159 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.5545 Median : 1.993 Median : 0.000 Median : 0.000 \n Mean : 2.2297 Mean : 2.493 Mean : 2.119 Mean : 2.461 \n 3rd Qu.: 2.0977 3rd Qu.: 3.692 3rd Qu.: 2.930 3rd Qu.: 3.340 \n Max. :11.2431 Max. :10.539 Max. :11.336 Max. :11.123 \n HSPC_161 HSPC_162 HSPC_164 HSPC_165 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.701 Median : 0.7152 Median : 0.000 Median : 0.000 \n Mean : 2.533 Mean : 2.3473 Mean : 2.161 Mean : 2.084 \n 3rd Qu.: 3.616 3rd Qu.: 2.4973 3rd Qu.: 2.553 3rd Qu.: 3.020 \n Max. :11.429 Max. :11.0065 Max. :11.865 Max. :10.282 \n HSPC_166 HSPC_168 HSPC_169 HSPC_170 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.002 Median : 1.158 Median : 0.000 \n Mean : 2.177 Mean : 2.390 Mean : 2.038 Mean : 2.401 \n 3rd Qu.: 3.296 3rd Qu.: 4.701 3rd Qu.: 2.232 3rd Qu.: 3.703 \n Max. :11.427 Max. :10.393 Max. :10.447 Max. :11.288 \n HSPC_171 HSPC_172 HSPC_173 HSPC_174 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.525 Median : 0.7679 Median : 0.000 Median : 1.257 \n Mean : 2.312 Mean : 2.3115 Mean : 2.288 Mean : 2.444 \n 3rd Qu.: 2.729 3rd Qu.: 3.7889 3rd Qu.: 3.037 3rd Qu.: 4.996 \n Max. :10.468 Max. :11.1442 Max. :11.074 Max. :11.095 \n HSPC_175 HSPC_176 HSPC_177 HSPC_178 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.496 Median : 2.024 Median : 1.971 Median : 1.003 \n Mean : 2.613 Mean : 2.593 Mean : 2.421 Mean : 2.277 \n 3rd Qu.: 4.845 3rd Qu.: 4.092 3rd Qu.: 3.665 3rd Qu.: 2.812 \n Max. :11.235 Max. :10.379 Max. :10.864 Max. :10.979 \n HSPC_179 HSPC_180 HSPC_181 HSPC_182 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.836 Median : 1.544 Median : 2.030 Median : 0.000 \n Mean : 2.205 Mean : 2.556 Mean : 2.890 Mean : 2.363 \n 3rd Qu.: 2.300 3rd Qu.: 4.798 3rd Qu.: 4.846 3rd Qu.: 3.779 \n Max. :11.244 Max. :10.802 Max. :10.945 Max. :10.399 \n HSPC_183 HSPC_185 HSPC_186 HSPC_187 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.020 Median : 0.000 Median : 1.606 Median : 0.000 \n Mean : 2.242 Mean : 2.708 Mean : 2.053 Mean : 2.360 \n 3rd Qu.: 2.842 3rd Qu.: 4.855 3rd Qu.: 2.834 3rd Qu.: 3.541 \n Max. :10.530 Max. :11.079 Max. :11.016 Max. :10.923 \n HSPC_189 HSPC_190 HSPC_191 HSPC_192 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.412 \n Mean : 2.120 Mean : 2.417 Mean : 2.175 Mean : 2.192 \n 3rd Qu.: 2.652 3rd Qu.: 5.226 3rd Qu.: 2.574 3rd Qu.: 2.669 \n Max. :11.300 Max. :11.023 Max. :11.454 Max. :10.225 \n HSPC_193 HSPC_195 HSPC_196 HSPC_198 \n Min. :0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.:0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median :0.9691 Median : 0.9175 Median : 1.379 Median : 1.105 \n Mean :2.5448 Mean : 2.7307 Mean : 2.327 Mean : 2.155 \n 3rd Qu.:5.1191 3rd Qu.: 5.8899 3rd Qu.: 2.625 3rd Qu.: 2.756 \n Max. :9.8728 Max. :10.4757 Max. :11.319 Max. :11.405 \n HSPC_199 HSPC_200 HSPC_202 HSPC_203 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 1.069 Median : 1.572 Median : 0.8045 Median : 1.311 \n Mean : 1.909 Mean : 2.346 Mean : 2.1384 Mean : 2.058 \n 3rd Qu.: 2.431 3rd Qu.: 2.791 3rd Qu.: 2.0569 3rd Qu.: 2.792 \n Max. :11.377 Max. :11.334 Max. :11.0516 Max. :10.852 \n HSPC_204 HSPC_205 HSPC_206 HSPC_207 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.342 Median : 1.997 Median : 1.076 Median : 0.9235 \n Mean : 2.716 Mean : 2.520 Mean : 2.426 Mean : 2.2974 \n 3rd Qu.: 5.611 3rd Qu.: 4.244 3rd Qu.: 4.057 3rd Qu.: 2.6736 \n Max. :10.269 Max. :10.817 Max. :11.866 Max. :11.4287 \n HSPC_208 HSPC_210 HSPC_211 HSPC_212 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 2.263 Median : 1.021 Median : 1.351 Median : 0.000 \n Mean : 2.893 Mean : 2.315 Mean : 2.425 Mean : 2.336 \n 3rd Qu.: 5.014 3rd Qu.: 2.676 3rd Qu.: 3.820 3rd Qu.: 3.443 \n Max. :11.375 Max. :12.208 Max. :11.360 Max. :11.808 \n HSPC_213 HSPC_214 HSPC_215 HSPC_216 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.270 Median : 0.9195 Median : 1.653 Median : 0.8022 \n Mean : 2.483 Mean : 2.1976 Mean : 2.563 Mean : 2.6010 \n 3rd Qu.: 4.903 3rd Qu.: 2.7139 3rd Qu.: 4.344 3rd Qu.: 6.0076 \n Max. :11.548 Max. :10.6947 Max. :10.933 Max. :11.2119 \n HSPC_218 HSPC_219 HSPC_220 HSPC_221 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.027 Median : 0.000 Median : 1.269 \n Mean : 2.467 Mean : 2.291 Mean : 2.449 Mean : 2.641 \n 3rd Qu.: 3.980 3rd Qu.: 2.853 3rd Qu.: 4.486 3rd Qu.: 3.617 \n Max. :11.654 Max. :10.801 Max. :10.410 Max. :11.651 \n HSPC_222 HSPC_223 HSPC_224 HSPC_225 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.449 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.262 Mean : 2.271 Mean : 2.492 Mean : 2.585 \n 3rd Qu.: 3.271 3rd Qu.: 3.727 3rd Qu.: 3.769 3rd Qu.: 5.253 \n Max. :11.133 Max. :12.000 Max. :11.114 Max. :11.671 \n HSPC_227 HSPC_228 HSPC_229 HSPC_230 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 2.484 Median : 0.000 \n Mean : 2.492 Mean : 2.370 Mean : 2.742 Mean : 2.586 \n 3rd Qu.: 3.692 3rd Qu.: 4.488 3rd Qu.: 4.836 3rd Qu.: 5.188 \n Max. :10.815 Max. :10.165 Max. :11.143 Max. :10.734 \n HSPC_231 HSPC_232 HSPC_233 HSPC_235 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.869 Median : 1.254 Median : 0.000 \n Mean : 2.379 Mean : 2.264 Mean : 2.531 Mean : 2.552 \n 3rd Qu.: 4.787 3rd Qu.: 3.163 3rd Qu.: 3.925 3rd Qu.: 4.389 \n Max. :10.790 Max. :12.098 Max. :11.533 Max. :11.765 \n HSPC_236 HSPC_237 HSPC_239 HSPC_240 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 2.207 Median : 0.892 \n Mean : 2.205 Mean : 2.457 Mean : 2.656 Mean : 2.049 \n 3rd Qu.: 3.748 3rd Qu.: 3.488 3rd Qu.: 4.904 3rd Qu.: 2.617 \n Max. :10.234 Max. :10.630 Max. :10.858 Max. :10.528 \n HSPC_243 HSPC_244 HSPC_245 HSPC_246 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.118 Median : 0.7872 Median : 1.459 Median : 1.629 \n Mean : 2.311 Mean : 2.6638 Mean : 2.360 Mean : 2.321 \n 3rd Qu.: 2.574 3rd Qu.: 6.2395 3rd Qu.: 3.000 3rd Qu.: 3.229 \n Max. :11.069 Max. :10.0730 Max. :11.297 Max. :11.237 \n HSPC_247 HSPC_248 HSPC_249 HSPC_250 \n Min. :0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.:0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median :0.000 Median : 0.8453 Median : 0.000 Median : 1.278 \n Mean :2.537 Mean : 2.3719 Mean : 1.803 Mean : 2.751 \n 3rd Qu.:4.687 3rd Qu.: 3.3090 3rd Qu.: 2.335 3rd Qu.: 6.330 \n Max. :9.821 Max. :10.8128 Max. :10.568 Max. :11.256 \n HSPC_251 HSPC_253 HSPC_254 HSPC_255 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.9714 Median : 1.265 Median : 0.000 Median : 0.9098 \n Mean : 2.5626 Mean : 2.492 Mean : 2.177 Mean : 2.1878 \n 3rd Qu.: 4.9167 3rd Qu.: 4.185 3rd Qu.: 3.437 3rd Qu.: 2.4313 \n Max. :11.1252 Max. :10.435 Max. :10.422 Max. :10.7952 \n HSPC_256 HSPC_257 HSPC_258 HSPC_261 \n Min. : 0.0000 Min. : 0.000 Min. :0.0000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 0.0000 \n Median : 0.8248 Median : 1.241 Median :0.8526 Median : 0.5387 \n Mean : 2.1051 Mean : 2.630 Mean :2.0295 Mean : 2.1419 \n 3rd Qu.: 2.3331 3rd Qu.: 5.646 3rd Qu.:3.0784 3rd Qu.: 1.9352 \n Max. :13.0375 Max. :11.499 Max. :9.9116 Max. :11.3247 \n HSPC_263 HSPC_264 HSPC_265 HSPC_266 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.538 Median : 1.426 Median : 1.883 Median : 1.839 \n Mean : 2.613 Mean : 2.374 Mean : 3.177 Mean : 2.833 \n 3rd Qu.: 4.485 3rd Qu.: 3.238 3rd Qu.: 5.702 3rd Qu.: 5.801 \n Max. :10.571 Max. :11.136 Max. :12.436 Max. :10.338 \n HSPC_267 HSPC_268 HSPC_269 HSPC_270 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.9675 Median : 0.7787 Median : 0.8632 Median : 0.9637 \n Mean : 2.4910 Mean : 2.5342 Mean : 2.4029 Mean : 2.6899 \n 3rd Qu.: 3.5345 3rd Qu.: 4.9871 3rd Qu.: 4.3176 3rd Qu.: 5.7266 \n Max. :10.0139 Max. :10.7848 Max. :11.2689 Max. :11.1648 \n HSPC_271 HSPC_274 HSPC_275 HSPC_276 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 1.352 Median : 1.730 Median : 0.5252 Median : 1.156 \n Mean : 2.493 Mean : 2.382 Mean : 2.5375 Mean : 2.485 \n 3rd Qu.: 4.430 3rd Qu.: 3.360 3rd Qu.: 5.7329 3rd Qu.: 4.623 \n Max. :11.636 Max. :11.165 Max. :11.6234 Max. :11.562 \n HSPC_278 HSPC_279 HSPC_280 HSPC_281 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.487 Median : 1.608 Median : 2.611 \n Mean : 2.161 Mean : 2.497 Mean : 2.580 Mean : 2.737 \n 3rd Qu.: 2.270 3rd Qu.: 3.813 3rd Qu.: 3.985 3rd Qu.: 4.731 \n Max. :11.734 Max. :10.900 Max. :11.673 Max. :10.076 \n HSPC_282 HSPC_283 HSPC_285 HSPC_286 \n Min. : 0.0000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.7021 Median : 1.911 Median : 0.8658 Median : 1.178 \n Mean : 2.4272 Mean : 2.534 Mean : 2.4868 Mean : 2.293 \n 3rd Qu.: 4.1254 3rd Qu.: 3.888 3rd Qu.: 5.3804 3rd Qu.: 2.597 \n Max. :11.1094 Max. :10.258 Max. :10.5533 Max. :11.112 \n HSPC_287 HSPC_288 HSPC_289 HSPC_290 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.049 Median : 0.8548 Median : 1.953 Median : 1.176 \n Mean : 2.775 Mean : 2.6412 Mean : 2.925 Mean : 2.304 \n 3rd Qu.: 5.476 3rd Qu.: 5.4204 3rd Qu.: 5.613 3rd Qu.: 3.445 \n Max. :10.925 Max. :11.0814 Max. :10.199 Max. :11.094 \n HSPC_291 HSPC_292 HSPC_293 HSPC_294 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.176 Median : 1.320 Median : 1.077 Median : 0.9161 \n Mean : 2.662 Mean : 2.534 Mean : 2.538 Mean : 2.4365 \n 3rd Qu.: 5.690 3rd Qu.: 4.297 3rd Qu.: 3.458 3rd Qu.: 4.8204 \n Max. :12.255 Max. :11.090 Max. :10.987 Max. :10.6135 \n HSPC_295 HSPC_296 HSPC_297 HSPC_298 \n Min. :0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.:0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median :1.479 Median : 2.157 Median : 2.444 Median : 1.281 \n Mean :2.849 Mean : 2.977 Mean : 3.062 Mean : 2.277 \n 3rd Qu.:5.282 3rd Qu.: 5.006 3rd Qu.: 5.005 3rd Qu.: 2.749 \n Max. :9.986 Max. :10.830 Max. :11.009 Max. :10.636 \n HSPC_299 HSPC_300 HSPC_301 HSPC_302 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.716 Median : 1.163 Median : 2.235 Median : 2.240 \n Mean : 2.597 Mean : 2.346 Mean : 2.739 Mean : 2.890 \n 3rd Qu.: 3.762 3rd Qu.: 2.876 3rd Qu.: 4.593 3rd Qu.: 4.945 \n Max. :11.663 Max. :11.690 Max. :10.364 Max. :10.498 \n HSPC_303 HSPC_304 HSPC_305 HSPC_306 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.8348 Median : 0.9727 Median : 1.152 Median : 1.303 \n Mean : 2.3400 Mean : 2.3710 Mean : 2.469 Mean : 2.496 \n 3rd Qu.: 3.2942 3rd Qu.: 2.9942 3rd Qu.: 3.300 3rd Qu.: 3.015 \n Max. :10.3022 Max. :11.7185 Max. :11.051 Max. :11.211 \n HSPC_307 HSPC_308 HSPC_309 HSPC_310 \n Min. :0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.:0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median :1.976 Median : 1.634 Median : 1.804 Median : 1.743 \n Mean :2.873 Mean : 2.812 Mean : 2.892 Mean : 2.874 \n 3rd Qu.:5.396 3rd Qu.: 5.089 3rd Qu.: 5.165 3rd Qu.: 5.004 \n Max. :9.921 Max. :10.527 Max. :10.476 Max. :11.107 \n HSPC_312 HSPC_313 HSPC_314 HSPC_315 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.420 Median : 1.592 Median : 1.635 Median : 2.262 \n Mean : 2.645 Mean : 2.637 Mean : 2.564 Mean : 2.628 \n 3rd Qu.: 4.925 3rd Qu.: 4.257 3rd Qu.: 4.297 3rd Qu.: 4.092 \n Max. :11.367 Max. :10.644 Max. :10.882 Max. :12.140 \n HSPC_317 HSPC_318 HSPC_320 HSPC_321 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 2.335 Median : 1.728 Median : 2.340 Median : 1.835 \n Mean : 2.648 Mean : 2.637 Mean : 3.064 Mean : 2.742 \n 3rd Qu.: 4.103 3rd Qu.: 4.483 3rd Qu.: 5.325 3rd Qu.: 4.340 \n Max. :10.933 Max. :11.712 Max. :11.589 Max. :11.695 \n HSPC_322 HSPC_323 HSPC_324 HSPC_325 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.9842 Median : 0.989 Median : 1.088 Median : 2.132 \n Mean : 2.5948 Mean : 2.905 Mean : 2.655 Mean : 3.091 \n 3rd Qu.: 3.4619 3rd Qu.: 5.629 3rd Qu.: 3.772 3rd Qu.: 5.191 \n Max. :11.9594 Max. :12.267 Max. :11.310 Max. :11.134 \n HSPC_326 HSPC_327 HSPC_328 HSPC_329 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.781 Median : 1.085 Median : 1.936 Median : 1.954 \n Mean : 3.021 Mean : 2.838 Mean : 2.582 Mean : 3.034 \n 3rd Qu.: 5.582 3rd Qu.: 6.388 3rd Qu.: 4.048 3rd Qu.: 5.497 \n Max. :11.268 Max. :11.433 Max. :11.908 Max. :10.927 \n HSPC_330 HSPC_331 HSPC_332 HSPC_333 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.870 Median : 2.953 Median : 1.644 Median : 1.320 \n Mean : 2.791 Mean : 3.058 Mean : 2.768 Mean : 2.428 \n 3rd Qu.: 4.409 3rd Qu.: 5.118 3rd Qu.: 5.141 3rd Qu.: 2.985 \n Max. :11.561 Max. :10.855 Max. :10.420 Max. :11.946 \n HSPC_334 HSPC_335 HSPC_336 HSPC_337 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.931 Median : 1.541 Median : 2.761 Median : 0.000 \n Mean : 2.894 Mean : 2.746 Mean : 3.051 Mean : 2.415 \n 3rd Qu.: 4.160 3rd Qu.: 4.461 3rd Qu.: 4.408 3rd Qu.: 4.188 \n Max. :11.592 Max. :11.076 Max. :11.246 Max. :10.205 \n HSPC_338 HSPC_339 HSPC_341 HSPC_342 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.9553 Median : 0.4452 \n Mean : 2.205 Mean : 2.325 Mean : 2.0823 Mean : 2.4572 \n 3rd Qu.: 2.449 3rd Qu.: 3.136 3rd Qu.: 2.0118 3rd Qu.: 4.9582 \n Max. :12.052 Max. :11.858 Max. :11.3855 Max. :11.8066 \n HSPC_343 HSPC_344 HSPC_345 HSPC_346 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.5197 \n Mean : 2.363 Mean : 2.290 Mean : 1.984 Mean : 2.5126 \n 3rd Qu.: 4.285 3rd Qu.: 3.238 3rd Qu.: 2.561 3rd Qu.: 5.2033 \n Max. :11.422 Max. :11.877 Max. :10.939 Max. :11.1527 \n HSPC_348 HSPC_349 HSPC_350 HSPC_351 \n Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 \n Median : 1.113 Median : 0.000 Median : 0.00 Median : 0.000 \n Mean : 2.232 Mean : 1.949 Mean : 2.11 Mean : 2.259 \n 3rd Qu.: 2.875 3rd Qu.: 2.784 3rd Qu.: 3.07 3rd Qu.: 3.214 \n Max. :11.161 Max. :10.720 Max. :11.15 Max. :10.912 \n HSPC_352 HSPC_353 HSPC_354 HSPC_356 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.333 Mean : 2.162 Mean : 2.427 Mean : 2.135 \n 3rd Qu.: 3.197 3rd Qu.: 2.819 3rd Qu.: 3.808 3rd Qu.: 2.709 \n Max. :12.275 Max. :11.351 Max. :11.190 Max. :10.662 \n HSPC_358 HSPC_359 HSPC_360 HSPC_361 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.278 Mean : 2.012 Mean : 2.381 Mean : 2.137 \n 3rd Qu.: 3.608 3rd Qu.: 1.460 3rd Qu.: 3.044 3rd Qu.: 2.875 \n Max. :10.924 Max. :11.678 Max. :11.203 Max. :10.847 \n HSPC_362 HSPC_363 HSPC_365 HSPC_367 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 1.783 Mean : 1.987 Mean : 2.937 Mean : 2.449 \n 3rd Qu.: 1.594 3rd Qu.: 2.750 3rd Qu.: 5.572 3rd Qu.: 3.936 \n Max. :11.889 Max. :10.389 Max. :12.427 Max. :11.081 \n HSPC_368 HSPC_370 HSPC_371 HSPC_372 \n Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.7971 Median : 0.7613 Median : 0.000 \n Mean : 1.877 Mean : 2.7681 Mean : 2.4278 Mean : 2.487 \n 3rd Qu.: 2.018 3rd Qu.: 6.5358 3rd Qu.: 4.9578 3rd Qu.: 4.226 \n Max. :11.523 Max. :11.9636 Max. :11.4223 Max. :11.700 \n HSPC_373 HSPC_374 HSPC_376 HSPC_377 \n Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.00 Median : 0.000 Median : 0.000 \n Mean : 2.330 Mean : 2.21 Mean : 2.625 Mean : 2.456 \n 3rd Qu.: 3.784 3rd Qu.: 2.44 3rd Qu.: 4.365 3rd Qu.: 4.875 \n Max. :11.672 Max. :12.04 Max. :12.011 Max. :11.282 \n HSPC_380 HSPC_382 HSPC_383 HSPC_386 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.9728 Median : 1.753 Median : 0.000 \n Mean : 2.291 Mean : 2.3318 Mean : 2.307 Mean : 2.351 \n 3rd Qu.: 2.403 3rd Qu.: 2.7605 3rd Qu.: 3.113 3rd Qu.: 3.704 \n Max. :11.415 Max. :11.3370 Max. :11.592 Max. :11.079 \n HSPC_387 HSPC_388 HSPC_389 HSPC_390 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.9037 Median : 0.000 Median : 0.000 \n Mean : 2.255 Mean : 2.4969 Mean : 2.081 Mean : 2.131 \n 3rd Qu.: 3.151 3rd Qu.: 5.3587 3rd Qu.: 2.723 3rd Qu.: 2.738 \n Max. :11.700 Max. :10.9923 Max. :11.868 Max. :10.913 \n HSPC_391 HSPC_392 HSPC_393 HSPC_395 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.026 Mean : 2.356 Mean : 2.063 Mean : 1.779 \n 3rd Qu.: 2.126 3rd Qu.: 3.781 3rd Qu.: 2.163 3rd Qu.: 1.924 \n Max. :12.021 Max. :11.370 Max. :10.530 Max. :12.219 \n HSPC_396 HSPC_398 HSPC_399 HSPC_400 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.164 Mean : 2.309 Mean : 1.831 Mean : 2.091 \n 3rd Qu.: 2.681 3rd Qu.: 3.994 3rd Qu.: 1.844 3rd Qu.: 2.781 \n Max. :11.292 Max. :11.431 Max. :11.343 Max. :10.863 \n HSPC_402 HSPC_403 HSPC_404 HSPC_405 \n Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.00 Median : 0.000 Median : 0.5496 \n Mean : 2.343 Mean : 2.06 Mean : 1.878 Mean : 2.3660 \n 3rd Qu.: 4.552 3rd Qu.: 2.45 3rd Qu.: 1.644 3rd Qu.: 2.5449 \n Max. :11.444 Max. :12.00 Max. :11.188 Max. :12.2605 \n HSPC_406 HSPC_407 HSPC_408 HSPC_409 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.565 Median : 0.5775 Median : 0.000 \n Mean : 2.169 Mean : 2.611 Mean : 1.9174 Mean : 2.234 \n 3rd Qu.: 2.606 3rd Qu.: 6.000 3rd Qu.: 1.3086 3rd Qu.: 3.044 \n Max. :10.866 Max. :11.296 Max. :12.8185 Max. :11.595 \n HSPC_410 HSPC_411 HSPC_412 HSPC_413 \n Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.9059 Median : 0.6614 Median : 0.000 \n Mean : 2.308 Mean : 3.1194 Mean : 3.0437 Mean : 2.433 \n 3rd Qu.: 4.022 3rd Qu.: 7.7574 3rd Qu.: 7.4695 3rd Qu.: 3.329 \n Max. :11.620 Max. :12.0858 Max. :11.5582 Max. :12.549 \n HSPC_415 HSPC_416 HSPC_417 HSPC_418 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.7222 Median : 0.000 \n Mean : 2.904 Mean : 2.228 Mean : 2.4242 Mean : 2.508 \n 3rd Qu.: 5.531 3rd Qu.: 3.111 3rd Qu.: 3.0795 3rd Qu.: 3.249 \n Max. :12.359 Max. :11.338 Max. :12.0314 Max. :11.857 \n HSPC_419 HSPC_420 HSPC_421 HSPC_422 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.6924 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.6246 Mean : 2.514 Mean : 2.075 Mean : 2.552 \n 3rd Qu.: 4.8156 3rd Qu.: 5.709 3rd Qu.: 3.682 3rd Qu.: 5.382 \n Max. :12.0526 Max. :11.270 Max. :10.250 Max. :11.691 \n HSPC_423 HSPC_424 HSPC_425 HSPC_426 \n Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.00 Median : 0.000 Median : 1.016 Median : 0.000 \n Mean : 2.12 Mean : 2.225 Mean : 2.658 Mean : 2.235 \n 3rd Qu.: 1.55 3rd Qu.: 2.471 3rd Qu.: 6.474 3rd Qu.: 3.134 \n Max. :11.56 Max. :11.734 Max. :11.303 Max. :10.888 \n HSPC_427 HSPC_431 HSPC_432 HSPC_435 \n Min. : 0.000 Min. : 0.000 Min. :0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.102 Median :0.000 Median : 1.098 \n Mean : 1.829 Mean : 2.360 Mean :2.169 Mean : 2.060 \n 3rd Qu.: 2.980 3rd Qu.: 3.640 3rd Qu.:3.261 3rd Qu.: 2.744 \n Max. :10.517 Max. :10.533 Max. :9.911 Max. :10.677 \n HSPC_436 HSPC_440 HSPC_441 HSPC_442 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.4719 Median : 1.385 Median : 1.084 Median : 0.595 \n Mean : 2.3880 Mean : 1.712 Mean : 2.265 Mean : 2.109 \n 3rd Qu.: 4.3738 3rd Qu.: 2.079 3rd Qu.: 2.828 3rd Qu.: 2.193 \n Max. :11.2839 Max. :11.065 Max. :11.152 Max. :11.560 \n HSPC_443 HSPC_444 HSPC_446 HSPC_447 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.7734 Median : 1.374 Median : 0.000 Median : 1.113 \n Mean : 2.5663 Mean : 2.262 Mean : 1.475 Mean : 2.446 \n 3rd Qu.: 4.9423 3rd Qu.: 2.952 3rd Qu.: 1.683 3rd Qu.: 4.733 \n Max. :10.9262 Max. :10.705 Max. :10.545 Max. :10.303 \n HSPC_448 HSPC_449 HSPC_450 HSPC_451 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.139 Median : 1.344 Median : 0.000 Median : 1.759 \n Mean : 2.396 Mean : 2.164 Mean : 1.946 Mean : 1.806 \n 3rd Qu.: 3.660 3rd Qu.: 2.490 3rd Qu.: 2.483 3rd Qu.: 2.528 \n Max. :11.091 Max. :11.324 Max. :10.397 Max. :10.395 \n HSPC_453 HSPC_454 HSPC_455 HSPC_456 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.9321 Median : 0.5303 Median : 0.000 Median : 0.6497 \n Mean : 2.4906 Mean : 2.4477 Mean : 2.379 Mean : 2.4263 \n 3rd Qu.: 4.9604 3rd Qu.: 4.8773 3rd Qu.: 3.016 3rd Qu.: 5.4740 \n Max. :10.5263 Max. :11.1628 Max. :11.437 Max. :10.9787 \n HSPC_457 HSPC_459 HSPC_460 HSPC_461 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.313 \n Mean : 2.060 Mean : 2.403 Mean : 1.712 Mean : 1.875 \n 3rd Qu.: 2.937 3rd Qu.: 3.029 3rd Qu.: 1.598 3rd Qu.: 2.104 \n Max. :11.746 Max. :12.135 Max. :12.526 Max. :10.210 \n HSPC_462 HSPC_463 HSPC_465 HSPC_466 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.7257 Median : 0.000 Median : 0.5816 \n Mean : 2.095 Mean : 2.2325 Mean : 2.000 Mean : 1.9972 \n 3rd Qu.: 2.578 3rd Qu.: 2.3442 3rd Qu.: 2.633 3rd Qu.: 2.2384 \n Max. :11.429 Max. :11.1776 Max. :11.064 Max. :11.5475 \n HSPC_467 HSPC_468 HSPC_470 HSPC_471 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.177 Median : 0.649 Median : 0.000 Median : 0.000 \n Mean : 1.866 Mean : 2.130 Mean : 1.774 Mean : 2.279 \n 3rd Qu.: 2.258 3rd Qu.: 2.513 3rd Qu.: 1.931 3rd Qu.: 2.744 \n Max. :10.632 Max. :10.527 Max. :10.781 Max. :11.533 \n HSPC_472 HSPC_473 HSPC_474 HSPC_475 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.265 Mean : 2.168 Mean : 2.016 Mean : 2.339 \n 3rd Qu.: 2.982 3rd Qu.: 2.677 3rd Qu.: 2.061 3rd Qu.: 3.319 \n Max. :11.795 Max. :12.071 Max. :11.732 Max. :10.672 \n HSPC_477 HSPC_478 HSPC_479 HSPC_480 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.6278 Median : 0.000 Median : 1.281 Median : 1.034 \n Mean : 1.9910 Mean : 2.068 Mean : 2.175 Mean : 2.239 \n 3rd Qu.: 1.6695 3rd Qu.: 3.402 3rd Qu.: 3.028 3rd Qu.: 2.642 \n Max. :11.1171 Max. :12.113 Max. :11.277 Max. :10.641 \n HSPC_482 HSPC_483 HSPC_485 HSPC_486 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.088 Median : 0.6036 Median : 1.411 \n Mean : 1.998 Mean : 2.454 Mean : 2.3824 Mean : 2.078 \n 3rd Qu.: 2.648 3rd Qu.: 3.006 3rd Qu.: 4.8213 3rd Qu.: 2.579 \n Max. :13.948 Max. :10.722 Max. :11.8691 Max. :10.155 \n HSPC_488 HSPC_489 HSPC_490 HSPC_491 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.310 Median : 0.000 \n Mean : 1.809 Mean : 1.947 Mean : 2.518 Mean : 2.268 \n 3rd Qu.: 2.120 3rd Qu.: 2.330 3rd Qu.: 4.140 3rd Qu.: 3.300 \n Max. :11.271 Max. :11.518 Max. :11.646 Max. :10.366 \n HSPC_492 HSPC_493 HSPC_494 HSPC_495 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.127 Mean : 2.054 Mean : 2.255 Mean : 2.326 \n 3rd Qu.: 2.322 3rd Qu.: 3.060 3rd Qu.: 3.386 3rd Qu.: 3.812 \n Max. :11.674 Max. :10.404 Max. :10.461 Max. :10.304 \n HSPC_496 HSPC_497 HSPC_498 HSPC_499 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.945 Median : 0.5839 Median : 0.000 \n Mean : 1.938 Mean : 2.287 Mean : 2.3731 Mean : 2.045 \n 3rd Qu.: 2.227 3rd Qu.: 2.872 3rd Qu.: 3.6112 3rd Qu.: 2.358 \n Max. :11.323 Max. :11.873 Max. :11.3264 Max. :10.632 \n HSPC_500 HSPC_501 HSPC_502 HSPC_503 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.9146 Median : 0.7789 \n Mean : 2.199 Mean : 2.209 Mean : 2.2727 Mean : 2.4495 \n 3rd Qu.: 2.678 3rd Qu.: 3.150 3rd Qu.: 2.8888 3rd Qu.: 5.4034 \n Max. :11.665 Max. :10.727 Max. :11.4591 Max. :11.5376 \n HSPC_504 HSPC_505 HSPC_506 HSPC_507 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.137 Mean : 2.132 Mean : 2.017 Mean : 2.314 \n 3rd Qu.: 3.035 3rd Qu.: 2.744 3rd Qu.: 2.794 3rd Qu.: 3.175 \n Max. :11.625 Max. :11.385 Max. :11.467 Max. :11.232 \n HSPC_508 HSPC_509 HSPC_510 HSPC_512 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.2297 Median : 1.691 Median : 1.166 Median : 0.000 \n Mean : 1.9265 Mean : 2.548 Mean : 2.319 Mean : 2.482 \n 3rd Qu.: 0.8975 3rd Qu.: 4.397 3rd Qu.: 3.492 3rd Qu.: 3.753 \n Max. :12.0747 Max. :10.603 Max. :10.885 Max. :12.492 \n HSPC_514 HSPC_515 HSPC_516 HSPC_518 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.109 Median : 0.8853 Median : 0.000 \n Mean : 2.295 Mean : 2.298 Mean : 2.5439 Mean : 2.649 \n 3rd Qu.: 2.429 3rd Qu.: 2.560 3rd Qu.: 4.6629 3rd Qu.: 5.581 \n Max. :11.783 Max. :12.193 Max. :12.1718 Max. :11.838 \n HSPC_520 HSPC_521 HSPC_522 HSPC_523 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.3648 \n Mean : 2.295 Mean : 2.348 Mean : 2.529 Mean : 1.9471 \n 3rd Qu.: 2.975 3rd Qu.: 3.375 3rd Qu.: 5.350 3rd Qu.: 1.5726 \n Max. :12.289 Max. :11.712 Max. :10.364 Max. :12.5906 \n HSPC_524 HSPC_526 HSPC_527 HSPC_528 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.777 Median : 0.532 \n Mean : 1.989 Mean : 2.218 Mean : 2.133 Mean : 2.238 \n 3rd Qu.: 3.267 3rd Qu.: 2.431 3rd Qu.: 1.651 3rd Qu.: 2.095 \n Max. :12.105 Max. :10.870 Max. :12.017 Max. :12.183 \n HSPC_530 HSPC_532 HSPC_533 HSPC_534 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.7537 Median : 0.000 \n Mean : 2.017 Mean : 1.856 Mean : 1.7546 Mean : 2.183 \n 3rd Qu.: 2.514 3rd Qu.: 1.816 3rd Qu.: 1.3378 3rd Qu.: 2.311 \n Max. :11.549 Max. :11.255 Max. :11.5862 Max. :11.696 \n HSPC_535 HSPC_537 HSPC_538 HSPC_539 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.122 Mean : 2.010 Mean : 2.501 Mean : 2.463 \n 3rd Qu.: 2.733 3rd Qu.: 2.541 3rd Qu.: 4.886 3rd Qu.: 4.100 \n Max. :10.793 Max. :10.305 Max. :11.359 Max. :11.755 \n HSPC_540 HSPC_541 HSPC_543 HSPC_544 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.9898 Median : 2.362 Median : 0.000 Median : 0.8222 \n Mean : 2.1775 Mean : 2.613 Mean : 2.275 Mean : 2.8070 \n 3rd Qu.: 1.9846 3rd Qu.: 4.440 3rd Qu.: 2.690 3rd Qu.: 6.4209 \n Max. :12.2963 Max. :11.844 Max. :10.983 Max. :10.7976 \n HSPC_545 HSPC_546 HSPC_547 HSPC_548 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 1.485 Median : 0.000 Median : 0.6548 Median : 1.456 \n Mean : 2.215 Mean : 2.424 Mean : 2.5255 Mean : 2.415 \n 3rd Qu.: 2.677 3rd Qu.: 3.573 3rd Qu.: 2.8714 3rd Qu.: 2.639 \n Max. :11.815 Max. :11.235 Max. :11.8801 Max. :11.955 \n HSPC_549 HSPC_550 HSPC_551 HSPC_552 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.750 Median : 1.287 Median : 1.226 \n Mean : 2.149 Mean : 2.592 Mean : 2.680 Mean : 2.236 \n 3rd Qu.: 2.289 3rd Qu.: 4.686 3rd Qu.: 4.007 3rd Qu.: 2.669 \n Max. :11.827 Max. :12.064 Max. :11.874 Max. :11.581 \n HSPC_553 HSPC_554 HSPC_555 HSPC_556 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.4709 Median : 0.000 Median : 0.000 Median : 0.9369 \n Mean : 2.6931 Mean : 2.090 Mean : 1.903 Mean : 2.4784 \n 3rd Qu.: 6.4420 3rd Qu.: 2.158 3rd Qu.: 2.579 3rd Qu.: 3.4024 \n Max. :11.0566 Max. :11.755 Max. :11.245 Max. :11.9838 \n HSPC_557 HSPC_559 HSPC_560 HSPC_562 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.681 Median : 0.000 \n Mean : 1.972 Mean : 1.937 Mean : 2.082 Mean : 2.470 \n 3rd Qu.: 1.880 3rd Qu.: 2.411 3rd Qu.: 2.436 3rd Qu.: 4.148 \n Max. :11.792 Max. :11.871 Max. :11.761 Max. :11.958 \n HSPC_563 HSPC_566 HSPC_567 HSPC_568 \n Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.00 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 1.83 Mean : 2.486 Mean : 2.186 Mean : 2.267 \n 3rd Qu.: 2.29 3rd Qu.: 3.577 3rd Qu.: 2.254 3rd Qu.: 2.957 \n Max. :10.59 Max. :12.452 Max. :11.302 Max. :10.851 \n HSPC_569 HSPC_571 HSPC_573 HSPC_574 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.771 Median : 1.042 Median : 0.000 Median : 0.7547 \n Mean : 2.283 Mean : 2.213 Mean : 2.089 Mean : 2.3196 \n 3rd Qu.: 3.021 3rd Qu.: 2.879 3rd Qu.: 2.291 3rd Qu.: 5.6078 \n Max. :10.720 Max. :10.939 Max. :11.397 Max. :10.4741 \n HSPC_575 HSPC_576 HSPC_577 HSPC_578 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.606 Median : 0.000 \n Mean : 2.016 Mean : 2.206 Mean : 2.358 Mean : 2.257 \n 3rd Qu.: 2.267 3rd Qu.: 2.741 3rd Qu.: 3.198 3rd Qu.: 2.923 \n Max. :10.687 Max. :11.201 Max. :11.613 Max. :12.323 \n HSPC_579 HSPC_580 HSPC_582 HSPC_584 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.182 Median : 0.9442 Median : 0.000 Median : 0.000 \n Mean : 2.472 Mean : 2.4264 Mean : 2.218 Mean : 2.276 \n 3rd Qu.: 5.009 3rd Qu.: 3.5841 3rd Qu.: 3.332 3rd Qu.: 3.067 \n Max. :11.096 Max. :10.6790 Max. :10.882 Max. :10.954 \n HSPC_585 HSPC_586 HSPC_589 HSPC_590 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8915 Median : 0.000 Median : 1.192 \n Mean : 2.034 Mean : 2.0490 Mean : 2.274 Mean : 2.252 \n 3rd Qu.: 2.157 3rd Qu.: 1.8340 3rd Qu.: 3.655 3rd Qu.: 2.364 \n Max. :11.956 Max. :11.4729 Max. :11.198 Max. :10.673 \n HSPC_592 HSPC_593 HSPC_594 HSPC_595 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.228 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.317 Mean : 2.329 Mean : 2.474 Mean : 1.463 \n 3rd Qu.: 2.671 3rd Qu.: 3.263 3rd Qu.: 4.396 3rd Qu.: 1.757 \n Max. :12.036 Max. :10.626 Max. :11.347 Max. :11.286 \n HSPC_596 HSPC_597 HSPC_598 HSPC_599 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.392 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.283 Mean : 1.858 Mean : 1.954 Mean : 1.905 \n 3rd Qu.: 3.425 3rd Qu.: 2.296 3rd Qu.: 2.320 3rd Qu.: 2.497 \n Max. :10.899 Max. :11.002 Max. :11.117 Max. :11.248 \n HSPC_600 HSPC_601 HSPC_602 HSPC_603 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.662 Median : 0.000 \n Mean : 2.335 Mean : 1.905 Mean : 2.343 Mean : 2.281 \n 3rd Qu.: 3.827 3rd Qu.: 2.376 3rd Qu.: 3.272 3rd Qu.: 3.048 \n Max. :11.208 Max. :11.022 Max. :10.908 Max. :11.464 \n HSPC_604 HSPC_606 HSPC_607 HSPC_608 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.136 Mean : 2.392 Mean : 2.142 Mean : 2.139 \n 3rd Qu.: 2.516 3rd Qu.: 4.726 3rd Qu.: 3.187 3rd Qu.: 2.885 \n Max. :11.743 Max. :11.210 Max. :10.319 Max. :10.802 \n HSPC_610 HSPC_612 HSPC_613 HSPC_614 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.315 \n Mean : 2.327 Mean : 2.298 Mean : 2.228 Mean : 2.364 \n 3rd Qu.: 3.718 3rd Qu.: 3.138 3rd Qu.: 2.705 3rd Qu.: 3.136 \n Max. :10.860 Max. :11.564 Max. :10.560 Max. :11.824 \n HSPC_615 HSPC_617 HSPC_618 HSPC_620 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8525 Median : 0.000 Median : 0.000 \n Mean : 1.964 Mean : 2.2100 Mean : 2.229 Mean : 1.881 \n 3rd Qu.: 2.451 3rd Qu.: 2.3301 3rd Qu.: 2.885 3rd Qu.: 2.518 \n Max. :11.058 Max. :10.9434 Max. :11.210 Max. :11.388 \n HSPC_623 HSPC_624 HSPC_625 HSPC_626 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.7201 Median : 0.000 Median : 0.000 \n Mean : 2.563 Mean : 2.0968 Mean : 2.042 Mean : 2.262 \n 3rd Qu.: 4.626 3rd Qu.: 1.8437 3rd Qu.: 2.938 3rd Qu.: 3.424 \n Max. :10.954 Max. :10.9459 Max. :11.226 Max. :11.770 \n HSPC_627 HSPC_628 HSPC_629 HSPC_630 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.269 Mean : 2.302 Mean : 2.212 Mean : 2.519 \n 3rd Qu.: 3.952 3rd Qu.: 2.875 3rd Qu.: 2.625 3rd Qu.: 4.511 \n Max. :11.426 Max. :11.792 Max. :11.139 Max. :11.519 \n HSPC_631 HSPC_633 HSPC_634 HSPC_635 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.303 Mean : 2.329 Mean : 2.268 Mean : 2.054 \n 3rd Qu.: 2.685 3rd Qu.: 3.619 3rd Qu.: 3.662 3rd Qu.: 2.629 \n Max. :10.996 Max. :12.011 Max. :11.406 Max. :11.178 \n HSPC_636 HSPC_637 HSPC_638 HSPC_639 \n Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.9389 Median : 0.5101 Median : 0.9966 \n Mean : 1.953 Mean : 2.1351 Mean : 1.6966 Mean : 1.5879 \n 3rd Qu.: 2.129 3rd Qu.: 2.4817 3rd Qu.: 1.6879 3rd Qu.: 1.6840 \n Max. :11.057 Max. :11.1881 Max. :10.8837 Max. :10.9561 \n HSPC_640 HSPC_641 HSPC_643 HSPC_644 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.025 Median : 1.000 Median : 1.706 Median : 0.4904 \n Mean : 2.136 Mean : 1.957 Mean : 2.468 Mean : 2.4726 \n 3rd Qu.: 2.119 3rd Qu.: 2.001 3rd Qu.: 3.329 3rd Qu.: 5.6227 \n Max. :11.173 Max. :11.056 Max. :12.016 Max. :11.0232 \n HSPC_645 HSPC_646 HSPC_648 HSPC_649 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.7157 Median : 0.9959 Median : 1.519 Median : 0.7139 \n Mean : 2.3517 Mean : 2.0594 Mean : 2.267 Mean : 2.3593 \n 3rd Qu.: 4.5630 3rd Qu.: 2.3154 3rd Qu.: 2.722 3rd Qu.: 4.1542 \n Max. :10.9922 Max. :11.6070 Max. :11.243 Max. :10.7707 \n HSPC_651 HSPC_652 HSPC_654 HSPC_656 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 \n Median : 0.000 Median : 0.000 Median : 1.398 Median :0.000 \n Mean : 2.550 Mean : 1.764 Mean : 2.108 Mean :1.983 \n 3rd Qu.: 5.615 3rd Qu.: 2.038 3rd Qu.: 2.562 3rd Qu.:2.505 \n Max. :11.202 Max. :10.897 Max. :10.367 Max. :9.673 \n HSPC_657 HSPC_658 HSPC_660 HSPC_661 \n Min. : 0.000 Min. : 0.000 Min. :0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median :1.253 Median : 1.491 \n Mean : 1.839 Mean : 2.319 Mean :2.542 Mean : 2.401 \n 3rd Qu.: 2.239 3rd Qu.: 4.021 3rd Qu.:5.274 3rd Qu.: 2.775 \n Max. :12.132 Max. :11.264 Max. :9.852 Max. :11.647 \n HSPC_662 HSPC_663 HSPC_664 HSPC_665 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 0.9407 Median : 0.6452 \n Mean : 2.298 Mean : 2.726 Mean : 2.4039 Mean : 2.1211 \n 3rd Qu.: 2.939 3rd Qu.: 6.519 3rd Qu.: 3.4095 3rd Qu.: 2.0744 \n Max. :11.277 Max. :12.152 Max. :10.9423 Max. :12.0111 \n HSPC_666 HSPC_667 HSPC_668 HSPC_669 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.130 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.815 Mean : 2.075 Mean : 2.245 Mean : 1.992 \n 3rd Qu.: 6.359 3rd Qu.: 2.549 3rd Qu.: 2.407 3rd Qu.: 2.426 \n Max. :11.052 Max. :11.406 Max. :11.061 Max. :11.752 \n HSPC_670 HSPC_671 HSPC_672 HSPC_673 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.336 Mean : 2.349 Mean : 2.041 Mean : 2.148 \n 3rd Qu.: 3.188 3rd Qu.: 3.777 3rd Qu.: 2.057 3rd Qu.: 2.723 \n Max. :11.021 Max. :10.846 Max. :11.212 Max. :11.579 \n HSPC_674 HSPC_676 HSPC_678 HSPC_679 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.525 Median : 1.531 \n Mean : 1.801 Mean : 2.239 Mean : 2.298 Mean : 2.133 \n 3rd Qu.: 1.892 3rd Qu.: 3.097 3rd Qu.: 3.089 3rd Qu.: 2.737 \n Max. :10.875 Max. :10.496 Max. :12.125 Max. :11.583 \n HSPC_680 HSPC_681 HSPC_682 HSPC_683 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.037 Median : 1.043 Median : 1.180 \n Mean : 2.573 Mean : 2.277 Mean : 2.586 Mean : 2.498 \n 3rd Qu.: 4.165 3rd Qu.: 4.210 3rd Qu.: 5.432 3rd Qu.: 3.929 \n Max. :11.100 Max. :10.154 Max. :11.095 Max. :10.859 \n HSPC_687 HSPC_689 HSPC_690 HSPC_692 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 2.091 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.136 Mean : 2.489 Mean : 2.686 Mean : 2.285 \n 3rd Qu.: 2.911 3rd Qu.: 4.106 3rd Qu.: 5.055 3rd Qu.: 3.427 \n Max. :11.380 Max. :10.693 Max. :10.408 Max. :12.242 \n HSPC_695 HSPC_696 HSPC_697 HSPC_698 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.2681 Median : 1.538 Median : 1.271 Median : 0.000 \n Mean : 1.6151 Mean : 2.688 Mean : 2.529 Mean : 2.531 \n 3rd Qu.: 0.6895 3rd Qu.: 5.560 3rd Qu.: 4.779 3rd Qu.: 4.387 \n Max. :12.4139 Max. :10.880 Max. :10.292 Max. :12.146 \n HSPC_699 HSPC_700 HSPC_701 HSPC_702 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 1.157 Median : 0.000 \n Mean : 2.586 Mean : 2.402 Mean : 2.401 Mean : 2.723 \n 3rd Qu.: 4.595 3rd Qu.: 4.797 3rd Qu.: 3.889 3rd Qu.: 4.822 \n Max. :11.389 Max. :10.630 Max. :11.750 Max. :11.805 \n HSPC_703 HSPC_704 HSPC_705 HSPC_706 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 2.193 Median : 0.000 Median : 0.9795 Median : 1.273 \n Mean : 2.543 Mean : 2.598 Mean : 2.5048 Mean : 2.364 \n 3rd Qu.: 3.935 3rd Qu.: 4.335 3rd Qu.: 5.0680 3rd Qu.: 3.492 \n Max. :11.710 Max. :11.488 Max. :11.3580 Max. :10.447 \n HSPC_707 HSPC_708 HSPC_709 HSPC_714 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.361 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.371 Mean : 2.509 Mean : 2.601 Mean : 2.326 \n 3rd Qu.: 3.626 3rd Qu.: 3.832 3rd Qu.: 5.060 3rd Qu.: 3.324 \n Max. :11.796 Max. :10.865 Max. :10.145 Max. :11.126 \n HSPC_716 HSPC_717 HSPC_719 HSPC_720 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 1.154 Median : 1.855 Median : 0.8206 \n Mean : 2.325 Mean : 2.302 Mean : 2.519 Mean : 2.5768 \n 3rd Qu.: 3.356 3rd Qu.: 2.833 3rd Qu.: 4.115 3rd Qu.: 5.5594 \n Max. :11.812 Max. :11.047 Max. :12.237 Max. :10.5895 \n HSPC_721 HSPC_722 HSPC_723 HSPC_724 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 2.113 Median : 1.185 Median : 0.8421 Median : 0.6485 \n Mean : 2.205 Mean : 1.814 Mean : 2.6174 Mean : 1.9644 \n 3rd Qu.: 3.456 3rd Qu.: 2.269 3rd Qu.: 4.9545 3rd Qu.: 1.9402 \n Max. :10.706 Max. :10.709 Max. :11.5956 Max. :11.3505 \n HSPC_725 HSPC_727 HSPC_729 HSPC_730 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 1.577 Median : 1.576 Median : 0.9579 Median : 1.135 \n Mean : 2.483 Mean : 2.436 Mean : 2.2448 Mean : 2.445 \n 3rd Qu.: 3.741 3rd Qu.: 3.447 3rd Qu.: 2.7343 3rd Qu.: 3.475 \n Max. :10.647 Max. :11.512 Max. :10.9657 Max. :11.121 \n HSPC_731 HSPC_732 HSPC_733 HSPC_734 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.130 Median : 0.6937 Median : 1.436 Median : 0.7333 \n Mean : 2.854 Mean : 2.1051 Mean : 2.489 Mean : 2.5404 \n 3rd Qu.: 6.019 3rd Qu.: 2.0311 3rd Qu.: 3.738 3rd Qu.: 5.6282 \n Max. :10.471 Max. :11.0494 Max. :10.929 Max. :10.4547 \n HSPC_735 HSPC_736 HSPC_737 HSPC_738 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.033 Median : 0.6789 Median : 1.185 Median : 1.514 \n Mean : 2.389 Mean : 2.0224 Mean : 2.722 Mean : 2.503 \n 3rd Qu.: 3.056 3rd Qu.: 2.0017 3rd Qu.: 5.669 3rd Qu.: 3.602 \n Max. :10.866 Max. :11.8100 Max. :11.076 Max. :10.473 \n HSPC_740 HSPC_742 HSPC_743 HSPC_744 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.8437 Median : 1.122 Median : 1.213 \n Mean : 2.506 Mean : 1.8949 Mean : 2.028 Mean : 2.048 \n 3rd Qu.: 3.794 3rd Qu.: 1.7586 3rd Qu.: 2.840 3rd Qu.: 2.309 \n Max. :10.618 Max. :11.6327 Max. :10.449 Max. :10.598 \n HSPC_745 HSPC_746 HSPC_747 HSPC_748 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 2.403 Median : 2.184 Median : 0.000 Median : 1.181 \n Mean : 2.309 Mean : 2.153 Mean : 2.543 Mean : 2.017 \n 3rd Qu.: 3.793 3rd Qu.: 3.016 3rd Qu.: 4.751 3rd Qu.: 2.264 \n Max. :10.882 Max. :10.988 Max. :10.860 Max. :12.153 \n HSPC_749 HSPC_750 HSPC_751 HSPC_752 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.444 Median : 1.030 Median : 1.567 Median : 2.228 \n Mean : 2.477 Mean : 2.370 Mean : 2.416 Mean : 2.529 \n 3rd Qu.: 3.501 3rd Qu.: 3.052 3rd Qu.: 3.435 3rd Qu.: 3.976 \n Max. :11.391 Max. :11.167 Max. :10.239 Max. :10.586 \n HSPC_753 HSPC_755 HSPC_756 HSPC_757 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.062 Median : 0.740 Median : 1.731 Median : 1.395 \n Mean : 2.313 Mean : 2.102 Mean : 2.592 Mean : 2.477 \n 3rd Qu.: 2.961 3rd Qu.: 2.509 3rd Qu.: 4.107 3rd Qu.: 3.253 \n Max. :11.202 Max. :10.559 Max. :10.783 Max. :10.973 \n HSPC_758 HSPC_759 HSPC_760 HSPC_761 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.8648 Median : 0.9415 Median : 1.052 Median : 0.6917 \n Mean : 2.6819 Mean : 2.1274 Mean : 2.288 Mean : 2.2992 \n 3rd Qu.: 4.7233 3rd Qu.: 2.2271 3rd Qu.: 2.404 3rd Qu.: 2.6015 \n Max. :11.1096 Max. :11.2534 Max. :11.008 Max. :11.7228 \n HSPC_762 HSPC_764 HSPC_765 HSPC_766 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 1.271 Median : 1.784 Median : 2.116 Median : 0.9828 \n Mean : 2.242 Mean : 2.068 Mean : 2.100 Mean : 2.1721 \n 3rd Qu.: 2.734 3rd Qu.: 3.059 3rd Qu.: 2.939 3rd Qu.: 2.6115 \n Max. :12.043 Max. :11.003 Max. :12.757 Max. :10.2002 \n HSPC_767 HSPC_768 HSPC_769 HSPC_770 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.6646 Median : 1.703 Median : 0.000 Median : 1.760 \n Mean : 1.9552 Mean : 2.365 Mean : 2.080 Mean : 2.343 \n 3rd Qu.: 1.9730 3rd Qu.: 3.325 3rd Qu.: 3.289 3rd Qu.: 3.122 \n Max. :11.3033 Max. :10.958 Max. :11.176 Max. :10.497 \n HSPC_771 HSPC_772 HSPC_773 HSPC_774 \n Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 \n Median : 1.178 Median : 1.601 Median : 0.9901 Median : 0.9736 \n Mean : 2.527 Mean : 2.283 Mean : 1.8628 Mean : 2.5263 \n 3rd Qu.: 3.342 3rd Qu.: 2.828 3rd Qu.: 1.9851 3rd Qu.: 5.4694 \n Max. :11.156 Max. :10.625 Max. :10.7274 Max. :10.7701 \n HSPC_776 HSPC_777 HSPC_778 HSPC_780 \n Min. : 0.000 Min. : 0.000 Min. :0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.053 Median :1.435 Median : 1.178 \n Mean : 2.315 Mean : 2.110 Mean :2.130 Mean : 2.476 \n 3rd Qu.: 3.788 3rd Qu.: 2.673 3rd Qu.:3.488 3rd Qu.: 3.769 \n Max. :11.105 Max. :11.646 Max. :9.535 Max. :11.265 \n HSPC_781 HSPC_782 HSPC_783 HSPC_784 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.050 \n Mean : 1.911 Mean : 2.416 Mean : 2.254 Mean : 2.175 \n 3rd Qu.: 2.884 3rd Qu.: 3.872 3rd Qu.: 2.548 3rd Qu.: 2.468 \n Max. :11.445 Max. :10.161 Max. :10.970 Max. :10.958 \n HSPC_785 HSPC_786 HSPC_787 HSPC_788 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 1.148 Median : 0.000 Median : 0.9386 \n Mean : 2.230 Mean : 2.467 Mean : 2.100 Mean : 1.9749 \n 3rd Qu.: 2.466 3rd Qu.: 3.899 3rd Qu.: 2.991 3rd Qu.: 2.6662 \n Max. :11.041 Max. :11.080 Max. :10.690 Max. :11.1078 \n HSPC_789 HSPC_790 HSPC_791 HSPC_794 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.181 Median : 1.353 Median : 1.790 Median : 1.113 \n Mean : 2.225 Mean : 2.255 Mean : 2.699 Mean : 2.225 \n 3rd Qu.: 2.876 3rd Qu.: 2.852 3rd Qu.: 4.931 3rd Qu.: 2.768 \n Max. :11.245 Max. :11.558 Max. :11.104 Max. :11.118 \n HSPC_795 HSPC_796 HSPC_797 HSPC_798 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.8317 Median : 0.7001 Median : 0.8722 Median : 1.531 \n Mean : 2.3985 Mean : 2.6865 Mean : 2.6172 Mean : 2.485 \n 3rd Qu.: 3.4461 3rd Qu.: 5.6688 3rd Qu.: 5.3078 3rd Qu.: 3.098 \n Max. :11.0956 Max. :11.0829 Max. :11.4339 Max. :10.933 \n HSPC_799 HSPC_800 HSPC_801 HSPC_802 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 1.033 \n Mean : 2.179 Mean : 2.173 Mean : 2.427 Mean : 2.613 \n 3rd Qu.: 3.517 3rd Qu.: 2.865 3rd Qu.: 4.665 3rd Qu.: 3.780 \n Max. :11.666 Max. :11.263 Max. :10.905 Max. :10.864 \n HSPC_803 HSPC_804 HSPC_806 HSPC_807 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.000 Median : 0.000 Median : 2.103 Median : 0.7501 \n Mean : 2.395 Mean : 2.301 Mean : 2.222 Mean : 2.2476 \n 3rd Qu.: 3.883 3rd Qu.: 3.167 3rd Qu.: 3.445 3rd Qu.: 2.3481 \n Max. :10.766 Max. :11.298 Max. :10.326 Max. :11.2700 \n HSPC_808 HSPC_809 HSPC_810 HSPC_812 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Median : 0.6619 Median : 1.788 Median : 1.651 Median : 0.8459 \n Mean : 2.1677 Mean : 2.544 Mean : 2.471 Mean : 2.2960 \n 3rd Qu.: 2.5355 3rd Qu.: 3.730 3rd Qu.: 3.662 3rd Qu.: 2.6906 \n Max. :10.9302 Max. :11.791 Max. :10.829 Max. :11.5500 \n HSPC_813 HSPC_814 HSPC_815 HSPC_816 \n Min. : 0.0000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.6278 Median : 1.110 Median : 0.9631 Median : 1.346 \n Mean : 2.2448 Mean : 2.762 Mean : 2.4587 Mean : 2.341 \n 3rd Qu.: 2.3066 3rd Qu.: 5.996 3rd Qu.: 3.4228 3rd Qu.: 2.842 \n Max. :12.0043 Max. :10.406 Max. :11.4527 Max. :11.151 \n HSPC_818 HSPC_819 HSPC_820 HSPC_821 \n Min. : 0.0000 Min. : 0.000 Min. : 0.0000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 \n Median : 0.9967 Median : 1.365 Median : 0.8099 Median : 1.382 \n Mean : 2.3081 Mean : 2.426 Mean : 2.1063 Mean : 2.532 \n 3rd Qu.: 2.9942 3rd Qu.: 3.632 3rd Qu.: 2.4643 3rd Qu.: 3.462 \n Max. :11.9931 Max. :10.672 Max. :11.2412 Max. :12.126 \n HSPC_822 HSPC_824 HSPC_825 HSPC_826 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.387 Median : 0.000 Median : 1.386 Median : 1.324 \n Mean : 2.503 Mean : 2.084 Mean : 2.162 Mean : 2.398 \n 3rd Qu.: 3.799 3rd Qu.: 2.342 3rd Qu.: 2.897 3rd Qu.: 3.150 \n Max. :11.892 Max. :11.365 Max. :11.498 Max. :11.198 \n HSPC_827 HSPC_828 HSPC_831 HSPC_832 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.746 Median : 1.003 Median : 1.304 Median : 1.035 \n Mean : 2.239 Mean : 2.145 Mean : 2.589 Mean : 2.384 \n 3rd Qu.: 2.638 3rd Qu.: 2.326 3rd Qu.: 3.866 3rd Qu.: 3.450 \n Max. :12.101 Max. :10.710 Max. :10.839 Max. :10.686 \n HSPC_833 HSPC_834 HSPC_835 HSPC_836 \n Min. : 0.0000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.7166 Median : 0.9245 Median : 1.006 Median : 0.000 \n Mean : 2.3553 Mean : 2.0872 Mean : 2.552 Mean : 2.471 \n 3rd Qu.: 3.9364 3rd Qu.: 2.4568 3rd Qu.: 4.034 3rd Qu.: 3.994 \n Max. :11.1695 Max. :11.1803 Max. :11.779 Max. :11.316 \n HSPC_837 HSPC_838 HSPC_839 HSPC_840 \n Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.312 Median : 0.9838 Median : 1.083 Median : 1.867 \n Mean : 2.590 Mean : 2.5281 Mean : 2.380 Mean : 2.548 \n 3rd Qu.: 4.443 3rd Qu.: 3.5551 3rd Qu.: 3.743 3rd Qu.: 3.609 \n Max. :10.672 Max. :11.2707 Max. :10.966 Max. :10.867 \n HSPC_841 HSPC_842 HSPC_843 HSPC_844 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.439 Median : 1.774 Median : 1.257 Median : 1.584 \n Mean : 2.408 Mean : 2.380 Mean : 2.845 Mean : 2.627 \n 3rd Qu.: 3.494 3rd Qu.: 3.490 3rd Qu.: 6.768 3rd Qu.: 3.951 \n Max. :10.930 Max. :11.137 Max. :11.933 Max. :11.446 \n HSPC_845 HSPC_846 HSPC_848 HSPC_849 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 1.227 Median : 1.401 Median : 0.000 Median : 1.602 \n Mean : 2.464 Mean : 2.240 Mean : 2.152 Mean : 2.402 \n 3rd Qu.: 3.377 3rd Qu.: 2.920 3rd Qu.: 2.554 3rd Qu.: 2.920 \n Max. :10.535 Max. :11.519 Max. :11.266 Max. :11.678 \n HSPC_851 HSPC_852 \n Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 \n Mean : 2.319 Mean : 2.143 \n 3rd Qu.: 3.373 3rd Qu.: 2.901 \n Max. :11.602 Max. :11.469 \n\n\nHmmmm, did you get all that? Nope, me neither! We have 701 cells but we only have 6 samples for the frogs. We will need a different approach to get an overview but I find it is still useful to look at the few columns\n🎬 Get a quick overview the first 20 columns:\n\nsummary(hspc[1:20])\n\n ensembl_gene_id HSPC_001 HSPC_002 HSPC_003 \n Length:280 Min. : 0.000 Min. : 0.000 Min. : 0.0000 \n Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 \n Mode :character Median : 0.000 Median : 0.000 Median : 0.9929 \n Mean : 2.143 Mean : 1.673 Mean : 2.5964 \n 3rd Qu.: 2.120 3rd Qu.: 2.239 3rd Qu.: 6.1559 \n Max. :12.567 Max. :11.976 Max. :11.1138 \n HSPC_004 HSPC_006 HSPC_008 HSPC_009 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.000 \n Median : 0.000 Median : 1.276 Median : 0.000 Median :0.000 \n Mean : 1.851 Mean : 2.338 Mean : 2.375 Mean :2.220 \n 3rd Qu.: 2.466 3rd Qu.: 3.536 3rd Qu.: 3.851 3rd Qu.:3.594 \n Max. :11.133 Max. :10.014 Max. :11.574 Max. :9.997 \n HSPC_011 HSPC_012 HSPC_014 HSPC_015 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 1.750 Median : 0.000 Median : 0.000 \n Mean : 2.285 Mean : 2.431 Mean : 2.295 Mean : 2.515 \n 3rd Qu.: 3.193 3rd Qu.: 3.741 3rd Qu.: 3.150 3rd Qu.: 3.789 \n Max. :11.260 Max. :10.905 Max. :11.051 Max. :10.751 \n HSPC_016 HSPC_017 HSPC_018 HSPC_020 \n Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.9488 Median : 0.000 Median : 1.248 Median : 0.000 \n Mean : 2.6115 Mean : 2.146 Mean : 2.710 Mean : 2.509 \n 3rd Qu.: 5.9412 3rd Qu.: 2.357 3rd Qu.: 6.006 3rd Qu.: 4.470 \n Max. :11.3082 Max. :12.058 Max. :11.894 Max. :11.281 \n HSPC_021 HSPC_022 HSPC_023 HSPC_024 \n Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 \n 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 \n Median : 0.000 Median : 0.000 Median : 0.000 Median : 0.000 \n Mean : 2.170 Mean : 2.287 Mean : 2.314 Mean : 2.195 \n 3rd Qu.: 2.996 3rd Qu.: 3.351 3rd Qu.: 2.749 3rd Qu.: 2.944 \n Max. :10.709 Max. :11.814 Max. :12.113 Max. :11.279 \n\n\nNotice that:\n\nthe maximum value is much less high than for the frogs and has decimals. That is because the mouse data are logged (to base 2) normalised counts, not raw counts as they are in the frog data set.\na minimum value of 0 appears in all 20 columns - perhaps that is true across the whole dataset (or at least common)\nat least some of the medians are zeros so there must be quite a lot of zeros\nthe few columns we can see are roughly similar\nit would not be very practical to plot the distributions of values in cell cell using facet_wrap().\n\nIn this data set, there is even more of an advantage of using the pivot_longer(), group_by() and summarise() approach. We will be able to open the dataframe in the Viewer and make plots to examine whether the distributions are similar across cells.\n🎬 Summarise all the cells:\n\nhspc_summary_samp <- hspc |>\n pivot_longer(cols = -ensembl_gene_id,\n names_to = \"cell\",\n values_to = \"expr\") |>\n group_by(cell) |>\n summarise(min = min(expr),\n lowerq = quantile(expr, 0.25),\n mean = mean(expr),\n median = median(expr),\n sd = sd(expr),\n upperq = quantile(expr, 0.75),\n max = max(expr),\n n_zero = sum(expr == 0))\n\nNotice that I have used cell as the column name rather than sample and expr (expression) rather than count. I’ve also added the standard deviation.\n🎬 View the hspc_summary_samp dataframe (click on it in the environment).\nAll cells have quite a few zeros and the lower quartile is 0 for all cells, i.e., every cell has many genes with zero expression.\nTo get a better understanding of the distribution of expressions in cells we can create a ggplot using the pointrange geom. Pointrange puts a dot at the mean and a line between a minimum and a maximum such as +/- one s.d. Not unlike a boxplot, but when you need the boxes too be very narrow!\n🎬 Create a pointrange plot.\n\nhspc_summary_samp |> \n ggplot(aes(x = cell, y = mean)) +\n geom_pointrange(aes(ymin = mean - sd, \n ymax = mean + sd ),\n size = 0.1)\n\n\n\n\nYou will need to use the Zoom button to pop the plot window out so you can make it as wide as possible\nThe things to notice are:\n\nthe average expression in cells is similar for all cells. This is good to know - if some cells had much lower expression perhaps there is something wrong with them, or their sequencing, and they should be excluded.\nthe distributions are roughly similar in width too\n\nThe default order of cell is alphabetical. It can be easier to see these (non-) effects if we order the lines by the size of the mean.\n🎬 Order a pointrange plot with reorder(variable_to_order, order_by).\n\nhspc_summary_samp |> \n ggplot(aes(x = reorder(cell, mean), y = mean)) +\n geom_pointrange(aes(ymin = mean - sd, \n ymax = mean + sd ),\n size = 0.1)\n\n\n\n\nreorder() arranges cell in increasing size of mean\n🎬 Write hspc_summary_samp to a file called “hspc_summary_samp.csv”:\nDistribution of values across the genes\n🐸 Frog genes\nThere are lots of genes in this dataset therefore we will take the same approach as that we took for the distributions across mouse cells. We will pivot the data to tidy and then summarise the counts for each gene.\n🎬 Summarise the counts for each genes:\n\ns30_summary_gene <- s30 |>\n pivot_longer(cols = -xenbase_gene_id,\n names_to = \"sample\",\n values_to = \"count\") |>\n group_by(xenbase_gene_id) |>\n summarise(min = min(count),\n lowerq = quantile(count, 0.25),\n sd = sd(count),\n mean = mean(count),\n median = median(count),\n upperq = quantile(count, 0.75),\n max = max(count),\n total = sum(count),\n n_zero = sum(count == 0))\n\nI have calculated the values we used before with one addition: the sum of the counts (total).\n🎬 View the s30_summary_gene dataframe.\nNotice that we have:\n\na lot of genes with counts of zero in every sample\na lot of genes with zero counts in several of the samples\nsome very very low counts.\n\nThese should be filtered out because they are unreliable - or, at the least, uninformative. The goal of our downstream analysis will be to see if there is a signifcance difference in gene expression between the control and FGF-treated sibling. Since we have only three replicates in each group, having one or two unreliable, missing or zero values, makes such a determination impossible for a particular gene. We will use the total counts and the number of samples with non-zero values to filter our genes later.\nAs we have a lot of genes, it is again helpful to plot the mean counts with pointrange to get an overview. We will plot the log of the counts - we saw earlier that logging made it easier to understand the distribution of counts over such a wide range. We will also order the genes from lowest to highest mean count.\n🎬 Plot the logged mean counts for each gene in order of size using geom_pointrange():\n\ns30_summary_gene |> \n ggplot(aes(x = reorder(xenbase_gene_id, mean), y = log10(mean))) +\n geom_pointrange(aes(ymin = log10(mean - sd), \n ymax = log10(mean + sd )),\n size = 0.1)\n\n\n\n\n(Remember, the warning is expected since we have zeros).\nYou can see we also have quite a few genes with means less than 1 (log below zero). Note that the variability between genes (average counts between 0 and 102586) is far greater than between samples (average counts from 260 to 426) which is exactly what we would expect to see.\n🎬 Write s30_summary_gene to a file called “s30_summary_gene.csv”:\n🐭 Mouse genes\nThere are fewer genes in this dataset, but still more than you can understand without the overview provided by a plot. We will again pivot the data to tidy and then summarise the expression for each gene.\n🎬 Summarise the expression for each genes:\n\nhspc_summary_gene <- hspc |>\n pivot_longer(cols = -ensembl_gene_id,\n names_to = \"cell\",\n values_to = \"expr\") |>\n group_by(ensembl_gene_id) |>\n summarise(min = min(expr),\n lowerq = quantile(expr, 0.25),\n sd = sd(expr),\n mean = mean(expr),\n median = median(expr),\n upperq = quantile(expr, 0.75),\n max = max(expr),\n total = sum(expr),\n n_zero = sum(expr == 0))\n\n🎬 View the hspc_summary_gene dataframe. Remember these are normalised and logged (base 2) so we should not see very large values.\nNotice that we have:\n\nno genes with 0 in every cell\nvery few genes (9) with no zeros at all\nquite a few genes with zero in many cells but this matters less than zeros in the frog samples because we had just 6 samples and we have 701 cells.\n\nAs we have a lot of genes, it is again helpful to plot the mean expression with pointrange to get an overview. We do not need to log the values but ordering the genes will help.\n🎬 Plot the logged mean counts for each gene in order of size using geom_pointrange():\n\nhspc_summary_gene |> \n ggplot(aes(x = reorder(ensembl_gene_id, mean), y = mean)) +\n geom_pointrange(aes(ymin = mean - sd, \n ymax = mean + sd),\n size = 0.1)\n\n\n\n\nNote again that the variability between genes (average expression between 0.02 and and 10.03) is far greater than between cells (average expression from1.46 to 3.18) which is expected.\n🎬 Write s30_summary_gene to a file called “s30_summary_gene.csv”:"
+ "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-3",
+ "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-3",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Using the usethis package",
+ "text": "Using the usethis package\nOtherwise\nIf you want the project directory elsewhere, you will need to give the relative path, e.g.\n\nusethis::create_project(\"../Documents/bananas\")"
},
{
- "objectID": "omics/week-3/workshop.html#filtering-for-qc",
- "href": "omics/week-3/workshop.html#filtering-for-qc",
- "title": "Workshop",
- "section": "Filtering for QC",
- "text": "Filtering for QC\n🐸 Frog filtering\nOur samples look to be similarly well sequenced. There are no samples we should remove. However, some genes are not express or the expression values are so low in for a gene that they are uninformative. We will filter the s30_summary_gene dataframe to obtain a list of xenbase_gene_id we can use to filter s30.\nMy suggestion is to include only the genes with counts in at least 3 samples3 and those with total counts above 20.\n🎬 Filter the summary by gene dataframe:\n\ns30_summary_gene_filtered <- s30_summary_gene |> \n filter(total > 20) |> \n filter(n_zero < 4)\n\n🎬 Write the filtered summary by gene to file:\n\nwrite_csv(s30_summary_gene_filtered, \n file = \"data-processed/s30_summary_gene_filtered.csv\")\n\n🎬 Use the list of xenbase_gene_id in the filtered summary to filter the original dataset:\n\ns30_filtered <- s30 |> \n filter(xenbase_gene_id %in% s30_summary_gene_filtered$xenbase_gene_id)\n\n🎬 Write the filtered original to file:\n\nwrite_csv(s30_filtered, \n file = \"data-processed/s30_filtered.csv\")\n\n🐭 Mouse filtering\nWe will take a different approach to filtering the single cell data. For the Frog samples we are examining the control and the FGF treated samples. This means have a low number of counts overall means the gene is not really expressed (detected) in any condition, and filtering out those genes is removing things that definitely are not interesting. For the mice, we have examined only one cell type but will be making comparisons between cells types. It may be that low expression of a gene in this cell type tells us something if that gene is highly expressed in another cell type. Instead, we will make statistical comparisons between the cell types and then filter based on overall expression, the difference in expression between cell types and whether that difference is significant.\nThe number of “replicates” is also important. When you have only three in each group it is not possible to make statistical comparisons when several replicates are zero. This is less of an issue with single cell data."
+ "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-4",
+ "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-4",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Using the usethis package",
+ "text": "Using the usethis package\nThe output will look like this and a new RStudio session will start.\n> usethis::create_project(\"bananas\")\n√ Creating 'bananas/'\n√ Setting active project to 'C:/Users/er13/Desktop/bananas'\n√ Creating 'R/'\n√ Writing 'bananas.Rproj'\n√ Adding '.Rproj.user' to '.gitignore'\n√ Opening 'C:/Users/er13/Desktop/bananas/' in new RStudio session\n√ Setting active project to '<no active project>'"
},
{
- "objectID": "omics/week-3/workshop.html#look-after-future-you",
- "href": "omics/week-3/workshop.html#look-after-future-you",
- "title": "Workshop",
- "section": "🤗 Look after future you!",
- "text": "🤗 Look after future you!\nYou need only do the section for your own project data\n🐸 Frogs and future you\n🎬 Create a new Project, frogs-88H, populated with folders and your data. Make a script file called cont-fgf-s30.R. This will a be commented analysis of the control vs FGF at S30 comparison. You will build on this each workshop and be able to use it as a template to examine other comparisons. Copy in the appropriate code and comments from workshop-1.R. Edit to improve your comments where your understanding has developed since you made them. Make sure you can close down RStudio, reopen it and run your whole script again.\n🐭 Mice and future you\n🎬 Create a new Project, mice-88H, populated with folders and your data. Make a script file called hspc-prog.R. This will a be commented analysis of the hspc cells vs the prog cells. At this point you will have only code for the hspc cells. You will build on this each workshop and be able to use it as a template to examine other comparisons. Copy in the appropriate code and comments from workshop-1.R. Edit to improve your comments where your understanding has developed since you made them. Make sure you can close down RStudio, reopen it and run your whole script again.\n🍂 xxxx and future you\nDo one of the other two examples."
+ "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-5",
+ "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-5",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Using the usethis package",
+ "text": "Using the usethis package\nWhen you create a new RStudio Project with usethis:\n\n\nA folder called bananas/ is created\nRStudio starts a new session in bananas/ i.e., your working directory is now bananas/\n\nA folder called R/ is created\nA file called bananas.Rproj is created\nA file called .gitignore is created\nA hidden directory called .Rproj.user is created"
},
{
- "objectID": "omics/week-3/workshop.html#footnotes",
- "href": "omics/week-3/workshop.html#footnotes",
- "title": "Workshop",
- "section": "Footnotes",
- "text": "Footnotes\n\nThis a result of the Central limit theorem,one consequence of which is that adding together lots of distributions - whatever distributions they are - will tend to a normal distribution.↩︎\nThis a result of the Central limit theorem,one consequence of which is that adding together lots of distributions - whatever distributions they are - will tend to a normal distribution.↩︎\nI chose three because that would keep [0, 0, 0] [#,#,#]. This is difference we cannot test statistically, but which would matter biologically.↩︎"
+ "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-6",
+ "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-6",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Using the usethis package",
+ "text": "Using the usethis package\n\n\nthe .Rproj file is what makes the directory an RStudio Project\nthe Rproj.user directory is where project-specific temporary files are stored. You don’t need to mess with it.\nthe .gitignore is used for version controlled projects. If not using git, you can ignore it."
},
{
- "objectID": "omics/week-3/study_before_workshop.html#overview",
- "href": "omics/week-3/study_before_workshop.html#overview",
+ "objectID": "core/week-2/study_before_workshop.html#opening-and-closing",
+ "href": "core/week-2/study_before_workshop.html#opening-and-closing",
"title": "Independent Study to prepare for workshop",
- "section": "Overview",
- "text": "Overview\n\n\nConcise summary of the experimental design and aims\nWhat the raw data consist of\nWhat has been done to the data so far\nWhat steps we will take in the workshop"
+ "section": "Opening and closing",
+ "text": "Opening and closing\nYou can close an RStudio Project with ONE of:\n\nFile | Close Project\nUsing the drop-down option on the far right of the tool bar where you see the Project name\n\n\nYou can open an RStudio Project with ONE of:\n\nFile | Open Project or File | Recent Projects\n\nUsing the drop-down option on the far right of the tool bar where you see the Project name\n\nDouble-clicking an .Rproj file from your file explorer/finder\n\nWhen you open project, a new R session starts."
},
{
- "objectID": "omics/week-3/study_before_workshop.html#the-data",
- "href": "omics/week-3/study_before_workshop.html#the-data",
+ "objectID": "core/week-2/study_before_workshop.html#using-the-usethis-package-7",
+ "href": "core/week-2/study_before_workshop.html#using-the-usethis-package-7",
"title": "Independent Study to prepare for workshop",
- "section": "The Data",
- "text": "The Data\nThere are three datasets\n\n🐸 transcriptomic data (bulk RNA-seq) from frog embryos.\n🐭 transcriptomic data (single cell RNA-seq) from stemcells\n🍂 ??????? Metabolomic / Metagenomic data from anaerobic digesters"
+ "section": "Using the usethis package",
+ "text": "Using the usethis package\nOnce the RStudio project has been created, usethis helps you follow good practice.\n\n🎬 We can add a README with:\n\nusethis::use_readme_md()\n\n\n\nThis creates a file called README.md, with a little default text, in the Project directory and opens it for editing.\n\n\nmd stands for markdown, it is a extremely widely used text formatting language which is readable as plain text. If you have ever used asterisks to make text bold or italic, you have used markdown."
},
{
- "objectID": "omics/week-3/study_before_workshop.html#experimental-design-1",
- "href": "omics/week-3/study_before_workshop.html#experimental-design-1",
+ "objectID": "core/week-2/study_before_workshop.html#code-formatting-and-style-1",
+ "href": "core/week-2/study_before_workshop.html#code-formatting-and-style-1",
"title": "Independent Study to prepare for workshop",
- "section": "🐸 Experimental design",
- "text": "🐸 Experimental design\n\nSchematic of frog development experiment"
+ "section": "Code formatting and style",
+ "text": "Code formatting and style\n\n“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.”\n\nThe tidyverse style guide"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#experimental-design-2",
- "href": "omics/week-3/study_before_workshop.html#experimental-design-2",
+ "objectID": "core/week-2/study_before_workshop.html#code-formatting-and-style-2",
+ "href": "core/week-2/study_before_workshop.html#code-formatting-and-style-2",
"title": "Independent Study to prepare for workshop",
- "section": "🐸 Experimental design",
- "text": "🐸 Experimental design\n\nSchematic of frog development experiment\n\n3 fertilisations\ntwo siblings from each fertilisation one control, on FGF treated\nsequenced at three time points\n3 x 2 x 3 = 18 groups"
+ "section": "Code formatting and style",
+ "text": "Code formatting and style\nWe have all written code which is hard to read!\nWe all improve over time.\n\n\n\nThe only way to write good code is to write tons of shitty code first. Feeling shame about bad code stops you from getting to good code— Hadley Wickham (@hadleywickham) April 17, 2015"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#experimental-design-3",
- "href": "omics/week-3/study_before_workshop.html#experimental-design-3",
+ "objectID": "core/week-2/study_before_workshop.html#code-formatting-and-style-3",
+ "href": "core/week-2/study_before_workshop.html#code-formatting-and-style-3",
"title": "Independent Study to prepare for workshop",
- "section": "🐸 Experimental design",
- "text": "🐸 Experimental design\n\nSchematic of frog development experiment\n\n3 fertilisations. These are the replicates, .5, .6, A\ntwo siblings from each fertilisation one control, one FGF treated. The treatments are paired\nsequenced at three time points. S14, S20, S30\n3 x 2 x 3 = 18 groups"
+ "section": "Code formatting and style",
+ "text": "Code formatting and style\nSome keys points:\n\nbe consistent, emulate experienced coders\n\nuse snake_case for variable names (not CamelCase, dot.case)\n\nuse <- not = for assignment\n\nuse spacing around most operators and after commas\n\nuse indentation\n\navoid long lines, break up code blocks with new lines\n\nuse \" for quoting text (not ') unless the text contains double quotes"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#aim",
- "href": "omics/week-3/study_before_workshop.html#aim",
+ "objectID": "core/week-2/study_before_workshop.html#ugly-code",
+ "href": "core/week-2/study_before_workshop.html#ugly-code",
"title": "Independent Study to prepare for workshop",
- "section": "🐸 Aim",
- "text": "🐸 Aim\n\n\nfind genes important in frog development\nImportant means the genes that are differentially expressed between the control-treated and the FGF-treated siblings\nDifferentially expressed means the expression in one group is significantly higher than in the other"
+ "section": "😩 Ugly code 😩",
+ "text": "😩 Ugly code 😩\n\ndata<-read_csv('../data-raw/Y101_Y102_Y201_Y202_Y101-5.csv',skip=2)\nlibrary(janitor);sol<-clean_names(data)\ndata=data|>filter(str_detect(description,\"OS=Homo sapiens\"))|>filter(x1pep=='x')\ndata=data|>\nmutate(g=str_extract(description,\n\"GN=[^\\\\s]+\")|>str_replace(\"GN=\",''))\ndata<-data|>mutate(id=str_extract(accession,\"1::[^;]+\")|>str_replace(\"1::\",\"\"))"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#guided-analysis",
- "href": "omics/week-3/study_before_workshop.html#guided-analysis",
+ "objectID": "core/week-2/study_before_workshop.html#ugly-code-1",
+ "href": "core/week-2/study_before_workshop.html#ugly-code-1",
"title": "Independent Study to prepare for workshop",
- "section": "🐸 Guided analysis",
- "text": "🐸 Guided analysis\n\n\nThe workshops will take you through comparing the control and FGF treated sibling at S30\nThis is the “least interesting” comparison\nYou will be guided to carefully document your work so you can apply the same methods to other comparisons"
+ "section": "😩 Ugly code 😩",
+ "text": "😩 Ugly code 😩\n\nno spacing or indentation\ninconsistent splitting of code blocks over lines\ninconsistent use of quote characters\nno comments\nvariable names convey no meaning\nuse of = for assignment and inconsistently\nmultiple commands on a line\nlibrary statement in the middle of the analysis"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#experimental-design-4",
- "href": "omics/week-3/study_before_workshop.html#experimental-design-4",
+ "objectID": "core/week-2/study_before_workshop.html#cool-code",
+ "href": "core/week-2/study_before_workshop.html#cool-code",
"title": "Independent Study to prepare for workshop",
- "section": "🐭 Experimental design",
- "text": "🐭 Experimental design\n\nSchematic of stem cell experiment"
+ "section": "😎 Cool code 😎",
+ "text": "😎 Cool code 😎\n\n# Packages ----------------------------------------------------------------\nlibrary(tidyverse)\nlibrary(janitor)\n\n# Import ------------------------------------------------------------------\n\n# define file name\nfile <- \"../data-raw/Y101_Y102_Y201_Y202_Y101-5.csv\"\n\n# import: column headers and data are from row 3\nsolu_protein <- read_csv(file, skip = 2) |>\n janitor::clean_names()\n\n# Tidy data ----------------------------------------------------------------\n\n# filter out the bovine proteins and those proteins \n# identified from fewer than 2 peptides\nsolu_protein <- solu_protein |>\n filter(str_detect(description, \"OS=Homo sapiens\")) |>\n filter(x1pep == \"x\")\n\n# Extract the genename from description column to a column\n# of its own\nsolu_protein <- solu_protein |>\n mutate(genename = str_extract(description,\"GN=[^\\\\s]+\") |>\n str_replace(\"GN=\", \"\"))\n\n# Extract the top protein identifier from accession column (first\n# Uniprot ID after \"1::\") to a column of its own\nsolu_protein <- solu_protein |>\n mutate(protid = str_extract(accession, \"1::[^;]+\") |>\n str_replace(\"1::\", \"\"))"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#experimental-design-5",
- "href": "omics/week-3/study_before_workshop.html#experimental-design-5",
+ "objectID": "core/week-2/study_before_workshop.html#cool-code-1",
+ "href": "core/week-2/study_before_workshop.html#cool-code-1",
"title": "Independent Study to prepare for workshop",
- "section": "🐭 Experimental design",
- "text": "🐭 Experimental design\n\nSchematic of stem cell experiment\n\nCells were sorted using flow cytometry on the basis of cell surface markers\nThere are three cell types: LT-HSCs, HSPCs, Progs\nMany cells of each cell type were sequenced"
+ "section": "😎 Cool code 😎",
+ "text": "😎 Cool code 😎\n\nlibrary() calls collected\nUses code sections to make it easier to navigate\nUses white space and proper indentation\nCommented\nUses more informative name for the dataframe"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#experimental-design-6",
- "href": "omics/week-3/study_before_workshop.html#experimental-design-6",
+ "objectID": "core/week-2/study_before_workshop.html#code-algorithmically-1",
+ "href": "core/week-2/study_before_workshop.html#code-algorithmically-1",
"title": "Independent Study to prepare for workshop",
- "section": "🐭 Experimental design",
- "text": "🐭 Experimental design\n\nSchematic of stem cell experiment\n\nThere are three cell types: LT-HSCs, HSPCs, Progs These are the “treaments”\nMany cells of each type were sequenced: These are the replicates\n155 LT-HSCs, 701 HSPCs, 798 Progs"
+ "section": "Code ‘algorithmically’",
+ "text": "Code ‘algorithmically’\n\n\nWrite code which expresses the structure of the problem/solution.\nAvoid hard coding numbers if at all possible - declare variables instead\nDeclare frequently used values as variables at the start e.g., colour schemes, figure saving settings"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#aim-1",
- "href": "omics/week-3/study_before_workshop.html#aim-1",
+ "objectID": "core/week-2/study_before_workshop.html#hard-coding-numbers.",
+ "href": "core/week-2/study_before_workshop.html#hard-coding-numbers.",
"title": "Independent Study to prepare for workshop",
- "section": "🐭 Aim",
- "text": "🐭 Aim\n\n\nfind genes for cell surface proteins that are important in stem cell identity\nImportant means genes that are differentially expressed between at least two cell types\nDifferentially expressed means the expression in one group is significantly higher than in the other"
+ "section": "😩 Hard coding numbers.",
+ "text": "😩 Hard coding numbers.\n\n\nSuppose we want to calculate the sums of squares, \\(SS(x)\\), for the number of eggs in five nests.\nThe formula is given by: \\(\\sum (x_i- \\bar{x})^2\\)\nWe could calculate the mean and copy it, and the individual numbers into the formula"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#guided-analysis-1",
- "href": "omics/week-3/study_before_workshop.html#guided-analysis-1",
+ "objectID": "core/week-2/study_before_workshop.html#hard-coding-numbers.-1",
+ "href": "core/week-2/study_before_workshop.html#hard-coding-numbers.-1",
"title": "Independent Study to prepare for workshop",
- "section": "🐭 Guided analysis",
- "text": "🐭 Guided analysis\n\n\nThe workshops will take you through comparing the HSPC and Prog cells\nThis is the “least interesting” comparison\nYou will be guided to carefully document your work so you can apply the same methods to other comparisons"
+ "section": "😩 Hard coding numbers.",
+ "text": "😩 Hard coding numbers.\n\n# mean number of eggs per nest\nsum(3, 5, 6, 7, 8) / 5\n\n[1] 5.8\n\n# ss(x) of number of eggs\n(3 - 5.8)^2 + (5 - 5.8)^2 + (6 - 5.8)^2 + (7 - 5.8)^2 + (8 - 5.8)^2\n\n[1] 14.8\n\n\nI am coding the calculation of the mean rather using the mean() function only to explain what ‘coding algorithmically’ means using a simple example."
},
{
- "objectID": "omics/week-3/study_before_workshop.html#raw-sequence-data",
- "href": "omics/week-3/study_before_workshop.html#raw-sequence-data",
+ "objectID": "core/week-2/study_before_workshop.html#hard-coding-numbers",
+ "href": "core/week-2/study_before_workshop.html#hard-coding-numbers",
"title": "Independent Study to prepare for workshop",
- "section": "Raw Sequence data",
- "text": "Raw Sequence data\n\n\nThe raw data are “reads” from a sequencing machine.\nA read is sequence of DNA or RNA shorter than the whole genome or transcriptome\nThe length of the reads depends on the type of sequencing machine\n\nShort-read technologies e.g. Illumina have higher base accuracy but are harder to align\nLong-read technologies e.g. Nanopore have lower base accuracy but are easier to align"
+ "section": "😩 Hard coding numbers",
+ "text": "😩 Hard coding numbers\n\n\nif any of the sample numbers must be altered, all the code needs changing\nit is hard to tell that the output of the first line is a mean\nits hard to recognise that the numbers in the mean calculation correspond to those in the next calculation\nit is hard to tell that 5 is just the number of nests\nno way of know if numbers are the same by coincidence or they refer to the same thing"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#raw-sequence-data-1",
- "href": "omics/week-3/study_before_workshop.html#raw-sequence-data-1",
+ "objectID": "core/week-2/study_before_workshop.html#better",
+ "href": "core/week-2/study_before_workshop.html#better",
"title": "Independent Study to prepare for workshop",
- "section": "Raw Sequence data",
- "text": "Raw Sequence data\n\n\nSequencing technology is constantly improving\nOptional: You can read more about Sequencing technologies in Statistically useful experimental design (Rand and Forrester 2022)"
+ "section": "😎 Better",
+ "text": "😎 Better\n\n# eggs each nest\neggs <- c(3, 5, 6, 7, 8)\n\n# mean eggs per nest\nmean_eggs <- sum(eggs) / length(eggs)\n\n# ss(x) of number of eggs\nsum((eggs - mean_eggs)^2)\n\n[1] 14.8"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#raw-sequence-data-2",
- "href": "omics/week-3/study_before_workshop.html#raw-sequence-data-2",
+ "objectID": "core/week-2/study_before_workshop.html#better-1",
+ "href": "core/week-2/study_before_workshop.html#better-1",
"title": "Independent Study to prepare for workshop",
- "section": "Raw Sequence data",
- "text": "Raw Sequence data\n\n\nThe RNA-seq data are from an Illumina machine 150-300bp; Metagenomic data are often Nanopore 10,000 - 30000bp\nReads are in FASTQ files\nFASTQ files contain the sequence of each read and a quality score for each base"
+ "section": "😎 Better",
+ "text": "😎 Better\n\n\nthe commenting is similar but it is easier to follow\nif any of the sample numbers must be altered, only that number needs changing\nassigning a value you will later use to a variable with a meaningful name allows us to understand the first and second calculations\nmakes use of R’s elementwise calculation which resembles the formula (i.e., is expressed as the general rule)"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#general-steps",
- "href": "omics/week-3/study_before_workshop.html#general-steps",
+ "objectID": "core/week-2/study_before_workshop.html#summary",
+ "href": "core/week-2/study_before_workshop.html#summary",
"title": "Independent Study to prepare for workshop",
- "section": "General steps",
- "text": "General steps\n\n\nReads are filtered and trimmed on the basis of the quality score\nThey are then aligned/pseudo-aligned to a reference genome/transcriptome or, in metagenomics, assembled de novo.\nReads are then counted to quantify the expression or number of genomes in metagenomics\nCounts are normalised to account for differences in sequencing depth and gene/transcript/genome length before statistical analysis"
+ "section": "Summary",
+ "text": "Summary\n\n\nUse an RStudio project for any R work (you can also incorporate other languages)\nWrite Cool code not Ugly code: space, consistency, indentation, comments, meaningful variable names\nWrite code which expresses the structure of the problem/solution.\nAvoid hard coding numbers if at all possible - declare variables instead"
},
{
- "objectID": "omics/week-3/study_before_workshop.html#data",
- "href": "omics/week-3/study_before_workshop.html#data",
+ "objectID": "core/week-2/study_before_workshop.html#references",
+ "href": "core/week-2/study_before_workshop.html#references",
"title": "Independent Study to prepare for workshop",
- "section": "🐸 Data",
- "text": "🐸 Data\n\nUnpublished (so far!)\nExpression for the whole transcriptome X. laevis v10.1 genome assembly\nValues are raw counts\nThe statistical analysis method we will use DESeq2 (Love, Huber, and Anders 2014) requires raw counts and performs the normalisation itself"
+ "section": "References",
+ "text": "References\n\n\n🔗 About Core 2: File types, workflow tips and other tools\n\n\n\nBryan, Jennifer. 2018. “Excuse Me, Do You Have a Moment to Talk about Version Control?” Am. Stat. 72 (1): 20–27. https://doi.org/10.1080/00031305.2017.1399928.\n\n\nBryan, Jennifer, Jim Hester, Shannon Pileggi, and E. David Aja. n.d. What They Forgot to Teach You about r. https://rstats.wtf/.\n\n\nSandve, Geir Kjetil, Anton Nekrutenko, James Taylor, and Eivind Hovig. 2013. “Ten Simple Rules for Reproducible Computational Research.” PLoS Comput. Biol. 9 (10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285.\n\n\nWilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. “Good Enough Practices in Scientific Computing.” PLoS Comput. Biol. 13 (6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510."
},
{
- "objectID": "omics/week-3/study_before_workshop.html#data-1",
- "href": "omics/week-3/study_before_workshop.html#data-1",
- "title": "Independent Study to prepare for workshop",
- "section": "🐭 Data",
- "text": "🐭 Data\n\nPublished in Nestorowa et al. (2016)\nExpression for a subset of genes, the surfaceome\nValues are log2 normalised values\nThe statistical analysis method we will use scran (Lun, McCarthy, and Marioni 2016) requires normalised values"
+ "objectID": "core/core.html",
+ "href": "core/core.html",
+ "title": "Core Data Analysis",
+ "section": "",
+ "text": "There are three workshops taken by everyone on BIO00088H. These are in weeks 1, 2 and 11. The first two cover some useful workflow tips and how to organise your analyses effectively so they are reproducible but you will also have the chance to revise material from stage 1 and 2. The third workshop covers Research Compendia and Reproducible Reporting. In week 6 there is a drop-in session where you can ask questions about the material covered in the first two workshops.\nStudents doing BIO00070M will do week 1 and 2 of the core workshops, then 3-5 of the Omics workshops. You can also attend the week 6 drop-in. You do not do the week 11 session because your assessment differs. However, you will learn about Reproducible reporting in BIO00052M in week 10 because your that applies to your 52M assessment.\nGood organisation is important because you will want to be able to set work aside for holidays and assessment periods and then restart easily. You will also be assessed on the organisation, reproducibility and transparency of your work.\n\n\nThis week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work.\n\n\n\nThis week we will consider File types, workflow tips and other tools. The independent study (~20 mins) reiterates the value of RStudio projects and shows you how you create them with usethis. You will also learn how to recognise and write cool 😎 code, not 😩 ugly code and code algorithmically. In the workshop we will examine some common biological data formats and discover some awesome short cuts to help you write cool 😎 code. You will also get a brief introduction to the command line and Google Colab.\n\n\n\nThis week there is a drop-in session where you can ask questions about the material particular covered in the first two workshops. However, we will also endeavour to answer questions about any of the material in the omics, images or structure strand.\n\n\n\nThis week we will cover the “Research compendium” and reproducible reporting which are part of the assessment. Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual. We will also cover reproducible reporting which means using literate programming to weave together code and text together in a single document. Quarto is a multi-language literate programming tool (very like R Markdown)."
},
{
- "objectID": "omics/week-3/study_before_workshop.html#workshops-1",
- "href": "omics/week-3/study_before_workshop.html#workshops-1",
- "title": "Independent Study to prepare for workshop",
- "section": "Workshops",
- "text": "Workshops\n\nOmics 1: Hello data Getting to know the data. Checking the distributions of values overall, across samples and across genes to check things are as we expect and detect genes/samples that need to be removed\nOmics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments. This is the main analysis step. We will use different methods for bulk and single cell data.\nOmics 3: Visualising and Interpreting Production of volcano plots and heatmaps to visualise the results of the statistical analysis. We will also look at how to interpret the results and how to find out more about the genes of interest."
+ "objectID": "core/core.html#week-1-core-1-organising-reproducible-data-analyses",
+ "href": "core/core.html#week-1-core-1-organising-reproducible-data-analyses",
+ "title": "Core Data Analysis",
+ "section": "",
+ "text": "This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work."
},
{
- "objectID": "omics/week-3/study_before_workshop.html#references",
- "href": "omics/week-3/study_before_workshop.html#references",
- "title": "Independent Study to prepare for workshop",
- "section": "References",
- "text": "References\n\n\n🔗 About Omics 1: Hello data!\n\n\n\nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2” 15: 550. https://doi.org/10.1186/s13059-014-0550-8.\n\n\nLun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor” 5: 2122. https://doi.org/10.12688/f1000research.9501.2.\n\n\nNestorowa, Sonia, Fiona K. Hamey, Blanca Pijuan Sala, Evangelia Diamanti, Mairi Shepherd, Elisa Laurenti, Nicola K. Wilson, David G. Kent, and Berthold Göttgens. 2016. “A Single-Cell Resolution Map of Mouse Hematopoietic Stem and Progenitor Cell Differentiation.” Blood 128 (8): e20–31. https://doi.org/10.1182/blood-2016-05-716480.\n\n\nRand, Emma, and Sarah Forrester. 2022. “Statistically Useful Experimental Design.” https://cloud-span.github.io/experimental_design00-overview/."
+ "objectID": "core/core.html#week-2-core-2-file-types-workflow-tips-and-other-tools",
+ "href": "core/core.html#week-2-core-2-file-types-workflow-tips-and-other-tools",
+ "title": "Core Data Analysis",
+ "section": "",
+ "text": "This week we will consider File types, workflow tips and other tools. The independent study (~20 mins) reiterates the value of RStudio projects and shows you how you create them with usethis. You will also learn how to recognise and write cool 😎 code, not 😩 ugly code and code algorithmically. In the workshop we will examine some common biological data formats and discover some awesome short cuts to help you write cool 😎 code. You will also get a brief introduction to the command line and Google Colab."
},
{
- "objectID": "omics/week-5/workshop.html",
- "href": "omics/week-5/workshop.html",
- "title": "Workshop",
+ "objectID": "core/core.html#week-6-core-drop-in",
+ "href": "core/core.html#week-6-core-drop-in",
+ "title": "Core Data Analysis",
"section": "",
- "text": "In the workshop, you will learn how to merge gene information into our results, conduct and plot a Principle Component Analysis (PCA) as well as how to create a nicely formatted Volcano plot and heatmap."
+ "text": "This week there is a drop-in session where you can ask questions about the material particular covered in the first two workshops. However, we will also endeavour to answer questions about any of the material in the omics, images or structure strand."
},
{
- "objectID": "omics/week-5/workshop.html#session-overview",
- "href": "omics/week-5/workshop.html#session-overview",
- "title": "Workshop",
+ "objectID": "core/core.html#week-11-core-3-research-compendia-and-reproducible-reporting",
+ "href": "core/core.html#week-11-core-3-research-compendia-and-reproducible-reporting",
+ "title": "Core Data Analysis",
"section": "",
- "text": "In the workshop, you will learn how to merge gene information into our results, conduct and plot a Principle Component Analysis (PCA) as well as how to create a nicely formatted Volcano plot and heatmap."
+ "text": "This week we will cover the “Research compendium” and reproducible reporting which are part of the assessment. Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual. We will also cover reproducible reporting which means using literate programming to weave together code and text together in a single document. Quarto is a multi-language literate programming tool (very like R Markdown)."
},
{
- "objectID": "omics/week-5/workshop.html#import",
- "href": "omics/week-5/workshop.html#import",
+ "objectID": "core/week-11/workshop.html",
+ "href": "core/week-11/workshop.html",
"title": "Workshop",
- "section": "Import",
- "text": "Import\nWe need to import both the normalised counts and the statistical results. We will need all of these for the visualisation and interpretation.\n🎬 Import files saved from last week from the results folder: S30_normalised_counts.csv and S30_results.csv. I used the names s30_count_norm and s30_results for the dataframes.\n🎬 Remind yourself what is in the rows and columns and the structure of the dataframes (perhaps using glimpse())\n\n\n\n\n\n\n\n\n\n\n\n\n\nIt is useful to have this information in a single dataframe to which we will add the gene information from xenbase. Having all the information together will make it easier to interpret the results and select genes of interest.\n🎬 Merge the two dataframes:\n\n# merge the results with the normalised counts\ns30_results <- s30_count_norm |>\n left_join(s30_results, by = \"xenbase_gene_id\")\n\nThis means you have the counts for each sample along with the statistical results for each gene."
+ "section": "",
+ "text": "Literate programming is a way of writing code and text together in a single document\nThe document is then processed to produce a report\nQuarto (recommended) or R Markdown\n\nIn this workshop we will go through an example quarto document. You will learn:\n\nwhat the YAML header is\nformatting (bold, italics, headings)\nto control default and individual chunk options\nhow to add citations\nfigures and tables with cross referencing and automatic numbering\nhow to use inline coding to report results\nhow to insert special characters and equations"
},
{
- "objectID": "omics/week-5/workshop.html#add-gene-information-from-xenbase",
- "href": "omics/week-5/workshop.html#add-gene-information-from-xenbase",
+ "objectID": "core/week-11/workshop.html#literate-programming",
+ "href": "core/week-11/workshop.html#literate-programming",
"title": "Workshop",
- "section": "Add gene information from Xenbase",
- "text": "Add gene information from Xenbase\n\nI got the information from the Xenbase information pages under Data Reports | Gene Information\nThis is listed: Xenbase Gene Product Information [readme] gzipped gpi (tab separated)\nClick on the readme link to see the file format and columns\nI downloaded xenbase.gpi.gz, unzipped it, removed header lines and the Xenopus tropicalis (taxon:8364) entries and saved it as xenbase_info.xlsx\n\nIf you want to emulate what I did you can use the following commands in the terminal after downloading the file:\ngunzip xenbase.gpi.gz\nless xenbase.gpi\nq\ngunzip unzips the file and less allows you to view the file. q quits the viewer. You will see the header lines and that the file contains both Xenopus tropicalis and Xenopus laevis. I read the file in with read_tsv (skipping the first header lines) then filtered out the Xenopus tropicalis entries, dropped some columns and saved the file as an excel file.\nHowever, I have already done this for you and saved the file as xenbase_info.xlsx in the meta folder. We will import this file and join it to the results dataframe.\n🎬 Load the readxl (Wickham and Bryan 2023) package:\n\nlibrary(readxl)\n\n🎬 Import the Xenbase gene information file:\n\ngene_info <- read_excel(\"meta/xenbase_info.xlsx\") \n\nYou should view the resulting dataframe to see what information is available. You can use glimpse() or View().\n🎬 Merge the gene information with the results:\n\n# join the gene info with the results\ns30_results <- s30_results |>\n left_join(gene_info, by = \"xenbase_gene_id\")\n\nWe will also find it useful to import the metadata that maps the sample names to treatments. This will allow us to label the samples in the visualisations.\n🎬 Import the metadata that maps the sample names to treatments:\n\n# Import metadata that maps the sample names to treatments\nmeta <- read_table(\"meta/frog_meta_data.txt\")\nrow.names(meta) <- meta$sample_id\n# We only need the s30\nmeta_s30 <- meta |>\n dplyr::filter(stage == \"stage_30\")"
+ "section": "",
+ "text": "Literate programming is a way of writing code and text together in a single document\nThe document is then processed to produce a report\nQuarto (recommended) or R Markdown"
},
{
- "objectID": "omics/week-5/workshop.html#log2-transform-the-data",
- "href": "omics/week-5/workshop.html#log2-transform-the-data",
+ "objectID": "core/week-11/workshop.html#session-overview",
+ "href": "core/week-11/workshop.html#session-overview",
"title": "Workshop",
- "section": "log2 transform the data",
- "text": "log2 transform the data\nWe use the normalised counts for data visualisations so that the comparisons are meaningful. Since the fold changes are given is log2 it is useful to log2 transform the normalised counts too. We will add columns to the dataframe with these transformed values. Since we have some counts of 0 we will add a tiny amount to avoid -Inf values.\n🎬 log2 transform the normalised counts:\n\n# log2 transform the counts plus a tiny amount to avoid log(0)\ns30_results <- s30_results |>\n mutate(across(starts_with(\"s30\"), \n \\(x) log2(x + 0.001),\n .names = \"log2_{.col}\"))\n\nThis is a wonderful bit or R wizardry. We are using the across() function to apply a transformation to multiple columns. We have selected all the columns that start with s30. The \\(x) is an “anonymous” function that takes the value of the column and adds 0.001 to it before applying the log2() function. The .names = \"log2_{.col}\" argument tells across() to name the new columns with the prefix log2_ followed by the original column name. You can read more about across() and anonymous functions from my posit::conf(2023) workshop\nI recommend viewing the dataframe to see the new columns.\nWe now have dataframe with all the information we need: normalised counts, log2 normalised counts, statistical comparisons with fold changes and p values, and information about the gene other than just the id."
+ "section": "",
+ "text": "In this workshop we will go through an example quarto document. You will learn:\n\nwhat the YAML header is\nformatting (bold, italics, headings)\nto control default and individual chunk options\nhow to add citations\nfigures and tables with cross referencing and automatic numbering\nhow to use inline coding to report results\nhow to insert special characters and equations"
},
{
- "objectID": "omics/week-5/workshop.html#write-the-significant-genes-to-file",
- "href": "omics/week-5/workshop.html#write-the-significant-genes-to-file",
+ "objectID": "core/week-11/overview.html",
+ "href": "core/week-11/overview.html",
+ "title": "Overview",
+ "section": "",
+ "text": "This week we will cover the “Research compendium” and reproducible reporting which are part of the assessment. Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual. We will also cover reproducible reporting which means using literate programming to weave together code and text together in a single document. Quarto is a multi-language literate programming tool (very like R Markdown).\n\nLearning objectives\nThe successful student will be able to:\n\nexplain what a research compendium is and describe its components\nrelate the content and concepts in Core 1 and Core 2 to the research compendium\nCreate a quarto document and:\n\nappreciate the role of the YAML header\nformat text as bold, italics, headings etc\nadd citations and a bibliography\ncreate automatically numbered figures and tables with cross references in text\nset default code chunk behaviour and those for individual chunks\nuse inline code to report results\ninsert special characters and mathematical expressions with LaTeX\n\n\n\n\nInstructions\n\nPrepare\nWorkshop\nConsolidate by working on your project and research compendium"
+ },
+ {
+ "objectID": "core/week-1/workshop.html",
+ "href": "core/week-1/workshop.html",
"title": "Workshop",
- "section": "Write the significant genes to file",
- "text": "Write the significant genes to file\nWe will create dataframe of the significant genes and write them to file. These are the files you want to examine in more detail along with the visualisations to select your genes of interest.\n🎬 Create a dataframe of the genes significant at the 0.01 level:\n\ns30_results_sig0.01 <- s30_results |> \n filter(padj <= 0.01)\n\n🎬 Write the dataframe to file\n🎬 Create a dataframe of the genes significant at the 0.05 level and write to file:\n❓How many genes are significant at the 0.01 and 0.05 levels?"
+ "section": "",
+ "text": "In this workshop we will discuss why reproducibility matters and how to organise your work to make it reproducible. We will cover:"
},
{
- "objectID": "omics/week-5/workshop.html#view-the-relationship-between-samples-using-pca",
- "href": "omics/week-5/workshop.html#view-the-relationship-between-samples-using-pca",
+ "objectID": "core/week-1/workshop.html#session-overview",
+ "href": "core/week-1/workshop.html#session-overview",
"title": "Workshop",
- "section": "View the relationship between samples using PCA",
- "text": "View the relationship between samples using PCA\nWe have 10,136 genes in our dataset. PCA will allow us to plot our samples in the “gene expression” space so we can see if FGF-treated sample cluster together and control samples cluster together as we would expect. We do this on the log2 transformed normalised counts.\nOur data have genes in rows and samples in columns which is a common organisation for gene expression data. However, PCA expects samples in rows and genes, the variables, in columns. We can transpose the data to get it in the correct format.\n🎬 Transpose the log2 transformed normalised counts:\n\ns30_log2_trans <- s30_results |> \n select(starts_with(\"log2_\")) |>\n t() |> \n data.frame()\n\nWe have used the select() function to select all the columns that start with log2_. We then use the t() function to transpose the dataframe. We then convert the resulting matrix to a dataframe using data.frame(). If you view that dataframe you’ll see it has default column name which we can fix using colnames() to set the column names to the Xenbase gene ids.\n🎬 Set the column names to the Xenbase gene ids:\n\ncolnames(s30_log2_trans) <- s30_results$xenbase_gene_id\n\n🎬 Perform PCA on the log2 transformed normalised counts:\n\npca <- s30_log2_trans |>\n prcomp(rank. = 4) \n\nThe rank. argument tells prcomp() to only calculate the first 4 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.\n\nsummary(pca)\n\nImportance of first k=4 (out of 6) components:\n PC1 PC2 PC3 PC4\nStandard deviation 64.0124 47.3351 38.4706 31.4111\nProportion of Variance 0.4243 0.2320 0.1532 0.1022\nCumulative Proportion 0.4243 0.6562 0.8095 0.9116\n\n\nThe Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.4243 of the variance, the second 0.2320, and the third 0.1532. Together the first three components explain nearly 81% of the total variance in the data. Plotting PC1 against PC2 will capture about 66% of the variance which is likely much better than we would get plotting any two genes against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the samples.\n🎬 Remove log2 from the row names:\n\nsample_id <- row.names(s30_log2_trans) |> str_remove(\"log2_\")\n\n🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the sample ids:\n\npca_labelled <- data.frame(pca$x,\n sample_id)\n\n🎬 Merge with the metadata so we can label points by treatment and sibling pair:\n\npca_labelled <- pca_labelled |> \n left_join(meta_s30, \n by = \"sample_id\")\n\nSince the metadata contained the sample ids, it was especially important to remove the log2_ from the row names so that the join would work. The dataframe should look like this:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPC1\nPC2\nPC3\nPC4\nsample_id\nstage\ntreatment\nsibling_rep\n\n\n\n-76.38391\n0.814699\n-60.728327\n-5.820669\nS30_C_5\nstage_30\ncontrol\nfive\n\n\n-67.02571\n25.668563\n51.476835\n28.480254\nS30_C_6\nstage_30\ncontrol\nsix\n\n\n-14.02772\n-78.474054\n15.282058\n-9.213076\nS30_C_A\nstage_30\ncontrol\nA\n\n\n47.60726\n49.035510\n-19.288753\n20.928290\nS30_F_5\nstage_30\nFGF\nfive\n\n\n26.04954\n32.914201\n20.206072\n-55.752818\nS30_F_6\nstage_30\nFGF\nsix\n\n\n83.78054\n-29.958919\n-6.947884\n21.378020\nS30_F_A\nstage_30\nFGF\nA\n\n\n\n\n\n🎬 Plot PC1 against PC2 and colour by sibling pair and shape by treatment:\n\npca <- pca_labelled |> \n ggplot(aes(x = PC1, y = PC2, \n colour = sibling_rep,\n shape = treatment)) +\n geom_point(size = 3) +\n scale_colour_viridis_d(end = 0.95, begin = 0.15,\n name = \"Sibling pair\",\n labels = c(\"A\", \".5\", \".6\")) +\n scale_shape_manual(values = c(21, 19),\n name = NULL,\n labels = c(\"Control\", \"FGF-Treated\")) +\n theme_classic()\npca\n\n\n\n\nThere is a good separation between treatments on PCA1. The sibling pairs do not seem to cluster together.\n🎬 Save the plot to file:\n\nggsave(\"figures/frog-s30-pca.png\",\n plot = pca,\n height = 3, \n width = 4,\n units = \"in\",\n device = \"png\")"
+ "section": "",
+ "text": "In this workshop we will discuss why reproducibility matters and how to organise your work to make it reproducible. We will cover:"
},
{
- "objectID": "omics/week-5/workshop.html#visualise-the-expression-of-the-most-significant-genes-using-a-heatmap",
- "href": "omics/week-5/workshop.html#visualise-the-expression-of-the-most-significant-genes-using-a-heatmap",
+ "objectID": "core/week-1/workshop.html#what-is-reproducibility",
+ "href": "core/week-1/workshop.html#what-is-reproducibility",
"title": "Workshop",
- "section": "Visualise the expression of the most significant genes using a heatmap",
- "text": "Visualise the expression of the most significant genes using a heatmap\nA heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level.\nWe are going to create an interactive heatmap with the heatmaply (Galili et al. 2017) package. heatmaply takes a matrix as input so we need to convert a dataframe of the log2 values to a matrix. We will also set the rownames to the Xenbase gene symbols.\n🎬 Convert a dataframe of the log2 values to a matrix:\n\nmat <- s30_results_sig0.01 |> \n select(starts_with(\"log2_\")) |>\n as.matrix()\n\n🎬 Set the rownames to the Xenbase gene symbols:\n\nrownames(mat) <- s30_results_sig0.01$xenbase_gene_symbol\n\nYou might want to view the matrix by clicking on it in the environment pane.\n🎬 Load the heatmaply package:\n\nlibrary(heatmaply)\n\nWe need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the treatments.\n🎬 Set the number of clusters for the treatments and genes:\n\nn_treatment_clusters <- 2\nn_gene_clusters <- 2\n\n🎬 Create the heatmap:\n\nheatmaply(mat, \n scale = \"row\",\n k_col = n_treatment_clusters,\n k_row = n_gene_clusters,\n fontsize_row = 7, fontsize_col = 10,\n labCol = str_remove(colnames(mat), pattern = \"log2_\"),\n labRow = rownames(mat),\n heatmap_layers = theme(axis.line = element_blank()))\n\n\n\n\n\nOn the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are samples. We can see that the FGF-treated samples cluster together and the control samples cluster together. We can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples (the pink cluster) and the other shows genes down regulated (more blue, the blue cluster) in the FGF-treated samples.\nThe heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can “Show in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function."
+ "section": "What is reproducibility?",
+ "text": "What is reproducibility?\n\nReproducible: Same data + same analysis = identical results. “… obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis. This definition is synonymous with”computational reproducibility” (National Academies of Sciences et al. 2019)\nReplicable: Different data + same analysis = qualitatively similar results. The work is not dependent on the specificities of the data.\nRobust: Same data + different analysis = qualitatively similar or identical results. The work is not dependent on the specificities of the analysis.\nGeneralisable: Different data + different analysis = qualitatively similar results and same conclusions. The findings can be generalised\n\n\n\n\nThe Turing Way's definitions of reproducible research"
},
{
- "objectID": "omics/week-5/workshop.html#visualise-all-the-results-with-a-volcano-plot",
- "href": "omics/week-5/workshop.html#visualise-all-the-results-with-a-volcano-plot",
+ "objectID": "core/week-1/workshop.html#why-does-it-matter",
+ "href": "core/week-1/workshop.html#why-does-it-matter",
"title": "Workshop",
- "section": "Visualise all the results with a volcano plot",
- "text": "Visualise all the results with a volcano plot\ncolour the points if padj < 0.05 and log2FoldChange > 1\n\nlibrary(ggrepel)\n\n\ns30_results <- s30_results |> \n mutate(log10_padj = -log10(padj),\n sig = padj < 0.05,\n bigfc = abs(log2FoldChange) >= 2) \n\n\nvol <- s30_results |> \n ggplot(aes(x = log2FoldChange, \n y = log10_padj, \n colour = interaction(sig, bigfc))) +\n geom_point() +\n geom_hline(yintercept = -log10(0.05), \n linetype = \"dashed\") +\n geom_vline(xintercept = 2, \n linetype = \"dashed\") +\n geom_vline(xintercept = -2, \n linetype = \"dashed\") +\n scale_x_continuous(expand = c(0, 0)) +\n scale_y_continuous(expand = c(0, 0)) +\n scale_colour_manual(values = c(\"gray\", \n \"pink\",\n \"gray30\",\n \"deeppink\")) +\n geom_text_repel(data = subset(s30_results, \n bigfc & sig),\n aes(label = xenbase_gene_symbol),\n size = 3,\n max.overlaps = 50) +\n theme_classic() +\n theme(legend.position = \"none\")\nvol\n\n\n\n\n\nggsave(\"figures/frog-s30-volcano.png\",\n plot = vol,\n height = 4.5, \n width = 4.5,\n units = \"in\",\n device = \"png\")"
+ "section": "Why does it matter?",
+ "text": "Why does it matter?\n\n\n\nfutureself, CC-BY-NC, by Julen Colomb\n\n\n\nFive selfish reasons to work reproducibly (Markowetz 2015). Alternatively, see the very entertaining talk\nMany high profile cases of work which did not reproduce e.g. Anil Potti unravelled by Baggerly and Coombes (2009)\nWill become standard in Science and publishing e.g OECD Global Science Forum Building digital workforce capacity and skills for data-intensive science (OECD Global Science Forum 2020)"
},
{
- "objectID": "omics/week-5/workshop.html#import-1",
- "href": "omics/week-5/workshop.html#import-1",
+ "objectID": "core/week-1/workshop.html#how-to-achieve-reproducibility",
+ "href": "core/week-1/workshop.html#how-to-achieve-reproducibility",
"title": "Workshop",
- "section": "Import",
- "text": "Import\nWe need to import both the normalised counts and the statistical results. We will need all of these for the visualisation and interpretation.\n🎬 Import the normalised counts for the Prog and HSPC cell types. I used the names prog and hspc for the dataframes.\n🎬 Combine the two dataframes (minus one set of gene ids) into one dataframe called prog_hspc:\n\n# combine into one dataframe dropping one of the gene id columns\nprog_hspc <- bind_cols(prog, hspc[-1])\n\n🎬 Import the statistical results in results/prog_hspc_results.csv. I used the name prog_hspc_results for the dataframe.\n🎬 Remind yourself what is in the rows and columns and the structure of the dataframe (perhaps using glimpse())\n\n\n\n\n\n\nIt is useful to have this information in a single dataframe to which we will add the gene information from Ensembl Having all the information together will make it easier to interpret the results and select genes of interest.\n🎬 Merge the two dataframes:\n\n# merge stats results with normalise values\nprog_hspc_results <- prog_hspc_results |> \n left_join(prog_hspc, by = \"ensembl_gene_id\")\n\nThis means you have the counts for each sample along with the statistical results for each gene."
+ "section": "How to achieve reproducibility",
+ "text": "How to achieve reproducibility\n\nScripting\nOrganisation: Project-oriented workflows with file and folder structure, naming things\nDocumentation: Readme files, code comments, metadata, version control"
},
{
- "objectID": "omics/week-5/workshop.html#add-gene-information-from-ensembl-using-biomart",
- "href": "omics/week-5/workshop.html#add-gene-information-from-ensembl-using-biomart",
+ "objectID": "core/week-1/workshop.html#rationale-for-scripting",
+ "href": "core/week-1/workshop.html#rationale-for-scripting",
"title": "Workshop",
- "section": "Add gene information from Ensembl using biomaRt",
- "text": "Add gene information from Ensembl using biomaRt\nEnsembl (Martin et al. 2023; Birney et al. 2004)is a bioinformatics project to organise all the biological information around the sequences of large genomes. The are a large number of databases but BioMart (Smedley et al. 2009) provides a consistent interface to the material. There are web-based tools to use these but the R package biomaRt (Durinck et al. 2009) gives you programmatic access making it easier to integrate information into R dataframes\n🎬 Load the biomaRt (Durinck et al. 2009) package:\n\nlibrary(biomaRt)\n\n🎬 Connect to the mouse database and see the first 20 bits of information we can retrieve:\n\n# Connect to the mouse database\nensembl <- useMart(biomart = \"ensembl\", \n dataset = \"mmusculus_gene_ensembl\")\n\n# See what information we can retrieve\nlistAttributes(mart = ensembl) |> head(20)\n\n name description\n1 ensembl_gene_id Gene stable ID\n2 ensembl_gene_id_version Gene stable ID version\n3 ensembl_transcript_id Transcript stable ID\n4 ensembl_transcript_id_version Transcript stable ID version\n5 ensembl_peptide_id Protein stable ID\n6 ensembl_peptide_id_version Protein stable ID version\n7 ensembl_exon_id Exon stable ID\n8 description Gene description\n9 chromosome_name Chromosome/scaffold name\n10 start_position Gene start (bp)\n11 end_position Gene end (bp)\n12 strand Strand\n13 band Karyotype band\n14 transcript_start Transcript start (bp)\n15 transcript_end Transcript end (bp)\n16 transcription_start_site Transcription start site (TSS)\n17 transcript_length Transcript length (including UTRs and CDS)\n18 transcript_tsl Transcript support level (TSL)\n19 transcript_gencode_basic GENCODE basic annotation\n20 transcript_appris APPRIS annotation\n page\n1 feature_page\n2 feature_page\n3 feature_page\n4 feature_page\n5 feature_page\n6 feature_page\n7 feature_page\n8 feature_page\n9 feature_page\n10 feature_page\n11 feature_page\n12 feature_page\n13 feature_page\n14 feature_page\n15 feature_page\n16 feature_page\n17 feature_page\n18 feature_page\n19 feature_page\n20 feature_page\n\n\nThere are many (2,985!) possible bits of information (attributes) that can be obtained. You can replace head(20) with View() to see them all.\nWe use the getBM() function to retrieve information from the database. The filters argument is used to specified what kind of identifier we are supplying to retrieve information. The attributes argument is used to select the information we want to retrieve. The values argument is used to specify the identifiers. The mart argument is used to specify the connection we created.\n🎬 Get the gene information:\n\ngene_info <- getBM(filters = \"ensembl_gene_id\",\n attributes = c(\"ensembl_gene_id\",\n \"external_gene_name\",\n \"description\"),\n values = prog_hspc_results$ensembl_gene_id,\n mart = ensembl)\n\nWe are getting the gene name and and a description. We also need to get the id because we will use that to merge the gene_info dataframe with the prog_hspc_results dataframe. Notice the dataframe returned only has 279 rows - one of the ids does not have information.\n🎬 We can find which is missing with:\n\nprog_hspc_results |> select(ensembl_gene_id) |> \n filter(!ensembl_gene_id %in% gene_info$ensembl_gene_id)\n\nError:\n! [conflicted] select found in 2 packages.\nEither pick the one you want with `::`:\n• biomaRt::select\n• plotly::select\nOr declare a preference with `conflicts_prefer()`:\n• `conflicts_prefer(biomaRt::select)`\n• `conflicts_prefer(plotly::select)`\n\n\nOh, conflicted has flagged a conflict for us.\n🎬 Take the appropriate action to resolve the conflict:\n❓ What is the id which is missing information?\n\n\nWe might want to look that up - but let’s worry about it later if it turns out to be something important.\n🎬 Merge the gene information with the results:\n\nprog_hspc_results <- prog_hspc_results |> \n left_join(gene_info, by = \"ensembl_gene_id\")\n\nI recommend viewing the dataframe to see the new columns. We now have dataframe with all the info we need, normalised counts, log2 normalised counts, statistical comparisons with fold changes and p values, information about the gene other than just the id"
+ "section": "Rationale for scripting?",
+ "text": "Rationale for scripting?\n\nScience is the generation of ideas, designing work to test them and reporting the results.\nWe ensure laboratory and field work is replicable, robust and generalisable by planning and recording in lab books and using standard protocols. Repeating results is still hard.\nWorkflows for computational projects, and the data analysis and reporting of other work can, and should, be 100% reproducible!\nScripting is the way to achieve this."
},
{
- "objectID": "omics/week-5/workshop.html#write-the-significant-genes-to-file-1",
- "href": "omics/week-5/workshop.html#write-the-significant-genes-to-file-1",
+ "objectID": "core/week-1/workshop.html#project-oriented-workflow",
+ "href": "core/week-1/workshop.html#project-oriented-workflow",
"title": "Workshop",
- "section": "Write the significant genes to file",
- "text": "Write the significant genes to file\nWe will create dateframe of the signifcant genes and write them to file. These are the files you want to examine in more detail along with the visualisations to select your genes of interest.\n🎬 Create a dataframe of the genes significant at the 0.01 level:\n\nprog_hspc_results_sig0.01 <- prog_hspc_results |> \n filter(FDR <= 0.01)\n\n🎬 Write the dataframe to file\n🎬 Create a dataframe of the genes significant at the 0.05 level and write to file:\n❓How many genes are significant at the 0.01 and 0.05 levels?"
+ "section": "Project-oriented workflow",
+ "text": "Project-oriented workflow\n\nuse folders to organise your work\nyou are aiming for structured, systematic and repeatable.\ninputs and outputs should be clearly identifiable from structure and/or naming\n\nExamples\n-- liver_transcriptome/\n |__data\n |__raw/\n |__processed/\n |__images/\n |__code/\n |__reports/\n |__figures/"
},
{
- "objectID": "omics/week-5/workshop.html#view-the-relationship-between-cells-using-pca",
- "href": "omics/week-5/workshop.html#view-the-relationship-between-cells-using-pca",
+ "objectID": "core/week-1/workshop.html#naming-things",
+ "href": "core/week-1/workshop.html#naming-things",
"title": "Workshop",
- "section": "View the relationship between cells using PCA",
- "text": "View the relationship between cells using PCA\nWe have 280 genes in our dataset. PCA will allow us to plot our cells in the “gene expression” space so we can see if Prog cells cluster together and HSPC cells cluster together as we would expect. We do this on the log2 transformed normalised counts.\nOur data have genes in rows and samples in columns which is a common organisation for gene expression data. However, PCA expects cells in rows and genes, the variables, in columns. We can transpose the data to get it in the correct format.\n🎬 Transpose the log2 transformed normalised counts:\n\nprog_hspc_trans <- prog_hspc_results |> \n dplyr::select(starts_with(c(\"Prog_\", \"HSPC_\"))) |>\n t() |> \n data.frame()\n\nWe have used the select() function to select all the columns that start with Prog_ or HSPC_. We then use the t() function to transpose the dataframe. We then convert the resulting matrix to a dataframe using data.frame(). If you view that dataframe you’ll see it has default column name which we can fix using colnames() to set the column names to the gene ids.\n🎬 Set the column names to the gene ids:\n\ncolnames(prog_hspc_trans) <- prog_hspc_results$ensembl_gene_id\n\nperform PCA using standard functions\n\npca <- prog_hspc_trans |>\n prcomp(rank. = 15) \n\nThe rank. argument tells prcomp() to only calculate the first 15 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.\n\nsummary(pca)\n\nImportance of first k=15 (out of 280) components:\n PC1 PC2 PC3 PC4 PC5 PC6 PC7\nStandard deviation 12.5612 8.36646 5.98988 5.41386 4.55730 4.06142 3.84444\nProportion of Variance 0.1099 0.04874 0.02498 0.02041 0.01446 0.01149 0.01029\nCumulative Proportion 0.1099 0.15861 0.18359 0.20400 0.21846 0.22995 0.24024\n PC8 PC9 PC10 PC11 PC12 PC13 PC14\nStandard deviation 3.70848 3.66899 3.5549 3.48508 3.44964 3.42393 3.37882\nProportion of Variance 0.00958 0.00937 0.0088 0.00846 0.00829 0.00816 0.00795\nCumulative Proportion 0.24982 0.25919 0.2680 0.27645 0.28473 0.29290 0.30085\n PC15\nStandard deviation 3.33622\nProportion of Variance 0.00775\nCumulative Proportion 0.30860\n\n\nThe Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.1099 of the variance, the second 0.04874, and the third 0.2498. Together the first three components explain 18% of the total variance in the data. Plotting PC1 against PC2 will capture about 16% of the variance. This is not that high but it likely better than we would get plotting any two genes against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the cells.\n🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the cell ids:\n\npca_labelled <- data.frame(pca$x,\n cell_id = row.names(prog_hspc_trans))\n\nIt will be helpful to add a column for the cell type so we can label points. One way to do this is to extract the information in the cell_id column into two columns.\n🎬 Extract the cell type and cell number from the cell_id column (keeping the cell_id column):\n\npca_labelled <- pca_labelled |> \n extract(cell_id, \n remove = FALSE,\n c(\"cell_type\", \"cell_number\"),\n \"([a-zA-Z]{4})_([0-9]{3})\")\n\n\"([a-zA-Z]{4})_([0-9]{3})\" is a regular expression - or regex. [a-zA-Z] means any lower or upper case letter, {4} means 4 of them, and [0-9] means any number, {3} means 3 of them. The brackets around the two parts of the regex mean we want to extract those parts. The first part goes into cell_type and the second part goes into cell_number. The _ between the two patterns matches the underscore and the fact it isn’t in a bracket means we don’t want to keep it.\nWe can now plot the PC1 and PC2 scores.\n🎬 Plot PC1 against PC2 and colour the points by cell type:\n\npca <- pca_labelled |> \n ggplot(aes(x = PC1, y = PC2, \n colour = cell_type)) +\n geom_point(alpha = 0.4) +\n scale_colour_viridis_d(end = 0.8, begin = 0.15,\n name = \"Cell type\") +\n theme_classic()\npca\n\n\n\n\nFairly good separation of cell types but plenty of overlap\n🎬 Save the plot to file:\n\nggsave(\"figures/prog_hspc-pca.png\",\n plot = pca,\n height = 3, \n width = 4,\n units = \"in\",\n device = \"png\")"
+ "section": "Naming things",
+ "text": "Naming things\n\n\n\ndocuments, CC-BY-NC, https://xkcd.com/1459/\n\n\nGuiding principle - Have a convention! Good file names are:\n\nmachine readable\nhuman readable\nplay nicely with sorting\n\nI suggest\n\nno spaces in names\nuse snake_case or kebab-case rather than CamelCase or dot.case\nuse all lower case except very occasionally where convention is otherwise, e.g., README, LICENSE\nordering: use left-padded numbers e.g., 01, 02….99 or 001, 002….999\ndates ISO 8601 format: 2020-10-16\nwrite down your conventions\n\n-- liver_transcriptome/\n |__data\n |__raw/\n |__2022-03-21_donor_1.csv\n |__2022-03-21_donor_2.csv\n |__2022-03-21_donor_3.csv\n |__2022-05-14_donor_1.csv\n |__2022-05-14_donor_2.csv\n |__2022-05-14_donor_3.csv\n |__processed/\n |__images/\n |__code/\n |__functions/\n |__summarise.R\n |__normalise.R\n |__theme_volcano.R\n |__01_data_processing.py\n |__02_exploratory.R\n |__03_modelling.R\n |__04_figures.R\n |__reports/\n |__01_report.qmd\n |__02_supplementary.qmd\n |__figures/\n |__01_volcano_donor_1_vs_donor_2.eps\n |__02_volcano_donor_1_vs_donor_3.eps"
},
{
- "objectID": "omics/week-5/workshop.html#visualise-the-expression-of-the-most-significant-genes-using-a-heatmap-1",
- "href": "omics/week-5/workshop.html#visualise-the-expression-of-the-most-significant-genes-using-a-heatmap-1",
+ "objectID": "core/week-1/workshop.html#readme-files",
+ "href": "core/week-1/workshop.html#readme-files",
"title": "Workshop",
- "section": "Visualise the expression of the most significant genes using a heatmap",
- "text": "Visualise the expression of the most significant genes using a heatmap\nA heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level.\nWe are going to create an interactive heatmap with the heatmaply (Galili et al. 2017) package. heatmaply takes a matrix as input so we need to convert a dataframe of the log2 values to a matrix. We will also set the rownames to the gene names.\n🎬 Convert a dataframe of the log2 values to a matrix. I have used sample() to select 70 random columns so the heatmap is generated quickly:\n\nmat <- prog_hspc_results_sig0.01 |> \n dplyr::select(starts_with(c(\"Prog\", \"HSPC\"))) |>\n dplyr::select(sample(1:1499, size = 70)) |>\n as.matrix()\n\n🎬 Set the row names to the gene names:\n\nrownames(mat) <- prog_hspc_results_sig0.01$external_gene_name\n\nYou might want to view the matrix by clicking on it in the environment pane.\n🎬 Load the heatmaply package:\n\nlibrary(heatmaply)\n\nWe need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the cell types to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the cell types.\n\nn_cell_clusters <- 2\nn_gene_clusters <- 2\n\n🎬 Create the heatmap:\n\nheatmaply(mat, \n scale = \"row\",\n k_col = n_cell_clusters,\n k_row = n_gene_clusters,\n fontsize_row = 7, fontsize_col = 10,\n labCol = colnames(mat),\n labRow = rownames(mat),\n heatmap_layers = theme(axis.line = element_blank()))\n\n\n\n\n\nIt will take a minute to run and display. On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are cells. We can see that cells of the same type don’t cluster that well together. We can also see two clusters of genes but the pattern of gene is not as clear as it was for the frogs and the correspondence with the cell clusters is not as strong.\nThe heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can “Show in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function.\nUsing all the cells is worth doing but it will take a while to generate the heatmap and then show in the viewer so do it sometime when you’re ready for a coffee break."
+ "section": "Readme files",
+ "text": "Readme files\nREADMEs are a form of documentation which have been widely used for a long time. They contain all the information about the other files in a directory. They can be extensive but need not be. Concise is good. Bullet points are good\n\nGive a project title and description, brief\nstart date, last updated date and contact information\nOutline the folder structure\nGive software requirements: programs and versions used or required. There are packages that give session information in R Wickham et al. (2021) and Python Ostblom, Joel (2019)\n\nR:\nsessioninfo::session_info()\nPython:\nimport session_info\nsession_info.show()\n\nInstructions run the code, build reports, and reproduce the figures etc\nWhere to find the data, outputs\nAny other information that needed to understand and recreate the work\nIdeally, a summary of changes with the date\n\n-- liver_transcriptome/\n |__data\n |__raw/\n |__2022-03-21_donor_1.csv\n |__2022-03-21_donor_2.csv\n |__2022-03-21_donor_3.csv\n |__2022-05-14_donor_1.csv\n |__2022-05-14_donor_2.csv\n |__2022-05-14_donor_3.csv\n |__processed/\n |__images/\n |__code/\n |__functions/\n |__summarise.R\n |__normalise.R\n |__theme_volcano.R\n |__01_data_processing.py\n |__02_exploratory.R\n |__03_modelling.R\n |__04_figures.R\n |__README.md\n |__reports/\n |__01_report.qmd\n |__02_supplementary.qmd\n |__figures/\n |__01_volcano_donor_1_vs_donor_2.eps\n |__02_volcano_donor_1_vs_donor_3.eps"
},
{
- "objectID": "omics/week-5/workshop.html#visualise-all-the-results-with-a-volcano-plot-1",
- "href": "omics/week-5/workshop.html#visualise-all-the-results-with-a-volcano-plot-1",
+ "objectID": "core/week-1/workshop.html#code-comments",
+ "href": "core/week-1/workshop.html#code-comments",
"title": "Workshop",
- "section": "Visualise all the results with a volcano plot",
- "text": "Visualise all the results with a volcano plot\ncolour the points if FDR < 0.05 and prog_hspc_results > 1\n\nlibrary(ggrepel)\n\n\nprog_hspc_results <- prog_hspc_results |> \n mutate(log10_FDR = -log10(FDR),\n sig = FDR < 0.05,\n bigfc = abs(summary.logFC) >= 2) \n\n\nvol <- prog_hspc_results |> \n ggplot(aes(x = summary.logFC, \n y = log10_FDR, \n colour = interaction(sig, bigfc))) +\n geom_point() +\n geom_hline(yintercept = -log10(0.05), \n linetype = \"dashed\") +\n geom_vline(xintercept = 2, \n linetype = \"dashed\") +\n geom_vline(xintercept = -2, \n linetype = \"dashed\") +\n scale_x_continuous(expand = c(0, 0)) +\n scale_y_continuous(expand = c(0, 0)) +\n scale_colour_manual(values = c(\"gray\",\n \"pink\",\n \"deeppink\")) +\n geom_text_repel(data = subset(prog_hspc_results, \n bigfc & sig),\n aes(label = external_gene_name),\n size = 3,\n max.overlaps = 50) +\n theme_classic() +\n theme(legend.position = \"none\")\nvol\n\n\n\n\n\nggsave(\"figures/prog-hspc-volcano.png\",\n plot = vol,\n height = 4.5, \n width = 4.5,\n units = \"in\",\n device = \"png\")"
+ "section": "Code comments",
+ "text": "Code comments\n\nComments are notes in the code which are not executed. They are ignored by the computer but are read by humans. They are used to explain what the code is doing and why. They are also used to temporarily remove code from execution."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#overview",
- "href": "omics/week-5/study_before_workshop.html#overview",
- "title": "Independent Study to prepare for workshop",
- "section": "Overview",
- "text": "Overview\nIn these slides we will:\n\n\nCheck where you are\n\nlearn some concepts used omics visualisation\n\nPrinciple Component Analysis (PCA)\nVolcano plots\nHeatmaps\n\n\nFind out what packages to install before the workshop"
+ "objectID": "core/week-1/overview.html",
+ "href": "core/week-1/overview.html",
+ "title": "Overview",
+ "section": "",
+ "text": "This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work. We will also examine some file types and the concept of tidy data.\n\nLearning objectives\nThe successful student will be able to:\n\nexplain the organisation of files and directories in a file systems including root, home and working directories\nexplain absolute and relative file paths\nexplain why working reproducibly is important\nknow how to use a project-oriented workflow to organise work\nbe able to give files human- and machine-readable names\noutline some common biological data file formats\n\n\n\nInstructions\n\nPrepare\n\n📖 Read Understanding file systems\n\nWorkshop\nConsolidate"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#what-we-did-in-omics-2-statistical-analysis",
- "href": "omics/week-5/study_before_workshop.html#what-we-did-in-omics-2-statistical-analysis",
- "title": "Independent Study to prepare for workshop",
- "section": "What we did in Omics 2: Statistical Analysis",
- "text": "What we did in Omics 2: Statistical Analysis\n\n\ncarried out differential expression analysis\nfound genes not expressed at all, or expressed in one group only\nSaved results files"
+ "objectID": "core/week-6/workshop.html",
+ "href": "core/week-6/workshop.html",
+ "title": "Workshop",
+ "section": "",
+ "text": "Use this session to ask any questions about Core 1 Organising reproducible data analyses and Core 2 File types, workflow tips and other tools in particular, or about R and RStudio in general. We will also try to answer any questions about the ’mics, Image and Structure strands.\n88H students might also review Stage 1 and 2 content to see if there are areas you might benefit from revisiting. You can access these through the past VLE sites but you might find it helpful to use the latest versions because there is no 2FA and the resources are searchable.\nStage 1\n\nData Analysis in R for Becoming a Bioscientist 1.Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data.\nData Analysis in R for Becoming a Bioscientist 2. The logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one- and two-way analysis of variance (ANOVA).\n\nStage 2\n\nGet Introductory Statistical Tests as Linear models: A guide for R users\nA simple introduction to GLM for analysing Poisson and Binomial responses in R\n\n70M students might also review 52M content to see if there are areas you might benefit from revisiting. You can access these through the VLE site but you might find it helpful to use this link without 2FA.\n\n52M Data Analysis in R. Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data, the logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one-way analysis of variance (ANOVA) and reproducible reports in Quarto.\n\nPages made with R (R Core Team 2023), Quarto (Allaire et al. 2022), knitr (Xie 2022), kableExtra (Zhu 2021)"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#where-should-you-be-1",
- "href": "omics/week-5/study_before_workshop.html#where-should-you-be-1",
- "title": "Independent Study to prepare for workshop",
- "section": "Where should you be?",
- "text": "Where should you be?\nAfter the Omics 2: 👋 Statistical Analysis Workshop including:\n\n🤗 Look after future you! and\nthe Independent Study to consolidate, you should have:"
+ "objectID": "core/week-6/workshop.html#session-overview",
+ "href": "core/week-6/workshop.html#session-overview",
+ "title": "Workshop",
+ "section": "",
+ "text": "Use this session to ask any questions about Core 1 Organising reproducible data analyses and Core 2 File types, workflow tips and other tools in particular, or about R and RStudio in general. We will also try to answer any questions about the ’mics, Image and Structure strands.\n88H students might also review Stage 1 and 2 content to see if there are areas you might benefit from revisiting. You can access these through the past VLE sites but you might find it helpful to use the latest versions because there is no 2FA and the resources are searchable.\nStage 1\n\nData Analysis in R for Becoming a Bioscientist 1.Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data.\nData Analysis in R for Becoming a Bioscientist 2. The logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one- and two-way analysis of variance (ANOVA).\n\nStage 2\n\nGet Introductory Statistical Tests as Linear models: A guide for R users\nA simple introduction to GLM for analysing Poisson and Binomial responses in R\n\n70M students might also review 52M content to see if there are areas you might benefit from revisiting. You can access these through the VLE site but you might find it helpful to use this link without 2FA.\n\n52M Data Analysis in R. Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data, the logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one-way analysis of variance (ANOVA) and reproducible reports in Quarto.\n\nPages made with R (R Core Team 2023), Quarto (Allaire et al. 2022), knitr (Xie 2022), kableExtra (Zhu 2021)"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#frogs",
- "href": "omics/week-5/study_before_workshop.html#frogs",
- "title": "Independent Study to prepare for workshop",
- "section": "🐸 Frogs",
- "text": "🐸 Frogs\n\n\nAn RStudio Project called frogs-88H which contains:\n\nRaw data (S14, S20 and S30)\nProcessed data (s30_filtered.csv, s30_summary_gene.csv, s30_summary_gene_filtered.csv, s30_summary_samp.csv and equivalents for S14 OR S20)\nResults files (s30_fgf_only.csv, S30_normalised_counts.csv, S30_results.csv and equivalents for S14 OR S20)\n\nTwo scripts called cont-fgf-s30.R and either cont-fgf-s20.R OR cont-fgf-s14.R\n\n\n\n\n\nFiles should be organised into folders. Code should well commented and easy to read."
+ "objectID": "core/week-6/overview.html",
+ "href": "core/week-6/overview.html",
+ "title": "Overview",
+ "section": "",
+ "text": "This week’s session is a drop-in and introduces no new material. Instead, it is an opportunity to ask questions about the content from Core 1 and 2 and to revise skills from stage 1 and 2 as needed.\n\nInstructions\n\nPrepare\n\n📖 Review content from Core 1 and 2\n\nWorkshop\n\n💻 Ask questions about the content from Core 1 and 2 as needed\n💻 Revise skills from stage 1 and 2 (88H students) or 52M (70M students) as needed\n\nConsolidate\n\nThere is no consolidation work for this drop-in"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#mice",
- "href": "omics/week-5/study_before_workshop.html#mice",
- "title": "Independent Study to prepare for workshop",
- "section": "🐭 Mice",
- "text": "🐭 Mice\n\n\nAn RStudio Project called mice-88H which contains\n\nRaw data (hspc, prog, lthsc)\nProcessed data (hspc_summary_gene.csv, hspc_summary_samp.csv, prog_summary_gene.csv, prog_summary_samp.csv, lthsc_summary_gene.csv, lthsc_summary_samp.csv)\n\n\nResults files (prog_hspc_results.csv and an equivalent for lthsc vs prog or hspc vs lthsc)\nTwo scripts called hspc-prog.R and either hspc-lthsc.R OR prog-lthsc.R\n\n\n\nFiles should be organised into folders. Code should well commented and easy to read."
+ "objectID": "images/images.html",
+ "href": "images/images.html",
+ "title": "Image Data Analysis for Group Project",
+ "section": "",
+ "text": "The following ImageJ workflow uses the processing steps you used in workshop 3 with one change. That change is to save the results to file rather than having the results window pop up and saving from there. Or maybe two changes: it also tells you to use meaning systematic file names that will be easy to process when importing data. The RStudio workflow shows you how to import multiple files into one dataframe with columns indicating the treatment.\n\nSave files with systematic names: ev_0.avi 343_0.avi ev_1.avi 343_1.avi ev_2.5.avi 343_2.5.avi\nOpen ImageJ\nOpen video file eg ev_2.5.avi\n\nConvert to 8-bit: Image | Type | 8-bit\nCrop to petri dish: Select then Image | Crop\nCalculate average pixel intensity: Image | Stacks | Z Project\n\nProjection type: Average Intensity to create AVG_ev_2.5.avi\n\n\n\nSubtract average from image: Process | Image Calculator\n\nImage 1: ev_2.5.avi\n\nOperation: Subtract\nImage 2: AVG_ev_2.5.avi\n\nCreate new window: checked\nOK, Yes to Process all\n\n\nInvert: Edit | Invert\nAdjust threshold: Image | Adjust | Threshold\n\nMethod: Default\nThresholding: Default, B&W\nDark background: checked\nAuto or adjust a little but make sure the larvae do not disappear at later points in the video (use the slider)\nApply\n\n\nInvert: Edit | Invert\nTrack: Plugins | wrMTrck\n\nSet minSize: 10\nSet maxSize: 400\nSet maxVelocity: 10\nSet maxAreaChange: 200\nSet bendThreshold: 1\n\nImportant: check Save Results File This is different to what you did in the workshop. It will help because the results will be saved automatically rather than to saving from the Results window that other pops up. Consequently, you will be able to save the results files with systematic names relating to their treatments and then read them into R simultaneously. That will also allow you to add information from the name of the file (which has the treatment information) to the resulting dataframes\n\n\nwrMTrck window with the settings listed above shown\n\n\nClick OK. Save to a folder for all the tracking data files. I recommend deleting the “Results of..” part of the name\n\n\nCheck that the Summary window indicates 3 tracks and that the 3 larvae are what is tracked by using the slider on the Result image\nRepeat for all videos\n\nThis is the code you need to import multiple csv files into a single dataframe and add a column with the treatment information from the file name. This is why systematic file names are good.\nIt assumes\n\nyour files are called type_concentration.txt for example: ev_0.txt 343_0.txt ev_1.txt 343_1.txt ev_2.5.txt 343_2.5.txt.\nthe .txt datafile are in a folder called track inside your working directory\nyou have installed the following packages: tidyverse, janitor\n\n\n🎬 Load the tidyverse\n\nlibrary(tidyverse)\n\n🎬 Put the file names into a vector we will iterate through\n\n# get a vector of the file names\nfiles <- list.files(path = \"track\", full.names = TRUE )\n\nWe can use map_df() from the purrr package which is one of the tidyverse gems loaded with tidyvserse. map_df() will iterate through files and read them into a dataframe with a specified import function. We are using read_table(). map_df() keeps track of the file by adding an index column called file to the resulting dataframe. Instead of this being a number (1 - 6 here) we can use set_names() to use the file names instead. The clean_names() function from the janitor package will clean up the column names (make them lower case, replace spaces with _ remove special characters etc)\n🎬 Import multiple csv files into one dataframe called tracking\n\n# import multiple data files into one dataframe called tracking\n# using map_df() from purrr package\n# clean the column names up using janitor::clean_names()\ntracking <- files |> \n set_names() |>\n map_dfr(read_table, .id = \"file\") |>\n janitor::clean_names()\n\nYou will get a warning Duplicated column names deduplicated: 'avgX' => 'avgX_1' [15] for each of the files because the csv files each have two columns called avgX. If you click on the tracking dataframe you see is contains the data from all the files.\nNow we can add columns for the type and the concentration by processing the values in the file. The values are like track/343_0.txt so we need to remove .txt and track/ and separate the remaining words into two columns.\n🎬 Process the file column to add columns for the type and the concentration\n\n# extract type and concentration from file name\n# and put them into additopnal separate columns\ntracking <- tracking |> \n mutate(file = str_remove(file, \".txt\")) |>\n mutate(file = str_remove(file, \"track/\")) |>\n extract(file, remove = \n FALSE,\n into = c(\"type\", \"conc\"), \n regex = \"([^_]{2,3})_(.+)\") \n\n[^_]{2,3} matches two or three characters that are not _ at the start of the string (^)\n.+ matches one or more characters. The extract() function puts the first match into the first column, type, and the second match into the second column, conc. The remove = FALSE argument means the original column is kept.\nYou now have a dataframe with all the tracking data which is relatively easy to summarise and plot using tools you know.\nThere is an example RStudio project containing this code here: tips. You can also download the project as a zip file from there but there is some code that will do that automatically for you. Since this is an RStudio Project, do not run the code from inside a project. You may want to navigate to a particular directory or edit the destdir:\n\nusethis::use_course(url = \"3mmaRand/tips\", destdir = \".\")\n\nYou can agree to deleting the zip. You should find RStudio restarts and you have a new project called tips-xxxxxx. The xxxxxx is a commit reference - you do not need to worry about that, it is just a way to tell you which version of the repo you downloaded. You can now run the code in the project."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#section",
- "href": "omics/week-5/study_before_workshop.html#section",
- "title": "Independent Study to prepare for workshop",
- "section": "🍂",
- "text": "🍂\nEither of the other examples."
- },
- {
- "objectID": "omics/week-5/study_before_workshop.html#if-you-do-not-have-those",
- "href": "omics/week-5/study_before_workshop.html#if-you-do-not-have-those",
- "title": "Independent Study to prepare for workshop",
- "section": "If you do not have those",
- "text": "If you do not have those\nGo through:\n\nOmics 2: Statistical Analysis including:\n🤗 Look after future you! and\nthe Independent Study to consolidate"
+ "objectID": "images/images.html#worm-tracking",
+ "href": "images/images.html#worm-tracking",
+ "title": "Image Data Analysis for Group Project",
+ "section": "",
+ "text": "The following ImageJ workflow uses the processing steps you used in workshop 3 with one change. That change is to save the results to file rather than having the results window pop up and saving from there. Or maybe two changes: it also tells you to use meaning systematic file names that will be easy to process when importing data. The RStudio workflow shows you how to import multiple files into one dataframe with columns indicating the treatment.\n\nSave files with systematic names: ev_0.avi 343_0.avi ev_1.avi 343_1.avi ev_2.5.avi 343_2.5.avi\nOpen ImageJ\nOpen video file eg ev_2.5.avi\n\nConvert to 8-bit: Image | Type | 8-bit\nCrop to petri dish: Select then Image | Crop\nCalculate average pixel intensity: Image | Stacks | Z Project\n\nProjection type: Average Intensity to create AVG_ev_2.5.avi\n\n\n\nSubtract average from image: Process | Image Calculator\n\nImage 1: ev_2.5.avi\n\nOperation: Subtract\nImage 2: AVG_ev_2.5.avi\n\nCreate new window: checked\nOK, Yes to Process all\n\n\nInvert: Edit | Invert\nAdjust threshold: Image | Adjust | Threshold\n\nMethod: Default\nThresholding: Default, B&W\nDark background: checked\nAuto or adjust a little but make sure the larvae do not disappear at later points in the video (use the slider)\nApply\n\n\nInvert: Edit | Invert\nTrack: Plugins | wrMTrck\n\nSet minSize: 10\nSet maxSize: 400\nSet maxVelocity: 10\nSet maxAreaChange: 200\nSet bendThreshold: 1\n\nImportant: check Save Results File This is different to what you did in the workshop. It will help because the results will be saved automatically rather than to saving from the Results window that other pops up. Consequently, you will be able to save the results files with systematic names relating to their treatments and then read them into R simultaneously. That will also allow you to add information from the name of the file (which has the treatment information) to the resulting dataframes\n\n\nwrMTrck window with the settings listed above shown\n\n\nClick OK. Save to a folder for all the tracking data files. I recommend deleting the “Results of..” part of the name\n\n\nCheck that the Summary window indicates 3 tracks and that the 3 larvae are what is tracked by using the slider on the Result image\nRepeat for all videos\n\nThis is the code you need to import multiple csv files into a single dataframe and add a column with the treatment information from the file name. This is why systematic file names are good.\nIt assumes\n\nyour files are called type_concentration.txt for example: ev_0.txt 343_0.txt ev_1.txt 343_1.txt ev_2.5.txt 343_2.5.txt.\nthe .txt datafile are in a folder called track inside your working directory\nyou have installed the following packages: tidyverse, janitor\n\n\n🎬 Load the tidyverse\n\nlibrary(tidyverse)\n\n🎬 Put the file names into a vector we will iterate through\n\n# get a vector of the file names\nfiles <- list.files(path = \"track\", full.names = TRUE )\n\nWe can use map_df() from the purrr package which is one of the tidyverse gems loaded with tidyvserse. map_df() will iterate through files and read them into a dataframe with a specified import function. We are using read_table(). map_df() keeps track of the file by adding an index column called file to the resulting dataframe. Instead of this being a number (1 - 6 here) we can use set_names() to use the file names instead. The clean_names() function from the janitor package will clean up the column names (make them lower case, replace spaces with _ remove special characters etc)\n🎬 Import multiple csv files into one dataframe called tracking\n\n# import multiple data files into one dataframe called tracking\n# using map_df() from purrr package\n# clean the column names up using janitor::clean_names()\ntracking <- files |> \n set_names() |>\n map_dfr(read_table, .id = \"file\") |>\n janitor::clean_names()\n\nYou will get a warning Duplicated column names deduplicated: 'avgX' => 'avgX_1' [15] for each of the files because the csv files each have two columns called avgX. If you click on the tracking dataframe you see is contains the data from all the files.\nNow we can add columns for the type and the concentration by processing the values in the file. The values are like track/343_0.txt so we need to remove .txt and track/ and separate the remaining words into two columns.\n🎬 Process the file column to add columns for the type and the concentration\n\n# extract type and concentration from file name\n# and put them into additopnal separate columns\ntracking <- tracking |> \n mutate(file = str_remove(file, \".txt\")) |>\n mutate(file = str_remove(file, \"track/\")) |>\n extract(file, remove = \n FALSE,\n into = c(\"type\", \"conc\"), \n regex = \"([^_]{2,3})_(.+)\") \n\n[^_]{2,3} matches two or three characters that are not _ at the start of the string (^)\n.+ matches one or more characters. The extract() function puts the first match into the first column, type, and the second match into the second column, conc. The remove = FALSE argument means the original column is kept.\nYou now have a dataframe with all the tracking data which is relatively easy to summarise and plot using tools you know.\nThere is an example RStudio project containing this code here: tips. You can also download the project as a zip file from there but there is some code that will do that automatically for you. Since this is an RStudio Project, do not run the code from inside a project. You may want to navigate to a particular directory or edit the destdir:\n\nusethis::use_course(url = \"3mmaRand/tips\", destdir = \".\")\n\nYou can agree to deleting the zip. You should find RStudio restarts and you have a new project called tips-xxxxxx. The xxxxxx is a commit reference - you do not need to worry about that, it is just a way to tell you which version of the repo you downloaded. You can now run the code in the project."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#examine-the-results-files-1",
- "href": "omics/week-5/study_before_workshop.html#examine-the-results-files-1",
- "title": "Independent Study to prepare for workshop",
- "section": "Examine the results files",
- "text": "Examine the results files\nRemind yourself of the key columns you have in the results files:\n\na log2 fold change\nan unadjusted p-value\na p value adjusted for multiple testing (FDR or padj)\na gene id"
+ "objectID": "structures/structures.html",
+ "href": "structures/structures.html",
+ "title": "Structure Data Analysis for Group Project",
+ "section": "",
+ "text": "There is an RStudio project containing a Quarto version of the the Antibody Mimetics Workshop by Michael Plevin & Jon Agirre. Instructions to obtain the RStudio project are at the bottom of this document after the set up instructions.\nYou might find RStudio useful for Python because you are already familiar with it. It is also a good way to create Quarto documents with code chunks in more than one language. Quarto documents can be used in RStudio, VS Code or Jupyter notebooks\nSome set up is required before you will be able to execute code in antibody_mimetics_workshop_3.qmd. This in contrast to the Colab notebook which is a cloud-based Jupyter notebook and does not require any set up (except installing packages).\n\n🎬 If using your own machine, install Python from https://www.python.org/downloads/. This should not be necessary if you are using a university machine where Python is already installed.\n🎬 If using your own machine and you did not install Quarto in the Core 1 workshop, install it now from https://quarto.org/docs/get-started/. This should not be necessary if you are using a university machine where quarto is already installed.\n🎬 Open RStudio and check you are using a “Git bash” Terminal: Tools | Global Options| Terminal | New Terminal opens with… . If the option to choose Git bash, you will need to install Git from https://git-scm.com/downloads. Quit RStudio first. This should not be necessary if you are using a university machine where Git bash is already installed.\n🎬 If on your own machine: In RStudio, install the quarto and the recticulate packages. This should not be necessary if you are using a university machine where these packages are already installed.\n🎬 Whether you are using your own machine or a university machine, you need to install some python packages. In RStudio and go to the Terminal window (behind the Console window). Run the following commands in the Terminal window:\npython -m pip install --upgrade pip setuptools wheel\nYou may get these warnings about scripts not being on the path. You can ignore these.\n WARNING: The script wheel.exe is installed in 'C:\\Users\\er13\\AppData\\Roaming\\Python\\Python39\\Scripts' which is not on PATH.\n Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n WARNING: The scripts pip.exe, pip3.11.exe, pip3.9.exe and pip3.exe are installed in 'C:\\Users\\er13\\AppData\\Roaming\\Python\\Python39\\Scripts' which is not on PATH.\n Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\nERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\nspyder 5.1.5 requires pyqt5<5.13, which is not installed.\nspyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.\nconda-repo-cli 1.0.4 requires pathlib, which is not installed.\nanaconda-project 0.10.2 requires ruamel-yaml, which is not installed.\nSuccessfully installed pip-23.3.1 setuptools-69.0.2 wheel-0.41.3\npython -m pip install session_info\npython -m pip install wget\npython -m pip install gemmi\nNote: On my windows laptop at home, I also had to install C++ Build Tools to be able to install the gemmi python package. If this is true for you, you will get a fail message telling you to install C++ build tools if you need them. These are from https://visualstudio.microsoft.com/visual-cpp-build-tools/ You need to check the Workloads tab and select C++ build tools.\n\nYou can then install the gemmi package again.\nI think that’s it! You can now download the RStudio project and run each chunk in the quarto document.\nThere is an example RStudio project here: structure-analysis. You can also download the project as a zip file from there but there is some code that will do that automatically for you. Since this is an RStudio Project, do not run the code from inside a project. You may want to navigate to a particular directory or edit the destdir:\n\nusethis::use_course(url = \"3mmaRand/structure-analysis\", destdir = \".\")\n\nYou can agree to deleting the zip. You should find RStudio restarts and you have a new project called structure-analysis-xxxxxx. The xxxxxx is a commit reference - you do not need to worry about that, it is just a way to tell you which version of the repo you downloaded.\nYou should be able to open the antibody_mimetics_workshop_3.qmd file and run each chunk. You can also knit the document to html."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#frogs-1",
- "href": "omics/week-5/study_before_workshop.html#frogs-1",
- "title": "Independent Study to prepare for workshop",
- "section": "🐸 Frogs",
- "text": "🐸 Frogs\n\n\nRows: 10,136\nColumns: 7\n$ baseMean <dbl> 237.553928, 531.565700, 86.392830, 49.813502, 419.9983…\n$ log2FoldChange <dbl> 0.096601855, -0.089588528, -0.192811203, -0.008858703,…\n$ lfcSE <dbl> 0.2079396, 0.1557384, 0.3253216, 0.4342614, 0.1685420,…\n$ stat <dbl> 0.46456683, -0.57525007, -0.59267874, -0.02039947, -0.…\n$ pvalue <dbl> 0.64224169, 0.56512218, 0.55339617, 0.98372471, 0.8699…\n$ padj <dbl> 0.9998970, 0.9998970, 0.9998970, 0.9998970, 0.9998970,…\n$ xenbase_gene_id <chr> \"XB-GENE-1000007\", \"XB-GENE-1000023\", \"XB-GENE-1000062…\n\n\n\n\n\nbaseMean is the mean of the normalised counts for the gene across all samples\n\nlfcSE standard error of the fold change\n\nstat is the test statistic (the Wald statistic)\nGenerated by DESeq2 (Love, Huber, and Anders 2014)"
+ "objectID": "structures/structures.html#programmatic-protein-structure-analysis",
+ "href": "structures/structures.html#programmatic-protein-structure-analysis",
+ "title": "Structure Data Analysis for Group Project",
+ "section": "",
+ "text": "There is an RStudio project containing a Quarto version of the the Antibody Mimetics Workshop by Michael Plevin & Jon Agirre. Instructions to obtain the RStudio project are at the bottom of this document after the set up instructions.\nYou might find RStudio useful for Python because you are already familiar with it. It is also a good way to create Quarto documents with code chunks in more than one language. Quarto documents can be used in RStudio, VS Code or Jupyter notebooks\nSome set up is required before you will be able to execute code in antibody_mimetics_workshop_3.qmd. This in contrast to the Colab notebook which is a cloud-based Jupyter notebook and does not require any set up (except installing packages).\n\n🎬 If using your own machine, install Python from https://www.python.org/downloads/. This should not be necessary if you are using a university machine where Python is already installed.\n🎬 If using your own machine and you did not install Quarto in the Core 1 workshop, install it now from https://quarto.org/docs/get-started/. This should not be necessary if you are using a university machine where quarto is already installed.\n🎬 Open RStudio and check you are using a “Git bash” Terminal: Tools | Global Options| Terminal | New Terminal opens with… . If the option to choose Git bash, you will need to install Git from https://git-scm.com/downloads. Quit RStudio first. This should not be necessary if you are using a university machine where Git bash is already installed.\n🎬 If on your own machine: In RStudio, install the quarto and the recticulate packages. This should not be necessary if you are using a university machine where these packages are already installed.\n🎬 Whether you are using your own machine or a university machine, you need to install some python packages. In RStudio and go to the Terminal window (behind the Console window). Run the following commands in the Terminal window:\npython -m pip install --upgrade pip setuptools wheel\nYou may get these warnings about scripts not being on the path. You can ignore these.\n WARNING: The script wheel.exe is installed in 'C:\\Users\\er13\\AppData\\Roaming\\Python\\Python39\\Scripts' which is not on PATH.\n Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n WARNING: The scripts pip.exe, pip3.11.exe, pip3.9.exe and pip3.exe are installed in 'C:\\Users\\er13\\AppData\\Roaming\\Python\\Python39\\Scripts' which is not on PATH.\n Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\nERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\nspyder 5.1.5 requires pyqt5<5.13, which is not installed.\nspyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.\nconda-repo-cli 1.0.4 requires pathlib, which is not installed.\nanaconda-project 0.10.2 requires ruamel-yaml, which is not installed.\nSuccessfully installed pip-23.3.1 setuptools-69.0.2 wheel-0.41.3\npython -m pip install session_info\npython -m pip install wget\npython -m pip install gemmi\nNote: On my windows laptop at home, I also had to install C++ Build Tools to be able to install the gemmi python package. If this is true for you, you will get a fail message telling you to install C++ build tools if you need them. These are from https://visualstudio.microsoft.com/visual-cpp-build-tools/ You need to check the Workloads tab and select C++ build tools.\n\nYou can then install the gemmi package again.\nI think that’s it! You can now download the RStudio project and run each chunk in the quarto document.\nThere is an example RStudio project here: structure-analysis. You can also download the project as a zip file from there but there is some code that will do that automatically for you. Since this is an RStudio Project, do not run the code from inside a project. You may want to navigate to a particular directory or edit the destdir:\n\nusethis::use_course(url = \"3mmaRand/structure-analysis\", destdir = \".\")\n\nYou can agree to deleting the zip. You should find RStudio restarts and you have a new project called structure-analysis-xxxxxx. The xxxxxx is a commit reference - you do not need to worry about that, it is just a way to tell you which version of the repo you downloaded.\nYou should be able to open the antibody_mimetics_workshop_3.qmd file and run each chunk. You can also knit the document to html."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#mice-1",
- "href": "omics/week-5/study_before_workshop.html#mice-1",
- "title": "Independent Study to prepare for workshop",
- "section": "🐭 Mice",
- "text": "🐭 Mice\n\n\nRows: 280\nColumns: 6\n$ Top <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…\n$ p.value <dbl> 7.038138e-117, 4.736622e-90, 1.832630e-88, 4.211954e-7…\n$ FDR <dbl> 1.970679e-114, 6.631271e-88, 1.710455e-86, 2.948368e-7…\n$ summary.logFC <dbl> 1.596910, 3.035165, 3.261056, -2.146491, -3.056730, 3.…\n$ logFC.hspc <dbl> 1.596910, 3.035165, 3.261056, -2.146491, -3.056730, 3.…\n$ ensembl_gene_id <chr> \"ENSMUSG00000028639\", \"ENSMUSG00000024053\", \"ENSMUSG00…\n\n\n\n\nTop is the rank of the gene ordered by the p-value (smallest first)\n\nsummary.logFC and logFC.hspc give the same value (in this case since comparing two cell types)\ngenerated by scran (Lun, McCarthy, and Marioni 2016)"
+ "objectID": "index.html",
+ "href": "index.html",
+ "title": "Data Analysis for the Group Research Project",
+ "section": "",
+ "text": "You are either\n\nan integrated masters student doing BIO00088H Group Research Project or\nan MSc Bioinformatics student doing BIO00070M Research, Professional and Team Skills\n\nFor students doing BIO00088H, Data Analysis compromises six workshops covering computational skills needed in your project. Three of these are core and taken by everyone and three are specific to your project type. MSc Bioinformatics students do the Core workshops and the ’omics workshops as part of BIO00070M.\nThe project types are:\n\n\n\n\n\n\n\nProject\nData Strand\n\n\n\n\nStem Cells, Jillian Barlow\n’omics, Emma Rand\n\n\nDevelopmental Biology, Betsy Pownal\n’omics, Emma Rand\n\n\nMicrobial Ecology, Kelly Redeker\n’omics, Emma Rand\n\n\nStructural Biochemistry, Michael Plevin\nmolecular-structure, Jon Agirre\n\n\nNeuroscience, Sean Sweeney\nimage-analysis, Richard Bingham\n\n\nxxxxxxxxxxxx, Richard Maguire\nimage-analysis, Richard Bingham\n\n\n\nThe data analysis workshops are:\n\n\n\nWeek\nData Strand\n\n\n\n\n1\nCore 1 Organising reproducible data analyses\n\n\n2\nCore 2 File types, workflow tips and other tools\n\n\n3\nomics/structure/images 1\n\n\n4\nomics/structure/images 2\n\n\n5\nomics/structure/images 3\n\n\n6\nDrop-in\n\n\n6\nCore 3 Research Compendia and Reproducible Reporting\n\n\n\n\n\nStudents who successfully complete this module will be able to\n\nuse appropriate computational techniques to reproducibly process, analyse and visualise data and generate scientific reports based on project work.\n\n\n\n\nAll material is on the VLE so why is this site useful? This site collects everything together in a searchable way. The search icon is on the top right.\n\n\n\nRand E (2023). Data Analysis for Group Project. https://3mmarand.github.io/BIO00088H-data/."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#adding-gene-information-1",
- "href": "omics/week-5/study_before_workshop.html#adding-gene-information-1",
- "title": "Independent Study to prepare for workshop",
- "section": "Adding gene information",
- "text": "Adding gene information\n\n\nThe gene id is difficult to interpret in plots/tables\nTherefore we need to add information such as the gene name and a description to the results\nFor the 🐸 Frog data information comes from Xenbase (Fisher et al. 2023)\nFor the 🐭 Mice data information comes from Ensembl (Birney et al. 2004)"
+ "objectID": "index.html#module-learning-outcome-linked-to-this-content",
+ "href": "index.html#module-learning-outcome-linked-to-this-content",
+ "title": "Data Analysis for the Group Research Project",
+ "section": "",
+ "text": "Students who successfully complete this module will be able to\n\nuse appropriate computational techniques to reproducibly process, analyse and visualise data and generate scientific reports based on project work."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#xenbase",
- "href": "omics/week-5/study_before_workshop.html#xenbase",
- "title": "Independent Study to prepare for workshop",
- "section": "🐸 Xenbase",
- "text": "🐸 Xenbase\n\nxenbase logoXenbase is a model organism database that provides genomic, molecular, and developmental biology information about Xenopus laevis and Xenopus tropicalis.\n\nIt took me some time to find the information you need."
+ "objectID": "index.html#what-is-this-site-for",
+ "href": "index.html#what-is-this-site-for",
+ "title": "Data Analysis for the Group Research Project",
+ "section": "",
+ "text": "All material is on the VLE so why is this site useful? This site collects everything together in a searchable way. The search icon is on the top right."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#xenbase-1",
- "href": "omics/week-5/study_before_workshop.html#xenbase-1",
- "title": "Independent Study to prepare for workshop",
- "section": "🐸 Xenbase",
- "text": "🐸 Xenbase\n\n\nI got the information from the Xenbase information pages under Data Reports | Gene Information\nThis is listed: Xenbase Gene Product Information [readme] gzipped gpi (tab separated)\nClick on the readme link to see the file format and columns\nI downloaded xenbase.gpi.gz, unzipped it, removed header lines and the Xenopus tropicalis (taxon:8364) entries and saved it as xenbase_info.xlsx\nIn the workshop you will import this file and merge the information with the results file"
+ "objectID": "index.html#please-cite-as",
+ "href": "index.html#please-cite-as",
+ "title": "Data Analysis for the Group Research Project",
+ "section": "",
+ "text": "Rand E (2023). Data Analysis for Group Project. https://3mmarand.github.io/BIO00088H-data/."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#ensembl",
- "href": "omics/week-5/study_before_workshop.html#ensembl",
+ "objectID": "core/week-6/study_before_workshop.html",
+ "href": "core/week-6/study_before_workshop.html",
"title": "Independent Study to prepare for workshop",
- "section": "🐭 Ensembl",
- "text": "🐭 Ensembl\n\n\nEnsembl creates, integrates and distributes reference datasets and analysis tools that enable genomics\nBioMart provides a access to these large datasets\nbiomaRt (Durinck et al. 2009) is a Bioconductor package gives you programmatic access to BioMart.\nIn the workshop you use this package to get information you can merge with the results file"
+ "section": "",
+ "text": "📖 Read materials from Core 1 Organising reproducible data analyses and make a note of questions you have\n📖 Read materials from Core 2 File types, workflow tips and other tools and make a note of questions you have.\n📖 Review Stage 1 and 2 (88H students) or 52M (70M students) content to see if there are areas you might benefit from revisiting. You can access these through the past VLE sites but you might find it helpful to use the latest versions, particularly for stage 1.\n\nStage 1\n\nData Analysis in R for Becoming a Bioscientist 1.Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data.\nData Analysis in R for Becoming a Bioscientist 2. The logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one- and two-way analysis of variance (ANOVA).\n\nStage 2\n\nGet Introductory Statistical Tests as Linear models: A guide for R users\nA simple introduction to GLM for analysing Poisson and Binomial responses in R\n\n52M\n\n52M Data Analysis in R. Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data, the logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one-way analysis of variance (ANOVA) and reproducible reports in Quarto."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#what-is-the-purpose-of-an-omics-plot",
- "href": "omics/week-5/study_before_workshop.html#what-is-the-purpose-of-an-omics-plot",
- "title": "Independent Study to prepare for workshop",
- "section": "What is the purpose of an Omics plot?",
- "text": "What is the purpose of an Omics plot?\n\n\nIn general, we plot data to help us summarise and understand it\nThis is especially import for omics data where we have a very large number of variables and often a large number of observations\nWe will look at three plots very commonly used in omics analysis: Principal Component Analysis (PCA) plot, Heatmaps and Volcano Plots"
+ "objectID": "core/week-6/study_after_workshop.html",
+ "href": "core/week-6/study_after_workshop.html",
+ "title": "Independent Study to consolidate this week",
+ "section": "",
+ "text": "There is no consolidation work other than to continue revising what you have learned over the course of your degree about data analysis."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#pca",
- "href": "omics/week-5/study_before_workshop.html#pca",
+ "objectID": "core/week-1/study_before_workshop.html",
+ "href": "core/week-1/study_before_workshop.html",
"title": "Independent Study to prepare for workshop",
- "section": "PCA",
- "text": "PCA\n\n\nPrincipal Component Analysis is an unsupervised machine learning technique\nUnsupervised methods1 are unsupervised in that they do not use/optimise to a particular output. The goal is to uncover structure. They do not test hypotheses\nIt is often used to visualise high dimensional data because it is a dimension reduction technique\n\n\nYou may wish to read a previous introduction to unsupervised methods I have written An introduction to Machine Learning: Unsupervised methods (Rand 2021)"
+ "section": "",
+ "text": "📖 Read Understanding file systems. This is an approximately 15 - 20 minute read revising file types and filesystems. It covers concepts of working directories and paths. We learned these ideas in stage 1 and you may feel completely confident with them but many students will benefit from a refresher. For BIO00070M students, this is part of the work you will also be asked to complete for BIO00052M Data Analysis in R."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#pca-1",
- "href": "omics/week-5/study_before_workshop.html#pca-1",
- "title": "Independent Study to prepare for workshop",
- "section": "PCA",
- "text": "PCA\n\n\nIt takes a large number of continuous variables (like gene expression) and reduces them to a smaller number of variables (called principal components) that explain most of the variation in the data\nThe principal components can be plotted to see how samples cluster together"
+ "objectID": "core/week-1/study_after_workshop.html",
+ "href": "core/week-1/study_after_workshop.html",
+ "title": "Independent Study to consolidate this week",
+ "section": "",
+ "text": "These are suggestions"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#pca-2",
- "href": "omics/week-5/study_before_workshop.html#pca-2",
- "title": "Independent Study to prepare for workshop",
- "section": "PCA",
- "text": "PCA\n\n\nTo see if samples cluster as we would expect, we might plot the expression of one gene against another\n\n\n\n\n\n\nSamples\n\n\n\n\n\nCells\n\n\n\n\nThis gives some insight but we have 280 (mice) or 10,000+(frogs) genes to consider. How do we know if the pair we use is typical? How can we consider al the genes at once?"
+ "objectID": "core/week-1/study_after_workshop.html#bio00088h-group-research-project-students",
+ "href": "core/week-1/study_after_workshop.html#bio00088h-group-research-project-students",
+ "title": "Independent Study to consolidate this week",
+ "section": "BIO00088H Group Research Project students",
+ "text": "BIO00088H Group Research Project students\n\nRevise previous Data Analysis materials. You can find the version you took on the VLE site for 17C / 08C. However, my latest versions (in development) are here: Data Analysis in R. The Becoming a Bioscientist (BABS) modules replace the Laboratory and Professional Skills modules. BABS1 and BABS2 are stage one, and I’ve tried to improve them over 17C / 08C. The site is also searchable (icon top right)"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#pca-3",
- "href": "omics/week-5/study_before_workshop.html#pca-3",
- "title": "Independent Study to prepare for workshop",
- "section": "PCA",
- "text": "PCA\n\n\nPCA is a solution for this - It takes a large number of continuous variables (like gene expression) and reduces them to a smaller number of “principal components” that explain most of the variation in the data.\n\n\n\n\n\n\nSamples\n\n\n\n\n\nCells"
+ "objectID": "core/week-1/study_after_workshop.html#msc-bioinformatics-students-doing-bio00070m",
+ "href": "core/week-1/study_after_workshop.html#msc-bioinformatics-students-doing-bio00070m",
+ "title": "Independent Study to consolidate this week",
+ "section": "MSc Bioinformatics students doing BIO00070M",
+ "text": "MSc Bioinformatics students doing BIO00070M\n\nMake sure you carry out the preparatory work for week 2 of 52M"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#pca-4",
- "href": "omics/week-5/study_before_workshop.html#pca-4",
+ "objectID": "core/week-11/study_before_workshop.html#module-assessment",
+ "href": "core/week-11/study_before_workshop.html#module-assessment",
"title": "Independent Study to prepare for workshop",
- "section": "PCA",
- "text": "PCA\nWe have done PCA in Omics 3, but often PCA might be one of the first exploratory steps because it gives you an idea whether you expect general patterns in gene expression that distinguish groups."
+ "section": "Module assessment",
+ "text": "Module assessment\nThis module is assessed by:\n\nOral presentation 30%\nProject Report and Research Compendium 70% of which\n\n50% report\n20% compendium\n\n\nThese slides are a guide to Research compendium."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#heatmaps-1",
- "href": "omics/week-5/study_before_workshop.html#heatmaps-1",
+ "objectID": "core/week-11/study_before_workshop.html#what-is-a-research-compendium",
+ "href": "core/week-11/study_before_workshop.html#what-is-a-research-compendium",
"title": "Independent Study to prepare for workshop",
- "section": "Heatmaps",
- "text": "Heatmaps\n\n\nare a grid of genes on one axis and samples on the other with each grid cell coloured by another variable\nin this case the other variable is gene expression\nthey allow you to quickly get an overview of the expression patterns across genes and samples\nwe often couple them with clustering to group genes and samples with similar expression patterns together which helps us see which genes are responsible for distinguishing groups"
+ "section": "What is a Research Compendium?",
+ "text": "What is a Research Compendium?\nOverview of assessment\n\nStage 3 Integrated Masters students are expected to submit a Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual.\nStudents will be assessed on the technical complexity, completeness and organisation of their compendium and the completeness, reproducibility and clarity of their documentation at the project and the code/process level. Marking will focus on the reproducibility of the results and the clarity of the decision making processes rather than the interpretation of the results which is covered in the report. There is no word or size limit for any part of the compendium but its contents should be concise and minimal. Extraneous text, code or files will be penalised."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#section-1",
- "href": "omics/week-5/study_before_workshop.html#section-1",
+ "objectID": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-1",
+ "href": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-1",
"title": "Independent Study to prepare for workshop",
- "section": "",
- "text": "Heat map for the frog data\n\nSee next slide for information"
+ "section": "What is a Research Compendium?",
+ "text": "What is a Research Compendium?\nOverview of assessment\n\nStage 3 Integrated Masters students are expected to submit a Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual.\nStudents will be assessed on the technical complexity, completeness and organisation of their compendium and the completeness, reproducibility and clarity of their documentation at the project and the code/process level. Marking will focus on the reproducibility of the results and the clarity of the decision making processes rather than the interpretation of the results which is covered in the report. There is no word or size limit for any part of the compendium but its contents should be concise and minimal. Extraneous text, code or files will be penalised."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#heatmaps-2",
- "href": "omics/week-5/study_before_workshop.html#heatmaps-2",
+ "objectID": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-2",
+ "href": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-2",
"title": "Independent Study to prepare for workshop",
- "section": "Heatmaps",
- "text": "Heatmaps\n\n\nOn the vertical axis are genes which are differentially expressed at the 0.01 level\nOn the horizontal axis are samples\nWe can see that the FGF-treated samples cluster together and the control samples cluster together\nWe can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples and the other shows genes downregulated (more blue) in the FGF-treated samples"
+ "section": "What is a Research Compendium?",
+ "text": "What is a Research Compendium?\n\n\n\nZipped folder containing all data, code and text associated with a research project organised and documented clearly. Any unscripted processing should be described.\nEverything needed to understand what the project is and reproduce the results, and no more. The compendium should not be a dumping ground for data files and scripts. It needs to be curated. You may generate files that are not needed to reproduce your work and these should be removed.\nYour compendium might be a single Quarto/RStudio Project, or it might be folder including an RStudio Project and some additional materials including the description of unscripted processing.\nIdeally uses literate programming to create submitted report"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#volcano-plots-1",
- "href": "omics/week-5/study_before_workshop.html#volcano-plots-1",
+ "objectID": "core/week-11/study_before_workshop.html#use-guidelines-from-core-1-and-2",
+ "href": "core/week-11/study_before_workshop.html#use-guidelines-from-core-1-and-2",
"title": "Independent Study to prepare for workshop",
- "section": "Volcano plots",
- "text": "Volcano plots\n\n\nVolcano plots often used to visualise the results of differential expression analysis\nThey are just a scatter of the corrected p value against the fold change….\nalmost - the we actually plot the negative log of the corrected p value against the fold change"
+ "section": "Use guidelines from Core 1 and 2",
+ "text": "Use guidelines from Core 1 and 2\n\nfollow the guidance in Core 1 on organisation, naming things and documentation\nfollow the guidance in Core 2 on well-formatted code, consistency, modularisation and documentation"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#volcano-plots-2",
- "href": "omics/week-5/study_before_workshop.html#volcano-plots-2",
+ "objectID": "core/week-11/study_before_workshop.html#project-level-documentation",
+ "href": "core/week-11/study_before_workshop.html#project-level-documentation",
"title": "Independent Study to prepare for workshop",
- "section": "Volcano plots",
- "text": "Volcano plots\n\n\nThis is because just plotting the p-value means the axis is counter intuitive. Small p-values (i.e., significant values) are at the bottom of the axis)\nAnd since p-values range from 1 to very tiny the points are all squashed at the bottom of the axis\n\n\n\nVolcano plot FDR against fold change"
+ "section": "Project level documentation",
+ "text": "Project level documentation\n\n\nas concise as possible, bullet points are good\nprimarily in the README file but some details may be in scripts\ntitle, concise description of the work, author exam number, date, overview of compendium contents\nall the software information including versions\ninstructions needed to reproduce the work, order of workflow, settings/parameter values for software"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#volcano-plots-3",
- "href": "omics/week-5/study_before_workshop.html#volcano-plots-3",
+ "objectID": "core/week-11/study_before_workshop.html#project-level-documentation---cont",
+ "href": "core/week-11/study_before_workshop.html#project-level-documentation---cont",
"title": "Independent Study to prepare for workshop",
- "section": "Volcano plots",
- "text": "Volcano plots\n\n\nPlotting the negative log of the corrected p-value means that the values are spread out and the significant values are at the top of the axis\n\n\n\nVolcano plot -log(FDR) against fold change"
+ "section": "Project level documentation - cont",
+ "text": "Project level documentation - cont\n\n\ndescription, format and provenance of the data\nstyle conventions used in the code,\nany other information needed to understand the project and reproduce the results"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#visualisations",
- "href": "omics/week-5/study_before_workshop.html#visualisations",
+ "objectID": "core/week-11/study_before_workshop.html#script-level-documentation",
+ "href": "core/week-11/study_before_workshop.html#script-level-documentation",
"title": "Independent Study to prepare for workshop",
- "section": "Visualisations",
- "text": "Visualisations\n\nShould be done on normalised data so meaningful comparisons can be made\nThe 🐭 mouse data were already log2normalised\nThe 🐸 frog data were normalised by the DE method and saved to file. We will log2 transform before doing visualisations"
+ "section": "Script level documentation",
+ "text": "Script level documentation\nShorthand for documentation at the script and/or code chunk level and/or process level where unscripted processing is used.\n\n\noverview of the script/chunk/process and its purpose\ncode comments"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#packages-to-install-before-the-workshop",
- "href": "omics/week-5/study_before_workshop.html#packages-to-install-before-the-workshop",
+ "objectID": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-3",
+ "href": "core/week-11/study_before_workshop.html#what-is-a-research-compendium-3",
"title": "Independent Study to prepare for workshop",
- "section": "Packages to install before the workshop",
- "text": "Packages to install before the workshop\nheatmaply (Galili et al. 2017) and ggrepel (Slowikowski 2023) from CRAN in the the normal way:\n\ninstall.packages(\"heatmaply\")\ninstall.packages(\"ggrepel\")\n\nbiomaRt (Durinck et al. 2009) from Bioconductor using BiocManager (Morgan and Ramos 2023)\n\nBiocManager::install(\"biomaRt\")"
+ "section": "What is a Research Compendium?",
+ "text": "What is a Research Compendium?\n\n\nA research compendium is something you develop throughout your research project. It is not something you create at the end.\nYou update and reorganise as you go.\nWhen you plan your research include the planning of recording, organising, and documenting your data and its analysis.\nThink ahead to how and where you will be recording your data and how you will be analysing."
},
{
- "objectID": "omics/week-5/study_before_workshop.html#workshops-1",
- "href": "omics/week-5/study_before_workshop.html#workshops-1",
+ "objectID": "core/week-11/study_before_workshop.html#further-reading",
+ "href": "core/week-11/study_before_workshop.html#further-reading",
"title": "Independent Study to prepare for workshop",
- "section": "Workshops",
- "text": "Workshops\n\nOmics 1: Hello data Getting to know the data. Checking the distributions of values\nOmics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments.\nOmics 3: Visualising and Interpreting. PCA, Volcano plots and heatmaps to visualise results. Interpreting the results and finding out more about genes of interest."
+ "section": "Further Reading",
+ "text": "Further Reading\n\nThe Turing Way (Community 2022)\nPackaging Data Analytical Work Reproducibly Using R (and Friends) (Marwick, Boettiger, and Mullen 2018)\nTen simple rules for writing and sharing computational analyses in Jupyter Notebooks (Rule et al. 2019)\nTen Simple rules for (Sandve et al. 2013)"
},
{
- "objectID": "omics/week-5/study_before_workshop.html#references",
- "href": "omics/week-5/study_before_workshop.html#references",
+ "objectID": "core/week-11/study_before_workshop.html#references",
+ "href": "core/week-11/study_before_workshop.html#references",
"title": "Independent Study to prepare for workshop",
"section": "References",
- "text": "References\n\n\n🔗 About Omics 3: Visualising and Interpreting\n\n\n\nBirney, Ewan, T. Daniel Andrews, Paul Bevan, Mario Caccamo, Yuan Chen, Laura Clarke, Guy Coates, et al. 2004. “An Overview of Ensembl.” Genome Research 14 (5): 925–28. https://doi.org/10.1101/gr.1860604.\n\n\nDurinck, Steffen, Paul T. Spellman, Ewan Birney, and Wolfgang Huber. 2009. “Mapping Identifiers for the Integration of Genomic Datasets with the r/Bioconductor Package biomaRt” 4.\n\n\nFisher, Malcolm, Christina James-Zorn, Virgilio Ponferrada, Andrew J Bell, Nivitha Sundararaj, Erik Segerdell, Praneet Chaturvedi, et al. 2023. “Xenbase: Key Features and Resources of the Xenopus Model Organism Knowledgebase.” Genetics 224 (1): iyad018. https://doi.org/10.1093/genetics/iyad018.\n\n\nGalili, Tal, O’Callaghan, Alan, Sidi, Jonathan, Sievert, and Carson. 2017. “Heatmaply: An r Package for Creating Interactive Cluster Heatmaps for Online Publishing.” Bioinformatics. https://doi.org/10.1093/bioinformatics/btx657.\n\n\nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2” 15: 550. https://doi.org/10.1186/s13059-014-0550-8.\n\n\nLun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor” 5: 2122. https://doi.org/10.12688/f1000research.9501.2.\n\n\nMorgan, Martin, and Marcel Ramos. 2023. BiocManager: Access the Bioconductor Project Package Repository. https://bioconductor.github.io/BiocManager/.\n\n\nRand, Emma. 2021. Data Science Strand of BIO00058M. https://doi.org/10.5281/zenodo.5527705.\n\n\nSlowikowski, Kamil. 2023. Ggrepel: Automatically Position Non-Overlapping Text Labels with ’Ggplot2’. https://github.com/slowkow/ggrepel."
+ "text": "References\n\n\n🔗 About Core 3 Research Compendia and Reproducible Reporting\n\n\n\nCommunity, The Turing Way. 2022. The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research. Zenodo. https://doi.org/10.5281/ZENODO.3233853.\n\n\nMarwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using r (and Friends).” The American Statistician 72 (1): 80–88. https://doi.org/10.1080/00031305.2017.1375986.\n\n\nRule, Adam, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, et al. 2019. “Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks.” Edited by Fran Lewitter. PLOS Computational Biology 15 (7): e1007007. https://doi.org/10.1371/journal.pcbi.1007007.\n\n\nSandve, Geir Kjetil, Anton Nekrutenko, James Taylor, and Eivind Hovig. 2013. “Ten Simple Rules for Reproducible Computational Research.” PLoS Comput. Biol. 9 (10): e1003285. https://doi.org/10.1371/journal.pcbi.1003285."
},
{
- "objectID": "omics/week-4/workshop.html",
- "href": "omics/week-4/workshop.html",
- "title": "Workshop",
+ "objectID": "core/week-11/study_after_workshop.html",
+ "href": "core/week-11/study_after_workshop.html",
+ "title": "Independent Study to consolidate this week",
"section": "",
- "text": "In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both."
+ "text": "💻 Just work on your project!"
},
{
- "objectID": "omics/week-4/workshop.html#session-overview",
- "href": "omics/week-4/workshop.html#session-overview",
- "title": "Workshop",
+ "objectID": "core/week-2/overview.html",
+ "href": "core/week-2/overview.html",
+ "title": "Overview",
"section": "",
- "text": "In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both."
+ "text": "This week we will consider File types, workflow tips and other tools. The independent study reiterates the value of RStudio projects and shows you how you create them with usethis. You will also learn how to recognise and write cool 😎 code, not 😩 ugly code and code algorithmically. In the workshop we will examine some common biological data formats and discover some awesome short cuts to help you write cool 😎 code. You will also get a brief introduction to the command line and Google Colab.\n\nLearning objectives\nThe successful student will be able to:\n\nexplain why RStudio are useful/essential and be able to use the usethis package\nwrite cool 😎 code not 😩 ugly code\nexplain the value of code which expresses the structure of the problem/solution.\ndescribe some common file types for biological data\nuse some useful shortcuts to help write cool 😎 code\nknow what the command line is and how to use it for simple tasks\nuse Google colab to run code\nrecognise some of the differences between R and Python\n\n\n\nInstructions\n\nPrepare 20 mins reading on RStudio Projects revisited, formatting code and coding algorithmically\nWorkshop\n\n💬 Types of biological data files\n🪄 Workflow tips and shortcuts\n💻 The command line\n💻 Google colab\n💻 Python\n\nConsolidate\n\n💻 not sure yet :)"
},
{
- "objectID": "omics/week-4/workshop.html#import",
- "href": "omics/week-4/workshop.html#import",
+ "objectID": "core/week-2/workshop.html",
+ "href": "core/week-2/workshop.html",
"title": "Workshop",
- "section": "Import",
- "text": "Import\nWe need to import the S30 data that were filtered to remove genes with 4, 5 or 6 zeros and those where the total counts was less than 20.\n🎬 Import the data from the data-processed folder."
+ "section": "",
+ "text": "In this workshop you will"
},
{
- "objectID": "omics/week-4/workshop.html#genes-expressed-in-one-treatment",
- "href": "omics/week-4/workshop.html#genes-expressed-in-one-treatment",
+ "objectID": "core/week-2/workshop.html#session-overview",
+ "href": "core/week-2/workshop.html#session-overview",
"title": "Workshop",
- "section": "Genes expressed in one treatment",
- "text": "Genes expressed in one treatment\nThe genes expressed in only one treatment group are those with zeros in all three replicates in one group and non-zero values in all three replicates in the other group. For example, those shown here:\n\n\n# A tibble: 3 × 7\n xenbase_gene_id S30_C_5 S30_C_6 S30_C_A S30_F_5 S30_F_6 S30_F_A\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n1 XB-GENE-1018260 0 0 0 10 2 16\n2 XB-GENE-17330117 0 0 0 13 4 17\n3 XB-GENE-17332184 0 0 0 6 19 6\n\n\nWe will use filter() to find these genes.\n🎬 Find the genes that are expressed only in the FGF-treated group:\n\ns30_fgf_only <- s30_filtered |> \n filter(S30_C_5 == 0, \n S30_C_6 == 0, \n S30_C_A == 0, \n S30_F_5 > 0, \n S30_F_6 > 0, \n S30_F_A > 0)\n\n❓ How many genes are expressed only in the FGF-treated group?\n\n\n🎬 Now you find any genes that are expressed only in the control group.\n❓ Do the results make sense to you in light of what you know about the biology?\n\n\n\n\n\n\n\n🎬 Write to file (saved in results) all the genes that are expressed one group only."
+ "section": "",
+ "text": "In this workshop you will"
},
{
- "objectID": "omics/week-4/workshop.html#create-deseqdataset-object",
- "href": "omics/week-4/workshop.html#create-deseqdataset-object",
+ "objectID": "core/week-2/workshop.html#omics",
+ "href": "core/week-2/workshop.html#omics",
"title": "Workshop",
- "section": "Create DESeqDataSet object",
- "text": "Create DESeqDataSet object\n🎬 Load the DESeq2 package:\nA DEseqDataSet object is a custom data type that is used by DESeq2. Custom data types are common in the Bioconductor1 packages. They are used to store data in a way that is useful for the analysis. These data types typically have data, transformed data, metadata and experimental designs within them.\nTo create a DESeqDataSet object, we need to provide three things:\n\nThe raw counts - these are what we imported into s30_filtered\n\nThe meta data which gives information about the samples and which treatment groups they belong to\nA design matrix which captures the design of the statistical model.\n\nThe counts must in a matrix rather than a dataframe. Unlike a dataframe, a matrix has columns of all the same type. That is, it will contain only the counts. The gene ids are given as row names rather than a column. The matrix() function will create a matrix from a dataframe of columns of the same type and the select() function can be used to remove the gene ids column.\n🎬 Create a matrix of the counts:\n\ns30_count_mat <- s30_filtered |>\n select(-xenbase_gene_id) |>\n as.matrix()\n\n🎬 Add the gene ids as row names to the matrix:\n\n# add the row names to the matrix\nrownames(s30_count_mat) <- s30_filtered$xenbase_gene_id\n\nYou might want to view the matrix.\nThe metadata are in a file, frog_meta_data.txt. This is a tab-delimited file. The first column is the sample name and the second column is the treatment group.\n🎬 Make a folder called meta and save the file to it.\n🎬 Read the metadata into a dataframe:\n\nmeta <- read_table(\"meta/frog_meta_data.txt\")\n\n🎬 Examine the resulting dataframe.\nWe need to add the sample names as row names to the metadata dataframe. This is because the DESeqDataSet object will use the row names to match the samples in the metadata to the samples in the counts matrix.\n🎬 Add the sample names as row names to the metadata dataframe:\n\nrow.names(meta) <- meta$sample_id\n\n(you will get a warning message but you can ignore it)\nWe are dealing only with the S30 data so we need to remove the samples that are not in the S30 data.\n🎬 Filter the metadata to keep only the S30 information:\n\nmeta_S30 <- meta |>\n dplyr::filter(stage == \"stage_30\")\n\n\n\n# A tibble: 6 × 4\n sample_id stage treatment sibling_rep\n* <chr> <chr> <chr> <chr> \n1 S30_C_5 stage_30 control five \n2 S30_C_6 stage_30 control six \n3 S30_C_A stage_30 control A \n4 S30_F_5 stage_30 FGF five \n5 S30_F_6 stage_30 FGF six \n6 S30_F_A stage_30 FGF A \n\n\nWe can now create the DESeqDataSet object. The design formula describes the statistical model You should notice that it is the same sort of formula we used in t.test(), lm(),glm() etc. The ~ indicates that the left hand side is the response variable (in this case counts) and the right hand side are the explanatory variables. We are interested in the difference between the treatments but we include sibling_rep to account for the fact that the data are paired. The names of the columns in the count matrix have to match the names in the metadata dataframe and the names of the explanatory variables in the design formula have to match the names of columns in the metadata.\n🎬 Create the DESeqDataSet object:\n\ndds <- DESeqDataSetFromMatrix(countData = s30_count_mat,\n colData = meta_S30,\n design = ~ treatment + sibling_rep)\n\nThe warning “Warning: some variables in design formula are characters, converting to factors” just means that the variable type of treatment and sibling_rep in the metadata dataframe are char. This is not a as DESeqDataSetFromMatrix() has made them into the factors it needs.\n🎬 Examine the DESeqDataSet object.\nThe counts are in dds@assays@data@listData[[\"counts\"]] and the metadata are in dds@colData but the easiest way to see them is to use the counts() and colData() functions from the DESeq2 package.\n🎬 View the counts:\n\ncounts(dds) |> View()\n\nError in .External2(C_dataviewer, x, title): unable to start data viewer\n\n\nYou should be able to see that this is the same as in s30_count_mat.\n\ncolData(dds)\n\nDataFrame with 6 rows and 4 columns\n sample_id stage treatment sibling_rep\n <character> <character> <factor> <factor>\nS30_C_5 S30_C_5 stage_30 control five\nS30_C_6 S30_C_6 stage_30 control six \nS30_C_A S30_C_A stage_30 control A \nS30_F_5 S30_F_5 stage_30 FGF five\nS30_F_6 S30_F_6 stage_30 FGF six \nS30_F_A S30_F_A stage_30 FGF A"
+ "section": "Omics",
+ "text": "Omics\n\ngene/transcript/protein/metabolite expression\ntranscriptomics 1\ntranscriptomics 2\nproteomics"
},
{
- "objectID": "omics/week-4/workshop.html#prepare-the-normalised-counts",
- "href": "omics/week-4/workshop.html#prepare-the-normalised-counts",
- "title": "Workshop",
- "section": "Prepare the normalised counts",
- "text": "Prepare the normalised counts\nThe normalised counts are the counts that have been transformed to account for the library size (i.e., the total number of reads in a sample) and the gene length. We have to first estimate the normalisation factors and store them in the DESeqDataSet object and then we can get the normalised counts.\n🎬 Estimate the factors for normalisation and store them in the DESeqDataSet object:\n\ndds <- estimateSizeFactors(dds)\n\n🎬 Look at the factors (just for information):\n\nsizeFactors(dds)\n\n S30_C_5 S30_C_6 S30_C_A S30_F_5 S30_F_6 S30_F_A \n0.8812200 0.9454600 1.2989886 1.0881870 1.0518961 0.8322894 \n\n\nTo get the normalised counts we again used the counts() function but this time we use the normalized=TRUE argument.\n🎬 Save the normalised to a matrix:\n\nnormalised_counts <- counts(dds, normalized = TRUE)\n\nWe will write the normalised counts to a file so that we can use them in the future.\n🎬 Make a dataframe of the normalised counts, add a column for the gene ids and write to file:\n\ndata.frame(normalised_counts,\n xenbase_gene_id = row.names(normalised_counts)) |>\n write_csv(file = \"results/S30_normalised_counts.csv\")"
- },
- {
- "objectID": "omics/week-4/workshop.html#differential-expression-analysis",
- "href": "omics/week-4/workshop.html#differential-expression-analysis",
+ "objectID": "core/week-2/workshop.html#images",
+ "href": "core/week-2/workshop.html#images",
"title": "Workshop",
- "section": "Differential expression analysis",
- "text": "Differential expression analysis\nWe used the DESeq() function to do the differential expression analysis. This function fits the statistical model to the data and then uses the model to calculate the significance of the difference between the treatments. It again stored the results in the DESseqDataSet object. Note that the differential expression needs the raw (unnormalised counts) as it does its own normalisation as part of the process.\n🎬 Run the differential expression analysis:\n\ndds <- DESeq(dds)\n\nThe function will take only a few moments to run on this data but can take longer for bigger datasets.\nWe need to define the contrasts we want to test. We want to test the difference between the treatments so we will define the contrast as FGF and control.\n🎬 Define the contrast:\n\ncontrast_fgf <- c(\"treatment\", \"FGF\", \"control\")\n\nNote that treatment is the name of the column in the metadata dataframe and FGF and control are the names of the levels in the treatment column. By putting them in the order FGF , control we are saying the fold change will be FGF / control. If we had put them in the order control, FGF we would have got the fold change as control / FGF. This means:\n\npositive log fold changes indicate FGF > control and\nnegative log fold changes indicates control > FGF.\n\n🎬 Extract the results from the DESseqDataSet object:\n\nresults_fgf <- results(dds,\n contrast = contrast_fgf)\n\nThis will give us the log2 fold change and p-value for the contrast.\n🎬 Save the results to a file:\n\ndata.frame(results_fgf,\n xenbase_gene_id = row.names(results_fgf)) |> \n write_csv(file = \"results/S30_results.csv\")"
+ "section": "Images",
+ "text": "Images\ncontrol_merged.tif\nlibrary(ijtiff)\nimg <- read_tif(\"data/control_merged.tif\")\nimg\n\nan image at least one and usually more matrices of numbers representing the intensity of light at each pixel in the image\nthe number of matrices depends on the number of ‘channels’ in the image\na channel is a colour in the image\na frame is a single image in a series of images\nwe might normally call this a multi-dimensional array: x and y coordinates of the pixels are 2 dimensions, the channel is the third dimension and time is the forth dimension\n\ndisplay(img)"
},
{
- "objectID": "omics/week-4/workshop.html#import-1",
- "href": "omics/week-4/workshop.html#import-1",
+ "objectID": "core/week-2/workshop.html#structure",
+ "href": "core/week-2/workshop.html#structure",
"title": "Workshop",
- "section": "Import",
- "text": "Import\n🎬 Import surfaceome_hspc.csv and surfaceome_prog.csv into dataframes called hspc and prog respectively."
+ "section": "Structure",
+ "text": "Structure\n1cq2.pdb"
},
{
- "objectID": "omics/week-4/workshop.html#combine-the-two-datasets",
- "href": "omics/week-4/workshop.html#combine-the-two-datasets",
+ "objectID": "core/week-2/workshop.html#the-command-line",
+ "href": "core/week-2/workshop.html#the-command-line",
"title": "Workshop",
- "section": "Combine the two datasets",
- "text": "Combine the two datasets\nWe need to combine the two datasets of 701 and 798 cells into one dataset of 1499 cells, i.e., 1499 columns. The number of rows is the number of genes, 280. Before combining, we must make sure genes in the same order in both dataframes or we would be comparing the expression of one gene in one cell type to the expression of a different gene in the other cell type!\n🎬 Check the gene ids are in the same order in both dataframes:\n\nidentical(prog$ensembl_gene_id, hspc$ensembl_gene_id)\n\n[1] TRUE\n\n\nscran can use a matrix or a dataframe of counts but theses must be log normalised counts. If using a dataframe, the columns must only contain the expression values (not the gene ids).\n🎬 Combine the two dataframes (minus the gene ids) into one dataframe called prog_hspc:\n\nprog_hspc <- bind_cols(prog[-1], hspc[-1])\n\n🎬 Now add the gene ids as the row names:\n\nrow.names(prog_hspc) <- prog$ensembl_gene_id"
+ "section": "The command line",
+ "text": "The command line\nThe command line - or shell - is a text interface for your computer. It’s a program that takes in commands, which it passes on to the computer’s operating system to run.\n\nWindows PowerShell is a command-line in windows. It uses bash-like commands unlike the Command Prompt which uses dos commands (a sort of windows only language). You can open is by going to Start | Windows PowerShell or by searching for it in the search bar.\nTerminal is the command line in Mac OS X. You can open it by going to Applications | Utilities | Terminal or by searching for it in the Spotlight search bar.\ngit bash. I used the bash shell that comes with Git"
},
{
- "objectID": "omics/week-4/workshop.html#filter-to-remove-unexpressed-genes",
- "href": "omics/week-4/workshop.html#filter-to-remove-unexpressed-genes",
+ "objectID": "core/week-2/workshop.html#rstudio-terminal",
+ "href": "core/week-2/workshop.html#rstudio-terminal",
"title": "Workshop",
- "section": "Filter to remove unexpressed genes",
- "text": "Filter to remove unexpressed genes\nIn this dataset, we will not see and genes that are not expressed in any of the cells because we are using a specific subset of the transcriptome that was deliberately selected. However, we will go through how to do this because it is an important step in most analyses.\nFor the 🐸 frog data you should remember that we were able to filter out our unexpressed genes in Omics 1 because we were examining both groups to be compared. In that workshop, we discussed that we could not filter out unexpressed genes in the 🐭 mouse data because we only had one cell types at that time. During the Consolidate Independent Study you examined the hspc cells.\nWhere the sum of all the values in the rows is zero, all the entries must be zero. We can use this to find the filter the genes that are not expressed in any of the cells. To do row wise aggregates such as the sum across rows we can use the rowwise() function. c_across() allows us to use the colon notation Prog_001:HSPC_852 in sum() rather than having to list all the column names: sum(Prog_001, Prog_002, Prog_002, Prog_004,.....)\n🎬 Find the genes that are 0 in every column of the prog_hspc dataframe:\n\nprog_hspc |> \n rowwise() |> \n filter(sum(c_across(Prog_001:HSPC_852)) == 0)\n\n# A tibble: 0 × 1,499\n# Rowwise: \n# ℹ 1,499 variables: Prog_001 <dbl>, Prog_002 <dbl>, Prog_003 <dbl>,\n# Prog_004 <dbl>, Prog_006 <dbl>, Prog_007 <dbl>, Prog_008 <dbl>,\n# Prog_009 <dbl>, Prog_010 <dbl>, Prog_011 <dbl>, Prog_012 <dbl>,\n# Prog_013 <dbl>, Prog_014 <dbl>, Prog_015 <dbl>, Prog_016 <dbl>,\n# Prog_017 <dbl>, Prog_018 <dbl>, Prog_019 <dbl>, Prog_020 <dbl>,\n# Prog_021 <dbl>, Prog_022 <dbl>, Prog_023 <dbl>, Prog_024 <dbl>,\n# Prog_025 <dbl>, Prog_026 <dbl>, Prog_027 <dbl>, Prog_028 <dbl>, …\n\n\nNotice that we have summed across all the columns.\n❓ What do you conclude?\n\n\nWe might also examine the genes which are least expressed.\n🎬 Find ten least expressed genes:\n\nrowSums(prog_hspc) |> sort() |> head(10)\n\nENSMUSG00000041046 ENSMUSG00000012428 ENSMUSG00000022225 ENSMUSG00000027863 \n 30.70322 35.35796 50.45975 61.27461 \nENSMUSG00000019359 ENSMUSG00000020701 ENSMUSG00000030772 ENSMUSG00000027376 \n 68.90961 77.95594 84.11234 97.69333 \nENSMUSG00000023132 ENSMUSG00000026285 \n 120.43065 126.95425 \n\n\n❓ What do you conclude?"
+ "section": "RStudio terminal",
+ "text": "RStudio terminal\nThe RStudio terminal is a convenient interface to the shell without leaving RStudio. It is useful for running commands that are not available in R. For example, you can use it to run other programs like fasqc, git, ftp, ssh\nNavigating your file system\nSeveral commands are frequently used to create, inspect, rename, and delete files and directories.\n$\nThe dollar sign is the prompt (like > on the R console), which shows us that the shell is waiting for input.\nYou can find out where you are using the pwd command, which stands for “print working directory”.\n\npwd\n\n/home/runner/work/BIO00088H-data/BIO00088H-data/core/week-2\n\n\nYou can find out what you can see with ls which stands for “list”.\n\nls\n\ndata\nimages\noverview.html\noverview.qmd\nstudy_after_workshop.qmd\nstudy_before_workshop.html\nstudy_before_workshop.ipynb\nstudy_before_workshop.qmd\nworkshop.html\nworkshop.qmd\nworkshop.rmarkdown\nworkshop_files\n\n\nYou might have noticed that unlike R, the commands do not have brackets after them. Instead, options (or switches) are given after the command. For example, we can modify the ls command to give us more information with the -l option, which stands for “long”.\n\nls -l\n\ntotal 228\ndrwxr-xr-x 2 runner docker 4096 Dec 15 12:46 data\ndrwxr-xr-x 2 runner docker 4096 Dec 15 12:46 images\n-rw-r--r-- 1 runner docker 27497 Dec 15 13:09 overview.html\n-rw-r--r-- 1 runner docker 1597 Dec 15 12:46 overview.qmd\n-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd\n-rw-r--r-- 1 runner docker 70988 Dec 15 13:09 study_before_workshop.html\n-rw-r--r-- 1 runner docker 4807 Dec 15 12:46 study_before_workshop.ipynb\n-rw-r--r-- 1 runner docker 13029 Dec 15 12:46 study_before_workshop.qmd\n-rw-r--r-- 1 runner docker 58063 Dec 15 12:46 workshop.html\n-rw-r--r-- 1 runner docker 8550 Dec 15 12:46 workshop.qmd\n-rw-r--r-- 1 runner docker 8564 Dec 15 13:09 workshop.rmarkdown\ndrwxr-xr-x 3 runner docker 4096 Dec 15 12:46 workshop_files\n\n\nYou can use more than one option at once. The -h option stands for “human readable” and makes the file sizes easier to understand for humans:\n\nls -hl\n\ntotal 228K\ndrwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 data\ndrwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 images\n-rw-r--r-- 1 runner docker 27K Dec 15 13:09 overview.html\n-rw-r--r-- 1 runner docker 1.6K Dec 15 12:46 overview.qmd\n-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd\n-rw-r--r-- 1 runner docker 70K Dec 15 13:09 study_before_workshop.html\n-rw-r--r-- 1 runner docker 4.7K Dec 15 12:46 study_before_workshop.ipynb\n-rw-r--r-- 1 runner docker 13K Dec 15 12:46 study_before_workshop.qmd\n-rw-r--r-- 1 runner docker 57K Dec 15 12:46 workshop.html\n-rw-r--r-- 1 runner docker 8.4K Dec 15 12:46 workshop.qmd\n-rw-r--r-- 1 runner docker 8.4K Dec 15 13:09 workshop.rmarkdown\ndrwxr-xr-x 3 runner docker 4.0K Dec 15 12:46 workshop_files\n\n\nThe -a option stands for “all” and shows us all the files, including hidden files.\n\nls -alh\n\ntotal 236K\ndrwxr-xr-x 5 runner docker 4.0K Dec 15 13:09 .\ndrwxr-xr-x 6 runner docker 4.0K Dec 15 13:09 ..\ndrwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 data\ndrwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 images\n-rw-r--r-- 1 runner docker 27K Dec 15 13:09 overview.html\n-rw-r--r-- 1 runner docker 1.6K Dec 15 12:46 overview.qmd\n-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd\n-rw-r--r-- 1 runner docker 70K Dec 15 13:09 study_before_workshop.html\n-rw-r--r-- 1 runner docker 4.7K Dec 15 12:46 study_before_workshop.ipynb\n-rw-r--r-- 1 runner docker 13K Dec 15 12:46 study_before_workshop.qmd\n-rw-r--r-- 1 runner docker 57K Dec 15 12:46 workshop.html\n-rw-r--r-- 1 runner docker 8.4K Dec 15 12:46 workshop.qmd\n-rw-r--r-- 1 runner docker 8.4K Dec 15 13:09 workshop.rmarkdown\ndrwxr-xr-x 3 runner docker 4.0K Dec 15 12:46 workshop_files\n\n\nYou can move about with the cd command, which stands for “change directory”. You can use it to move into a directory by specifying the path to the directory:\n\ncd data\npwd\ncd ..\npwd\ncd data\npwd\n\n/home/runner/work/BIO00088H-data/BIO00088H-data/core/week-2/data\n/home/runner/work/BIO00088H-data/BIO00088H-data/core/week-2\n/home/runner/work/BIO00088H-data/BIO00088H-data/core/week-2/data\n\n\nhead 1cq2.pdb\nHEADER OXYGEN STORAGE/TRANSPORT 04-AUG-99 1CQ2 \nTITLE NEUTRON STRUCTURE OF FULLY DEUTERATED SPERM WHALE MYOGLOBIN AT 2.0 \nTITLE 2 ANGSTROM \nCOMPND MOL_ID: 1; \nCOMPND 2 MOLECULE: MYOGLOBIN; \nCOMPND 3 CHAIN: A; \nCOMPND 4 ENGINEERED: YES; \nCOMPND 5 OTHER_DETAILS: PROTEIN IS FULLY DEUTERATED \nSOURCE MOL_ID: 1; \nSOURCE 2 ORGANISM_SCIENTIFIC: PHYSETER CATODON; \nhead -20 data/1cq2.pdb\nHEADER OXYGEN STORAGE/TRANSPORT 04-AUG-99 1CQ2 \nTITLE NEUTRON STRUCTURE OF FULLY DEUTERATED SPERM WHALE MYOGLOBIN AT 2.0 \nTITLE 2 ANGSTROM \nCOMPND MOL_ID: 1; \nCOMPND 2 MOLECULE: MYOGLOBIN; \nCOMPND 3 CHAIN: A; \nCOMPND 4 ENGINEERED: YES; \nCOMPND 5 OTHER_DETAILS: PROTEIN IS FULLY DEUTERATED \nSOURCE MOL_ID: 1; \nSOURCE 2 ORGANISM_SCIENTIFIC: PHYSETER CATODON; \nSOURCE 3 ORGANISM_COMMON: SPERM WHALE; \nSOURCE 4 ORGANISM_TAXID: 9755; \nSOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; \nSOURCE 6 EXPRESSION_SYSTEM_TAXID: 562; \nSOURCE 7 EXPRESSION_SYSTEM_VECTOR_TYPE: PLASMID; \nSOURCE 8 EXPRESSION_SYSTEM_PLASMID: PET15A \nKEYWDS HELICAL, GLOBULAR, ALL-HYDROGEN CONTAINING STRUCTURE, OXYGEN STORAGE- \nKEYWDS 2 TRANSPORT COMPLEX \nEXPDTA NEUTRON DIFFRACTION \nAUTHOR F.SHU,V.RAMAKRISHNAN,B.P.SCHOENBORN \nless 1cq2.pdb\nless is a program that displays the contents of a file, one page at a time. It is useful for viewing large files because it does not load the whole file into memory before displaying it. Instead, it reads and displays a few lines at a time. You can navigate forward through the file with the spacebar, and backwards with the b key. Press q to quit.\nA wildcard is a character that can be used as a substitute for any of a class of characters in a search, The most common wildcard characters are the asterisk (*) and the question mark (?).\nls *.csv\ncp stands for “copy”. You can copy a file from one directory to another by giving cp the path to the file you want to copy and the path to the destination directory.\ncp 1cq2.pdb copy_of_1cq2.pdb\ncp 1cq2.pdb ../copy_of_1cq2.pdb\ncp 1cq2.pdb ../bob.txt\nTo delete a file use the rm command, which stands for “remove”.\nrm ../bob.txt\nbut be careful because the file will be gone forever. There is no “are you sure?” or undo.\nTo move a file from one directory to another, use the mv command. mv works like cp except that it also deletes the original file.\nmv ../copy_of_1cq2.pdb .\nMake a directory\nmkdir mynewdir"
},
{
- "objectID": "omics/week-4/workshop.html#find-the-genes-that-are-expressed-in-only-one-cell-type",
- "href": "omics/week-4/workshop.html#find-the-genes-that-are-expressed-in-only-one-cell-type",
+ "objectID": "core/week-2/workshop.html#differences-between-r-and-python",
+ "href": "core/week-2/workshop.html#differences-between-r-and-python",
"title": "Workshop",
- "section": "Find the genes that are expressed in only one cell type",
- "text": "Find the genes that are expressed in only one cell type\nTo find the genes that are expressed in only one cell type, we can use the same approach as above but only sum the columns for one cell type.\n🎬 Find the genes that are 0 in every column for the HSPC cells:\n\nprog_hspc |> \n rowwise() |> \n filter(sum(c_across(HSPC_001:HSPC_852)) == 0)\n\n# A tibble: 0 × 1,499\n# Rowwise: \n# ℹ 1,499 variables: Prog_001 <dbl>, Prog_002 <dbl>, Prog_003 <dbl>,\n# Prog_004 <dbl>, Prog_006 <dbl>, Prog_007 <dbl>, Prog_008 <dbl>,\n# Prog_009 <dbl>, Prog_010 <dbl>, Prog_011 <dbl>, Prog_012 <dbl>,\n# Prog_013 <dbl>, Prog_014 <dbl>, Prog_015 <dbl>, Prog_016 <dbl>,\n# Prog_017 <dbl>, Prog_018 <dbl>, Prog_019 <dbl>, Prog_020 <dbl>,\n# Prog_021 <dbl>, Prog_022 <dbl>, Prog_023 <dbl>, Prog_024 <dbl>,\n# Prog_025 <dbl>, Prog_026 <dbl>, Prog_027 <dbl>, Prog_028 <dbl>, …\n\n\nWe have summed across the HSPC cells only. Note that if we knew there were some rows that were all zero across both cell types, we would need to add |> filter(sum(c_across(Prog_001:Prog_852)) != 0)\nmeaning zero in all the HSPC but not zero in all the Prog\n🎬 Now you find the genes that are 0 in every column for the Prog cells:\n\n\n# A tibble: 0 × 1,499\n# Rowwise: \n# ℹ 1,499 variables: Prog_001 <dbl>, Prog_002 <dbl>, Prog_003 <dbl>,\n# Prog_004 <dbl>, Prog_006 <dbl>, Prog_007 <dbl>, Prog_008 <dbl>,\n# Prog_009 <dbl>, Prog_010 <dbl>, Prog_011 <dbl>, Prog_012 <dbl>,\n# Prog_013 <dbl>, Prog_014 <dbl>, Prog_015 <dbl>, Prog_016 <dbl>,\n# Prog_017 <dbl>, Prog_018 <dbl>, Prog_019 <dbl>, Prog_020 <dbl>,\n# Prog_021 <dbl>, Prog_022 <dbl>, Prog_023 <dbl>, Prog_024 <dbl>,\n# Prog_025 <dbl>, Prog_026 <dbl>, Prog_027 <dbl>, Prog_028 <dbl>, …\n\n\n❓ What do you conclude?"
+ "section": "Differences between R and python",
+ "text": "Differences between R and python\nDemo\nYou’re finished!"
},
{
- "objectID": "omics/week-4/workshop.html#differential-expression-analysis-1",
- "href": "omics/week-4/workshop.html#differential-expression-analysis-1",
- "title": "Workshop",
- "section": "Differential expression analysis",
- "text": "Differential expression analysis\nLike DESeq2, scran uses a statistical model to calculate the significance of the difference between the treatments and needs metadata to define the treatments.\n🎬 Load the scran package:\nThe meta data needed for the frog data was information about which columns were in which treatment group and which sibling group and we had that information in a file. Similarly, here we need information on which columns are from which cell type. Instead of having this is a file, we will create a vector that indicates which column belongs to which cell type.\n🎬 Create a vector that indicates which column belongs to which cell type:\n\ncell_type <- rep(c(\"prog\",\"hspc\"), \n times = c(length(prog) - 1,\n length(hspc) - 1))\n\nThe number of times each cell type is repeated is the number of columns in that cell type minus 1. This is because we have removed the column with the gene ids. Do check that the length of the cell_type vector is the same as the number of columns in the prog_hspc dataframe.\n🎬 Run the differential expression analysis:\n\nres_prog_hspc <- findMarkers(prog_hspc, \n cell_type)\n\nfindMarkers() is the function that runs the differential expression analysis. The first argument is the dataframe containing the data. The second argument is the vector indicating which columns are in which cell type. It gives us two dataframes of the results - rather unnecessarily. One is the results with fold changes that are Prog/HSPC and the other is the results with fold changes that are HSPC/Prog. These have the same magnitude, just a different sign\nThe dataframe res_prog_hspc$prog is log prog - log hspc (i.e.,Prog/HSPC). This means - Positive fold change: prog is higher than hspc - Negative fold change: hspc is higher than prog\nThe dataframe res_prog_hspc$hspc is log hspc - log prog (i.e., HSPC/Prog). . This means - Positive fold change: hspc is higher than prog - Negative fold change: prog is higher than hspc\n\n\n\nThe res_prog_hspc$prog dataframe\n\n\n\n\n\n\n\n\n\n\n\nTop\np.value\nFDR\nsummary.logFC\nlogFC.hspc\nensembl_gene_id\n\n\n\nENSMUSG00000028639\n1\n0\n0\n1.596910\n1.596910\nENSMUSG00000028639\n\n\nENSMUSG00000024053\n2\n0\n0\n3.035165\n3.035165\nENSMUSG00000024053\n\n\nENSMUSG00000041329\n3\n0\n0\n3.261056\n3.261056\nENSMUSG00000041329\n\n\nENSMUSG00000030336\n4\n0\n0\n-2.146491\n-2.146491\nENSMUSG00000030336\n\n\nENSMUSG00000016494\n5\n0\n0\n-3.056730\n-3.056730\nENSMUSG00000016494\n\n\nENSMUSG00000002808\n6\n0\n0\n3.000810\n3.000810\nENSMUSG00000002808\n\n\n\n\n\n\n\n\nThe res_prog_hspc$hspc dataframe. Notice the sign of the fold change is the other way\n\n\n\n\n\n\n\n\n\n\n\nTop\np.value\nFDR\nsummary.logFC\nlogFC.prog\nensembl_gene_id\n\n\n\nENSMUSG00000028639\n1\n0\n0\n-1.596910\n-1.596910\nENSMUSG00000028639\n\n\nENSMUSG00000024053\n2\n0\n0\n-3.035165\n-3.035165\nENSMUSG00000024053\n\n\nENSMUSG00000041329\n3\n0\n0\n-3.261056\n-3.261056\nENSMUSG00000041329\n\n\nENSMUSG00000030336\n4\n0\n0\n2.146491\n2.146491\nENSMUSG00000030336\n\n\nENSMUSG00000016494\n5\n0\n0\n3.056730\n3.056730\nENSMUSG00000016494\n\n\nENSMUSG00000002808\n6\n0\n0\n-3.000810\n-3.000810\nENSMUSG00000002808\n\n\n\n\n\n🎬 Write the results to file:\n\ndata.frame(res_prog_hspc$prog, \n ensembl_gene_id = row.names(res_prog_hspc$prog)) |> \n write_csv(\"results/prog_hspc_results.csv\")"
+ "objectID": "omics/omics.html",
+ "href": "omics/omics.html",
+ "title": "Omics Data Analysis for Group Project",
+ "section": "",
+ "text": "This week you will meet your data. The independent study will concisely cover how these data were generated and how they have been processed before being given to you. There will also be an overview of the analysis we will carry out over three workshops. In the workshop, you will learn what steps to take to get a good understanding of ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control.\n\n\n\nThis week we cover differential expression analysis on raw counts or log normalised values. The independent study will allow you to check you have what you should have following the Omics 1: Hello Data workshop and Consolidation study. It will also summarise the concepts and methods we will use in the workshop. In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both.\n\n\n\nbefore\n\nrecap what we have\nPCA\nvolcano plot described\nGO terms\n\n\nworkshop\n\nPCA\nvolcano plot\nannotating with go terms\n\nafter\n\ndocument what you have done\nrepeat on another comparison\n\nReferences"
},
{
- "objectID": "omics/week-4/workshop.html#footnotes",
- "href": "omics/week-4/workshop.html#footnotes",
- "title": "Workshop",
- "section": "Footnotes",
- "text": "Footnotes\n\nBioconductor is a project that develops and supports R packages for bioinformatics.↩︎"
+ "objectID": "omics/omics.html#omics-1-hello-data",
+ "href": "omics/omics.html#omics-1-hello-data",
+ "title": "Omics Data Analysis for Group Project",
+ "section": "",
+ "text": "This week you will meet your data. The independent study will concisely cover how these data were generated and how they have been processed before being given to you. There will also be an overview of the analysis we will carry out over three workshops. In the workshop, you will learn what steps to take to get a good understanding of ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control."
},
{
- "objectID": "omics/week-4/study_before_workshop.html#overview",
- "href": "omics/week-4/study_before_workshop.html#overview",
- "title": "Independent Study to prepare for workshop",
- "section": "Overview",
- "text": "Overview\nIn these slides we will:\n\n\nCheck where you are\n\nlearn some concepts in differential expression\n\nlog2 fold changes\nMultiple correction\nnormalisation\nstatistical model\n\n\nFind out what packages to install before the workshop"
+ "objectID": "omics/omics.html#omics-2-statistical-analysis",
+ "href": "omics/omics.html#omics-2-statistical-analysis",
+ "title": "Omics Data Analysis for Group Project",
+ "section": "",
+ "text": "This week we cover differential expression analysis on raw counts or log normalised values. The independent study will allow you to check you have what you should have following the Omics 1: Hello Data workshop and Consolidation study. It will also summarise the concepts and methods we will use in the workshop. In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both."
},
{
- "objectID": "omics/week-4/study_before_workshop.html#what-we-did-in-omics-1-hello-data",
- "href": "omics/week-4/study_before_workshop.html#what-we-did-in-omics-1-hello-data",
- "title": "Independent Study to prepare for workshop",
- "section": "What we did in Omics 1: 👋 Hello data!",
- "text": "What we did in Omics 1: 👋 Hello data!\n\n\n\nDiscovered how many rows and columns we had in our datasets and what these were.\nExamined the distribution\n\nof values across the whole dataset\nof values across the samples/cells (i.e., averaged across genes) to see variation between samples/cells\nof values across the genes (i.e., averaged across samples/cells) to see variation between genes\n\n\nSaved files of filtered or summarised data."
+ "objectID": "omics/omics.html#omics-3-visualising-and-interpreting",
+ "href": "omics/omics.html#omics-3-visualising-and-interpreting",
+ "title": "Omics Data Analysis for Group Project",
+ "section": "",
+ "text": "before\n\nrecap what we have\nPCA\nvolcano plot described\nGO terms\n\n\nworkshop\n\nPCA\nvolcano plot\nannotating with go terms\n\nafter\n\ndocument what you have done\nrepeat on another comparison\n\nReferences"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#where-should-you-be-1",
- "href": "omics/week-4/study_before_workshop.html#where-should-you-be-1",
+ "objectID": "omics/week-3/study_before_workshop.html#overview",
+ "href": "omics/week-3/study_before_workshop.html#overview",
"title": "Independent Study to prepare for workshop",
- "section": "Where should you be?",
- "text": "Where should you be?\nAfter the Omics 1: 👋 Hello data! Workshop including:\n\n🤗 Look after future you! and\nthe Independent Study to consolidate, you should have:"
+ "section": "Overview",
+ "text": "Overview\n\n\nConcise summary of the experimental design and aims\nWhat the raw data consist of\nWhat has been done to the data so far\nWhat steps we will take in the workshop"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#frogs",
- "href": "omics/week-4/study_before_workshop.html#frogs",
+ "objectID": "omics/week-3/study_before_workshop.html#the-data",
+ "href": "omics/week-3/study_before_workshop.html#the-data",
"title": "Independent Study to prepare for workshop",
- "section": "🐸 Frogs",
- "text": "🐸 Frogs\n\n\nAn RStudio Project called frogs-88H which contains:\n\nRaw data (S14, S20 and S30)\nProcessed data (s30_filtered.csv, s30_summary_gene.csv, s30_summary_gene_filtered.csv, s30_summary_samp.csv and equivalents for S14 OR S20)\nTwo scripts called cont-fgf-s30.R and cont-fgf-s20.R OR cont-fgf-s14.R\n\n\n\n\n\nFiles should be organised into folders. Code should well commented and easy to read."
+ "section": "The Data",
+ "text": "The Data\nThere are three datasets\n\n🐸 transcriptomic data (bulk RNA-seq) from frog embryos.\n🐭 transcriptomic data (single cell RNA-seq) from stemcells\n🍂 ??????? Metabolomic / Metagenomic data from anaerobic digesters"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#mice",
- "href": "omics/week-4/study_before_workshop.html#mice",
+ "objectID": "omics/week-3/study_before_workshop.html#experimental-design-1",
+ "href": "omics/week-3/study_before_workshop.html#experimental-design-1",
"title": "Independent Study to prepare for workshop",
- "section": "🐭 Mice",
- "text": "🐭 Mice\n\nAn RStudio Project called mice-88H which contains\n\nRaw data (hspc, prog, lthsc)\nProcessed data (hspc_summary_gene.csv, hspc_summary_samp.csv, prog_summary_gene.csv, prog_summary_samp.csv)\n\n\nOne script called hspc-prog.R\n\n\nFiles should be organised into folders. Code should well commented and easy to read."
+ "section": "🐸 Experimental design",
+ "text": "🐸 Experimental design\n\nSchematic of frog development experiment"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#section",
- "href": "omics/week-4/study_before_workshop.html#section",
+ "objectID": "omics/week-3/study_before_workshop.html#experimental-design-2",
+ "href": "omics/week-3/study_before_workshop.html#experimental-design-2",
"title": "Independent Study to prepare for workshop",
- "section": "🍂",
- "text": "🍂\nEither of the other examples."
+ "section": "🐸 Experimental design",
+ "text": "🐸 Experimental design\n\nSchematic of frog development experiment\n\n3 fertilisations\ntwo siblings from each fertilisation one control, on FGF treated\nsequenced at three time points\n3 x 2 x 3 = 18 groups"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#if-you-do-not-have-those",
- "href": "omics/week-4/study_before_workshop.html#if-you-do-not-have-those",
+ "objectID": "omics/week-3/study_before_workshop.html#experimental-design-3",
+ "href": "omics/week-3/study_before_workshop.html#experimental-design-3",
"title": "Independent Study to prepare for workshop",
- "section": "If you do not have those",
- "text": "If you do not have those\nGo through:\n\nOmics 1: 👋 Hello data! Workshop including:\n🤗 Look after future you! and\nthe Independent Study to consolidate"
+ "section": "🐸 Experimental design",
+ "text": "🐸 Experimental design\n\nSchematic of frog development experiment\n\n3 fertilisations. These are the replicates, .5, .6, A\ntwo siblings from each fertilisation one control, one FGF treated. The treatments are paired\nsequenced at three time points. S14, S20, S30\n3 x 2 x 3 = 18 groups"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#differential-expression-1",
- "href": "omics/week-4/study_before_workshop.html#differential-expression-1",
+ "objectID": "omics/week-3/study_before_workshop.html#aim",
+ "href": "omics/week-3/study_before_workshop.html#aim",
"title": "Independent Study to prepare for workshop",
- "section": "Differential expression",
- "text": "Differential expression\n\n\nThe goal of differential expression is to test whether there is a significant difference in gene expression between groups.\nA large number of computational methods have been developed for differential expression analysis\nR is the leading language for differential expression analysis"
+ "section": "🐸 Aim",
+ "text": "🐸 Aim\n\n\nfind genes important in frog development\nImportant means the genes that are differentially expressed between the control-treated and the FGF-treated siblings\nDifferentially expressed means the expression in one group is significantly higher than in the other"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#differential-expression-2",
- "href": "omics/week-4/study_before_workshop.html#differential-expression-2",
+ "objectID": "omics/week-3/study_before_workshop.html#guided-analysis",
+ "href": "omics/week-3/study_before_workshop.html#guided-analysis",
"title": "Independent Study to prepare for workshop",
- "section": "Differential expression",
- "text": "Differential expression\n\n\nthe statistical concepts are very similar to those you have already encountered in stages 1 and 2\nyou are essentially doing paired- or independent-samples tests\nbut you are doing a lot of them! One for every gene\ndata need normalisation before comparison"
+ "section": "🐸 Guided analysis",
+ "text": "🐸 Guided analysis\n\n\nThe workshops will take you through comparing the control and FGF treated sibling at S30\nThis is the “least interesting” comparison\nYou will be guided to carefully document your work so you can apply the same methods to other comparisons"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#statistical-concepts",
- "href": "omics/week-4/study_before_workshop.html#statistical-concepts",
+ "objectID": "omics/week-3/study_before_workshop.html#experimental-design-4",
+ "href": "omics/week-3/study_before_workshop.html#experimental-design-4",
"title": "Independent Study to prepare for workshop",
- "section": "Statistical concepts",
- "text": "Statistical concepts\nLike familiar tests:\n\n\nthe type of test (the function) you use depends on the type of data you have and the type of assumptions you want to make\nthe tests work by comparing the variation between groups to the variation within groups.\nyou will get: the difference between groups, a test statistic, and a p-value\nyou also get an adjusted p-value which is the ‘correction’ for multiple testing"
+ "section": "🐭 Experimental design",
+ "text": "🐭 Experimental design\n\nSchematic of stem cell experiment"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#the-difference-between-groups",
- "href": "omics/week-4/study_before_workshop.html#the-difference-between-groups",
+ "objectID": "omics/week-3/study_before_workshop.html#experimental-design-5",
+ "href": "omics/week-3/study_before_workshop.html#experimental-design-5",
"title": "Independent Study to prepare for workshop",
- "section": "The difference between groups",
- "text": "The difference between groups\n\n\nThe difference between groups is given as the log2 fold change in expression between groups\nA fold change is the expression in one group divided by the expression in the other group\nwe use fold changes because the absolute expression values may not be accurate and relative changes are what matters\nwe use log2 fold changes because they are symmetrical around 0"
+ "section": "🐭 Experimental design",
+ "text": "🐭 Experimental design\n\nSchematic of stem cell experiment\n\nCells were sorted using flow cytometry on the basis of cell surface markers\nThere are three cell types: LT-HSCs, HSPCs, Progs\nMany cells of each cell type were sequenced"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#log2-fold-change",
- "href": "omics/week-4/study_before_workshop.html#log2-fold-change",
+ "objectID": "omics/week-3/study_before_workshop.html#experimental-design-6",
+ "href": "omics/week-3/study_before_workshop.html#experimental-design-6",
"title": "Independent Study to prepare for workshop",
- "section": "log2 fold change",
- "text": "log2 fold change\n\n\nlog2 means log to the base 2\nSuppose the expression in group A is 5 and the expression in group B is 8\nA/B = 5/8 = 0.625 and B/A = 8/5 = 1.6\nIf B is greater than A the range of A/B is 0 to 1 but the range of B/A is 1 to infinity\nHowever, if we take the log2 of A/B we get -0.678 and the log2 of B/A is 0.678."
+ "section": "🐭 Experimental design",
+ "text": "🐭 Experimental design\n\nSchematic of stem cell experiment\n\nThere are three cell types: LT-HSCs, HSPCs, Progs These are the “treaments”\nMany cells of each type were sequenced: These are the replicates\n155 LT-HSCs, 701 HSPCs, 798 Progs"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#adjusted-p-value",
- "href": "omics/week-4/study_before_workshop.html#adjusted-p-value",
+ "objectID": "omics/week-3/study_before_workshop.html#aim-1",
+ "href": "omics/week-3/study_before_workshop.html#aim-1",
"title": "Independent Study to prepare for workshop",
- "section": "Adjusted p-value",
- "text": "Adjusted p-value\n\n\nThe p-value has to be adjusted because of the number of tested being done\nIn stage 1, we used Tukey’s HSD to adjust for multiple testing following an ANOVA\nHere the Benjamini-Hochberg procedure (Benjamini and Hochberg 1995) is used to adjust for multiple testing\nBH controls the False Discovery Rate (FDR)\nThe FDR is the proportion of false positives among the genes called significant"
+ "section": "🐭 Aim",
+ "text": "🐭 Aim\n\n\nfind genes for cell surface proteins that are important in stem cell identity\nImportant means genes that are differentially expressed between at least two cell types\nDifferentially expressed means the expression in one group is significantly higher than in the other"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#normalisation",
- "href": "omics/week-4/study_before_workshop.html#normalisation",
+ "objectID": "omics/week-3/study_before_workshop.html#guided-analysis-1",
+ "href": "omics/week-3/study_before_workshop.html#guided-analysis-1",
"title": "Independent Study to prepare for workshop",
- "section": "Normalisation",
- "text": "Normalisation\n\n\nNormalisation adjusts raw counts to account for factors that prevent direct comparisons\nNormalisation usually influences the experimental design as well as the analysis\nThe 🐭 mouse data have been normalised to simplify the analysis for you; the 🐸 frog data have not but the DE method will do this for you.\nNormalisation is a big topic. See Düren, Lederer, and Qin (2022); Bullard et al. (2010); Lytal, Ran, and An (2020); Abrams et al. (2019); Vallejos et al. (2017); Evans, Hardin, and Stoebel (2017)"
+ "section": "🐭 Guided analysis",
+ "text": "🐭 Guided analysis\n\n\nThe workshops will take you through comparing the HSPC and Prog cells\nThis is the “least interesting” comparison\nYou will be guided to carefully document your work so you can apply the same methods to other comparisons"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#type-of-test-the-function",
- "href": "omics/week-4/study_before_workshop.html#type-of-test-the-function",
+ "objectID": "omics/week-3/study_before_workshop.html#raw-sequence-data",
+ "href": "omics/week-3/study_before_workshop.html#raw-sequence-data",
"title": "Independent Study to prepare for workshop",
- "section": "Type of test (the function)",
- "text": "Type of test (the function)\n\n\nA large number of computational methods have been developed for differential expression analysis\nMethods vary in the types of normalisation they do, the statistical model they use, and the assumptions they make\nSome of the most well-known methods are provided by: DESeq2 (Love, Huber, and Anders 2014), edgeR (Robinson, McCarthy, and Smyth 2010; McCarthy, Chen, and Smyth 2012; Chen, Lun, and Smyth 2016), limma (Ritchie et al. 2015) and scran (Lun, McCarthy, and Marioni 2016)"
+ "section": "Raw Sequence data",
+ "text": "Raw Sequence data\n\n\nThe raw data are “reads” from a sequencing machine.\nA read is sequence of DNA or RNA shorter than the whole genome or transcriptome\nThe length of the reads depends on the type of sequencing machine\n\nShort-read technologies e.g. Illumina have higher base accuracy but are harder to align\nLong-read technologies e.g. Nanopore have lower base accuracy but are easier to align"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#type-of-test-the-function-1",
- "href": "omics/week-4/study_before_workshop.html#type-of-test-the-function-1",
+ "objectID": "omics/week-3/study_before_workshop.html#raw-sequence-data-1",
+ "href": "omics/week-3/study_before_workshop.html#raw-sequence-data-1",
"title": "Independent Study to prepare for workshop",
- "section": "Type of test (the function)",
- "text": "Type of test (the function)\n\n\n\nDESeq2 and edgeR\n\nboth require raw counts as input\nboth assume that most genes are not DE\nboth use a negative binomial distribution1 to model the data\nuse slightly different normalisation methods: DESeq2 uses the median of ratios method; edgeR uses the trimmed mean of M values (TMM) method\n\n\n\n\nA discrete distribution for counts, similar to the Poisson distribution"
+ "section": "Raw Sequence data",
+ "text": "Raw Sequence data\n\n\nSequencing technology is constantly improving\nOptional: You can read more about Sequencing technologies in Statistically useful experimental design (Rand and Forrester 2022)"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#type-of-test-the-function-2",
- "href": "omics/week-4/study_before_workshop.html#type-of-test-the-function-2",
+ "objectID": "omics/week-3/study_before_workshop.html#raw-sequence-data-2",
+ "href": "omics/week-3/study_before_workshop.html#raw-sequence-data-2",
"title": "Independent Study to prepare for workshop",
- "section": "Type of test (the function)",
- "text": "Type of test (the function)\n\n\nscran\n\nworks on normalized log-expression values\nperforms Welch t-tests"
+ "section": "Raw Sequence data",
+ "text": "Raw Sequence data\n\n\nThe RNA-seq data are from an Illumina machine 150-300bp; Metagenomic data are often Nanopore 10,000 - 30000bp\nReads are in FASTQ files\nFASTQ files contain the sequence of each read and a quality score for each base"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#meta-data",
- "href": "omics/week-4/study_before_workshop.html#meta-data",
+ "objectID": "omics/week-3/study_before_workshop.html#general-steps",
+ "href": "omics/week-3/study_before_workshop.html#general-steps",
"title": "Independent Study to prepare for workshop",
- "section": "Meta data",
- "text": "Meta data\n\n\nDE methods require two types of data: the expression data and the meta data\nThe meta data is the information about the samples\nIt says which samples (columns) are in which group (s)\nIt is usually stored in a separate file"
+ "section": "General steps",
+ "text": "General steps\n\n\nReads are filtered and trimmed on the basis of the quality score\nThey are then aligned/pseudo-aligned to a reference genome/transcriptome or, in metagenomics, assembled de novo.\nReads are then counted to quantify the expression or number of genomes in metagenomics\nCounts are normalised to account for differences in sequencing depth and gene/transcript/genome length before statistical analysis"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#data",
- "href": "omics/week-4/study_before_workshop.html#data",
+ "objectID": "omics/week-3/study_before_workshop.html#data",
+ "href": "omics/week-3/study_before_workshop.html#data",
"title": "Independent Study to prepare for workshop",
"section": "🐸 Data",
- "text": "🐸 Data\n\nExpression for the whole transcriptome X. laevis v10.1 genome assembly\nValues are raw counts\nThe statistical analysis method we will use DESeq2 (Love, Huber, and Anders 2014) requires raw counts and performs the normalisation itself"
+ "text": "🐸 Data\n\nUnpublished (so far!)\nExpression for the whole transcriptome X. laevis v10.1 genome assembly\nValues are raw counts\nThe statistical analysis method we will use DESeq2 (Love, Huber, and Anders 2014) requires raw counts and performs the normalisation itself"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#data-1",
- "href": "omics/week-4/study_before_workshop.html#data-1",
+ "objectID": "omics/week-3/study_before_workshop.html#data-1",
+ "href": "omics/week-3/study_before_workshop.html#data-1",
"title": "Independent Study to prepare for workshop",
"section": "🐭 Data",
- "text": "🐭 Data\n\nExpression for a subset of genes, the surfaceome\nValues are log2 normalised values\nThe statistical analysis method we will use scran (Lun, McCarthy, and Marioni 2016) requires normalised values"
- },
- {
- "objectID": "omics/week-4/study_before_workshop.html#packages-to-install-before-the-workshop",
- "href": "omics/week-4/study_before_workshop.html#packages-to-install-before-the-workshop",
- "title": "Independent Study to prepare for workshop",
- "section": "Packages to install before the workshop",
- "text": "Packages to install before the workshop\nBiocManager from CRAN in the the normal way:\n\ninstall.packages(\"BiocManager\")\n\nDESeq2 from Bioconductor using BiocManager:\n\nBiocManager::install(\"DESeq2\")\n\nscran from Bioconductor using BiocManager:\n\nBiocManager::install(\"scran\")"
+ "text": "🐭 Data\n\nPublished in Nestorowa et al. (2016)\nExpression for a subset of genes, the surfaceome\nValues are log2 normalised values\nThe statistical analysis method we will use scran (Lun, McCarthy, and Marioni 2016) requires normalised values"
},
{
- "objectID": "omics/week-4/study_before_workshop.html#workshops-1",
- "href": "omics/week-4/study_before_workshop.html#workshops-1",
+ "objectID": "omics/week-3/study_before_workshop.html#workshops-1",
+ "href": "omics/week-3/study_before_workshop.html#workshops-1",
"title": "Independent Study to prepare for workshop",
"section": "Workshops",
- "text": "Workshops\n\nOmics 1: Hello data Getting to know the data. Checking the distributions of values\nOmics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments.\nOmics 3: Visualising and Interpreting. PCA, Volcano plots and heatmaps to visualise results. Interpreting the results and finding out more about genes of interest."
+ "text": "Workshops\n\nOmics 1: Hello data Getting to know the data. Checking the distributions of values overall, across samples and across genes to check things are as we expect and detect genes/samples that need to be removed\nOmics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments. This is the main analysis step. We will use different methods for bulk and single cell data.\nOmics 3: Visualising and Interpreting Production of volcano plots and heatmaps to visualise the results of the statistical analysis. We will also look at how to interpret the results and how to find out more about the genes of interest."
},
{
- "objectID": "omics/week-4/study_before_workshop.html#references",
- "href": "omics/week-4/study_before_workshop.html#references",
+ "objectID": "omics/week-3/study_before_workshop.html#references",
+ "href": "omics/week-3/study_before_workshop.html#references",
"title": "Independent Study to prepare for workshop",
"section": "References",
- "text": "References\n\n\n🔗 About Omics 2: Statistical Analysis\n\n\n\nAbrams, Zachary B., Travis S. Johnson, Kun Huang, Philip R. O. Payne, and Kevin Coombes. 2019. “A Protocol to Evaluate RNA Sequencing Normalization Methods.” BMC Bioinformatics 20 (24): 679. https://doi.org/10.1186/s12859-019-3247-x.\n\n\nBenjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” J. R. Stat. Soc. Series B Stat. Methodol. 57 (1): 289–300. http://www.jstor.org/stable/2346101.\n\n\nBullard, James H., Elizabeth Purdom, Kasper D. Hansen, and Sandrine Dudoit. 2010. “Evaluation of Statistical Methods for Normalization and Differential Expression in mRNA-Seq Experiments.” BMC Bioinformatics 11 (1): 94. https://doi.org/10.1186/1471-2105-11-94.\n\n\nChen, Yunshun, Aaron T. L. Lun, and Gordon K. Smyth. 2016. “From Reads to Genes to Pathways: Differential Expression Analysis of RNA-Seq Experiments Using Rsubread and the edgeR Quasi-Likelihood Pipeline.” https://doi.org/10.12688/f1000research.8987.2.\n\n\nDüren, Yannick, Johannes Lederer, and Li-Xuan Qin. 2022. “Depth Normalization of Small RNA Sequencing: Using Data and Biology to Select a Suitable Method.” Nucleic Acids Research 50 (10): e56. https://doi.org/10.1093/nar/gkac064.\n\n\nEvans, Ciaran, Johanna Hardin, and Daniel M Stoebel. 2017. “Selecting Between-Sample RNA-Seq Normalization Methods from the Perspective of Their Assumptions.” Briefings in Bioinformatics 19 (5): 776–92. https://doi.org/10.1093/bib/bbx008.\n\n\nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2” 15: 550. https://doi.org/10.1186/s13059-014-0550-8.\n\n\nLun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor” 5: 2122. https://doi.org/10.12688/f1000research.9501.2.\n\n\nLytal, Nicholas, Di Ran, and Lingling An. 2020. “Normalization Methods on Single-Cell RNA-Seq Data: An Empirical Survey.” Frontiers in Genetics 11. https://www.frontiersin.org/articles/10.3389/fgene.2020.00041.\n\n\nMcCarthy, Davis J., Yunshun Chen, and Gordon K. Smyth. 2012. “Differential Expression Analysis of Multifactor RNA-Seq Experiments with Respect to Biological Variation.” Nucleic Acids Research 40 (10): 4288–97. https://doi.org/10.1093/nar/gks042.\n\n\nRitchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi, and Gordon K. Smyth. 2015. “Limma Powers Differential Expression Analyses for RNA-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7): e47. https://doi.org/10.1093/nar/gkv007.\n\n\nRobinson, Mark D., Davis J. McCarthy, and Gordon K. Smyth. 2010. “edgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26 (1): 139–40. https://doi.org/10.1093/bioinformatics/btp616.\n\n\nVallejos, Catalina A., Davide Risso, Antonio Scialdone, Sandrine Dudoit, and John C. Marioni. 2017. “Normalizing Single-Cell RNA Sequencing Data: Challenges and Opportunities.” Nature Methods 14 (6): 565–71. https://doi.org/10.1038/nmeth.4292."
+ "text": "References\n\n\n🔗 About Omics 1: Hello data!\n\n\n\nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2” 15: 550. https://doi.org/10.1186/s13059-014-0550-8.\n\n\nLun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor” 5: 2122. https://doi.org/10.12688/f1000research.9501.2.\n\n\nNestorowa, Sonia, Fiona K. Hamey, Blanca Pijuan Sala, Evangelia Diamanti, Mairi Shepherd, Elisa Laurenti, Nicola K. Wilson, David G. Kent, and Berthold Göttgens. 2016. “A Single-Cell Resolution Map of Mouse Hematopoietic Stem and Progenitor Cell Differentiation.” Blood 128 (8): e20–31. https://doi.org/10.1182/blood-2016-05-716480.\n\n\nRand, Emma, and Sarah Forrester. 2022. “Statistically Useful Experimental Design.” https://cloud-span.github.io/experimental_design00-overview/."
},
{
- "objectID": "index.html",
- "href": "index.html",
- "title": "Data Analysis for the Group Research Project",
+ "objectID": "omics/week-3/study_after_workshop.html",
+ "href": "omics/week-3/study_after_workshop.html",
+ "title": "Independent Study to consolidate this week",
"section": "",
- "text": "You are either\n\nan integrated masters student doing BIO00088H Group Research Project or\nan MSc Bioinformatics student doing BIO00070M Research, Professional and Team Skills\n\nFor students doing BIO00088H, Data Analysis compromises six workshops covering computational skills needed in your project. Three of these are core and taken by everyone and three are specific to your project type. MSc Bioinformatics students do the Core workshops and the ’omics workshops as part of BIO00070M.\nThe project types are:\n\n\n\n\n\n\n\nProject\nData Strand\n\n\n\n\nStem Cells, Jillian Barlow\n’omics, Emma Rand\n\n\nDevelopmental Biology, Betsy Pownal\n’omics, Emma Rand\n\n\nMicrobial Ecology, Kelly Redeker\n’omics, Emma Rand\n\n\nStructural Biochemistry, Michael Plevin\nmolecular-structure, Jon Agirre\n\n\nNeuroscience, Sean Sweeney\nimage-analysis, Richard Bingham\n\n\nxxxxxxxxxxxx, Richard Maguire\nimage-analysis, Richard Bingham\n\n\n\nThe data analysis workshops are:\n\n\n\nWeek\nData Strand\n\n\n\n\n1\nCore 1 Organising reproducible data analyses\n\n\n2\nCore 2 File types, workflow tips and other tools\n\n\n3\nomics/structure/images 1\n\n\n4\nomics/structure/images 2\n\n\n5\nomics/structure/images 3\n\n\n6\nDrop-in\n\n\n6\nCore 3 Research Compendia and Reproducible Reporting\n\n\n\n\n\nStudents who successfully complete this module will be able to\n\nuse appropriate computational techniques to reproducibly process, analyse and visualise data and generate scientific reports based on project work.\n\n\n\n\nAll material is on the VLE so why is this site useful? This site collects everything together in a searchable way. The search icon is on the top right.\n\n\n\nRand E (2023). Data Analysis for Group Project. https://3mmarand.github.io/BIO00088H-data/."
+ "text": "You need only do the section for your own project data\n🐸 Frogs\n🎬 Open your frogs-88H Project. Make a new script and, using cont-fgf-s30.R as a template, repeat the analysis on one of the other comparisons.\n🐭 Mice\n🎬 Open your mice-88H Project. Open your hspc-prog.R script and, using you code working with the hspc cells as a template, repeat the analysis on the prog cells.\n🍂 xxxx\nFollow one of the other examples."
},
{
- "objectID": "index.html#module-learning-outcome-linked-to-this-content",
- "href": "index.html#module-learning-outcome-linked-to-this-content",
- "title": "Data Analysis for the Group Research Project",
- "section": "",
- "text": "Students who successfully complete this module will be able to\n\nuse appropriate computational techniques to reproducibly process, analyse and visualise data and generate scientific reports based on project work."
+ "objectID": "omics/week-5/study_before_workshop.html#overview",
+ "href": "omics/week-5/study_before_workshop.html#overview",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Overview",
+ "text": "Overview\nIn these slides we will:\n\n\nCheck where you are\n\nlearn some concepts used omics visualisation\n\nPrinciple Component Analysis (PCA)\nVolcano plots\nHeatmaps\n\n\nFind out what packages to install before the workshop"
},
{
- "objectID": "index.html#what-is-this-site-for",
- "href": "index.html#what-is-this-site-for",
- "title": "Data Analysis for the Group Research Project",
- "section": "",
- "text": "All material is on the VLE so why is this site useful? This site collects everything together in a searchable way. The search icon is on the top right."
+ "objectID": "omics/week-5/study_before_workshop.html#what-we-did-in-omics-2-statistical-analysis",
+ "href": "omics/week-5/study_before_workshop.html#what-we-did-in-omics-2-statistical-analysis",
+ "title": "Independent Study to prepare for workshop",
+ "section": "What we did in Omics 2: Statistical Analysis",
+ "text": "What we did in Omics 2: Statistical Analysis\n\n\ncarried out differential expression analysis\nfound genes not expressed at all, or expressed in one group only\nSaved results files"
},
{
- "objectID": "index.html#please-cite-as",
- "href": "index.html#please-cite-as",
- "title": "Data Analysis for the Group Research Project",
- "section": "",
- "text": "Rand E (2023). Data Analysis for Group Project. https://3mmarand.github.io/BIO00088H-data/."
+ "objectID": "omics/week-5/study_before_workshop.html#where-should-you-be-1",
+ "href": "omics/week-5/study_before_workshop.html#where-should-you-be-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Where should you be?",
+ "text": "Where should you be?\nAfter the Omics 2: 👋 Statistical Analysis Workshop including:\n\n🤗 Look after future you! and\nthe Independent Study to consolidate, you should have:"
},
{
- "objectID": "omics/omics.html",
- "href": "omics/omics.html",
- "title": "Omics Data Analysis for Group Project",
- "section": "",
- "text": "This week you will meet your data. The independent study will concisely cover how these data were generated and how they have been processed before being given to you. There will also be an overview of the analysis we will carry out over three workshops. In the workshop, you will learn what steps to take to get a good understanding of ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control.\n\n\n\nThis week we cover differential expression analysis on raw counts or log normalised values. The independent study will allow you to check you have what you should have following the Omics 1: Hello Data workshop and Consolidation study. It will also summarise the concepts and methods we will use in the workshop. In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both.\n\n\n\nbefore\n\nrecap what we have\nPCA\nvolcano plot described\nGO terms\n\n\nworkshop\n\nPCA\nvolcano plot\nannotating with go terms\n\nafter\n\ndocument what you have done\nrepeat on another comparison\n\nReferences"
+ "objectID": "omics/week-5/study_before_workshop.html#frogs",
+ "href": "omics/week-5/study_before_workshop.html#frogs",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🐸 Frogs",
+ "text": "🐸 Frogs\n\n\nAn RStudio Project called frogs-88H which contains:\n\nRaw data (S14, S20 and S30)\nProcessed data (s30_filtered.csv, s30_summary_gene.csv, s30_summary_gene_filtered.csv, s30_summary_samp.csv and equivalents for S14 OR S20)\nResults files (s30_fgf_only.csv, S30_normalised_counts.csv, S30_results.csv and equivalents for S14 OR S20)\n\nTwo scripts called cont-fgf-s30.R and either cont-fgf-s20.R OR cont-fgf-s14.R\n\n\n\n\n\nFiles should be organised into folders. Code should well commented and easy to read."
},
{
- "objectID": "omics/omics.html#omics-1-hello-data",
- "href": "omics/omics.html#omics-1-hello-data",
- "title": "Omics Data Analysis for Group Project",
- "section": "",
- "text": "This week you will meet your data. The independent study will concisely cover how these data were generated and how they have been processed before being given to you. There will also be an overview of the analysis we will carry out over three workshops. In the workshop, you will learn what steps to take to get a good understanding of ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control."
+ "objectID": "omics/week-5/study_before_workshop.html#mice",
+ "href": "omics/week-5/study_before_workshop.html#mice",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🐭 Mice",
+ "text": "🐭 Mice\n\n\nAn RStudio Project called mice-88H which contains\n\nRaw data (hspc, prog, lthsc)\nProcessed data (hspc_summary_gene.csv, hspc_summary_samp.csv, prog_summary_gene.csv, prog_summary_samp.csv, lthsc_summary_gene.csv, lthsc_summary_samp.csv)\n\n\nResults files (prog_hspc_results.csv and an equivalent for lthsc vs prog or hspc vs lthsc)\nTwo scripts called hspc-prog.R and either hspc-lthsc.R OR prog-lthsc.R\n\n\n\nFiles should be organised into folders. Code should well commented and easy to read."
},
{
- "objectID": "omics/omics.html#omics-2-statistical-analysis",
- "href": "omics/omics.html#omics-2-statistical-analysis",
- "title": "Omics Data Analysis for Group Project",
- "section": "",
- "text": "This week we cover differential expression analysis on raw counts or log normalised values. The independent study will allow you to check you have what you should have following the Omics 1: Hello Data workshop and Consolidation study. It will also summarise the concepts and methods we will use in the workshop. In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both."
+ "objectID": "omics/week-5/study_before_workshop.html#section",
+ "href": "omics/week-5/study_before_workshop.html#section",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🍂",
+ "text": "🍂\nEither of the other examples."
},
{
- "objectID": "omics/omics.html#omics-3-visualising-and-interpreting",
- "href": "omics/omics.html#omics-3-visualising-and-interpreting",
- "title": "Omics Data Analysis for Group Project",
- "section": "",
- "text": "before\n\nrecap what we have\nPCA\nvolcano plot described\nGO terms\n\n\nworkshop\n\nPCA\nvolcano plot\nannotating with go terms\n\nafter\n\ndocument what you have done\nrepeat on another comparison\n\nReferences"
+ "objectID": "omics/week-5/study_before_workshop.html#if-you-do-not-have-those",
+ "href": "omics/week-5/study_before_workshop.html#if-you-do-not-have-those",
+ "title": "Independent Study to prepare for workshop",
+ "section": "If you do not have those",
+ "text": "If you do not have those\nGo through:\n\nOmics 2: Statistical Analysis including:\n🤗 Look after future you! and\nthe Independent Study to consolidate"
},
{
- "objectID": "omics/week-4/study_after_workshop.html",
- "href": "omics/week-4/study_after_workshop.html",
- "title": "Independent Study to consolidate this week",
- "section": "",
- "text": "You need only do the section for your own project data\n🐸 Frogs\n🎬 Open your frogs-88H Project and script you began in the Consolidation study last week. This is likely to be cont-fgf-s20.R or cont-fgf-s14.R. Use the differential expression analysis you did in the workshop (in cont-fgf-s30.R) as a template to continue your script.\n🐭 Mice\n🎬 Open your mice-88H Project. Make a new script and, using hspc-prog.R as a template, repeat the analysis on a different comparisons.\n🍂 xxxx\n🎬 Follow one of the other examples"
+ "objectID": "omics/week-5/study_before_workshop.html#examine-the-results-files-1",
+ "href": "omics/week-5/study_before_workshop.html#examine-the-results-files-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Examine the results files",
+ "text": "Examine the results files\nRemind yourself of the key columns you have in the results files:\n\na log2 fold change\nan unadjusted p-value\na p value adjusted for multiple testing (FDR or padj)\na gene id"
},
{
- "objectID": "omics/week-4/overview.html",
- "href": "omics/week-4/overview.html",
- "title": "Overview",
- "section": "",
- "text": "This week we cover differential expression analysis on raw counts or log normalised values. The independent study will allow you to check you have what you should have following the Omics 1: Hello Data workshop and Consolidation study. It will also summarise the concepts and methods we will use in the workshop. In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both.\nWe suggest you sit together with your group in the workshop.\n\nLearning objectives\nThe successful student will be able to:\n\nverify they have the required RStudio Project set up and the data and code files from the previous Workshop and Consolidation study\nexplain the goal of differential expression analysis and the importance of normalisation\nexplain why and how the nature of the input values determines the analysis package used\ndescribe the metadata needed to carry out differential expression analysis and the statistical models used by DESeq2 and scran\nfind genes that are unexpressed or expressed in a a single cell type or treatment group\nperform differential expression analysis on raw counts using DESeq2 or on logged normalised expression values using scran or both.\nexplain the output of differential expression: log fold change, p-value, adjusted p-value,\n\n\n\nInstructions\n\nPrepare\n\n📖 Read what you should have so far and about concepts in differential expression analysis.\n\nWorkshop\n\n💻 Find unexpressed genes and those expressed in a single cell type or treatment group.\n💻 Set up the metadata for differential expression analysis.\n💻 Perform differential expression analysis on raw counts using DESeq2 or on logged normalised expression values using scran or both.\nLook after future you!\n\nConsolidate\n\n💻 Use the work you completed in the workshop as a template to apply to a new case.\n\n\n\n\n\n\n\n\n\n\nReferences\n\nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2” 15: 550. https://doi.org/10.1186/s13059-014-0550-8.\n\n\nLun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor” 5: 2122. https://doi.org/10.12688/f1000research.9501.2."
+ "objectID": "omics/week-5/study_before_workshop.html#frogs-1",
+ "href": "omics/week-5/study_before_workshop.html#frogs-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🐸 Frogs",
+ "text": "🐸 Frogs\n\n\nRows: 10,136\nColumns: 7\n$ baseMean <dbl> 237.553928, 531.565700, 86.392830, 49.813502, 419.9983…\n$ log2FoldChange <dbl> 0.096601855, -0.089588528, -0.192811203, -0.008858703,…\n$ lfcSE <dbl> 0.2079396, 0.1557384, 0.3253216, 0.4342614, 0.1685420,…\n$ stat <dbl> 0.46456683, -0.57525007, -0.59267874, -0.02039947, -0.…\n$ pvalue <dbl> 0.64224169, 0.56512218, 0.55339617, 0.98372471, 0.8699…\n$ padj <dbl> 0.9998970, 0.9998970, 0.9998970, 0.9998970, 0.9998970,…\n$ xenbase_gene_id <chr> \"XB-GENE-1000007\", \"XB-GENE-1000023\", \"XB-GENE-1000062…\n\n\n\n\n\nbaseMean is the mean of the normalised counts for the gene across all samples\n\nlfcSE standard error of the fold change\n\nstat is the test statistic (the Wald statistic)\nGenerated by DESeq2 (Love, Huber, and Anders 2014)"
},
{
- "objectID": "omics/week-5/study_after_workshop.html",
- "href": "omics/week-5/study_after_workshop.html",
- "title": "Independent Study to consolidate this week",
- "section": "",
- "text": "You need only do the section for one of the examples.\n🐸 Frogs\n🎬 Open your frogs-88H Project and script you began in the Consolidation study of Omics 1 and continued to work on in Omics 2. This is likely to be cont-fgf-s20.R or cont-fgf-s14.R. Use the code you used in the workshop (in cont-fgf-s30.R) as a template to visualise the s20/s14 results.\n🐭 Mice\n🎬 Open your mice-88H Project and the script you began in the Consolidation study of Omics 2. This is likely to be hspc-lthsc.R or lthsc-prog.R. Use the code you used in the workshop (in hspc-prog.R) as a template to visualise the hspc-lthsc/lthsc-prog results.\n🍂 xxxx\n🎬 Follow one of the other examples"
+ "objectID": "omics/week-5/study_before_workshop.html#mice-1",
+ "href": "omics/week-5/study_before_workshop.html#mice-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🐭 Mice",
+ "text": "🐭 Mice\n\n\nRows: 280\nColumns: 6\n$ Top <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…\n$ p.value <dbl> 7.038138e-117, 4.736622e-90, 1.832630e-88, 4.211954e-7…\n$ FDR <dbl> 1.970679e-114, 6.631271e-88, 1.710455e-86, 2.948368e-7…\n$ summary.logFC <dbl> 1.596910, 3.035165, 3.261056, -2.146491, -3.056730, 3.…\n$ logFC.hspc <dbl> 1.596910, 3.035165, 3.261056, -2.146491, -3.056730, 3.…\n$ ensembl_gene_id <chr> \"ENSMUSG00000028639\", \"ENSMUSG00000024053\", \"ENSMUSG00…\n\n\n\n\nTop is the rank of the gene ordered by the p-value (smallest first)\n\nsummary.logFC and logFC.hspc give the same value (in this case since comparing two cell types)\ngenerated by scran (Lun, McCarthy, and Marioni 2016)"
},
{
- "objectID": "omics/week-5/overview.html",
- "href": "omics/week-5/overview.html",
- "title": "Overview",
- "section": "",
- "text": "This week we cover how to visualise and interpret the results of your differential expression analysis. The independent study will allow you to check you have what you should have following the Omics 2: Statistical Analysis workshop and Consolidation study. It will also summarise the the methods and plots we will go through in the workshop. In the workshop, we will learn how to merge gene information into our results, conduct a Principle Component Analysis (PCA) and plot the results as well as how to create a nicely formatted Volcano plot and heatmap.\nWe suggest you sit together with your group in the workshop.\n\nLearning objectives\nThe successful student will be able to:\n\nverify they have the required RStudio Project set up and the data and code files from the previous Workshop and Consolidation study\nexplain where gene information came from and add it to their results\nperform a PCA and understand how to interpret them\ncreate a heatmap and understand how to interpret them\ncreate a volcano plot and understand how to interpret them\n\n\n\nInstructions\n\nPrepare\n\n📖 Read what you should have so far and about concepts in PCA, volcano plots and heatmaps.\n\nWorkshop\n\n💻 Add gene information to the results of DE\n💻 Perform and plot a PCA\n💻 Visualise results with a heatmap\n💻 Visualise all the results with a volcano plot\nLook after future you!\n\nConsolidate\n\n💻 Use the work you completed in the workshop as a template to apply to a new case.\n\n\n\n\nReferences"
+ "objectID": "omics/week-5/study_before_workshop.html#adding-gene-information-1",
+ "href": "omics/week-5/study_before_workshop.html#adding-gene-information-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Adding gene information",
+ "text": "Adding gene information\n\n\nThe gene id is difficult to interpret in plots/tables\nTherefore we need to add information such as the gene name and a description to the results\nFor the 🐸 Frog data information comes from Xenbase (Fisher et al. 2023)\nFor the 🐭 Mice data information comes from Ensembl (Birney et al. 2004)"
},
{
- "objectID": "omics/week-3/study_after_workshop.html",
- "href": "omics/week-3/study_after_workshop.html",
- "title": "Independent Study to consolidate this week",
- "section": "",
- "text": "You need only do the section for your own project data\n🐸 Frogs\n🎬 Open your frogs-88H Project. Make a new script and, using cont-fgf-s30.R as a template, repeat the analysis on one of the other comparisons.\n🐭 Mice\n🎬 Open your mice-88H Project. Open your hspc-prog.R script and, using you code working with the hspc cells as a template, repeat the analysis on the prog cells.\n🍂 xxxx\nFollow one of the other examples."
+ "objectID": "omics/week-5/study_before_workshop.html#xenbase",
+ "href": "omics/week-5/study_before_workshop.html#xenbase",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🐸 Xenbase",
+ "text": "🐸 Xenbase\n\nxenbase logoXenbase is a model organism database that provides genomic, molecular, and developmental biology information about Xenopus laevis and Xenopus tropicalis.\n\nIt took me some time to find the information you need."
},
{
- "objectID": "omics/week-3/overview.html",
- "href": "omics/week-3/overview.html",
- "title": "Overview",
- "section": "",
- "text": "This week you will meet your data. The independent study will concisely cover how these data were generated and how they have been processed before being given to you. There will also be an overview of the analysis we will carry out over three workshops. In the workshop, you will learn what steps to take to get a good understanding of ’omics data before you consider any statistical analysis. This is an often overlooked, but very valuable and informative, part of any data pipeline. It gives you the deep understanding of the data structures and values that you will need to code and trouble-shoot code, allows you to spot failed or problematic samples and informs your decisions on quality control.\nWe suggest you sit together with your group in the workshop.\n\nLearning objectives\nThe successful student will be able to:\n\nexplore ’omics data to find the number of rows and columns and know how these correspond to samples and variables\nexplore the distribution of expression measures across whole data sets, across variables and across samples by summarising and plotting\nexplain what distributions are expected and interpret the distributions they have\nexplain on what basis we might filter out variables or samples\nimport, explore and filter ’omics data reproducibly so they can understand and reuse their code in the future\n\n\n\nInstructions\n\nPrepare\n\n📖 Read how the data were generated and how they have been processed so far\n\nWorkshop\n\n💻 Set up a Project\n💻 Import data\n💻 Explore the distribution of values across samples/cells and across genes/species\n💻 Look after future you!\n\nConsolidate\n\n💻 Use the work you completed in the workshop as a template to apply to a new case."
+ "objectID": "omics/week-5/study_before_workshop.html#xenbase-1",
+ "href": "omics/week-5/study_before_workshop.html#xenbase-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "🐸 Xenbase",
+ "text": "🐸 Xenbase\n\n\nI got the information from the Xenbase information pages under Data Reports | Gene Information\nThis is listed: Xenbase Gene Product Information [readme] gzipped gpi (tab separated)\nClick on the readme link to see the file format and columns\nI downloaded xenbase.gpi.gz, unzipped it, removed header lines and the Xenopus tropicalis (taxon:8364) entries and saved it as xenbase_info.xlsx\nIn the workshop you will import this file and merge the information with the results file"
},
{
- "objectID": "core/week-6/study_before_workshop.html",
- "href": "core/week-6/study_before_workshop.html",
+ "objectID": "omics/week-5/study_before_workshop.html#ensembl",
+ "href": "omics/week-5/study_before_workshop.html#ensembl",
"title": "Independent Study to prepare for workshop",
- "section": "",
- "text": "📖 Read materials from Core 1 Organising reproducible data analyses and make a note of questions you have\n📖 Read materials from Core 2 File types, workflow tips and other tools and make a note of questions you have.\n📖 Review Stage 1 and 2 (88H students) or 52M (70M students) content to see if there are areas you might benefit from revisiting. You can access these through the past VLE sites but you might find it helpful to use the latest versions, particularly for stage 1.\n\nStage 1\n\nData Analysis in R for Becoming a Bioscientist 1.Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data.\nData Analysis in R for Becoming a Bioscientist 2. The logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one- and two-way analysis of variance (ANOVA).\n\nStage 2\n\nGet Introductory Statistical Tests as Linear models: A guide for R users\nA simple introduction to GLM for analysing Poisson and Binomial responses in R\n\n52M\n\n52M Data Analysis in R. Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data, the logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one-way analysis of variance (ANOVA) and reproducible reports in Quarto."
+ "section": "🐭 Ensembl",
+ "text": "🐭 Ensembl\n\n\nEnsembl creates, integrates and distributes reference datasets and analysis tools that enable genomics\nBioMart provides a access to these large datasets\nbiomaRt (Durinck et al. 2009) is a Bioconductor package gives you programmatic access to BioMart.\nIn the workshop you use this package to get information you can merge with the results file"
},
{
- "objectID": "core/week-6/workshop.html",
- "href": "core/week-6/workshop.html",
- "title": "Workshop",
- "section": "",
- "text": "Use this session to ask any questions about Core 1 Organising reproducible data analyses and Core 2 File types, workflow tips and other tools in particular, or about R and RStudio in general. We will also try to answer any questions about the ’mics, Image and Structure strands.\n88H students might also review Stage 1 and 2 content to see if there are areas you might benefit from revisiting. You can access these through the past VLE sites but you might find it helpful to use the latest versions because there is no 2FA and the resources are searchable.\nStage 1\n\nData Analysis in R for Becoming a Bioscientist 1.Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data.\nData Analysis in R for Becoming a Bioscientist 2. The logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one- and two-way analysis of variance (ANOVA).\n\nStage 2\n\nGet Introductory Statistical Tests as Linear models: A guide for R users\nA simple introduction to GLM for analysing Poisson and Binomial responses in R\n\n70M students might also review 52M content to see if there are areas you might benefit from revisiting. You can access these through the VLE site but you might find it helpful to use this link without 2FA.\n\n52M Data Analysis in R. Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data, the logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one-way analysis of variance (ANOVA) and reproducible reports in Quarto.\n\nPages made with R (R Core Team 2023), Quarto (Allaire et al. 2022), knitr (Xie 2022), kableExtra (Zhu 2021)"
+ "objectID": "omics/week-5/study_before_workshop.html#what-is-the-purpose-of-an-omics-plot",
+ "href": "omics/week-5/study_before_workshop.html#what-is-the-purpose-of-an-omics-plot",
+ "title": "Independent Study to prepare for workshop",
+ "section": "What is the purpose of an Omics plot?",
+ "text": "What is the purpose of an Omics plot?\n\n\nIn general, we plot data to help us summarise and understand it\nThis is especially import for omics data where we have a very large number of variables and often a large number of observations\nWe will look at three plots very commonly used in omics analysis: Principal Component Analysis (PCA) plot, Heatmaps and Volcano Plots"
},
{
- "objectID": "core/week-6/workshop.html#session-overview",
- "href": "core/week-6/workshop.html#session-overview",
- "title": "Workshop",
- "section": "",
- "text": "Use this session to ask any questions about Core 1 Organising reproducible data analyses and Core 2 File types, workflow tips and other tools in particular, or about R and RStudio in general. We will also try to answer any questions about the ’mics, Image and Structure strands.\n88H students might also review Stage 1 and 2 content to see if there are areas you might benefit from revisiting. You can access these through the past VLE sites but you might find it helpful to use the latest versions because there is no 2FA and the resources are searchable.\nStage 1\n\nData Analysis in R for Becoming a Bioscientist 1.Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data.\nData Analysis in R for Becoming a Bioscientist 2. The logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one- and two-way analysis of variance (ANOVA).\n\nStage 2\n\nGet Introductory Statistical Tests as Linear models: A guide for R users\nA simple introduction to GLM for analysing Poisson and Binomial responses in R\n\n70M students might also review 52M content to see if there are areas you might benefit from revisiting. You can access these through the VLE site but you might find it helpful to use this link without 2FA.\n\n52M Data Analysis in R. Core concepts about scientific computing, types of variable, the role of variables in analysis and how to use RStudio to organise analysis and import, summarise and plot data, the logic of hypothesis testing, confidence intervals, what is meant by a statistical model, two-sample tests and one-way analysis of variance (ANOVA) and reproducible reports in Quarto.\n\nPages made with R (R Core Team 2023), Quarto (Allaire et al. 2022), knitr (Xie 2022), kableExtra (Zhu 2021)"
+ "objectID": "omics/week-5/study_before_workshop.html#pca",
+ "href": "omics/week-5/study_before_workshop.html#pca",
+ "title": "Independent Study to prepare for workshop",
+ "section": "PCA",
+ "text": "PCA\n\n\nPrincipal Component Analysis is an unsupervised machine learning technique\nUnsupervised methods1 are unsupervised in that they do not use/optimise to a particular output. The goal is to uncover structure. They do not test hypotheses\nIt is often used to visualise high dimensional data because it is a dimension reduction technique\n\n\nYou may wish to read a previous introduction to unsupervised methods I have written An introduction to Machine Learning: Unsupervised methods (Rand 2021)"
},
{
- "objectID": "core/core.html",
- "href": "core/core.html",
- "title": "Core Data Analysis",
- "section": "",
- "text": "There are three workshops taken by everyone on BIO00088H. These are in weeks 1, 2 and 11. The first two cover some useful workflow tips and how to organise your analyses effectively so they are reproducible but you will also have the chance to revise material from stage 1 and 2. The third workshop covers Research Compendia and Reproducible Reporting. In week 6 there is a drop-in session where you can ask questions about the material covered in the first two workshops.\nStudents doing BIO00070M will do week 1 and 2 of the core workshops, then 3-5 of the Omics workshops. You can also attend the week 6 drop-in. You do not do the week 11 session because your assessment differs. However, you will learn about Reproducible reporting in BIO00052M in week 10 because your that applies to your 52M assessment.\nGood organisation is important because you will want to be able to set work aside for holidays and assessment periods and then restart easily. You will also be assessed on the organisation, reproducibility and transparency of your work.\n\n\nThis week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work.\n\n\n\nThis week we will consider File types, workflow tips and other tools. The independent study (~20 mins) reiterates the value of RStudio projects and shows you how you create them with usethis. You will also learn how to recognise and write cool 😎 code, not 😩 ugly code and code algorithmically. In the workshop we will examine some common biological data formats and discover some awesome short cuts to help you write cool 😎 code. You will also get a brief introduction to the command line and Google Colab.\n\n\n\nThis week there is a drop-in session where you can ask questions about the material particular covered in the first two workshops. However, we will also endeavour to answer questions about any of the material in the omics, images or structure strand.\n\n\n\nThis week we will cover the “Research compendium” and reproducible reporting which are part of the assessment. Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual. We will also cover reproducible reporting which means using literate programming to weave together code and text together in a single document. Quarto is a multi-language literate programming tool (very like R Markdown)."
+ "objectID": "omics/week-5/study_before_workshop.html#pca-1",
+ "href": "omics/week-5/study_before_workshop.html#pca-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "PCA",
+ "text": "PCA\n\n\nIt takes a large number of continuous variables (like gene expression) and reduces them to a smaller number of variables (called principal components) that explain most of the variation in the data\nThe principal components can be plotted to see how samples cluster together"
},
{
- "objectID": "core/core.html#week-1-core-1-organising-reproducible-data-analyses",
- "href": "core/core.html#week-1-core-1-organising-reproducible-data-analyses",
- "title": "Core Data Analysis",
- "section": "",
- "text": "This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work."
+ "objectID": "omics/week-5/study_before_workshop.html#pca-2",
+ "href": "omics/week-5/study_before_workshop.html#pca-2",
+ "title": "Independent Study to prepare for workshop",
+ "section": "PCA",
+ "text": "PCA\n\n\nTo see if samples cluster as we would expect, we might plot the expression of one gene against another\n\n\n\n\n\n\nSamples\n\n\n\n\n\nCells\n\n\n\n\nThis gives some insight but we have 280 (mice) or 10,000+(frogs) genes to consider. How do we know if the pair we use is typical? How can we consider al the genes at once?"
},
{
- "objectID": "core/core.html#week-2-core-2-file-types-workflow-tips-and-other-tools",
- "href": "core/core.html#week-2-core-2-file-types-workflow-tips-and-other-tools",
- "title": "Core Data Analysis",
- "section": "",
- "text": "This week we will consider File types, workflow tips and other tools. The independent study (~20 mins) reiterates the value of RStudio projects and shows you how you create them with usethis. You will also learn how to recognise and write cool 😎 code, not 😩 ugly code and code algorithmically. In the workshop we will examine some common biological data formats and discover some awesome short cuts to help you write cool 😎 code. You will also get a brief introduction to the command line and Google Colab."
+ "objectID": "omics/week-5/study_before_workshop.html#pca-3",
+ "href": "omics/week-5/study_before_workshop.html#pca-3",
+ "title": "Independent Study to prepare for workshop",
+ "section": "PCA",
+ "text": "PCA\n\n\nPCA is a solution for this - It takes a large number of continuous variables (like gene expression) and reduces them to a smaller number of “principal components” that explain most of the variation in the data.\n\n\n\n\n\n\nSamples\n\n\n\n\n\nCells"
},
{
- "objectID": "core/core.html#week-6-core-drop-in",
- "href": "core/core.html#week-6-core-drop-in",
- "title": "Core Data Analysis",
- "section": "",
- "text": "This week there is a drop-in session where you can ask questions about the material particular covered in the first two workshops. However, we will also endeavour to answer questions about any of the material in the omics, images or structure strand."
+ "objectID": "omics/week-5/study_before_workshop.html#pca-4",
+ "href": "omics/week-5/study_before_workshop.html#pca-4",
+ "title": "Independent Study to prepare for workshop",
+ "section": "PCA",
+ "text": "PCA\nWe have done PCA in Omics 3, but often PCA might be one of the first exploratory steps because it gives you an idea whether you expect general patterns in gene expression that distinguish groups."
},
{
- "objectID": "core/core.html#week-11-core-3-research-compendia-and-reproducible-reporting",
- "href": "core/core.html#week-11-core-3-research-compendia-and-reproducible-reporting",
- "title": "Core Data Analysis",
- "section": "",
- "text": "This week we will cover the “Research compendium” and reproducible reporting which are part of the assessment. Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual. We will also cover reproducible reporting which means using literate programming to weave together code and text together in a single document. Quarto is a multi-language literate programming tool (very like R Markdown)."
+ "objectID": "omics/week-5/study_before_workshop.html#heatmaps-1",
+ "href": "omics/week-5/study_before_workshop.html#heatmaps-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Heatmaps",
+ "text": "Heatmaps\n\n\nare a grid of genes on one axis and samples on the other with each grid cell coloured by another variable\nin this case the other variable is gene expression\nthey allow you to quickly get an overview of the expression patterns across genes and samples\nwe often couple them with clustering to group genes and samples with similar expression patterns together which helps us see which genes are responsible for distinguishing groups"
},
{
- "objectID": "core/week-11/study_after_workshop.html",
- "href": "core/week-11/study_after_workshop.html",
- "title": "Independent Study to consolidate this week",
+ "objectID": "omics/week-5/study_before_workshop.html#section-1",
+ "href": "omics/week-5/study_before_workshop.html#section-1",
+ "title": "Independent Study to prepare for workshop",
"section": "",
- "text": "💻 Just work on your project!"
+ "text": "Heat map for the frog data\n\nSee next slide for information"
},
{
- "objectID": "core/week-11/overview.html",
- "href": "core/week-11/overview.html",
- "title": "Overview",
- "section": "",
- "text": "This week we will cover the “Research compendium” and reproducible reporting which are part of the assessment. Research Compendium that is a documented collection of all the digital parts of the research project including data (or access to data), code and outputs. The Compendium might be a single Quarto/RStudio Project, or it might be a folder including an Quarto/RStudio Project and some additional materials including the description of unscripted processing. The collection is organised and documented in such a way that reproducing all the results is straightforward for another individual. We will also cover reproducible reporting which means using literate programming to weave together code and text together in a single document. Quarto is a multi-language literate programming tool (very like R Markdown).\n\nLearning objectives\nThe successful student will be able to:\n\nexplain what a research compendium is and describe its components\nrelate the content and concepts in Core 1 and Core 2 to the research compendium\nCreate a quarto document and:\n\nappreciate the role of the YAML header\nformat text as bold, italics, headings etc\nadd citations and a bibliography\ncreate automatically numbered figures and tables with cross references in text\nset default code chunk behaviour and those for individual chunks\nuse inline code to report results\ninsert special characters and mathematical expressions with LaTeX\n\n\n\n\nInstructions\n\nPrepare\nWorkshop\nConsolidate by working on your project and research compendium"
+ "objectID": "omics/week-5/study_before_workshop.html#heatmaps-2",
+ "href": "omics/week-5/study_before_workshop.html#heatmaps-2",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Heatmaps",
+ "text": "Heatmaps\n\n\nOn the vertical axis are genes which are differentially expressed at the 0.01 level\nOn the horizontal axis are samples\nWe can see that the FGF-treated samples cluster together and the control samples cluster together\nWe can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples and the other shows genes downregulated (more blue) in the FGF-treated samples"
},
{
- "objectID": "core/week-1/study_after_workshop.html",
- "href": "core/week-1/study_after_workshop.html",
- "title": "Independent Study to consolidate this week",
- "section": "",
- "text": "These are suggestions"
+ "objectID": "omics/week-5/study_before_workshop.html#volcano-plots-1",
+ "href": "omics/week-5/study_before_workshop.html#volcano-plots-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Volcano plots",
+ "text": "Volcano plots\n\n\nVolcano plots often used to visualise the results of differential expression analysis\nThey are just a scatter of the corrected p value against the fold change….\nalmost - the we actually plot the negative log of the corrected p value against the fold change"
},
{
- "objectID": "core/week-1/study_after_workshop.html#bio00088h-group-research-project-students",
- "href": "core/week-1/study_after_workshop.html#bio00088h-group-research-project-students",
- "title": "Independent Study to consolidate this week",
- "section": "BIO00088H Group Research Project students",
- "text": "BIO00088H Group Research Project students\n\nRevise previous Data Analysis materials. You can find the version you took on the VLE site for 17C / 08C. However, my latest versions (in development) are here: Data Analysis in R. The Becoming a Bioscientist (BABS) modules replace the Laboratory and Professional Skills modules. BABS1 and BABS2 are stage one, and I’ve tried to improve them over 17C / 08C. The site is also searchable (icon top right)"
+ "objectID": "omics/week-5/study_before_workshop.html#volcano-plots-2",
+ "href": "omics/week-5/study_before_workshop.html#volcano-plots-2",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Volcano plots",
+ "text": "Volcano plots\n\n\nThis is because just plotting the p-value means the axis is counter intuitive. Small p-values (i.e., significant values) are at the bottom of the axis)\nAnd since p-values range from 1 to very tiny the points are all squashed at the bottom of the axis\n\n\n\nVolcano plot FDR against fold change"
},
{
- "objectID": "core/week-1/study_after_workshop.html#msc-bioinformatics-students-doing-bio00070m",
- "href": "core/week-1/study_after_workshop.html#msc-bioinformatics-students-doing-bio00070m",
+ "objectID": "omics/week-5/study_before_workshop.html#volcano-plots-3",
+ "href": "omics/week-5/study_before_workshop.html#volcano-plots-3",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Volcano plots",
+ "text": "Volcano plots\n\n\nPlotting the negative log of the corrected p-value means that the values are spread out and the significant values are at the top of the axis\n\n\n\nVolcano plot -log(FDR) against fold change"
+ },
+ {
+ "objectID": "omics/week-5/study_before_workshop.html#visualisations",
+ "href": "omics/week-5/study_before_workshop.html#visualisations",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Visualisations",
+ "text": "Visualisations\n\nShould be done on normalised data so meaningful comparisons can be made\nThe 🐭 mouse data were already log2normalised\nThe 🐸 frog data were normalised by the DE method and saved to file. We will log2 transform before doing visualisations"
+ },
+ {
+ "objectID": "omics/week-5/study_before_workshop.html#packages-to-install-before-the-workshop",
+ "href": "omics/week-5/study_before_workshop.html#packages-to-install-before-the-workshop",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Packages to install before the workshop",
+ "text": "Packages to install before the workshop\nheatmaply (Galili et al. 2017) and ggrepel (Slowikowski 2023) from CRAN in the the normal way:\n\ninstall.packages(\"heatmaply\")\ninstall.packages(\"ggrepel\")\n\nbiomaRt (Durinck et al. 2009) from Bioconductor using BiocManager (Morgan and Ramos 2023)\n\nBiocManager::install(\"biomaRt\")"
+ },
+ {
+ "objectID": "omics/week-5/study_before_workshop.html#workshops-1",
+ "href": "omics/week-5/study_before_workshop.html#workshops-1",
+ "title": "Independent Study to prepare for workshop",
+ "section": "Workshops",
+ "text": "Workshops\n\nOmics 1: Hello data Getting to know the data. Checking the distributions of values\nOmics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments.\nOmics 3: Visualising and Interpreting. PCA, Volcano plots and heatmaps to visualise results. Interpreting the results and finding out more about genes of interest."
+ },
+ {
+ "objectID": "omics/week-5/study_before_workshop.html#references",
+ "href": "omics/week-5/study_before_workshop.html#references",
+ "title": "Independent Study to prepare for workshop",
+ "section": "References",
+ "text": "References\n\n\n🔗 About Omics 3: Visualising and Interpreting\n\n\n\nBirney, Ewan, T. Daniel Andrews, Paul Bevan, Mario Caccamo, Yuan Chen, Laura Clarke, Guy Coates, et al. 2004. “An Overview of Ensembl.” Genome Research 14 (5): 925–28. https://doi.org/10.1101/gr.1860604.\n\n\nDurinck, Steffen, Paul T. Spellman, Ewan Birney, and Wolfgang Huber. 2009. “Mapping Identifiers for the Integration of Genomic Datasets with the r/Bioconductor Package biomaRt” 4.\n\n\nFisher, Malcolm, Christina James-Zorn, Virgilio Ponferrada, Andrew J Bell, Nivitha Sundararaj, Erik Segerdell, Praneet Chaturvedi, et al. 2023. “Xenbase: Key Features and Resources of the Xenopus Model Organism Knowledgebase.” Genetics 224 (1): iyad018. https://doi.org/10.1093/genetics/iyad018.\n\n\nGalili, Tal, O’Callaghan, Alan, Sidi, Jonathan, Sievert, and Carson. 2017. “Heatmaply: An r Package for Creating Interactive Cluster Heatmaps for Online Publishing.” Bioinformatics. https://doi.org/10.1093/bioinformatics/btx657.\n\n\nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2” 15: 550. https://doi.org/10.1186/s13059-014-0550-8.\n\n\nLun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor” 5: 2122. https://doi.org/10.12688/f1000research.9501.2.\n\n\nMorgan, Martin, and Marcel Ramos. 2023. BiocManager: Access the Bioconductor Project Package Repository. https://bioconductor.github.io/BiocManager/.\n\n\nRand, Emma. 2021. Data Science Strand of BIO00058M. https://doi.org/10.5281/zenodo.5527705.\n\n\nSlowikowski, Kamil. 2023. Ggrepel: Automatically Position Non-Overlapping Text Labels with ’Ggplot2’. https://github.com/slowkow/ggrepel."
+ },
+ {
+ "objectID": "omics/week-5/study_after_workshop.html",
+ "href": "omics/week-5/study_after_workshop.html",
"title": "Independent Study to consolidate this week",
- "section": "MSc Bioinformatics students doing BIO00070M",
- "text": "MSc Bioinformatics students doing BIO00070M\n\nMake sure you carry out the preparatory work for week 2 of 52M"
+ "section": "",
+ "text": "You need only do the section for one of the examples.\n🐸 Frogs\n🎬 Open your frogs-88H Project and script you began in the Consolidation study of Omics 1 and continued to work on in Omics 2. This is likely to be cont-fgf-s20.R or cont-fgf-s14.R. Use the code you used in the workshop (in cont-fgf-s30.R) as a template to visualise the s20/s14 results.\n🐭 Mice\n🎬 Open your mice-88H Project and the script you began in the Consolidation study of Omics 2. This is likely to be hspc-lthsc.R or lthsc-prog.R. Use the code you used in the workshop (in hspc-prog.R) as a template to visualise the hspc-lthsc/lthsc-prog results.\n🍂 xxxx\n🎬 Follow one of the other examples"
},
{
- "objectID": "core/week-1/overview.html",
- "href": "core/week-1/overview.html",
+ "objectID": "omics/week-4/overview.html",
+ "href": "omics/week-4/overview.html",
"title": "Overview",
"section": "",
- "text": "This week you will revise some essential concepts for scientific computing: file system organisation, file types, working directories and paths. The workshop will cover a rationale for working reproducibly, project oriented workflow, naming things and documenting your work. We will also examine some file types and the concept of tidy data.\n\nLearning objectives\nThe successful student will be able to:\n\nexplain the organisation of files and directories in a file systems including root, home and working directories\nexplain absolute and relative file paths\nexplain why working reproducibly is important\nknow how to use a project-oriented workflow to organise work\nbe able to give files human- and machine-readable names\noutline some common biological data file formats\n\n\n\nInstructions\n\nPrepare\n\n📖 Read Understanding file systems\n\nWorkshop\nConsolidate"
+ "text": "This week we cover differential expression analysis on raw counts or log normalised values. The independent study will allow you to check you have what you should have following the Omics 1: Hello Data workshop and Consolidation study. It will also summarise the concepts and methods we will use in the workshop. In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both.\nWe suggest you sit together with your group in the workshop.\n\nLearning objectives\nThe successful student will be able to:\n\nverify they have the required RStudio Project set up and the data and code files from the previous Workshop and Consolidation study\nexplain the goal of differential expression analysis and the importance of normalisation\nexplain why and how the nature of the input values determines the analysis package used\ndescribe the metadata needed to carry out differential expression analysis and the statistical models used by DESeq2 and scran\nfind genes that are unexpressed or expressed in a a single cell type or treatment group\nperform differential expression analysis on raw counts using DESeq2 or on logged normalised expression values using scran or both.\nexplain the output of differential expression: log fold change, p-value, adjusted p-value,\n\n\n\nInstructions\n\nPrepare\n\n📖 Read what you should have so far and about concepts in differential expression analysis.\n\nWorkshop\n\n💻 Find unexpressed genes and those expressed in a single cell type or treatment group.\n💻 Set up the metadata for differential expression analysis.\n💻 Perform differential expression analysis on raw counts using DESeq2 or on logged normalised expression values using scran or both.\nLook after future you!\n\nConsolidate\n\n💻 Use the work you completed in the workshop as a template to apply to a new case.\n\n\n\n\n\n\n\n\n\n\nReferences\n\nLove, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2” 15: 550. https://doi.org/10.1186/s13059-014-0550-8.\n\n\nLun, Aaron T. L., Davis J. McCarthy, and John C. Marioni. 2016. “A Step-by-Step Workflow for Low-Level Analysis of Single-Cell RNA-Seq Data with Bioconductor” 5: 2122. https://doi.org/10.12688/f1000research.9501.2."
},
{
- "objectID": "core/week-2/study_after_workshop.html",
- "href": "core/week-2/study_after_workshop.html",
- "title": "Independent Study to consolidate this week",
+ "objectID": "omics/week-4/workshop.html",
+ "href": "omics/week-4/workshop.html",
+ "title": "Workshop",
"section": "",
- "text": "bbbb"
+ "text": "In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both."
},
{
- "objectID": "core/week-2/overview.html",
- "href": "core/week-2/overview.html",
- "title": "Overview",
+ "objectID": "omics/week-4/workshop.html#session-overview",
+ "href": "omics/week-4/workshop.html#session-overview",
+ "title": "Workshop",
"section": "",
- "text": "This week we will consider File types, workflow tips and other tools. The independent study reiterates the value of RStudio projects and shows you how you create them with usethis. You will also learn how to recognise and write cool 😎 code, not 😩 ugly code and code algorithmically. In the workshop we will examine some common biological data formats and discover some awesome short cuts to help you write cool 😎 code. You will also get a brief introduction to the command line and Google Colab.\n\nLearning objectives\nThe successful student will be able to:\n\nexplain why RStudio are useful/essential and be able to use the usethis package\nwrite cool 😎 code not 😩 ugly code\nexplain the value of code which expresses the structure of the problem/solution.\ndescribe some common file types for biological data\nuse some useful shortcuts to help write cool 😎 code\nknow what the command line is and how to use it for simple tasks\nuse Google colab to run code\nrecognise some of the differences between R and Python\n\n\n\nInstructions\n\nPrepare 20 mins reading on RStudio Projects revisited, formatting code and coding algorithmically\nWorkshop\n\n💬 Types of biological data files\n🪄 Workflow tips and shortcuts\n💻 The command line\n💻 Google colab\n💻 Python\n\nConsolidate\n\n💻 not sure yet :)"
+ "text": "In the workshop, you will learn how to perform differential expression analysis on raw counts using DESeq2 (Love, Huber, and Anders 2014) or on logged normalised expression values using scran (Lun, McCarthy, and Marioni 2016) or both."
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#import",
+ "href": "omics/week-4/workshop.html#import",
+ "title": "Workshop",
+ "section": "Import",
+ "text": "Import\nWe need to import the S30 data that were filtered to remove genes with 4, 5 or 6 zeros and those where the total counts was less than 20.\n🎬 Import the data from the data-processed folder."
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#genes-expressed-in-one-treatment",
+ "href": "omics/week-4/workshop.html#genes-expressed-in-one-treatment",
+ "title": "Workshop",
+ "section": "Genes expressed in one treatment",
+ "text": "Genes expressed in one treatment\nThe genes expressed in only one treatment group are those with zeros in all three replicates in one group and non-zero values in all three replicates in the other group. For example, those shown here:\n\n\n# A tibble: 3 × 7\n xenbase_gene_id S30_C_5 S30_C_6 S30_C_A S30_F_5 S30_F_6 S30_F_A\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n1 XB-GENE-1018260 0 0 0 10 2 16\n2 XB-GENE-17330117 0 0 0 13 4 17\n3 XB-GENE-17332184 0 0 0 6 19 6\n\n\nWe will use filter() to find these genes.\n🎬 Find the genes that are expressed only in the FGF-treated group:\n\ns30_fgf_only <- s30_filtered |> \n filter(S30_C_5 == 0, \n S30_C_6 == 0, \n S30_C_A == 0, \n S30_F_5 > 0, \n S30_F_6 > 0, \n S30_F_A > 0)\n\n❓ How many genes are expressed only in the FGF-treated group?\n\n\n🎬 Now you find any genes that are expressed only in the control group.\n❓ Do the results make sense to you in light of what you know about the biology?\n\n\n\n\n\n\n\n🎬 Write to file (saved in results) all the genes that are expressed one group only."
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#create-deseqdataset-object",
+ "href": "omics/week-4/workshop.html#create-deseqdataset-object",
+ "title": "Workshop",
+ "section": "Create DESeqDataSet object",
+ "text": "Create DESeqDataSet object\n🎬 Load the DESeq2 package:\nA DEseqDataSet object is a custom data type that is used by DESeq2. Custom data types are common in the Bioconductor1 packages. They are used to store data in a way that is useful for the analysis. These data types typically have data, transformed data, metadata and experimental designs within them.\nTo create a DESeqDataSet object, we need to provide three things:\n\nThe raw counts - these are what we imported into s30_filtered\n\nThe meta data which gives information about the samples and which treatment groups they belong to\nA design matrix which captures the design of the statistical model.\n\nThe counts must in a matrix rather than a dataframe. Unlike a dataframe, a matrix has columns of all the same type. That is, it will contain only the counts. The gene ids are given as row names rather than a column. The matrix() function will create a matrix from a dataframe of columns of the same type and the select() function can be used to remove the gene ids column.\n🎬 Create a matrix of the counts:\n\ns30_count_mat <- s30_filtered |>\n select(-xenbase_gene_id) |>\n as.matrix()\n\n🎬 Add the gene ids as row names to the matrix:\n\n# add the row names to the matrix\nrownames(s30_count_mat) <- s30_filtered$xenbase_gene_id\n\nYou might want to view the matrix.\nThe metadata are in a file, frog_meta_data.txt. This is a tab-delimited file. The first column is the sample name and the second column is the treatment group.\n🎬 Make a folder called meta and save the file to it.\n🎬 Read the metadata into a dataframe:\n\nmeta <- read_table(\"meta/frog_meta_data.txt\")\n\n🎬 Examine the resulting dataframe.\nWe need to add the sample names as row names to the metadata dataframe. This is because the DESeqDataSet object will use the row names to match the samples in the metadata to the samples in the counts matrix.\n🎬 Add the sample names as row names to the metadata dataframe:\n\nrow.names(meta) <- meta$sample_id\n\n(you will get a warning message but you can ignore it)\nWe are dealing only with the S30 data so we need to remove the samples that are not in the S30 data.\n🎬 Filter the metadata to keep only the S30 information:\n\nmeta_S30 <- meta |>\n dplyr::filter(stage == \"stage_30\")\n\n\n\n# A tibble: 6 × 4\n sample_id stage treatment sibling_rep\n* <chr> <chr> <chr> <chr> \n1 S30_C_5 stage_30 control five \n2 S30_C_6 stage_30 control six \n3 S30_C_A stage_30 control A \n4 S30_F_5 stage_30 FGF five \n5 S30_F_6 stage_30 FGF six \n6 S30_F_A stage_30 FGF A \n\n\nWe can now create the DESeqDataSet object. The design formula describes the statistical model You should notice that it is the same sort of formula we used in t.test(), lm(),glm() etc. The ~ indicates that the left hand side is the response variable (in this case counts) and the right hand side are the explanatory variables. We are interested in the difference between the treatments but we include sibling_rep to account for the fact that the data are paired. The names of the columns in the count matrix have to match the names in the metadata dataframe and the names of the explanatory variables in the design formula have to match the names of columns in the metadata.\n🎬 Create the DESeqDataSet object:\n\ndds <- DESeqDataSetFromMatrix(countData = s30_count_mat,\n colData = meta_S30,\n design = ~ treatment + sibling_rep)\n\nThe warning “Warning: some variables in design formula are characters, converting to factors” just means that the variable type of treatment and sibling_rep in the metadata dataframe are char. This is not a as DESeqDataSetFromMatrix() has made them into the factors it needs.\n🎬 Examine the DESeqDataSet object.\nThe counts are in dds@assays@data@listData[[\"counts\"]] and the metadata are in dds@colData but the easiest way to see them is to use the counts() and colData() functions from the DESeq2 package.\n🎬 View the counts:\n\ncounts(dds) |> View()\n\nError in .External2(C_dataviewer, x, title): unable to start data viewer\n\n\nYou should be able to see that this is the same as in s30_count_mat.\n\ncolData(dds)\n\nDataFrame with 6 rows and 4 columns\n sample_id stage treatment sibling_rep\n <character> <character> <factor> <factor>\nS30_C_5 S30_C_5 stage_30 control five\nS30_C_6 S30_C_6 stage_30 control six \nS30_C_A S30_C_A stage_30 control A \nS30_F_5 S30_F_5 stage_30 FGF five\nS30_F_6 S30_F_6 stage_30 FGF six \nS30_F_A S30_F_A stage_30 FGF A"
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#prepare-the-normalised-counts",
+ "href": "omics/week-4/workshop.html#prepare-the-normalised-counts",
+ "title": "Workshop",
+ "section": "Prepare the normalised counts",
+ "text": "Prepare the normalised counts\nThe normalised counts are the counts that have been transformed to account for the library size (i.e., the total number of reads in a sample) and the gene length. We have to first estimate the normalisation factors and store them in the DESeqDataSet object and then we can get the normalised counts.\n🎬 Estimate the factors for normalisation and store them in the DESeqDataSet object:\n\ndds <- estimateSizeFactors(dds)\n\n🎬 Look at the factors (just for information):\n\nsizeFactors(dds)\n\n S30_C_5 S30_C_6 S30_C_A S30_F_5 S30_F_6 S30_F_A \n0.8812200 0.9454600 1.2989886 1.0881870 1.0518961 0.8322894 \n\n\nTo get the normalised counts we again used the counts() function but this time we use the normalized=TRUE argument.\n🎬 Save the normalised to a matrix:\n\nnormalised_counts <- counts(dds, normalized = TRUE)\n\nWe will write the normalised counts to a file so that we can use them in the future.\n🎬 Make a dataframe of the normalised counts, add a column for the gene ids and write to file:\n\ndata.frame(normalised_counts,\n xenbase_gene_id = row.names(normalised_counts)) |>\n write_csv(file = \"results/S30_normalised_counts.csv\")"
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#differential-expression-analysis",
+ "href": "omics/week-4/workshop.html#differential-expression-analysis",
+ "title": "Workshop",
+ "section": "Differential expression analysis",
+ "text": "Differential expression analysis\nWe used the DESeq() function to do the differential expression analysis. This function fits the statistical model to the data and then uses the model to calculate the significance of the difference between the treatments. It again stored the results in the DESseqDataSet object. Note that the differential expression needs the raw (unnormalised counts) as it does its own normalisation as part of the process.\n🎬 Run the differential expression analysis:\n\ndds <- DESeq(dds)\n\nThe function will take only a few moments to run on this data but can take longer for bigger datasets.\nWe need to define the contrasts we want to test. We want to test the difference between the treatments so we will define the contrast as FGF and control.\n🎬 Define the contrast:\n\ncontrast_fgf <- c(\"treatment\", \"FGF\", \"control\")\n\nNote that treatment is the name of the column in the metadata dataframe and FGF and control are the names of the levels in the treatment column. By putting them in the order FGF , control we are saying the fold change will be FGF / control. If we had put them in the order control, FGF we would have got the fold change as control / FGF. This means:\n\npositive log fold changes indicate FGF > control and\nnegative log fold changes indicates control > FGF.\n\n🎬 Extract the results from the DESseqDataSet object:\n\nresults_fgf <- results(dds,\n contrast = contrast_fgf)\n\nThis will give us the log2 fold change and p-value for the contrast.\n🎬 Save the results to a file:\n\ndata.frame(results_fgf,\n xenbase_gene_id = row.names(results_fgf)) |> \n write_csv(file = \"results/S30_results.csv\")"
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#import-1",
+ "href": "omics/week-4/workshop.html#import-1",
+ "title": "Workshop",
+ "section": "Import",
+ "text": "Import\n🎬 Import surfaceome_hspc.csv and surfaceome_prog.csv into dataframes called hspc and prog respectively."
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#combine-the-two-datasets",
+ "href": "omics/week-4/workshop.html#combine-the-two-datasets",
+ "title": "Workshop",
+ "section": "Combine the two datasets",
+ "text": "Combine the two datasets\nWe need to combine the two datasets of 701 and 798 cells into one dataset of 1499 cells, i.e., 1499 columns. The number of rows is the number of genes, 280. Before combining, we must make sure genes in the same order in both dataframes or we would be comparing the expression of one gene in one cell type to the expression of a different gene in the other cell type!\n🎬 Check the gene ids are in the same order in both dataframes:\n\nidentical(prog$ensembl_gene_id, hspc$ensembl_gene_id)\n\n[1] TRUE\n\n\nscran can use a matrix or a dataframe of counts but theses must be log normalised counts. If using a dataframe, the columns must only contain the expression values (not the gene ids).\n🎬 Combine the two dataframes (minus the gene ids) into one dataframe called prog_hspc:\n\nprog_hspc <- bind_cols(prog[-1], hspc[-1])\n\n🎬 Now add the gene ids as the row names:\n\nrow.names(prog_hspc) <- prog$ensembl_gene_id"
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#filter-to-remove-unexpressed-genes",
+ "href": "omics/week-4/workshop.html#filter-to-remove-unexpressed-genes",
+ "title": "Workshop",
+ "section": "Filter to remove unexpressed genes",
+ "text": "Filter to remove unexpressed genes\nIn this dataset, we will not see and genes that are not expressed in any of the cells because we are using a specific subset of the transcriptome that was deliberately selected. However, we will go through how to do this because it is an important step in most analyses.\nFor the 🐸 frog data you should remember that we were able to filter out our unexpressed genes in Omics 1 because we were examining both groups to be compared. In that workshop, we discussed that we could not filter out unexpressed genes in the 🐭 mouse data because we only had one cell types at that time. During the Consolidate Independent Study you examined the hspc cells.\nWhere the sum of all the values in the rows is zero, all the entries must be zero. We can use this to find the filter the genes that are not expressed in any of the cells. To do row wise aggregates such as the sum across rows we can use the rowwise() function. c_across() allows us to use the colon notation Prog_001:HSPC_852 in sum() rather than having to list all the column names: sum(Prog_001, Prog_002, Prog_002, Prog_004,.....)\n🎬 Find the genes that are 0 in every column of the prog_hspc dataframe:\n\nprog_hspc |> \n rowwise() |> \n filter(sum(c_across(Prog_001:HSPC_852)) == 0)\n\n# A tibble: 0 × 1,499\n# Rowwise: \n# ℹ 1,499 variables: Prog_001 <dbl>, Prog_002 <dbl>, Prog_003 <dbl>,\n# Prog_004 <dbl>, Prog_006 <dbl>, Prog_007 <dbl>, Prog_008 <dbl>,\n# Prog_009 <dbl>, Prog_010 <dbl>, Prog_011 <dbl>, Prog_012 <dbl>,\n# Prog_013 <dbl>, Prog_014 <dbl>, Prog_015 <dbl>, Prog_016 <dbl>,\n# Prog_017 <dbl>, Prog_018 <dbl>, Prog_019 <dbl>, Prog_020 <dbl>,\n# Prog_021 <dbl>, Prog_022 <dbl>, Prog_023 <dbl>, Prog_024 <dbl>,\n# Prog_025 <dbl>, Prog_026 <dbl>, Prog_027 <dbl>, Prog_028 <dbl>, …\n\n\nNotice that we have summed across all the columns.\n❓ What do you conclude?\n\n\nWe might also examine the genes which are least expressed.\n🎬 Find ten least expressed genes:\n\nrowSums(prog_hspc) |> sort() |> head(10)\n\nENSMUSG00000041046 ENSMUSG00000012428 ENSMUSG00000022225 ENSMUSG00000027863 \n 30.70322 35.35796 50.45975 61.27461 \nENSMUSG00000019359 ENSMUSG00000020701 ENSMUSG00000030772 ENSMUSG00000027376 \n 68.90961 77.95594 84.11234 97.69333 \nENSMUSG00000023132 ENSMUSG00000026285 \n 120.43065 126.95425 \n\n\n❓ What do you conclude?"
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#find-the-genes-that-are-expressed-in-only-one-cell-type",
+ "href": "omics/week-4/workshop.html#find-the-genes-that-are-expressed-in-only-one-cell-type",
+ "title": "Workshop",
+ "section": "Find the genes that are expressed in only one cell type",
+ "text": "Find the genes that are expressed in only one cell type\nTo find the genes that are expressed in only one cell type, we can use the same approach as above but only sum the columns for one cell type.\n🎬 Find the genes that are 0 in every column for the HSPC cells:\n\nprog_hspc |> \n rowwise() |> \n filter(sum(c_across(HSPC_001:HSPC_852)) == 0)\n\n# A tibble: 0 × 1,499\n# Rowwise: \n# ℹ 1,499 variables: Prog_001 <dbl>, Prog_002 <dbl>, Prog_003 <dbl>,\n# Prog_004 <dbl>, Prog_006 <dbl>, Prog_007 <dbl>, Prog_008 <dbl>,\n# Prog_009 <dbl>, Prog_010 <dbl>, Prog_011 <dbl>, Prog_012 <dbl>,\n# Prog_013 <dbl>, Prog_014 <dbl>, Prog_015 <dbl>, Prog_016 <dbl>,\n# Prog_017 <dbl>, Prog_018 <dbl>, Prog_019 <dbl>, Prog_020 <dbl>,\n# Prog_021 <dbl>, Prog_022 <dbl>, Prog_023 <dbl>, Prog_024 <dbl>,\n# Prog_025 <dbl>, Prog_026 <dbl>, Prog_027 <dbl>, Prog_028 <dbl>, …\n\n\nWe have summed across the HSPC cells only. Note that if we knew there were some rows that were all zero across both cell types, we would need to add |> filter(sum(c_across(Prog_001:Prog_852)) != 0)\nmeaning zero in all the HSPC but not zero in all the Prog\n🎬 Now you find the genes that are 0 in every column for the Prog cells:\n\n\n# A tibble: 0 × 1,499\n# Rowwise: \n# ℹ 1,499 variables: Prog_001 <dbl>, Prog_002 <dbl>, Prog_003 <dbl>,\n# Prog_004 <dbl>, Prog_006 <dbl>, Prog_007 <dbl>, Prog_008 <dbl>,\n# Prog_009 <dbl>, Prog_010 <dbl>, Prog_011 <dbl>, Prog_012 <dbl>,\n# Prog_013 <dbl>, Prog_014 <dbl>, Prog_015 <dbl>, Prog_016 <dbl>,\n# Prog_017 <dbl>, Prog_018 <dbl>, Prog_019 <dbl>, Prog_020 <dbl>,\n# Prog_021 <dbl>, Prog_022 <dbl>, Prog_023 <dbl>, Prog_024 <dbl>,\n# Prog_025 <dbl>, Prog_026 <dbl>, Prog_027 <dbl>, Prog_028 <dbl>, …\n\n\n❓ What do you conclude?"
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#differential-expression-analysis-1",
+ "href": "omics/week-4/workshop.html#differential-expression-analysis-1",
+ "title": "Workshop",
+ "section": "Differential expression analysis",
+ "text": "Differential expression analysis\nLike DESeq2, scran uses a statistical model to calculate the significance of the difference between the treatments and needs metadata to define the treatments.\n🎬 Load the scran package:\nThe meta data needed for the frog data was information about which columns were in which treatment group and which sibling group and we had that information in a file. Similarly, here we need information on which columns are from which cell type. Instead of having this is a file, we will create a vector that indicates which column belongs to which cell type.\n🎬 Create a vector that indicates which column belongs to which cell type:\n\ncell_type <- rep(c(\"prog\",\"hspc\"), \n times = c(length(prog) - 1,\n length(hspc) - 1))\n\nThe number of times each cell type is repeated is the number of columns in that cell type minus 1. This is because we have removed the column with the gene ids. Do check that the length of the cell_type vector is the same as the number of columns in the prog_hspc dataframe.\n🎬 Run the differential expression analysis:\n\nres_prog_hspc <- findMarkers(prog_hspc, \n cell_type)\n\nfindMarkers() is the function that runs the differential expression analysis. The first argument is the dataframe containing the data. The second argument is the vector indicating which columns are in which cell type. It gives us two dataframes of the results - rather unnecessarily. One is the results with fold changes that are Prog/HSPC and the other is the results with fold changes that are HSPC/Prog. These have the same magnitude, just a different sign\nThe dataframe res_prog_hspc$prog is log prog - log hspc (i.e.,Prog/HSPC). This means - Positive fold change: prog is higher than hspc - Negative fold change: hspc is higher than prog\nThe dataframe res_prog_hspc$hspc is log hspc - log prog (i.e., HSPC/Prog). . This means - Positive fold change: hspc is higher than prog - Negative fold change: prog is higher than hspc\n\n\n\nThe res_prog_hspc$prog dataframe\n\n\n\n\n\n\n\n\n\n\n\nTop\np.value\nFDR\nsummary.logFC\nlogFC.hspc\nensembl_gene_id\n\n\n\nENSMUSG00000028639\n1\n0\n0\n1.596910\n1.596910\nENSMUSG00000028639\n\n\nENSMUSG00000024053\n2\n0\n0\n3.035165\n3.035165\nENSMUSG00000024053\n\n\nENSMUSG00000041329\n3\n0\n0\n3.261056\n3.261056\nENSMUSG00000041329\n\n\nENSMUSG00000030336\n4\n0\n0\n-2.146491\n-2.146491\nENSMUSG00000030336\n\n\nENSMUSG00000016494\n5\n0\n0\n-3.056730\n-3.056730\nENSMUSG00000016494\n\n\nENSMUSG00000002808\n6\n0\n0\n3.000810\n3.000810\nENSMUSG00000002808\n\n\n\n\n\n\n\n\nThe res_prog_hspc$hspc dataframe. Notice the sign of the fold change is the other way\n\n\n\n\n\n\n\n\n\n\n\nTop\np.value\nFDR\nsummary.logFC\nlogFC.prog\nensembl_gene_id\n\n\n\nENSMUSG00000028639\n1\n0\n0\n-1.596910\n-1.596910\nENSMUSG00000028639\n\n\nENSMUSG00000024053\n2\n0\n0\n-3.035165\n-3.035165\nENSMUSG00000024053\n\n\nENSMUSG00000041329\n3\n0\n0\n-3.261056\n-3.261056\nENSMUSG00000041329\n\n\nENSMUSG00000030336\n4\n0\n0\n2.146491\n2.146491\nENSMUSG00000030336\n\n\nENSMUSG00000016494\n5\n0\n0\n3.056730\n3.056730\nENSMUSG00000016494\n\n\nENSMUSG00000002808\n6\n0\n0\n-3.000810\n-3.000810\nENSMUSG00000002808\n\n\n\n\n\n🎬 Write the results to file:\n\ndata.frame(res_prog_hspc$prog, \n ensembl_gene_id = row.names(res_prog_hspc$prog)) |> \n write_csv(\"results/prog_hspc_results.csv\")"
+ },
+ {
+ "objectID": "omics/week-4/workshop.html#footnotes",
+ "href": "omics/week-4/workshop.html#footnotes",
+ "title": "Workshop",
+ "section": "Footnotes",
+ "text": "Footnotes\n\nBioconductor is a project that develops and supports R packages for bioinformatics.↩︎"
}
]
\ No newline at end of file
diff --git a/structures/structures.html b/structures/structures.html
index 90e7404..bf75fb1 100644
--- a/structures/structures.html
+++ b/structures/structures.html
@@ -208,7 +208,7 @@