total 228
-drwxr-xr-x 2 runner docker 4096 Dec 5 12:12 data
-drwxr-xr-x 2 runner docker 4096 Dec 5 12:12 images
--rw-r--r-- 1 runner docker 1597 Dec 5 12:12 overview.qmd
--rw-r--r-- 1 runner docker 25553 Dec 5 12:16 study_after_workshop.html
--rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd
--rw-r--r-- 1 runner docker 70839 Dec 5 12:16 study_before_workshop.html
--rw-r--r-- 1 runner docker 4807 Dec 5 12:12 study_before_workshop.ipynb
--rw-r--r-- 1 runner docker 13029 Dec 5 12:12 study_before_workshop.qmd
--rw-r--r-- 1 runner docker 58063 Dec 5 12:12 workshop.html
--rw-r--r-- 1 runner docker 8550 Dec 5 12:12 workshop.qmd
--rw-r--r-- 1 runner docker 8564 Dec 5 12:16 workshop.rmarkdown
-drwxr-xr-x 3 runner docker 4096 Dec 5 12:12 workshop_files
+drwxr-xr-x 2 runner docker 4096 Dec 15 12:46 data
+drwxr-xr-x 2 runner docker 4096 Dec 15 12:46 images
+-rw-r--r-- 1 runner docker 27497 Dec 15 13:09 overview.html
+-rw-r--r-- 1 runner docker 1597 Dec 15 12:46 overview.qmd
+-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd
+-rw-r--r-- 1 runner docker 70988 Dec 15 13:09 study_before_workshop.html
+-rw-r--r-- 1 runner docker 4807 Dec 15 12:46 study_before_workshop.ipynb
+-rw-r--r-- 1 runner docker 13029 Dec 15 12:46 study_before_workshop.qmd
+-rw-r--r-- 1 runner docker 58063 Dec 15 12:46 workshop.html
+-rw-r--r-- 1 runner docker 8550 Dec 15 12:46 workshop.qmd
+-rw-r--r-- 1 runner docker 8564 Dec 15 13:09 workshop.rmarkdown
+drwxr-xr-x 3 runner docker 4096 Dec 15 12:46 workshop_files
You can use more than one option at once. The -h option stands for “human readable” and makes the file sizes easier to understand for humans:
@@ -512,18 +512,18 @@
Workshop
ls-hl
total 228K
-drwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 data
-drwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 images
--rw-r--r-- 1 runner docker 1.6K Dec 5 12:12 overview.qmd
--rw-r--r-- 1 runner docker 25K Dec 5 12:16 study_after_workshop.html
--rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd
--rw-r--r-- 1 runner docker 70K Dec 5 12:16 study_before_workshop.html
--rw-r--r-- 1 runner docker 4.7K Dec 5 12:12 study_before_workshop.ipynb
--rw-r--r-- 1 runner docker 13K Dec 5 12:12 study_before_workshop.qmd
--rw-r--r-- 1 runner docker 57K Dec 5 12:12 workshop.html
--rw-r--r-- 1 runner docker 8.4K Dec 5 12:12 workshop.qmd
--rw-r--r-- 1 runner docker 8.4K Dec 5 12:16 workshop.rmarkdown
-drwxr-xr-x 3 runner docker 4.0K Dec 5 12:12 workshop_files
+drwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 data
+drwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 images
+-rw-r--r-- 1 runner docker 27K Dec 15 13:09 overview.html
+-rw-r--r-- 1 runner docker 1.6K Dec 15 12:46 overview.qmd
+-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd
+-rw-r--r-- 1 runner docker 70K Dec 15 13:09 study_before_workshop.html
+-rw-r--r-- 1 runner docker 4.7K Dec 15 12:46 study_before_workshop.ipynb
+-rw-r--r-- 1 runner docker 13K Dec 15 12:46 study_before_workshop.qmd
+-rw-r--r-- 1 runner docker 57K Dec 15 12:46 workshop.html
+-rw-r--r-- 1 runner docker 8.4K Dec 15 12:46 workshop.qmd
+-rw-r--r-- 1 runner docker 8.4K Dec 15 13:09 workshop.rmarkdown
+drwxr-xr-x 3 runner docker 4.0K Dec 15 12:46 workshop_files
The -a option stands for “all” and shows us all the files, including hidden files.
@@ -531,20 +531,20 @@
Workshop
ls-alh
total 236K
-drwxr-xr-x 5 runner docker 4.0K Dec 5 12:16 .
-drwxr-xr-x 6 runner docker 4.0K Dec 5 12:16 ..
-drwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 data
-drwxr-xr-x 2 runner docker 4.0K Dec 5 12:12 images
--rw-r--r-- 1 runner docker 1.6K Dec 5 12:12 overview.qmd
--rw-r--r-- 1 runner docker 25K Dec 5 12:16 study_after_workshop.html
--rw-r--r-- 1 runner docker 184 Dec 5 12:12 study_after_workshop.qmd
--rw-r--r-- 1 runner docker 70K Dec 5 12:16 study_before_workshop.html
--rw-r--r-- 1 runner docker 4.7K Dec 5 12:12 study_before_workshop.ipynb
--rw-r--r-- 1 runner docker 13K Dec 5 12:12 study_before_workshop.qmd
--rw-r--r-- 1 runner docker 57K Dec 5 12:12 workshop.html
--rw-r--r-- 1 runner docker 8.4K Dec 5 12:12 workshop.qmd
--rw-r--r-- 1 runner docker 8.4K Dec 5 12:16 workshop.rmarkdown
-drwxr-xr-x 3 runner docker 4.0K Dec 5 12:12 workshop_files
+drwxr-xr-x 5 runner docker 4.0K Dec 15 13:09 .
+drwxr-xr-x 6 runner docker 4.0K Dec 15 13:09 ..
+drwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 data
+drwxr-xr-x 2 runner docker 4.0K Dec 15 12:46 images
+-rw-r--r-- 1 runner docker 27K Dec 15 13:09 overview.html
+-rw-r--r-- 1 runner docker 1.6K Dec 15 12:46 overview.qmd
+-rw-r--r-- 1 runner docker 184 Dec 15 12:46 study_after_workshop.qmd
+-rw-r--r-- 1 runner docker 70K Dec 15 13:09 study_before_workshop.html
+-rw-r--r-- 1 runner docker 4.7K Dec 15 12:46 study_before_workshop.ipynb
+-rw-r--r-- 1 runner docker 13K Dec 15 12:46 study_before_workshop.qmd
+-rw-r--r-- 1 runner docker 57K Dec 15 12:46 workshop.html
+-rw-r--r-- 1 runner docker 8.4K Dec 15 12:46 workshop.qmd
+-rw-r--r-- 1 runner docker 8.4K Dec 15 13:09 workshop.rmarkdown
+drwxr-xr-x 3 runner docker 4.0K Dec 15 12:46 workshop_files
You can move about with the cd command, which stands for “change directory”. You can use it to move into a directory by specifying the path to the directory:
You will get a warning Duplicated column names deduplicated: 'avgX' => 'avgX_1' [15] for each of the files because the csv files each have two columns called avgX. If you click on the tracking dataframe you see is contains the data from all the files.
Now we can add columns for the type and the concentration by processing the values in the file. The values are like track/343_0.txt so we need to remove .txt and track/ and separate the remaining words into two columns.
The provided data is cumulative/absolute. We need to calculate the change in VFA with time. There is a function, lag() that will help us do this. It will take the previous value and subtract it from the current value. We need to do that separately for each sample_replicate so we need to group by sample_replicate first. We also need to make sure the data is in the right order so we will arrange by sample_replicate and time_day.
Now we have two dataframes, one for the cumulative data and one for the change in VFA.
+
To make conversions from mM to g/l we need to do mM * 0.001 * MW. We will import the molecular weight data, pivot the VFA data to long format and join the molecular weight data to the VFA data. Then we can calculate the g/l. We will do this for both the cumulative and delta dataframes.
We have 8 genes in our dataset. PCA will allow us to plot our samples in the “VFA” space so we can see if treatments, time or replicate cluster.
+
However, PCA expects a matrix with samples in rows and VFA, the variables, in columns. We will need to select the columns we need and pivot wider. Then convert to a matrix.
The scale. argument tells prcomp() to scale the data to have a mean of 0 and a standard deviation of 1. The rank. argument tells prcomp() to only calculate the first 4 principal components. This is useful for visualisation as we can only plot in 2 or 3 dimensions. We can see the results of the PCA by viewing the summary() of the pca object.
Importance of first k=4 (out of 8) components:
+ PC1 PC2 PC3 PC4
+Standard deviation 2.4977 0.9026 0.77959 0.45567
+Proportion of Variance 0.7798 0.1018 0.07597 0.02595
+Cumulative Proportion 0.7798 0.8816 0.95760 0.98355
+
+
+
The Proportion of Variance tells us how much of the variance is explained by each component. We can see that the first component explains 0.7798 of the variance, the second 0.1018, and the third 0.07597. Together the first three components explain nearly 96% of the total variance in the data. Plotting PC1 against PC2 will capture about 78% of the variance which is likely much better than we would get plotting any two VFA against each other. To plot the PC1 against PC2 we will need to extract the PC1 and PC2 score from the pca object and add labels for the samples.
+
🎬 Create a dataframe of the PC1 and PC2 scores which are in pca$x and add the sample information from vfa_cummul_pca:
We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the vfa to be the same since it makes sense to see what clusters of genes correlate with the treatments.
+
🎬 Set the number of clusters for the treatments and vfa:
The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can “Show in a new window” to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting hide_colorbar = TRUE, in the heatmaply() function.
+
One of the NC replicates at time = 22 is very different from the other replicates. The CN10 treatments cluster together at high time points. CN10 samples are more similar to NC samples early on. Most of the VFAs behave similarly with highest values later in the experiment for CN10 but isohexanoate and hexanoate differ. The difference might be because isohexanoate is especially low in the NC replicates at time = 1 and hexanoate is especially high in the NC replicate 2 at time = 22
+Galili, Tal, O’Callaghan, Alan, Sidi, Jonathan, Sievert, and Carson. 2017. “Heatmaply: An r Package for Creating Interactive Cluster Heatmaps for Online Publishing.”Bioinformatics. https://doi.org/10.1093/bioinformatics/btx657.
+
+
+R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
+
+
+Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.
+
+
+Xie, Yihui. 2022. “Knitr: A General-Purpose Package for Dynamic Report Generation in r.”https://yihui.org/knitr/.
+