diff --git a/omics/week-5/Rplot001.jpg b/omics/week-5/Rplot001.jpg new file mode 100644 index 0000000..fd0b397 Binary files /dev/null and b/omics/week-5/Rplot001.jpg differ diff --git a/omics/week-5/workshop.qmd b/omics/week-5/workshop.qmd index af95501..28c8ba9 100644 --- a/omics/week-5/workshop.qmd +++ b/omics/week-5/workshop.qmd @@ -447,41 +447,58 @@ ggsave("figures/frog-s30-pca.png", ## Visualise the expression of the most significant genes using a heatmap -only should do on sig genes. but use the log 2 normalised values +A heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level. +We are going to create an interactive heatmap with the **`heatmaply`** [@heatmaply] package. **`heatmaply`** takes a matrix as input so we need to convert a dataframe of the log~2~ values to a matrix. We will also set the rownames to the Xenbase gene symbols. + +🎬 Convert a dataframe of the log~2~ values to a matrix: ```{r} mat <- s30_results_sig0.01 |> select(starts_with("log2_")) |> as.matrix() ``` +🎬 Set the rownames to the Xenbase gene symbols: ```{r} rownames(mat) <- s30_results_sig0.01$xenbase_gene_symbol ``` +You might want to view the matrix by clicking on it in the environment pane. + + +🎬 Load the **`heatmaply`** package: ```{r} library(heatmaply) ``` +We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the treatments. + +🎬 Set the number of clusters for the treatments and genes: + ```{r} n_treatment_clusters <- 2 n_gene_clusters <- 2 ``` + +🎬 Create the heatmap: ```{r} #| fig-height: 8 heatmaply(mat, scale = "row", - hide_colorbar = TRUE, k_col = n_treatment_clusters, k_row = n_gene_clusters, - label_names = c("Gene", "Sample", "Expression (normalised, log2)"), fontsize_row = 7, fontsize_col = 10, labCol = str_remove(colnames(mat), pattern = "log2_"), labRow = rownames(mat), heatmap_layers = theme(axis.line = element_blank())) ``` + +On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are samples. We can see that the FGF-treated samples cluster together and the control samples cluster together. We can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples (the pink cluster) and the other shows genes down regulated (more blue, the blue cluster) in the FGF-treated samples. + +The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can "Show in a new window" to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting `hide_colorbar = TRUE,` in the `heatmaply()` function. + ## Visualise all the results with a volcano plot colour the points if padj \< 0.05 and log2FoldChange \> 1