diff --git a/omics/week-5/Rplot001.jpg b/omics/week-5/Rplot001.jpg new file mode 100644 index 0000000..fd0b397 Binary files /dev/null and b/omics/week-5/Rplot001.jpg differ diff --git a/omics/week-5/workshop.qmd b/omics/week-5/workshop.qmd index af95501..ed7d378 100644 --- a/omics/week-5/workshop.qmd +++ b/omics/week-5/workshop.qmd @@ -447,41 +447,58 @@ ggsave("figures/frog-s30-pca.png", ## Visualise the expression of the most significant genes using a heatmap -only should do on sig genes. but use the log 2 normalised values +A heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level. +We are going to create an interactive heatmap with the **`heatmaply`** [@heatmaply] package. **`heatmaply`** takes a matrix as input so we need to convert a dataframe of the log~2~ values to a matrix. We will also set the rownames to the Xenbase gene symbols. + +🎬 Convert a dataframe of the log~2~ values to a matrix: ```{r} mat <- s30_results_sig0.01 |> select(starts_with("log2_")) |> as.matrix() ``` +🎬 Set the rownames to the Xenbase gene symbols: ```{r} rownames(mat) <- s30_results_sig0.01$xenbase_gene_symbol ``` +You might want to view the matrix by clicking on it in the environment pane. + + +🎬 Load the **`heatmaply`** package: ```{r} library(heatmaply) ``` +We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the treatments. + +🎬 Set the number of clusters for the treatments and genes: + ```{r} n_treatment_clusters <- 2 n_gene_clusters <- 2 ``` + +🎬 Create the heatmap: ```{r} #| fig-height: 8 heatmaply(mat, scale = "row", - hide_colorbar = TRUE, k_col = n_treatment_clusters, k_row = n_gene_clusters, - label_names = c("Gene", "Sample", "Expression (normalised, log2)"), fontsize_row = 7, fontsize_col = 10, labCol = str_remove(colnames(mat), pattern = "log2_"), labRow = rownames(mat), heatmap_layers = theme(axis.line = element_blank())) ``` + +On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are samples. We can see that the FGF-treated samples cluster together and the control samples cluster together. We can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples (the pink cluster) and the other shows genes down regulated (more blue, the blue cluster) in the FGF-treated samples. + +The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can "Show in a new window" to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting `hide_colorbar = TRUE,` in the `heatmaply()` function. + ## Visualise all the results with a volcano plot colour the points if padj \< 0.05 and log2FoldChange \> 1 @@ -583,7 +600,7 @@ prog_hspc_results <- read_csv("results/prog_hspc_results.csv") ``` 🎬 Remind yourself what is in the rows and columns and the structure of -the dataframes (perhaps using `glimpse()`) +the dataframe (perhaps using `glimpse()`) ```{r} #| include: false @@ -838,12 +855,12 @@ ggsave("figures/prog_hspc-pca.png", ## Visualise the expression of the most significant genes using a heatmap -```{r} -library(heatmaply) -``` +A heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level. + +We are going to create an interactive heatmap with the **`heatmaply`** [@heatmaply] package. **`heatmaply`** takes a matrix as input so we need to convert a dataframe of the log~2~ values to a matrix. We will also set the rownames to the gene names. -we will use the most significant genes on a random subset of the cells -since \~1500 columns is a lot + +🎬 Convert a dataframe of the log~2~ values to a matrix. I have used `sample()` to select 70 random columns so the heatmap is generated quickly: ```{r} mat <- prog_hspc_results_sig0.01 |> @@ -852,32 +869,47 @@ mat <- prog_hspc_results_sig0.01 |> as.matrix() ``` + +🎬 Set the row names to the gene names: + ```{r} rownames(mat) <- prog_hspc_results_sig0.01$external_gene_name ``` +You might want to view the matrix by clicking on it in the environment pane. + +🎬 Load the **`heatmaply`** package: +```{r} +library(heatmaply) +``` + +We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the cell types to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the cell types. + ```{r} n_cell_clusters <- 2 n_gene_clusters <- 2 ``` + +🎬 Create the heatmap: + ```{r} heatmaply(mat, scale = "row", - hide_colorbar = TRUE, k_col = n_cell_clusters, k_row = n_gene_clusters, - label_names = c("Gene", "Cell id", "Expression (normalised, log2)"), fontsize_row = 7, fontsize_col = 10, labCol = colnames(mat), labRow = rownames(mat), heatmap_layers = theme(axis.line = element_blank())) ``` -will take a few mins to run, and longer to appear in the viewer -separation is not as strong as for the frog data run a few times to see -different subset +It will take a minute to run and display. On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are cells. We can see that cells of the same type don't cluster that well together. We can also see two clusters of genes but the pattern of gene is not as clear as it was for the frogs and the correspondence with the cell clusters is not as strong. + +The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can "Show in a new window" to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting `hide_colorbar = TRUE,` in the `heatmaply()` function. + +Using all the cells is worth doing but it will take a while to generate the heatmap and then show in the viewer so do it sometime when you're ready for a coffee break. ## Visualise all the results with a volcano plot