Skip to content

Commit

Permalink
Merge pull request #25 from 3mmaRand/omics-03-heatmaps
Browse files Browse the repository at this point in the history
Omics 03 heatmaps
  • Loading branch information
3mmaRand committed Oct 24, 2023
2 parents 17ff1ba + 2cb0568 commit 13f0fa6
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 14 deletions.
Binary file added omics/week-5/Rplot001.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 46 additions & 14 deletions omics/week-5/workshop.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -447,41 +447,58 @@ ggsave("figures/frog-s30-pca.png",

## Visualise the expression of the most significant genes using a heatmap

only should do on sig genes. but use the log 2 normalised values
A heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level.

We are going to create an interactive heatmap with the **`heatmaply`** [@heatmaply] package. **`heatmaply`** takes a matrix as input so we need to convert a dataframe of the log~2~ values to a matrix. We will also set the rownames to the Xenbase gene symbols.

🎬 Convert a dataframe of the log~2~ values to a matrix:
```{r}
mat <- s30_results_sig0.01 |>
select(starts_with("log2_")) |>
as.matrix()
```

🎬 Set the rownames to the Xenbase gene symbols:
```{r}
rownames(mat) <- s30_results_sig0.01$xenbase_gene_symbol
```

You might want to view the matrix by clicking on it in the environment pane.


🎬 Load the **`heatmaply`** package:
```{r}
library(heatmaply)
```

We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the treatments to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the treatments.

🎬 Set the number of clusters for the treatments and genes:

```{r}
n_treatment_clusters <- 2
n_gene_clusters <- 2
```


🎬 Create the heatmap:
```{r}
#| fig-height: 8
heatmaply(mat,
scale = "row",
hide_colorbar = TRUE,
k_col = n_treatment_clusters,
k_row = n_gene_clusters,
label_names = c("Gene", "Sample", "Expression (normalised, log2)"),
fontsize_row = 7, fontsize_col = 10,
labCol = str_remove(colnames(mat), pattern = "log2_"),
labRow = rownames(mat),
heatmap_layers = theme(axis.line = element_blank()))
```


On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are samples. We can see that the FGF-treated samples cluster together and the control samples cluster together. We can also see two clusters of genes; one of these shows genes upregulated (more yellow) in the FGF-treated samples (the pink cluster) and the other shows genes down regulated (more blue, the blue cluster) in the FGF-treated samples.

The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can "Show in a new window" to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting `hide_colorbar = TRUE,` in the `heatmaply()` function.

## Visualise all the results with a volcano plot

colour the points if padj \< 0.05 and log2FoldChange \> 1
Expand Down Expand Up @@ -583,7 +600,7 @@ prog_hspc_results <- read_csv("results/prog_hspc_results.csv")
```

🎬 Remind yourself what is in the rows and columns and the structure of
the dataframes (perhaps using `glimpse()`)
the dataframe (perhaps using `glimpse()`)

```{r}
#| include: false
Expand Down Expand Up @@ -838,12 +855,12 @@ ggsave("figures/prog_hspc-pca.png",

## Visualise the expression of the most significant genes using a heatmap

```{r}
library(heatmaply)
```
A heatmap is a common way to visualise gene expression data. Often people will create heatmaps with thousands of genes but it can be more informative to use a subset along with clustering methods. We will use the genes which are significant at the 0.01 level.

We are going to create an interactive heatmap with the **`heatmaply`** [@heatmaply] package. **`heatmaply`** takes a matrix as input so we need to convert a dataframe of the log~2~ values to a matrix. We will also set the rownames to the gene names.

we will use the most significant genes on a random subset of the cells
since \~1500 columns is a lot

🎬 Convert a dataframe of the log~2~ values to a matrix. I have used `sample()` to select 70 random columns so the heatmap is generated quickly:

```{r}
mat <- prog_hspc_results_sig0.01 |>
Expand All @@ -852,32 +869,47 @@ mat <- prog_hspc_results_sig0.01 |>
as.matrix()
```


🎬 Set the row names to the gene names:

```{r}
rownames(mat) <- prog_hspc_results_sig0.01$external_gene_name
```

You might want to view the matrix by clicking on it in the environment pane.

🎬 Load the **`heatmaply`** package:
```{r}
library(heatmaply)
```

We need to tell the clustering algorithm how many clusters to create. We will set the number of clusters for the cell types to be 2 and the number of clusters for the genes to be the same since it makes sense to see what clusters of genes correlate with the cell types.

```{r}
n_cell_clusters <- 2
n_gene_clusters <- 2
```


🎬 Create the heatmap:

```{r}
heatmaply(mat,
scale = "row",
hide_colorbar = TRUE,
k_col = n_cell_clusters,
k_row = n_gene_clusters,
label_names = c("Gene", "Cell id", "Expression (normalised, log2)"),
fontsize_row = 7, fontsize_col = 10,
labCol = colnames(mat),
labRow = rownames(mat),
heatmap_layers = theme(axis.line = element_blank()))
```

will take a few mins to run, and longer to appear in the viewer
separation is not as strong as for the frog data run a few times to see
different subset
It will take a minute to run and display. On the vertical axis are genes which are differentially expressed at the 0.01 level. On the horizontal axis are cells. We can see that cells of the same type don't cluster that well together. We can also see two clusters of genes but the pattern of gene is not as clear as it was for the frogs and the correspondence with the cell clusters is not as strong.

The heatmap will open in the viewer pane (rather than the plot pane) because it is html. You can "Show in a new window" to see it in a larger format. You can also zoom in and out and pan around the heatmap and download it as a png. You might feel the colour bars is not adding much to the plot. You can remove it by setting `hide_colorbar = TRUE,` in the `heatmaply()` function.

Using all the cells is worth doing but it will take a while to generate the heatmap and then show in the viewer so do it sometime when you're ready for a coffee break.

## Visualise all the results with a volcano plot

Expand Down

0 comments on commit 13f0fa6

Please sign in to comment.