diff --git a/_site/omics/week-5/images/Xenbase-Logo-Medium.png b/_site/omics/week-5/images/Xenbase-Logo-Medium.png new file mode 100644 index 0000000..2121bb0 Binary files /dev/null and b/_site/omics/week-5/images/Xenbase-Logo-Medium.png differ diff --git a/_site/omics/week-5/meta/xenbase_info.xlsx b/_site/omics/week-5/meta/xenbase_info.xlsx new file mode 100644 index 0000000..940c933 Binary files /dev/null and b/_site/omics/week-5/meta/xenbase_info.xlsx differ diff --git a/_site/omics/week-5/study_before_workshop.html b/_site/omics/week-5/study_before_workshop.html new file mode 100644 index 0000000..1c44600 --- /dev/null +++ b/_site/omics/week-5/study_before_workshop.html @@ -0,0 +1,958 @@ + + + + + + + + + + + + +Data Analysis for Group Project - Independent Study to prepare for workshop + + + + + + + + + + + + + + + + + +
+
+ +

Independent Study to prepare for workshop

+

Omics 3: Visualising and Interpreting

+ +
+
+
+Emma Rand +
+
+
+ +

23 October, 2023

+

Overview

+

In these slides we will:

+
+
    +
  • Check where you are

  • +
  • +

    learn some concepts used omics visualisation

    +
      +
    • Principle Component Analysis (PCA)
    • +
    • Volcano plots
    • +
    • Heatmaps
    • +
    +
  • +
  • Find out what packages to install before the workshop

  • +
+
+

Where should you be?

+ +

What we did in Omics 2: Statistical Analysis

+
+
    +
  • carried out differential expression analysis

  • +
  • found genes not expressed at all, or expressed in one group only

  • +
  • Saved results files

  • +
+
+

Where should you be?

+

After the Omics 2: šŸ‘‹ Statistical Analysis Workshop including:

+

šŸø Frogs

+
+
    +
  • An RStudio Project called frogs-88H which contains: +
      +
    • Raw data (S14, S20 and S30)
    • +
    • Processed data (s30_filtered.csv, s30_summary_gene.csv, s30_summary_gene_filtered.csv, s30_summary_samp.csv and equivalents for S14 OR S20)
    • +
    • Results files (s30_fgf_only.csv, S30_normalised_counts.csv, S30_results.csv and equivalents for S14 OR S20)
      +
    • +
    • Two scripts called cont-fgf-s30.R and either cont-fgf-s20.R OR cont-fgf-s14.R +
    • +
    +
  • +
+
+

Files should be organised into folders. Code should well commented and easy to read.

+

šŸ­ Mice

+
+
    +
  • An RStudio Project called mice-88H which contains +
      +
    • Raw data (hspc, prog, lthsc)
    • +
    • Processed data (hspc_summary_gene.csv, hspc_summary_samp.csv, prog_summary_gene.csv, prog_summary_samp.csv, lthsc_summary_gene.csv, lthsc_summary_samp.csv)
    • +
    +
  • +
  • Results files (prog_hspc_results.csv and an equivalent for lthsc vs prog or hspc vs lthsc)
  • +
  • Two scripts called hspc-prog.R and either hspc-lthsc.R OR prog-lthsc.R +
  • +
+
+

Files should be organised into folders. Code should well commented and easy to read.

+

šŸ‚

+

Either of the other examples.

+

If you do not have those

+

Go through:

+

Examine the results files

+ +

Examine the results files

+

Remind yourself of the key columns you have in the results files:

+
    +
  • a fold change, logged to base 2
  • +
  • an unadjusted p-value
  • +
  • a p value adjusted for multiple testing (FDR or padj)
  • +
  • a gene id
  • +

šŸø Frogs

+
+
+
Rows: 10,136
+Columns: 7
+$ baseMean        <dbl> 237.553928, 531.565700, 86.392830, 49.813502, 419.9983ā€¦
+$ log2FoldChange  <dbl> 0.096601855, -0.089588528, -0.192811203, -0.008858703,ā€¦
+$ lfcSE           <dbl> 0.2079396, 0.1557384, 0.3253216, 0.4342614, 0.1685420,ā€¦
+$ stat            <dbl> 0.46456683, -0.57525007, -0.59267874, -0.02039947, -0.ā€¦
+$ pvalue          <dbl> 0.64224169, 0.56512218, 0.55339617, 0.98372471, 0.8699ā€¦
+$ padj            <dbl> 0.9998970, 0.9998970, 0.9998970, 0.9998970, 0.9998970,ā€¦
+$ xenbase_gene_id <chr> "XB-GENE-1000007", "XB-GENE-1000023", "XB-GENE-1000062ā€¦
+
+
+
    +
  • +baseMean is the mean of the normalised counts for the gene across all samples
  • +
  • +lfcSE standard error of the fold change
  • +
  • +stat is the test statistic (the Wald statistic)
  • +

šŸ­ Mice

+
+
+
Rows: 280
+Columns: 6
+$ Top             <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,ā€¦
+$ p.value         <dbl> 7.038138e-117, 4.736622e-90, 1.832630e-88, 4.211954e-7ā€¦
+$ FDR             <dbl> 1.970679e-114, 6.631271e-88, 1.710455e-86, 2.948368e-7ā€¦
+$ summary.logFC   <dbl> 1.596910, 3.035165, 3.261056, -2.146491, -3.056730, 3.ā€¦
+$ logFC.hspc      <dbl> 1.596910, 3.035165, 3.261056, -2.146491, -3.056730, 3.ā€¦
+$ ensembl_gene_id <chr> "ENSMUSG00000028639", "ENSMUSG00000024053", "ENSMUSG00ā€¦
+
+
+
    +
  • Top is the rank of the gene ordered by the p value (smallest first)
  • +

Adding gene information

+ +

Adding gene information

+
+
    +
  • The gene id is difficult to interpret in plots/tables

  • +
  • Therefore we need to add information such as the gene name and a description to the results

  • +
  • For the šŸø Frog data information comes from xenbase

  • +
  • For the šŸ­ Mice data information comes from Ensembl

  • +
+
+

šŸø Xenbase

+ +

xenbase logo

Xenbase

+

Xenbase is a model organism database that provides genomic, molecular, and developmental biology information about Xenopus laevis and Xenopus tropicalis.

+

It took me some time to find the information you need.

+

šŸø Xenbase

+
+
    +
  • I got the information from the Xenbase information pages under Data Reports | Gene Information

  • +
  • This is listed: Xenbase Gene Product Information [readme] gzipped gpi (tab separated)

  • +
  • Click on the readme link to see the file format and columns

  • +
  • I downloaded xenbase.gpi.gz, unzipped it, removed header lines and the Xenopus tropicalis (taxon:8364) entries and saved it as xenbase_info.xlsx

  • +
  • In the workshop you will merge this information with the results file

  • +
+
+

šŸ­ Ensembl

+

from the ncbi

+

biomart is a package that allows you to get information from the ncbi database such as gene names and descriptions

+

Plots

+ +

plots purpose

+

dimsenion reduction

+

pca

+

lots of variables

+

tsne

+

lots of variables and lots of observations

+

normalsing before plotting

+ +

normalising before plotting

+

log

+

normalisation regularised log is a method to bias from low count genes. https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/03_DGE_QC_analysis.html

+
+
    +
  • T
  • +
+
+

rlog is a method to bias from low count genes. https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/03_DGE_QC_analysis.html gives a good explanation of regularized the log transform (rlog)

+

The rlog transformation of the normalized counts is only necessary for these visualization methods during this quality assessment. They are not used for DE because DESeq2 takes care of that

+

in the workshop we just to log transformed

+
    +
  • The šŸ­ mouse data have been normalised to simplify the analysis for you; the šŸø frog data have not but the DE method will do this for you.
  • +

Packages to install before the workshop

+

heatmaply ggrepel from CRAN in the the normal way:

+
+
install.packages("heatmaply")
+install.packages("ggrepel")
+
+

biomaRt from Bioconductor using BiocManager:

+
+
BiocManager::install("biomaRt")
+
+

Workshops

+ +

Workshops

+
    +
  • Omics 1: Hello data Getting to know the data. Checking the distributions of values

  • +
  • Omics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments.

  • +
  • Omics 3: Visualising and Interpreting. PCA, Volcano plots and heatmaps to visualise results. Interpreting the results and finding out more about genes of interest.

  • +

References

+ + +
+
+
+ + + + + \ No newline at end of file diff --git a/_site/search.json b/_site/search.json index 20eaa12..1e7152a 100644 --- a/_site/search.json +++ b/_site/search.json @@ -1417,5 +1417,166 @@ "Week 4: Statistical Analysis", "Prepare!" ] + }, + { + "objectID": "omics/week-5/study_before_workshop.html#overview", + "href": "omics/week-5/study_before_workshop.html#overview", + "title": "Independent Study to prepare for workshop", + "section": "Overview", + "text": "Overview\nIn these slides we will:\n\n\nCheck where you are\n\nlearn some concepts used omics visualisation\n\nPrinciple Component Analysis (PCA)\nVolcano plots\nHeatmaps\n\n\nFind out what packages to install before the workshop" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#what-we-did-in-omics-2-statistical-analysis", + "href": "omics/week-5/study_before_workshop.html#what-we-did-in-omics-2-statistical-analysis", + "title": "Independent Study to prepare for workshop", + "section": "What we did in Omics 2: Statistical Analysis", + "text": "What we did in Omics 2: Statistical Analysis\n\n\ncarried out differential expression analysis\nfound genes not expressed at all, or expressed in one group only\nSaved results files" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#where-should-you-be-1", + "href": "omics/week-5/study_before_workshop.html#where-should-you-be-1", + "title": "Independent Study to prepare for workshop", + "section": "Where should you be?", + "text": "Where should you be?\nAfter the Omics 2: šŸ‘‹ Statistical Analysis Workshop including:\n\nšŸ¤— Look after future you! and\nthe Independent Study to consolidate, you should have:" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#frogs", + "href": "omics/week-5/study_before_workshop.html#frogs", + "title": "Independent Study to prepare for workshop", + "section": "šŸø Frogs", + "text": "šŸø Frogs\n\n\nAn RStudio Project called frogs-88H which contains:\n\nRaw data (S14, S20 and S30)\nProcessed data (s30_filtered.csv, s30_summary_gene.csv, s30_summary_gene_filtered.csv, s30_summary_samp.csv and equivalents for S14 OR S20)\nResults files (s30_fgf_only.csv, S30_normalised_counts.csv, S30_results.csv and equivalents for S14 OR S20)\n\nTwo scripts called cont-fgf-s30.R and either cont-fgf-s20.R OR cont-fgf-s14.R\n\n\n\n\n\nFiles should be organised into folders. Code should well commented and easy to read." + }, + { + "objectID": "omics/week-5/study_before_workshop.html#mice", + "href": "omics/week-5/study_before_workshop.html#mice", + "title": "Independent Study to prepare for workshop", + "section": "šŸ­ Mice", + "text": "šŸ­ Mice\n\n\nAn RStudio Project called mice-88H which contains\n\nRaw data (hspc, prog, lthsc)\nProcessed data (hspc_summary_gene.csv, hspc_summary_samp.csv, prog_summary_gene.csv, prog_summary_samp.csv, lthsc_summary_gene.csv, lthsc_summary_samp.csv)\n\n\nResults files (prog_hspc_results.csv and an equivalent for lthsc vs prog or hspc vs lthsc)\nTwo scripts called hspc-prog.R and either hspc-lthsc.R OR prog-lthsc.R\n\n\n\nFiles should be organised into folders. Code should well commented and easy to read." + }, + { + "objectID": "omics/week-5/study_before_workshop.html#section", + "href": "omics/week-5/study_before_workshop.html#section", + "title": "Independent Study to prepare for workshop", + "section": "šŸ‚", + "text": "šŸ‚\nEither of the other examples." + }, + { + "objectID": "omics/week-5/study_before_workshop.html#if-you-do-not-have-those", + "href": "omics/week-5/study_before_workshop.html#if-you-do-not-have-those", + "title": "Independent Study to prepare for workshop", + "section": "If you do not have those", + "text": "If you do not have those\nGo through:\n\nOmics 2: Statistical Analysis including:\nšŸ¤— Look after future you! and\nthe Independent Study to consolidate" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#examine-the-results-files-1", + "href": "omics/week-5/study_before_workshop.html#examine-the-results-files-1", + "title": "Independent Study to prepare for workshop", + "section": "Examine the results files", + "text": "Examine the results files\nRemind yourself of the key columns you have in the results files:\n\na fold change, logged to base 2\nan unadjusted p-value\na p value adjusted for multiple testing (FDR or padj)\na gene id" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#frogs-1", + "href": "omics/week-5/study_before_workshop.html#frogs-1", + "title": "Independent Study to prepare for workshop", + "section": "šŸø Frogs", + "text": "šŸø Frogs\n\n\nRows: 10,136\nColumns: 7\n$ baseMean 237.553928, 531.565700, 86.392830, 49.813502, 419.9983ā€¦\n$ log2FoldChange 0.096601855, -0.089588528, -0.192811203, -0.008858703,ā€¦\n$ lfcSE 0.2079396, 0.1557384, 0.3253216, 0.4342614, 0.1685420,ā€¦\n$ stat 0.46456683, -0.57525007, -0.59267874, -0.02039947, -0.ā€¦\n$ pvalue 0.64224169, 0.56512218, 0.55339617, 0.98372471, 0.8699ā€¦\n$ padj 0.9998970, 0.9998970, 0.9998970, 0.9998970, 0.9998970,ā€¦\n$ xenbase_gene_id \"XB-GENE-1000007\", \"XB-GENE-1000023\", \"XB-GENE-1000062ā€¦\n\n\n\n\nbaseMean is the mean of the normalised counts for the gene across all samples\n\nlfcSE standard error of the fold change\n\nstat is the test statistic (the Wald statistic)" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#mice-1", + "href": "omics/week-5/study_before_workshop.html#mice-1", + "title": "Independent Study to prepare for workshop", + "section": "šŸ­ Mice", + "text": "šŸ­ Mice\n\n\nRows: 280\nColumns: 6\n$ Top 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,ā€¦\n$ p.value 7.038138e-117, 4.736622e-90, 1.832630e-88, 4.211954e-7ā€¦\n$ FDR 1.970679e-114, 6.631271e-88, 1.710455e-86, 2.948368e-7ā€¦\n$ summary.logFC 1.596910, 3.035165, 3.261056, -2.146491, -3.056730, 3.ā€¦\n$ logFC.hspc 1.596910, 3.035165, 3.261056, -2.146491, -3.056730, 3.ā€¦\n$ ensembl_gene_id \"ENSMUSG00000028639\", \"ENSMUSG00000024053\", \"ENSMUSG00ā€¦\n\n\n\nTop is the rank of the gene ordered by the p value (smallest first)" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#from-xenbase", + "href": "omics/week-5/study_before_workshop.html#from-xenbase", + "title": "Independent Study to prepare for workshop", + "section": "from xenbase", + "text": "from xenbase\n\nxenbase logoXenbase (http://www.xenbase.org/, RRID:SCR_003280)\nXenbase is a model organism database that provides genomic, molecular, and developmental biology information about Xenopus laevis and Xenopus tropicalis. Xenbase is funded by the National Institutes of Health (NIH) and the National Science Foundation (NSF).\nour data gives the xenbase gene id so we are using xenbase to get the information a lot of the information would also be in the ncbi" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#from-the-ncbi", + "href": "omics/week-5/study_before_workshop.html#from-the-ncbi", + "title": "Independent Study to prepare for workshop", + "section": "from the ncbi", + "text": "from the ncbi\nbiomart is a package that allows you to get information from the ncbi database such as gene names and descriptions" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#plots-purpose", + "href": "omics/week-5/study_before_workshop.html#plots-purpose", + "title": "Independent Study to prepare for workshop", + "section": "plots purpose", + "text": "plots purpose\ndimsenion reduction" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#pca", + "href": "omics/week-5/study_before_workshop.html#pca", + "title": "Independent Study to prepare for workshop", + "section": "pca", + "text": "pca\nlots of variables" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#tsne", + "href": "omics/week-5/study_before_workshop.html#tsne", + "title": "Independent Study to prepare for workshop", + "section": "tsne", + "text": "tsne\nlots of variables and lots of observations" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#normalising-before-plotting", + "href": "omics/week-5/study_before_workshop.html#normalising-before-plotting", + "title": "Independent Study to prepare for workshop", + "section": "normalising before plotting", + "text": "normalising before plotting\nlog\nnormalisation regularised log is a method to bias from low count genes. https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/03_DGE_QC_analysis.html\n\n\nT\n\n\nrlog is a method to bias from low count genes. https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/03_DGE_QC_analysis.html gives a good explanation of regularized the log transform (rlog)\nThe rlog transformation of the normalized counts is only necessary for these visualization methods during this quality assessment. They are not used for DE because DESeq2 takes care of that\nin the workshop we just to log transformed\n\nThe šŸ­ mouse data have been normalised to simplify the analysis for you; the šŸø frog data have not but the DE method will do this for you." + }, + { + "objectID": "omics/week-5/study_before_workshop.html#packages-to-install-before-the-workshop", + "href": "omics/week-5/study_before_workshop.html#packages-to-install-before-the-workshop", + "title": "Independent Study to prepare for workshop", + "section": "Packages to install before the workshop", + "text": "Packages to install before the workshop\nheatmaply ggrepel from CRAN in the the normal way:\n\ninstall.packages(\"heatmaply\")\ninstall.packages(\"ggrepel\")\n\nbiomaRt from Bioconductor using BiocManager:\n\nBiocManager::install(\"biomaRt\")" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#workshops-1", + "href": "omics/week-5/study_before_workshop.html#workshops-1", + "title": "Independent Study to prepare for workshop", + "section": "Workshops", + "text": "Workshops\n\nOmics 1: Hello data Getting to know the data. Checking the distributions of values\nOmics 2: Statistical Analysis Identifying which genes are differentially expressed between treatments.\nOmics 3: Visualising and Interpreting. PCA, Volcano plots and heatmaps to visualise results. Interpreting the results and finding out more about genes of interest." + }, + { + "objectID": "omics/week-5/study_before_workshop.html#references", + "href": "omics/week-5/study_before_workshop.html#references", + "title": "Independent Study to prepare for workshop", + "section": "References", + "text": "References\n\n\nšŸ”— About Omics 3: Visualising and Interpreting" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#adding-gene-information-1", + "href": "omics/week-5/study_before_workshop.html#adding-gene-information-1", + "title": "Independent Study to prepare for workshop", + "section": "Adding gene information", + "text": "Adding gene information\n\n\nThe gene id is difficult to interpret in plots/tables\nTherefore we need to add information such as the gene name and a description to the results\nFor the šŸø Frog data information comes from xenbase\nFor the šŸ­ Mice data information comes from Ensembl" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#xenbase", + "href": "omics/week-5/study_before_workshop.html#xenbase", + "title": "Independent Study to prepare for workshop", + "section": "šŸø Xenbase", + "text": "šŸø Xenbase\n\nxenbase logoXenbase\nXenbase is a model organism database that provides genomic, molecular, and developmental biology information about Xenopus laevis and Xenopus tropicalis.\nIt took me some time to find the information you need." + }, + { + "objectID": "omics/week-5/study_before_workshop.html#xenbase-1", + "href": "omics/week-5/study_before_workshop.html#xenbase-1", + "title": "Independent Study to prepare for workshop", + "section": "šŸø Xenbase", + "text": "šŸø Xenbase\n\n\nI got the information from the Xenbase information pages under Data Reports | Gene Information\nThis is listed: Xenbase Gene Product Information [readme] gzipped gpi (tab separated)\nClick on the readme link to see the file format and columns\nI downloaded xenbase.gpi.gz, unzipped it, removed header lines and the Xenopus tropicalis (taxon:8364) entries and saved it as xenbase_info.xlsx\nIn the workshop you will merge this information with the results file" + }, + { + "objectID": "omics/week-5/study_before_workshop.html#ensembl", + "href": "omics/week-5/study_before_workshop.html#ensembl", + "title": "Independent Study to prepare for workshop", + "section": "šŸ­ Ensembl", + "text": "šŸ­ Ensembl\nfrom the ncbi\nbiomart is a package that allows you to get information from the ncbi database such as gene names and descriptions" } ] \ No newline at end of file diff --git a/_site/site_libs/quarto-html/quarto-html.min.css b/_site/site_libs/quarto-html/quarto-html.min.css index 8b13789..c2857c3 100644 --- a/_site/site_libs/quarto-html/quarto-html.min.css +++ b/_site/site_libs/quarto-html/quarto-html.min.css @@ -1 +1 @@ - +/*# sourceMappingURL=0a6b880beb84f9b6f36107a76f82c5b1.css.map */ diff --git a/omics/week-5/overview.qmd b/omics/week-5/overview.qmd index 60313b4..872786e 100644 --- a/omics/week-5/overview.qmd +++ b/omics/week-5/overview.qmd @@ -5,7 +5,7 @@ toc: true toc-location: right --- -This week we cover how to visualise and interpret the results of your differential expression analysis. The independent study will allow you to check you have what you should have following the [Omics 2: Statistical Analysis workshop](../week-4/workshop.html) and [Consolidation study](../week-4/study_after_workshop.html). It will also summarise the the methods and plots we will go through in the workshop. In the workshop, we will learn how to conduct a Principle Component Analysis (PCA) and plot the results as well as how to create a nicely formatted Volcano plot and heatmap. We will also consider three factors that help us choose an interesting/important gene: the absolute expression, the fold change and the adjusted p-value. +This week we cover how to visualise and interpret the results of your differential expression analysis. The independent study will allow you to check you have what you should have following the [Omics 2: Statistical Analysis workshop](../week-4/workshop.html) and [Consolidation study](../week-4/study_after_workshop.html). It will also summarise the the methods and plots we will go through in the workshop. In the workshop, we will learn how to conduct a Principle Component Analysis (PCA) and plot the results as well as how to create a nicely formatted Volcano plot and heatmap. We suggest you sit together with your group in the workshop. diff --git a/omics/week-5/study_before_workshop.qmd b/omics/week-5/study_before_workshop.qmd index 9a84a68..3795d28 100644 --- a/omics/week-5/study_before_workshop.qmd +++ b/omics/week-5/study_before_workshop.qmd @@ -15,6 +15,13 @@ editor: wrap: 72 --- +```{r} +#| include: false +library(tidyverse) + +``` + + ## Overview In these slides we will: @@ -22,9 +29,11 @@ In these slides we will: ::: incremental - Check where you are -- learn some concepts +- learn some concepts used omics visualisation - - + - Principle Component Analysis (PCA) + - Volcano plots + - Heatmaps - Find out what packages to install before the workshop ::: @@ -34,13 +43,14 @@ In these slides we will: ## What we did in Omics 2: Statistical Analysis ::: incremental -::: {style="font-size: 90%;"} -- -- -- Saved files . -::: +- carried out differential expression analysis + +- found genes not expressed at all, or expressed in one group only + +- Saved results files + ::: ## Where should you be? @@ -56,13 +66,16 @@ Workshop](../week-4/workshop.html) including: ## šŸø Frogs -::: {style="font-size: 90%;"} +::: {style="font-size: 70%;"} + - An RStudio Project called `frogs-88H` which contains: - Raw data (S14, S20 and S30) - Processed data (`s30_filtered.csv`, `s30_summary_gene.csv`, `s30_summary_gene_filtered.csv`, `s30_summary_samp.csv` and equivalents for S14 *OR* S20) - - Two scripts called `cont-fgf-s30.R` and `cont-fgf-s20.R` *OR* + - Results files (`s30_fgf_only.csv`, `S30_normalised_counts.csv`, `S30_results.csv` and + equivalents for S14 *OR* S20) + - Two scripts called `cont-fgf-s30.R` and either `cont-fgf-s20.R` *OR* `cont-fgf-s14.R` ::: @@ -71,12 +84,18 @@ easy to read. ## šŸ­ Mice +::: {style="font-size: 70%;"} + - An RStudio Project called `mice-88H` which contains - Raw data (hspc, prog, lthsc) - Processed data (`hspc_summary_gene.csv`, `hspc_summary_samp.csv`, `prog_summary_gene.csv`, - `prog_summary_samp.csv`) -- One script called `hspc-prog.R` + `prog_summary_samp.csv`, `lthsc_summary_gene.csv`, + `lthsc_summary_samp.csv`) +- Results files (`prog_hspc_results.csv` and an equivalent for lthsc vs prog or hspc vs lthsc) +- Two scripts called `hspc-prog.R` and either `hspc-lthsc.R` *OR* + `prog-lthsc.R` +::: Files should be organised into folders. Code should well commented and easy to read. @@ -101,33 +120,98 @@ Go through: ## Examine the results files +Remind yourself of the key columns you have in the results files: + +- a fold change, logged to base 2 +- an unadjusted p-value +- a p value adjusted for multiple testing (`FDR` or `padj`) +- a gene id + + +## šŸø Frogs + +```{r} +#| echo: false +read_csv("results/s30_results.csv") |> glimpse() + +``` +- `baseMean` is the mean of the normalised counts for the gene across + all samples +- `lfcSE` standard error of the fold change +- `stat` is the test statistic (the Wald statistic) + + + + +## šŸ­ Mice +```{r} +#| echo: false +read_csv("results/prog_hspc_results.csv") |> glimpse() + + +``` + +- Top is the rank of the gene ordered by the p value (smallest first) + + # Adding gene information -## from xenbase +## Adding gene information + +::: incremental + +- The gene id is difficult to interpret in plots/tables + +- Therefore we need to add information such as the gene name and a description to the results +- For the šŸø Frog data information comes from xenbase -![xenbase logo](images/Xenbase-Logo-Medium.png){width="700"} +- For the šŸ­ Mice data information comes from Ensembl + +::: +## šŸø Xenbase -Xenbase (http://www.xenbase.org/, RRID:SCR_003280) -Xenbase is a model organism database that provides genomic, molecular, -and developmental biology information about Xenopus laevis and Xenopus -tropicalis. Xenbase is funded by the National Institutes of Health -(NIH) and the National Science Foundation (NSF). +![xenbase logo](images/Xenbase-Logo-Medium.png){width="800"} -our data gives the xenbase gene id so we are using xenbase to get the information -a lot of the information would also be in the ncbi -## from the ncbi +[Xenbase](http://www.xenbase.org/) is a model organism database that provides genomic, molecular, and developmental biology information about *Xenopus laevis* and *Xenopus tropicalis*. -biomart is a package that allows you to get information from the ncbi -database such as gene names and descriptions +. . . +It took me some time to find the information you need. +## šŸø Xenbase + +::: incremental + +- I got the information from the [Xenbase information pages](https://www.xenbase.org/xenbase/static-xenbase/ftpDatafiles.jsp) under Data Reports | Gene Information + +- This is listed: Xenbase Gene Product Information [readme] [gzipped gpi (tab separated)](https://download.xenbase.org/xenbase/GenePageReports/xenbase.gpi.gz) + +- Click on the readme link to see the file format and columns + +- I downloaded [xenbase.gpi.gz](https://download.xenbase.org/xenbase/GenePageReports/xenbase.gpi.gz), unzipped it, removed header lines and the *Xenopus tropicalis* (taxon:8364) entries and saved it as [xenbase_info.xlsx](meta/xenbase_info.xlsx) + +- In the workshop you will merge this information with the results file +::: + +## šŸ­ Ensembl + +::: incremental + +- [Ensembl](https://www.ensembl.org/index.html) creates, integrates and distributes reference datasets and analysis tools that enable genomics + +- [BioMart](https://grch37.ensembl.org/info/data/biomart/index.html) provides a access to these large datasets + +- **`biomaRt`** is a Bioconductor package gives you programmatic access to BioMart. + +- In the workshop you use this package to get information you can merge with the results file +::: diff --git a/omics/week-5/workshop.qmd b/omics/week-5/workshop.qmd index 21bd7c3..f1bc095 100644 --- a/omics/week-5/workshop.qmd +++ b/omics/week-5/workshop.qmd @@ -138,7 +138,7 @@ If you click on the readme link you can see information telling you that the fil šŸŽ¬ ...... -```{bash} +```bash gunzip xenbase.gpi.gz less xenbase.gpi q