DataViz Projects on topic of SARS-CoV-2 genomic sequencing, featuring search/matching of mutations (AA Substitutions).
Filter recent gisaid data by Continent/Country and by Lineage, date range etc.
Select any mutation from the AA Substitutions to chart the frequency of sequences with that mutation by country, over time. Top 5 countries (by sample volume) shown.
The note at the top-right and grey chart across the bottom show the sample size for all Lineages.
Filter recent gisaid data by Continent/Country/Location and by Lineage, date range etc.
Select any mutation from the AA Substitutions to chart the frequency of sequences with that mutation by location (state/province/region etc), over time. Top 5 locations (by sample volume) shown.
The note at the top-right and grey chart across the bottom show the sample size for all Lineages.
Filter recent gisaid data by Continent/Country/Location and by Lineage, date range etc.
Select any mutation from the AA Substitutions to chart the growth of those samples against a "L2" Lineage e.g. BA.2.*.
The note at the top-right and grey chart across the bottom show the sample size for all Lineages.
Filter recent gisaid data by Continent/Country/Location and by Lineage, date range etc.
Select any mutation from the AA Substitutions to chart the growth of those samples against a Lineage e.g. BA.5.2.
The note at the top-right and grey chart across the bottom show the sample size for all Lineages.
The Lineage growth comparison (log) page was suggested by Uffe Poulsen, based to a chart produced by Alex Selby.
Filter recent gisaid data by Continent/Country/Location and/or by Lineage etc. Select a required AA Substition which will apply as a filter.
Then scroll down the table summarising the mutations found, drilling in to any rows of interest. The rows can be drilled by AA Prefix (e.g. "Spike"), AA Substitution e.g. Spike R346T, Country and Location. On each row, the count of matching samples is shown, along with a sparkline by Collection date.
Select any row to filter the detailed table at the right.
Visualise the geographical spread of selected mutations. Use the play control at bottom for an animated view of the spread. Note the animation speed is slow (5 secs per date) to allow for calculation time. Record and speed up before presenting.
Locations are approximate - typically by reporting state/province or country. Bubble sizes are driven by the % of the total set of samples selected.
Track the weekly progress of selected mutations for any combination of Continents and Countries. Shows the counts of that lineage vs the overall total, by week collected, also as a %.
Summary
From gisaid.org we gather their EpiCoV metadata dataset. For most countries, this dataset is the most complete and up-to-date available. This can be added to with gisaid search results (e.g. for the most recent samples), downloaded in "Patient status metadata" format.
The AA Substitution values for each gisaid sample can be used as a filter, either independently or together with GISAID's PANGO lineages. Their frequencies and trends over time can be compared on the frequency page.
From nextclade we classify the gisaid samples to obtain the nextclade pango lineage, plus the AA & nucleotide insertions, deletions & substitutions, frameshifts and reversions (using the nextclade cli tool). These attributes can be used as a filter for any analysis. Their frequencies and trends over time can be compared on the frequency page.
I'm mainly following the visualisation style I first saw presented by Trevor Bedford. The main feature are clean, simple line charts, filtered by default to the top 5 countries/locations in the selected data. For each chart point, the frequency of that lineage + mutation in the last 7 days is calculated, always comparing to all the sequencing data available for that country/location.
Other visualisations include interactive drill-down tables to compare mutations, and an interactive map that can be "played" over time - to help understand the geographic spread of any sub-set of samples.
In this project, the data is presented in an interactive data visualisation tool: Power BI. This allows interactive filtering of the data and presentation in multiple formats, for faster easier analysis.
GISAID citation: Elbe, S., and Buckland-Merrett, G. (2017) Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges, 1:33-46. DOI:10.1002/gch2.1018 PMCID: 31565258
Contributions, issues, feature requests and sponsorship are all welcome!
Give a ⭐️ if you like this project!