Why are genomes hard to assemble?
-
Biological
(Very) High ploidy, heterozygosity, repeat content
-
Sequencing
(Very) large genomes, imperfect sequencing
-
Computational
(Very) Large genomes, complex structure
-
Accuracy
(Very) Hard to assess correctness
Detangle graph with long reads, mates, and other links
Goal of WGA
For two genomes, A and B, find a mapping from each position in A to its corresponding position in B
Not so fast...
Genome A may have insertions, deletions, translocations, inversions, duplications or SNPs with respect to B (sometimes all of the above)
WGA visualization
-
Moleculo Sequencing
Clever library preparation technique to turn a short read sequencer into a quazi-long read sequencer
-
PacBio SMRT Sequencing
Imaging of fluorescently phospholinked labeled nucleotides as they are incorporated by a polymerase anchored to a Zero-Mode Waveguide (ZMW)
-
Oxford Nanopore MinION
SK-BR-3: Most commonly used Her2-amplified breast cancer cell line
❓ Can we resolve the complex structural varia4ons, especially around Her2?Improving SMRTcell Performance
PacBio and Illumina coverage values are highly correlated but Illumina shows greater variance because of poorly mapping reads
Confirmed both known gene fusions in this region
Joint coverage and breakpoint analysis to discover underlying events
Cancer lesion Reconstruction