Skip to content

n8upham/mammalGenomesNCBI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Perspective in Science (2023): "Genomics expands the mammalverse"

Nathan S. Upham and Michael J. Landis

DOI

Code for parsing NCBI mammal genome metadata

Download from NCBI Genome on 9 Feb 2023

Query: https://www.ncbi.nlm.nih.gov/data-hub/genome/?taxon=40674

  • 2800 total genomes (= 264 RefSeq + 2,536 other assemblies)

  • 754 unique taxa (includes species, subspecies, hybrids, no epithet)

    • 168 chromoLevel unique taxa
    • 489 scaffoldLevel unique taxa
    • 254 contigLevel unique taxa
  • 675 species after aligning with the mammal tree taxonomy and collapsing redundancy

    • 76 chromoLevel species (= 68 + 8 more from the 'redundantOrNotUsing' category)
    • 414 scaffoldLevel species (= 418 - 4 from above)
    • 185 contigLevel species (= 189 - 4 from above)

Summary of taxonomic sampling of genome species (taxonomy of mammal tree)

Higher taxon GENOME SP TOTAL SP % SP
Afrotheria 12 92 13%
Euarchontoglires 208 2963 7%
Laurasiatheria 277 2456 11%
Marsupialia 168 ** 362 46%
Monotremata 2 5 40%
Xenarthra 8 33 24%

** There was only 7 marsupial genomes before 2021; then 11 + 148 + 2 added in 2021 (= 161 genomes) for the current total of 168

Order GENOME SP TOTAL SP % SP
AFROSORICIDA 3 55 5%
ARTIODACTYLA 128 348 37%
CARNIVORA 74 298 25%
CHIROPTERA 49 1287 4%
CINGULATA 3 21 14%
DASYUROMORPHIA 63 78 81%
DERMOPTERA 2 2 100%
DIDELPHIMORPHIA 3 106 3%
DIPROTODONTIA 85 146 58%
EULIPOTYPHLA 12 491 2%
HYRACOIDEA 2 5 40%
LAGOMORPHA 6 91 7%
MACROSCELIDEA 1 19 5%
MICROBIOTHERIA 1 1 100%
MONOTREMATA 2 5 40%
OTORYCTEMORPHIA 2 2 100%
PAUCITUBERCULATA 0 ** 7 0%
PERAMELEMORPHIA 14 22 64%
PERISSODACTYLA 10 24 42%
PHOLIDOTA 4 8 50%
PILOSA 5 12 42%
PRIMATES 83 458 18%
PROBOSCIDEA 2 7 29%
RODENTIA 115 2392 5%
SCANDENTIA 2 20 10%
SIRENIA 3 5 60%
TUBULIDENTATA 1 1 100%

** Paucituberculata is the only extant mammal order yet without a genome! The extant marsupial family Caenolestidae (shrew-opossums) is represented by 7 species in the Andes mountains of South America.

Summary of genomes by attributes

variable med low95 up95 low50 up50
Body mass (all) 0.08 0.00 180.39 0.02 0.63
Body mass (genome) 2.07 0.01 2392.26 0.08 22.42
Latitude (all) 15.35 0.52 52.44 5.87 28.06
Latitude (genome) 22.30 0.64 58.69 8.39 34.90
  • Wilcoxon rank sum tests with continuity correction:
    • Body mass: W = 1150239, p-value < 2.2e-16
    • Latitude: W = 1404678, p-value = 3.09e-14

Summary of the Zoonomia 241-way alignment

  • 239 species (241 taxa) in the Zoonomia Consortium alignment
    • "A total of 242 genome assemblies, representing 240 species, are included in the Zoonomia Cactus alignment. We included all non-redundant, high-quality assemblies posted on NCBI for >6 months as of March 3, 2018, or for a shorter time if an associated publication was available. One species (dog) is represented by two genomes. Due to a technical error, one genome available on NCBI (Tarsius_syrichta-2.0.1) was not included in this initial alignment, and the genome for Dipodomys stephensi was represented twice."
    • So, 241 species in the alignment V2 on UCSC: https://cglgenomics.ucsc.edu/data/cactus/
    • Of these, 2 are redundant at the species level (Canis lupus dingo, Ceratotherium simum cottoni) == 239 species
      • 121 species were newly generated by Zoonomia, published in 2019
      • 118 species were aligned from RefSeq

All plotted.

alt text

About

Code for parsing NCBI mammal genome metadata

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages