A GenBank data mining program for (mostly fungal) taxonomists
GenMine downloads GenBank nucleotide records GenMine filters downloaded data with frequently used genes in taxonomy.
Citation: Chang Wan Seo, Sung Hyun Kim, Young Woon Lim & Myung Soo Park (2022) Re-Identification on Korean Penicillium Sequences in GenBank Collected by Software GenMine, Mycobiology, DOI: 10.1080/12298093.2022.2116816
https://www.tandfonline.com/doi/full/10.1080/12298093.2022.2116816
- pip
pip install GenMine
- conda
conda install -c cwseo GenMine
- Download all Penicillium records
GenMine -e [email protected] -g Penicillium
- Download all Penicillium records and then filter records with term "Korea"
GenMine -e [email protected] -g Penicillium -a Korea
- Download data accession numbers
GenMine -e [email protected] -c ON417149.1 ON417150.1
- Download records of multiple genera
GenMine -e [email protected] -g Penicillium Trichoderma Alternaria
- Download records of multiple genera given by file
GenMine -e [email protected] -g genera.txt
"genera.txt" should be like this
Penicillium
Trichoderma
Alternaria
- Download records of multiple accession given by file
GenMine -e [email protected] -c accessions.txt
"accessions.txt" should be like this
ON417149.1
ON417150.1
MW554209.1
OK643788.1
- Continue download from interrupted run (only for accessions, for genus, it will automatically solve if you launch GenMine in same location)
GenMine -e [email protected] -c accessions.txt -o "2022-11-02-00-12-08"
# Caution 1: -o should be name of previous run result directory
# Caution 2: will not work for finished run
- Basic Parameters
--genus, -g : List of genus to find | File with genera in each line
--accession, -c : List of accessions to get | File with accessions in each line
--email, -e : your email for NCBI access
- Optional Parameters
--additional, -a : additional terms (ex. country name) to filter
--max, -m : maximum length of the sequence to parse (default: 5000)
WIP
GenMine is a python program that parses records from GenBank and sort by gene names, based on Entrez library. Comparing to Entrez, GenMiner has some advantages and disadvantages
- GenMine doesn't misses records, especially with multiple terms
- GenMine can download discontinuously, especially useful in low internet condition
- GenMine classifies downloaded records by gene types (ITS, LSU, SSU, BenA etc...)
- If you want more gene types, issue it!
- We are currently working on better gene annotations
- Slower than Entrez (sometimes a lot), due to completeness and stability
- Bug reports and suggestions are available in Github Issues or directly to [email protected]
- However, we want GenMine to remain as small tool. For suggestions little bit too much for the purpose of GenMine might be accepted in our upcomming softwares