GenMine

A GenBank data mining program for (mostly fungal) taxonomists

GenMine downloads GenBank nucleotide records GenMine filters downloaded data with frequently used genes in taxonomy.

Citation: Chang Wan Seo, Sung Hyun Kim, Young Woon Lim & Myung Soo Park (2022) Re-Identification on Korean Penicillium Sequences in GenBank Collected by Software GenMine, Mycobiology, DOI: 10.1080/12298093.2022.2116816

https://www.tandfonline.com/doi/full/10.1080/12298093.2022.2116816

Installation

pip

pip install GenMine

conda

conda install -c cwseo GenMine

Usage

Basic usage

Download all Penicillium records

GenMine -e [email protected] -g Penicillium

Download all Penicillium records and then filter records with term "Korea"

GenMine -e [email protected] -g Penicillium -a Korea

Download data accession numbers

GenMine -e [email protected] -c ON417149.1 ON417150.1

Advanced usage

Download records of multiple genera

GenMine -e [email protected] -g Penicillium Trichoderma Alternaria

Download records of multiple genera given by file

GenMine -e [email protected] -g genera.txt

"genera.txt" should be like this

Penicillium
Trichoderma
Alternaria

Download records of multiple accession given by file

GenMine -e [email protected] -c accessions.txt

"accessions.txt" should be like this

ON417149.1
ON417150.1
MW554209.1
OK643788.1

Continue download from interrupted run (only for accessions, for genus, it will automatically solve if you launch GenMine in same location)

GenMine -e [email protected] -c accessions.txt -o "2022-11-02-00-12-08"
# Caution 1: -o should be name of previous run result directory
# Caution 2: will not work for finished run

Arguments

Basic Parameters

--genus, -g : List of genus to find | File with genera in each line
--accession, -c : List of accessions to get | File with accessions in each line
--email, -e : your email for NCBI access

Optional Parameters

--additional, -a : additional terms (ex. country name) to filter 
--max, -m : maximum length of the sequence to parse (default: 5000)

Output explanations

Main output

WIP

Features

GenMine is a python program that parses records from GenBank and sort by gene names, based on Entrez library. Comparing to Entrez, GenMiner has some advantages and disadvantages

Advantages

GenMine doesn't misses records, especially with multiple terms
GenMine can download discontinuously, especially useful in low internet condition
GenMine classifies downloaded records by gene types (ITS, LSU, SSU, BenA etc...)

If you want more gene types, issue it!
We are currently working on better gene annotations

Limitations

Slower than Entrez (sometimes a lot), due to completeness and stability

Bug reports and Suggestions

Bug reports and suggestions are available in Github Issues or directly to [email protected]
However, we want GenMine to remain as small tool. For suggestions little bit too much for the purpose of GenMine might be accepted in our upcomming softwares

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
GenMine		GenMine
conda		conda
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
GenMine.yaml		GenMine.yaml
GenMine_Windows.yaml		GenMine_Windows.yaml
LICENSE		LICENSE
README.md		README.md
bld.bat		bld.bat
build.sh		build.sh
build_manual.md		build_manual.md
setup.py		setup.py
test.yaml		test.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenMine

Installation

Usage

Basic usage

Advanced usage

Arguments

Output explanations

Main output

Features

Advantages

Limitations

Bug reports and Suggestions

About

Releases 1

Packages

Languages

License

Changwanseo/GenMine

Folders and files

Latest commit

History

Repository files navigation

GenMine

Installation

Usage

Basic usage

Advanced usage

Arguments

Output explanations

Main output

Features

Advantages

Limitations

Bug reports and Suggestions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages