How can I use metaeuk to annotation genome without reference #46

Nana7m1 · 2022-04-30T04:27:31Z

Dear developer and other users,
As the title says, I wanna use metaeuk to annotation genome without reference. But I cannot find how to deal with it in manual.

Best
Nana7m1

elileka · 2022-05-01T16:31:01Z

Hello,

The way to do it is to download or construct a reference database to run against. What do you know about your genome? What taxonomic group is it? I could try to provide further advice based on your answer :)
Once you have the reference database at hand, you could use easy-predict to find similar genes in your input genome.

Best,
Eli

tiantianlili · 2023-12-21T07:02:35Z

Hello,

The way to do it is to download or construct a reference database to run against. What do you know about your genome? What taxonomic group is it? I could try to provide further advice based on your answer :) Once you have the reference database at hand, you could use easy-predict to find similar genes in your input genome.

Best, Eli

Hello, thank you for developing this software. I would like to follow up this question. I obtained contigs with a length greater than 1kbp from the metagenome data of soil contaminated with heavy metals. I noticed that there are many reference datasets of mmseqs recommended by you, some of which are nucleic acid databases (https://github.com/soedinglab/MMseqs2/wiki#downloading-databases). May I ask which database is the most suitable for me (SILVA )?

elileka · 2023-12-23T02:44:45Z

Hi,

As a reference DB MetaEuk takes in either protein or protein profiles. Therefore the nucleotide DBs available thorough the databases command, including SILVA, are not relevant.

Choosing the right protein/protein profile DB depends on your scientific goal. Here are two ideas I have, based on the details you provided:

UniRef50 can be a good start to find homologs for proteins, which mostly were not discovered through metagenomic experiments. This DB can be downloaded thorough the databases command and it has taxonomic and other info, which can be used to annotate your sample.
If you are mainly interested in discovering homologs of rare, environmental proteins and less in annotation, you can download one of these DBs. Specifically, SRC (soil) and BFD seem most suitable for your sample. However, note that (1) environmental DBs like these are generally not annotated and that (2) these DBs are large: 200-300 Gb, which means higher requirements (storage, runtime, etc.) so I would first test on smaller scales.

You can also have a look at Busco if you are interested in estimating the geneomic completeness of specific organisms via single-copy marker genes of various phylogenetic groups. BUSCO uses MetaEuk internally.

Best,
Eli

tiantianlili · 2023-12-23T03:09:14Z

Hi,

As a reference DB MetaEuk takes in either protein or protein profiles. Therefore the nucleotide DBs available thorough the command, including SILVA, are not relevant.databases

Choosing the right protein/protein profile DB depends on your scientific goal. Here are two ideas I have, based on the details you provided:

UniRef50 can be a good start to find homologs for proteins, which mostly were not discovered through metagenomic experiments. This DB can be downloaded thorough the command and it has taxonomic and other info, which can be used to annotate your sample.databases

If you are mainly interested in discovering homologs of rare, environmental proteins and less in annotation, you can download one of these DBs. Specifically, SRC (soil) and BFD seem most suitable for your sample. However, note that (1) environmental DBs like these are generally not annotated and that (2) these DBs are large: 200-300 Gb, which means higher requirements (storage, runtime, etc.) so I would first test on smaller scales.

You can also have a look at Busco if you are interested in estimating the geneomic completeness of specific organisms via single-copy marker genes of various phylogenetic groups. BUSCO uses MetaEuk internally.

Best, Eli

Thank you very much for your detailed reply. I'll try the UniRef50 and SRC databases first, hopefully with good results.

Best
li tian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I use metaeuk to annotation genome without reference #46

How can I use metaeuk to annotation genome without reference #46

Nana7m1 commented Apr 30, 2022

elileka commented May 1, 2022

tiantianlili commented Dec 21, 2023

elileka commented Dec 23, 2023

tiantianlili commented Dec 23, 2023

How can I use metaeuk to annotation genome without reference #46

How can I use metaeuk to annotation genome without reference #46

Comments

Nana7m1 commented Apr 30, 2022

elileka commented May 1, 2022

tiantianlili commented Dec 21, 2023

elileka commented Dec 23, 2023

tiantianlili commented Dec 23, 2023