Error running Phylophlan #88

rybioinf · 2022-04-16T23:05:49Z

Dear Phylophlan developers,

I am running into an issue running PhyloPhlAn. It gives me an error code "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1035: invalid start byte" which I am not sure how to troubleshoot. I installed and updated Phylophlan via bioconda. Any advice is greatly appreciated.

Here is the code/ output:

(phylophlan) @macbook-pro Documents % phylophlan -I Reference_Genomes_2 -d phylophlan_databases --diversity low -f phylophlan_configs/supermatrix_nt.cfg
Traceback (most recent call last):
File "/Users/ryan/miniconda3/envs/phylophlan/bin/phylophlan", line 10, in
sys.exit(phylophlan_main())
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/site-packages/phylophlan/phylophlan.py", line 3226, in phylophlan_main
db_type, db_dna, db_aa = init_database(args.database, args.databases_folder, args.db_type, configs, 'db_dna', 'db_aa',
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/site-packages/phylophlan/phylophlan.py", line 817, in init_database
d = Counter([len(set(seq))
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/site-packages/phylophlan/phylophlan.py", line 817, in
d = Counter([len(set(seq))
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/site-packages/Bio/SeqIO/FastaIO.py", line 47, in SimpleFastaParser
for line in handle:
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1035: invalid start byte

fasnicar · 2022-04-19T20:50:25Z

Hi, thanks for reporting this.

I think here the database is not properly set. There is no database named phylophlan_database, that would be the default folder where all available databases are stored. So, my guess is that you would like to use the phylophlan database with the -d param.
If that's the case, there is a second issue, you specified the default config file for a nucleotide database (-f phylophlan_configs/supermatrix_nt.cfg). The phylophlan database is a set of proteins so you would need to use a config file for a database of amino acids (like the default supermatrix_aa.cfg).

If you should run into other issues, please also use the --verbose param that allows for a more context to debug the problems.

Many thanks,
Francesco

rybioinf · 2022-04-23T21:06:40Z

Hi Francesco, Thank you for the response. I am a bit confused now as to what my input files should be. If I use the phylophlan database and the supermatrix_aa.cfg configuration file, would it be inappropriate to use nt fasta files as my input genomes? I appreciate your help. Best, Ryan

…

On Apr 19, 2022, at 1:50 PM, Francesco Asnicar ***@***.***> wrote: Hi, thanks for reporting this. I think here the database is not properly set. There is no database named phylophlan_database, that would be the default folder where all available databases are stored. So, my guess is that you would like to use the phylophlan database with the -d param. If that's the case, there is a second issue, you specified the default config file for a nucleotide database (-f phylophlan_configs/supermatrix_nt.cfg). The phylophlan database is a set of proteins so you would need to use a config file for a database of amino acids (like the default supermatrix_aa.cfg). If you should run into other issues, please also use the --verbose param that allows for a more context to debug the problems. Many thanks, Francesco — Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_biobakery_phylophlan_issues_88-23issuecomment-2D1103149322&d=DwMCaQ&c=-35OiAkTchMrZOngvJPOeA&r=JoFOMK8hMN06zsp5kzDcqg&m=JBHauEUgWBCGQyrLqzBrx-aKSvRIJDI_jzWQ3y44_jRBEVAspg1l6GtfgLOWbux5&s=7zmrQW6KUh_dFojrbnsGIaKkmSPYaKQTCCmm9gDhSfw&e=>, or unsubscribe <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AW3SDKZK5DSU5TFF66BIRNTVF4MBZANCNFSM5TS7WOOQ&d=DwMCaQ&c=-35OiAkTchMrZOngvJPOeA&r=JoFOMK8hMN06zsp5kzDcqg&m=JBHauEUgWBCGQyrLqzBrx-aKSvRIJDI_jzWQ3y44_jRBEVAspg1l6GtfgLOWbux5&s=o42p-KJ0Q16wzulg_faWiffAM6ZZvvMU7tC-B_IFLRU&e=>. You are receiving this because you authored the thread.

fasnicar · 2022-04-28T21:28:41Z

Hi Ryan, no, using genomes as input will be totally fine. PhyloPhlAn with a database of proteins will be able to map them using a translated search (will be the [map_dna] section in the config file), this won't be true vice versa, proteomes as input and a database of genes.

Please, let me know if something is still not clear.

fasnicar self-assigned this Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running Phylophlan #88

Error running Phylophlan #88

rybioinf commented Apr 16, 2022

fasnicar commented Apr 19, 2022

rybioinf commented Apr 23, 2022 via email

fasnicar commented Apr 28, 2022

Error running Phylophlan #88

Error running Phylophlan #88

Comments

rybioinf commented Apr 16, 2022

fasnicar commented Apr 19, 2022

rybioinf commented Apr 23, 2022 via email

fasnicar commented Apr 28, 2022