Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running Phylophlan #88

Open
rybioinf opened this issue Apr 16, 2022 · 3 comments
Open

Error running Phylophlan #88

rybioinf opened this issue Apr 16, 2022 · 3 comments
Assignees

Comments

@rybioinf
Copy link

Dear Phylophlan developers,

I am running into an issue running PhyloPhlAn. It gives me an error code "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1035: invalid start byte" which I am not sure how to troubleshoot. I installed and updated Phylophlan via bioconda. Any advice is greatly appreciated.

Here is the code/ output:

(phylophlan) @macbook-pro Documents % phylophlan -I Reference_Genomes_2 -d phylophlan_databases --diversity low -f phylophlan_configs/supermatrix_nt.cfg
Traceback (most recent call last):
File "/Users/ryan/miniconda3/envs/phylophlan/bin/phylophlan", line 10, in
sys.exit(phylophlan_main())
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/site-packages/phylophlan/phylophlan.py", line 3226, in phylophlan_main
db_type, db_dna, db_aa = init_database(args.database, args.databases_folder, args.db_type, configs, 'db_dna', 'db_aa',
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/site-packages/phylophlan/phylophlan.py", line 817, in init_database
d = Counter([len(set(seq))
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/site-packages/phylophlan/phylophlan.py", line 817, in
d = Counter([len(set(seq))
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/site-packages/Bio/SeqIO/FastaIO.py", line 47, in SimpleFastaParser
for line in handle:
File "/Users/ryan/miniconda3/envs/phylophlan/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1035: invalid start byte

@fasnicar fasnicar self-assigned this Apr 19, 2022
@fasnicar
Copy link
Collaborator

Hi, thanks for reporting this.

I think here the database is not properly set. There is no database named phylophlan_database, that would be the default folder where all available databases are stored. So, my guess is that you would like to use the phylophlan database with the -d param.
If that's the case, there is a second issue, you specified the default config file for a nucleotide database (-f phylophlan_configs/supermatrix_nt.cfg). The phylophlan database is a set of proteins so you would need to use a config file for a database of amino acids (like the default supermatrix_aa.cfg).

If you should run into other issues, please also use the --verbose param that allows for a more context to debug the problems.

Many thanks,
Francesco

@rybioinf
Copy link
Author

rybioinf commented Apr 23, 2022 via email

@fasnicar
Copy link
Collaborator

Hi Ryan, no, using genomes as input will be totally fine. PhyloPhlAn with a database of proteins will be able to map them using a translated search (will be the [map_dna] section in the config file), this won't be true vice versa, proteomes as input and a database of genes.

Please, let me know if something is still not clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants