Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence identifier warnings from get-ncbi-data. #158

Open
mikerobeson opened this issue May 10, 2023 · 0 comments
Open

Sequence identifier warnings from get-ncbi-data. #158

mikerobeson opened this issue May 10, 2023 · 0 comments
Assignees

Comments

@mikerobeson
Copy link
Collaborator

mikerobeson commented May 10, 2023

A user initially reported this issue when running the following command:

qiime rescript get-ncbi-data   \
    --p-query "txid4751[ORGN] AND (ITS1 OR ITS2 OR its1 OR its2) NOT environmental sample[Filter] NOT environmental samples[Filter] NOT environmental[Title] NOT uncultured[Title] NOT unclassified[Title] NOT unidentified[Title] NOT unverified[Title]" \
    --p-ranks kingdom phylum class order family genus species \
    --p-rank-propagation \
    --p-n-jobs 4 \
    --o-sequences ITS-ref-seqs-ng.qza \
    --o-taxonomy ITS-ref-tax-ng.qza \
    --verbose

Which resulted in the following errors:

WARNING:2023-05-10 08:31:04,095:MainProcess:Using pdb|8E5T|3 as a sequence identifier, because it did not come down with an accession version.
WARNING:2023-05-10 08:31:04,096:MainProcess:Using pdb|7V08|6 as a sequence identifier, because it did not come down with an accession version.
WARNING:2023-05-10 08:31:04,096:MainProcess:Using pdb|7UQZ|6 as a sequence identifier, because it did not come down with an accession version.
WARNING:2023-05-10 08:31:04,096:MainProcess:Using pdb|7UQB|6 as a sequence identifier, because it did not come down with an accession version.
...

I was able to reproduce the issue. I exported the resulting FASTA file and did observe sequences with headers like those shown above. I also manually ran BLAST on a few of the sequences, they did appear to contain ITS sequences, though I've not tested thoroughly. I am not sure why pdb identifiers are used, when the returned data might actually contain the requested ITS DNA sequences.

The warning message comes from specifically these lines from ncbi.py.

Probably not really a true issue, but it can be difficult to trace back the origin of these data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants