Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frameshift correction #227

Open
hoelzer opened this issue Apr 6, 2022 · 0 comments
Open

Frameshift correction #227

hoelzer opened this issue Apr 6, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@hoelzer
Copy link
Collaborator

hoelzer commented Apr 6, 2022

It happens quite frequently that FSs are introduced in consensus sequences. In almost all cases these are errors.

Suggestion:

We could integrate a new tool proovframe to correct FS based on aligning reference protein sequences to the consensuses.

I just tried this yet with a single example sequence so this would need more proper benchmarking:

Top: original sequence w/ FS from poreCov
Middle: sequence after proovframe correction w/ all SC2 proteins as reference. However, this introduces another error in ORF1a likely due to the polyprotein structure of ORF1ab!
Bottom: Thus, I removed the protein sequence of the polyprotein from the reference FASTA and this seems to work. Sequence fixed

image

Reference protein FASTA used w/o the ORF1ab polyprotein:
GCF_009858895.2_ASM985889v3_protein_noORF1ab.faa.zip

Commands:

# map proteins to reads
proovframe/bin/proovframe map -a GCF_009858895.2_ASM985889v3_protein_noORF1ab.faa -o raw-seqs.tsv sample.consensus.fasta

# fix frameshifts in reads
proovframe/bin/proovframe fix -o corrected.fasta sample.consensus.fasta raw-seqs.tsv

However: I would suggest then providing these fs-corrected consensus sequences in addition to the default consensus sequences. It would need proper benchmarking to figure out if these corrections do not introduce any other potential errors for SARS-CoV-2 sequences.

@hoelzer hoelzer added the enhancement New feature or request label Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants