-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture uploaded allele correctly for VCF input #1744
base: main
Are you sure you want to change the base?
Capture uploaded allele correctly for VCF input #1744
Conversation
7dbea87
to
df56c61
Compare
# Updating a flag to minimise multi-allelic variants in split_variants/rejoin_variants | ||
$vf->{minimised} = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-allelic is not getting minimised for default format. For example - 1 961320 961324 GCAGG/GCA/GCAG +
But in the output still getting MINIMISED=1
, (without the PR they are also not minimised but there is no MINIMISED=1
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nakib103 , can you please test this example with the latest commit. The allele is expected to be similar to when running --minimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the allele are same when running with or without --minimal
.
But the original problem remains. The output says it is minimised but when it is not -
1_961320_GCAGG/GCA/GCAG 1:961320-961324 - ENSG00000188976 ENST00000327044 Transcript upstream_gene_variant - -UPLOADED_ALLELE=GCAGG/GCA/GCAG;IMPACT=MODIFIER;DISTANCE=2067;STRAND=-1;MINIMISED=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nakib103 , the output is still minimised if you notice the 3rd column (correct me if I got it wrong). We have the allele "-" as the minimised representation for first alternative allele is GG/- and for the second one is G/-. The problem however is that there is no way to differentiate between the alternative alleles as both show "-". Ideally minimal representation should be GG/-/G
. This is also an existing problem with the option --minimal
and needs to be addressed probably in a future ticket. Please let me know if this makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually looking at the Uploaded_variation
column, it does not seem to be minimised as it does for bi-allelic variants (see - 1 961320 961324 GCAGG/GCA +
).
But the Allele
column shows the alleles are minimised. It seems Uploaded_variation string has different logic, should we address this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree @nakib103. The uploaded variation was not expected to minimise alleles but it was minimising the alleles in some cases due to the way we populate the column and it was difficult to correct this as it is also the default option, hence the approach was taken to capture original allele with a different flag (--uploaded_allele
). However, I agree it may also be worth investigating why uploaded variation is minimising the allele in some cases and not doing it in other cases.
The |
When the input is in VCF format, example: In the vep default output, the |
Hi @dglemos , yes this is right. The idea is to use |
Yes, UPLOADED_ALLELE is enough to match the alleles. It should be returned by default though, if the variant is minimised it should return the original alleles too. |
I think it is generally advisable to not update the default output list so not to break existing pipelines. |
Ticket: ENSVAR-5858
To test:
--allele_number
for multi-allelic examples