Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting confidence intervals #383

Open
weewus opened this issue May 24, 2024 · 8 comments
Open

Interpreting confidence intervals #383

weewus opened this issue May 24, 2024 · 8 comments

Comments

@weewus
Copy link

weewus commented May 24, 2024

I'm looking at delly vcf outputs and trying to compare them to other callers.
What distributions, % confidence is used for CIPOS and CIEND? Is this user defined?

@tobiasrausch
Copy link
Member

This is based on the paired-end distributions and how many pairs support an SV and whether the SV is supported by split-reads.

@weewus
Copy link
Author

weewus commented Jun 7, 2024

Firstly, thanks for getting back so quickly :)

This helps in understanding what supports CIPOS and CIEND, but it doesn't quite tell me what they mean.

The use case is understanding overlaps between variants. If two variants don't overlap when not considering CIPOS and CIEND, but overlap when considering CIPOS and CIEND for each - how should this be interpreted?

My impression was that regardless of how CIPOS and CIEND are calculated, they would give information about the % significance for POS and END locations for some probability distribution.

I'm not really sure where to start if CIPOS and CIEND don't give some % significance for POS and END locations.

@zhangshouwei309194
Copy link

This is based on the paired-end distributions and how many pairs support an SV and whether the SV is supported by split-reads.

Dear author:
When I used delly for SV detecting,i found a questions that maked me confused. For example: SR indicates the number of split reads is 39, but RV (the number of junction reads) is 0. For junction reads and split reads, it should be the same notation. Is there any reasons in genotyping?
8 121299546 BND00008260 A [10:123241058[A 2340 PASS PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv1.1.6;END=121299547;CHR2=10;POS2=123241058;PE=0;MAPQ=0;CT=5to5;CIPOS=-5,5;CIEND=-5,5;SRMAPQ=60;INSLEN=0;HOMLEN=5;SR=39;SRQ=1;CONSENSUS=CTCTCCATAACCAAGAAAATAAACATGCCAAGAGGAATTTGGTGAGTAAACAATGTTAAGTCCTAAGAGCTGCTAATGGGACCACTTTGAGCCATGAACTAATAAATCTCCACCACATCAAAAGAGAACTTTTTGCTTACAATGATAAAAACGAAATTTTGTCCTAAATGGAACCGTTTTTCTTGAGCATATGGTAATGATTTTCAGAAGGAAAGAAACTTCGATTTTTATATCCACCAGAC;CE=1.92421 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/0:0,-75.1615,-768.704:10000:PASS:338:4712:4374:2:0:0:250:0
Look forward to your reply! Thank you.
Thank you .
Yours sincerely
Phillip!

@tobiasrausch
Copy link
Member

I think that's a duplicate question with issue #385 so I am closing this one.

@weewus
Copy link
Author

weewus commented Aug 5, 2024

Not to beat a dead horse, but I don't think this was really answered :3

Firstly, thanks for getting back so quickly :)

This helps in understanding what supports CIPOS and CIEND, but it doesn't quite tell me what they mean.

The use case is understanding overlaps between variants. If two variants don't overlap when not considering CIPOS and CIEND, but overlap when considering CIPOS and CIEND for each - how should this be interpreted?

My impression was that regardless of how CIPOS and CIEND are calculated, they would give information about the % significance for POS and END locations for some probability distribution.

I'm not really sure where to start if CIPOS and CIEND don't give some % significance for POS and END locations.

@tobiasrausch
Copy link
Member

Ah sorry, so CIPOS and CIEND are completely derived from the mapping locations of reads and given that germline SVs are often homology-mediated (repeat-mediated) these can be quite misleading. That's why SV comparison tools gradually move towards comparing SV alleles instead of solely relying on reciprocal overlap. That's also what I implemented in sansa

@tobiasrausch tobiasrausch reopened this Aug 5, 2024
@weewus
Copy link
Author

weewus commented Aug 7, 2024

Thanks! Ok, off topic from confidence intervals now, but by comparing SV alleles do you mean comparing Alt Sequence information?
I read a bit into sansa, but couldn't quite see where the SV allele divergence value comes from.

@tobiasrausch
Copy link
Member

Sansa uses delly's INFO/CONSENSUS sequence which is the ALT sequence + surrounding sequence (i.e., a local assembly of SV supporting reads).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants