Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Constraint violation score #4

Open
kmtiny opened this issue Aug 17, 2018 · 10 comments
Open

About Constraint violation score #4

kmtiny opened this issue Aug 17, 2018 · 10 comments

Comments

@kmtiny
Copy link

kmtiny commented Aug 17, 2018

Hi,
I ran into some problems when predicted mutation effect using ExPecto, which are listed as follows:

  1. in my input file *.vcf, whether I should restrict mutations within transcriptional regulatory regions near TSS, or I can use all calling mutations to predict mutations effect.
  2. for calculating 'constraint violation score', I learned that it was computed as the product of 'predicted mutation effect' and 'variation potential directionality score'. For 'predicted mutation effect' of each mutation in different tissues, we can directly obtain from ExPecto. However, for later, which was computed as the sum of predicted log(fold change) values for all mutations per gene in the paper, I should obtain it by calculating the sum(predicted mutation effect) of all mutations on target gene in my *vcf file or using the associated value in the file 'variation_potential.directionality_scores.txt' which was provided in Supplementary_Data.2 of the Paper.
  3. in the paper, there is an explanatory definition on 'constraint violation score',
    'The constraint violation score was computed as the product of the predicted variant effect of the prioritized LD variant and the variation potential directionality score of the nearest TSS', 'the variation potential directionality score of the nearest TSS' of which how I should understand?
    I hope to get your helps, thank you!
@jzthree
Copy link
Collaborator

jzthree commented Aug 17, 2018

Hi,

Hope this helps:

  1. You can use all mutations but for computational efficiency, but I recommend focusing on variants within 10kb or 20kb to TSS. Mutations that are further away usually get very small predicted effects.

  2. 'variation potential directionality score' can be obtained from ''variation_potential.directionality_scores.txt' '. It was calculated based on all potential single nucleotide mutations within 1kb to the TSS.

  3. Constraint violation score is computed as the product of predicted expression effect (log fold change) and variational potential directionality score. Both scores should be computed with respect to the same gene(TSS) - the later is already computed and can be obtained as in 2. In the case of the examples we showed in the paper, we use the nearest TSS as the TSS of interest.

Best,
Jian

@kmtiny
Copy link
Author

kmtiny commented Aug 17, 2018

Thanks for your timely reply.

@kmtiny
Copy link
Author

kmtiny commented Aug 20, 2018

I have still a question to ask you! we knows that constraint violation score for each of mutation on a gene can be calculated according to formula in Paper. Then, could we directly sum the scores of all mutation on a gene to represent the impact of all mutation on gene? If couldn't, what the sum might mean? Thank you.

@jzthree
Copy link
Collaborator

jzthree commented Aug 20, 2018

I think you are asking about the variation potential directionality score which is the sum of predicted mutation effects of all potential mutations - right? The sum is used to measure the bias of the distribution of predicted mutation effects - whether the distribution is biased toward positive effect mutations or negative effect mutations. Maybe it is more intuitive to think about the mean of predicted mutation effects, which differs from the sum only by a constant factor in this case.

@kmtiny
Copy link
Author

kmtiny commented Aug 20, 2018

Hi, Jian
Thanks for your timely reply. In my question, it is indeed on "constraint violation score", the sum of which was mentioned in Paper. I just want to know that supposing we forced to calculate the sum of "constraint violation score" for all mutations on a gene, whether the value of sum would be meaning.
In short, for a gene, whether can we calculate the sum of all mutations on it?
Thank you!

@kmtiny
Copy link
Author

kmtiny commented Aug 20, 2018

in sentence " the sum of which was mentioned in Paper", "was mentioned" is corrected to "was not mentioned".

@jzthree
Copy link
Collaborator

jzthree commented Aug 20, 2018

I see that is an interesting question. That will be equivalent to the square of the variation potential directionality score - it can probably have an interpretation as the size of variation potential directionality.

@kmtiny
Copy link
Author

kmtiny commented Aug 21, 2018

Hi, Jian
A error was occuring when I ran ExPecto with command line "python chromatin.py xx.vcf", which is appended as follows:

Number of variants with reference allele matched with reference genome:
704
Number of input variants:
704
Traceback (most recent call last):
File "/work1/xuelab/project/guokm/software/ExPecto/chromatin.py", line 154, in
input = torch.from_numpy(ref_encoded[int(i*batchSize):int((i+1)*batchSize),:,:]).unsqueeze(3)
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 3)

How should I solve it?
Thank you!

@jzthree
Copy link
Collaborator

jzthree commented Aug 21, 2018

Did you try git pull the newest code? I just made a commit to fix a bug that may cause this.

@kmtiny
Copy link
Author

kmtiny commented Aug 24, 2018

The error reported above had been solved after updating code, Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants