Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chess sim output .tsv file explained #61

Open
aminakur opened this issue Aug 10, 2023 · 1 comment
Open

Chess sim output .tsv file explained #61

aminakur opened this issue Aug 10, 2023 · 1 comment

Comments

@aminakur
Copy link

Could you please provide information about the z_bg and p_bg columns in the chess sim output file?
image

@liz-is
Copy link
Collaborator

liz-is commented Aug 11, 2023

When you ran chess sim, you must have specified either --background-regions or --background-query. Using these options means that CHESS will calculate a z-score and corresponding p-value for the significance of the similarity of each pair of regions, compared to comparing your reference region to the background regions. This is described in more detail in the CHESS paper:

An additional application of CHESS is to assess whether contact matrices originating from different genomic regions, or different genomes, are similar. An appropriate null model can be used to test whether the similarity measured by S is statistically significant. For example, a region R containing just a single TAD might obtain a high score when compared to a particular query Q, which also contains a single TAD. However, the same is true for any region with a single TAD, which is why the similarity of R and Q is not particularly informative in this instance. The score for the comparison of R versus Q should then be assigned a low significance. Conversely, when two highly complex regions with many structural features are assigned a strong similarity score S, it is unlikely to find an equally similar region in the genome by chance and the comparison is given high statistical significance. To compute a suitable null model, CHESS compares the reference matrix R to all other regions of the same size across the genome (referred to as 𝑄𝐵𝑖 in Fig. 1b). The distribution of scores from the null model is then used to calculate a z-score, corresponding to a normalized effect size, and a P value denoting the frequency of scores equal to or higher than S in the null model (Fig. 1c and Methods). Therefore, CHESS enables a quantitative comparison and assessment of statistical significance of contact matrix similarities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants