Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistical methods employed comparing insulation scores #190

Open
luzporras opened this issue May 17, 2024 · 1 comment
Open

Statistical methods employed comparing insulation scores #190

luzporras opened this issue May 17, 2024 · 1 comment

Comments

@luzporras
Copy link

I'm comparing insulation scores of two samples using FANC. I've created two .bed files—one through FANC compare and the other via fanc.DifferenceRegions.from_regions, both using a 50kb bin and a 150kb window. However, I've observed discrepancies between the outputs of these bed files, leaving me unsure of which one to utilize.

My main questions revolve around the statistical methods employed in generating these outputs. Specifically, I want to know what statistical analyses underlie the calculations used to generate these outputs and what the criteria are for determining the significance of differences between the insulation scores of the two samples.

Thanks,
Luz P

@kaukrise
Copy link
Collaborator

kaukrise commented Jun 4, 2024

Hey, apologies for the late response.

Here is some pseudocode for fanc compare:

if input are matrices:
  if comparison == 'fold-change':
    use FoldChangeMatrix
  else if comparison == 'difference':
    use DifferenceMatrix
else if input are scores:
  if comparison == 'fold-change':
    use FoldChangeScores
  elif comparison == 'difference':
    use DifferenceScores
else if input are regions:
  if comparison == 'fold-change':
    use FoldChangeRegions
  elif comparison == 'difference':
    use DifferenceRegions

As you can see, fanc compare uses DifferenceRegions under the hood if you provide BED files and the --comparison difference argument. The default, however, is to use fold-change - maybe that is where the difference stems from?

You can see the actual code here:

elif isinstance(matrix1, RegionBased) and isinstance(matrix2, RegionBased):
ComparisonRegions = None
if comparison == 'fold-change' or comparison == 'fc':
ComparisonRegions = FoldChangeRegions
elif comparison == 'difference' or comparison == 'diff':
ComparisonRegions = DifferenceRegions
else:
parser.error("Comparison type -c {} not recognised!".format(comparison))
if output_format is None:
comparison_output = output_file
else:
output_format = output_format.lower()
comparison_output = None
cmp = ComparisonRegions.from_regions(matrix1, matrix2, file_name=comparison_output,
tmpdir=tmp, mode='w', log=log)

So, the call would be

DifferenceRegions.from_regions(
  matrix1, matrix2, 
  file_name=comparison_output, 
  tmpdir=tmp, 
  mode='w', 
  log=log
)

All fanc compare does is to calculate either the difference or the fold-change of values in the BED for each region. There are no statistics involved.

class DifferenceRegions(ComparisonRegions):
_classid = 'DIFFERENCEREGIONS'
def __init__(self, *args, **kwargs):
ComparisonRegions.__init__(self, *args, **kwargs)
def compare(self, score1, score2):
return score1 - score2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants