Statistical methods employed comparing insulation scores #190

luzporras · 2024-05-17T14:36:22Z

I'm comparing insulation scores of two samples using FANC. I've created two .bed files—one through FANC compare and the other via fanc.DifferenceRegions.from_regions, both using a 50kb bin and a 150kb window. However, I've observed discrepancies between the outputs of these bed files, leaving me unsure of which one to utilize.

My main questions revolve around the statistical methods employed in generating these outputs. Specifically, I want to know what statistical analyses underlie the calculations used to generate these outputs and what the criteria are for determining the significance of differences between the insulation scores of the two samples.

Thanks,
Luz P

kaukrise · 2024-06-04T19:44:55Z

Hey, apologies for the late response.

Here is some pseudocode for fanc compare:

if input are matrices:
  if comparison == 'fold-change':
    use FoldChangeMatrix
  else if comparison == 'difference':
    use DifferenceMatrix
else if input are scores:
  if comparison == 'fold-change':
    use FoldChangeScores
  elif comparison == 'difference':
    use DifferenceScores
else if input are regions:
  if comparison == 'fold-change':
    use FoldChangeRegions
  elif comparison == 'difference':
    use DifferenceRegions

As you can see, fanc compare uses DifferenceRegions under the hood if you provide BED files and the --comparison difference argument. The default, however, is to use fold-change - maybe that is where the difference stems from?

You can see the actual code here:

fanc/fanc/commands/fanc_commands.py

Lines 3533 to 3548 in d5d8608

    
           elif isinstance(matrix1, RegionBased) and isinstance(matrix2, RegionBased): 
        
               ComparisonRegions = None 
        
               if comparison == 'fold-change' or comparison == 'fc': 
        
                   ComparisonRegions = FoldChangeRegions 
        
               elif comparison == 'difference' or comparison == 'diff': 
        
                   ComparisonRegions = DifferenceRegions 
        
               else: 
        
                   parser.error("Comparison type -c {} not recognised!".format(comparison)) 
        
               if output_format is None: 
        
                   comparison_output = output_file 
        
               else: 
        
                   output_format = output_format.lower() 
        
                   comparison_output = None 
        
               cmp = ComparisonRegions.from_regions(matrix1, matrix2, file_name=comparison_output, 
        
                                                    tmpdir=tmp, mode='w', log=log)

So, the call would be

DifferenceRegions.from_regions(
  matrix1, matrix2, 
  file_name=comparison_output, 
  tmpdir=tmp, 
  mode='w', 
  log=log
)

All fanc compare does is to calculate either the difference or the fold-change of values in the BED for each region. There are no statistics involved.

fanc/fanc/architecture/comparisons.py

Lines 532 to 540 in d5d8608

    
           class DifferenceRegions(ComparisonRegions): 
        
               _classid = 'DIFFERENCEREGIONS' 
        
               def __init__(self, *args, **kwargs): 
        
                   ComparisonRegions.__init__(self, *args, **kwargs) 
        
               def compare(self, score1, score2): 
        
                   return score1 - score2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistical methods employed comparing insulation scores #190

Statistical methods employed comparing insulation scores #190

luzporras commented May 17, 2024

kaukrise commented Jun 4, 2024

Statistical methods employed comparing insulation scores #190

Statistical methods employed comparing insulation scores #190

Comments

luzporras commented May 17, 2024

kaukrise commented Jun 4, 2024