-
Notifications
You must be signed in to change notification settings - Fork 14
Interpreting Results
The following is an explanation of each header (i.e., column) in the Bayesian protein quantification output ("BayesianFoldChangeAnalysis.tsv"). You may wish to read about how the algorithm works. Briefly, after peptide quantification and normalization steps are performed, FlashLFQ estimates each protein's fold-change from its constituent peptides' fold-changes. The estimated mean of the peptide fold-changes is used as the protein's fold-change; this mean is estimated using Bayesian statistics. You can decide if shared peptides should be used for protein quantification in FlashLFQ's settings.
Protein Group – the name (accession) of the protein.
Gene – the gene name(s) of the protein.
Organism – the organism the protein came from.
Control Condition – this is the condition the quantification is relative to. So for a "normal vs. tumor" analysis, the control condition would be "normal". The protein fold-change reported by FlashLFQ would be a change in protein abundance from "normal" to "tumor".
Treatment Condition – the "tumor" condition in the example above.
Log2 Fold-Change Cutoff – this is what is considered to be “noise” in terms of a protein’s quantitative change. It can either be specified by the user, or FlashLFQ can estimate a cutoff for you from the standard deviation in peptide fold-change measurements between conditions. For now, all proteins have the same noise-level cutoff.
Protein Log2 Fold-Change – the protein’s log2-transformed fold-change. A log2 fold-change of 1.0 means the protein is doubling in abundance between conditions; 2.0 means it’s quadrupling, -1.0 means it’s halving.
Standard Deviation of Peptide Log2 Fold-Changes - the standard deviation of the peptide fold-change measurements that the protein fold-change is based on.
Protein Intensity for Treatment Condition – a measurement of the intensity of the protein in the treatment condition, based on the peptides observed. This is not used by FlashLFQ; in other words, it is an output of the protein quantification algorithm, not an input.
Number of Peptides – number of peptides observed that were from this protein, across all data files. Typically, longer and more abundant proteins have more peptides observed.
Number of Fold-Change Measurements – the protein’s fold-change is estimated from the peptide abundance measurements that were observed in the control and treatment conditions. This column shows how many comparisons could be made from the data. Sometimes peptide-level data is missing, especially if the protein is completely missing from one of the conditions; this provides fewer measurements for protein quantification.
List of Fold-Change Measurements Grouped by Peptide – The actual peptide-level fold-change data that the protein fold-change is based on.
Posterior Error Probability – this is the estimated probability that the protein’s change falls below the specified noise level. For example, 0.1 means the protein has a 10% probability of being noise.
False Discovery Rate – This is the proportion of false discoveries in the list of proteins so far. An false discovery rate (FDR) of 0.05 means that of the proteins listed so far, 5% are estimated to fall below the estimated noise level. A false-discovery rate of 5% is a typical "cutoff" for deciding whether or not a protein's abundance change is worthy of further validation, but you may be more tolerant or more strict. You may of course instead choose to use a posterior error probability cutoff or some other metric instead of an FDR cutoff.