Edit normalization documentation

drmani · drmani · commit 3e8327cc2ffb · 2021-03-23T13:06:13.000-04:00
diff --git a/release/version-1_0/panoply_normalize_ms_data/panoply_normalize_ms_data.md b/release/version-1_0/panoply_normalize_ms_data/panoply_normalize_ms_data.md
@@ -10,6 +10,8 @@ Normalization methods available are:
 * Median-MAD normalization (`median`): median-centering followed by median absolute deviation (MAD) scaling
 * Two-component mixture model-based normalization (`2comp`): In this method, we assume that for every sample there is a set of unregulated proteins or PTM sites. In the normalized sample, these proteins or PTM sites should have a log ratio centered at zero. In addition, there are proteins or PTM sites that are either up- or downregulated. This normalization scheme attempts to identify the unregulated proteins or PTM sites, and centers the distribution of these log-ratios around zero in order to nullify the effect of differential protein loading and/or systematic MS variation. A 2-component Gaussian mixture model-based is used to achieve this effect. The two Gaussians N (&mu;<sub>1</sub>, &sigma;<sub>1</sub>) and N (&mu;<sub>2</sub>, &sigma;<sub>2</sub>) for a sample *i* are fitted and used in the normalization process as follows: the mode *m*<sub>i</sub> of the log-ratio distribution is determined for each sample using kernel density estimation with a Gaussian kernel and Shafer-Jones bandwidth. A two-component Gaussian mixture model is then fit with the mean of both Gaussians constrained to be *m*<sub>i</sub>, i.e., &mu;<sub>i1</sub> = &mu;<sub>i2</sub> = *m*<sub>i</sub>. The Gaussian with the smaller estimated standard deviation &sigma;<sub>i</sub> = min (&sigma;<sub>i1</sub> , &sigma;<sub>i2</sub>) is assumed to represent the unregulated component of proteins/PTM sites, and is used to normalize the sample by subtracting the mean *m*<sub>i</sub> from each protein/PTM site and dividing by the standard deviation &sigma;<sub>i</sub>. See (Mertins et al., 2016) and (Gillette et al, 2020).
 
+**CAVEAT:** The two-component mixture model-based normalization (`2comp`) method has been tuned for log-transformed ratio (to a common reference) data, and hence cannot be used with intensity-based (e.g., label-free) data.
+
 Normalization is followed by filtering:
 
 * For Spectrum Mill based tables, exclude rows (proteins/PTM sites) with numratio less than `minNumratio` in more than `minNumratioFraction` samples