Skip to content

Commit 3e8327c

Browse files
committed
Edit normalization documentation
1 parent f10780d commit 3e8327c

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

release/version-1_0/panoply_normalize_ms_data/panoply_normalize_ms_data.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Normalization methods available are:
1010
* Median-MAD normalization (`median`): median-centering followed by median absolute deviation (MAD) scaling
1111
* Two-component mixture model-based normalization (`2comp`): In this method, we assume that for every sample there is a set of unregulated proteins or PTM sites. In the normalized sample, these proteins or PTM sites should have a log ratio centered at zero. In addition, there are proteins or PTM sites that are either up- or downregulated. This normalization scheme attempts to identify the unregulated proteins or PTM sites, and centers the distribution of these log-ratios around zero in order to nullify the effect of differential protein loading and/or systematic MS variation. A 2-component Gaussian mixture model-based is used to achieve this effect. The two Gaussians N (&mu;<sub>1</sub>, &sigma;<sub>1</sub>) and N (&mu;<sub>2</sub>, &sigma;<sub>2</sub>) for a sample *i* are fitted and used in the normalization process as follows: the mode *m*<sub>i</sub> of the log-ratio distribution is determined for each sample using kernel density estimation with a Gaussian kernel and Shafer-Jones bandwidth. A two-component Gaussian mixture model is then fit with the mean of both Gaussians constrained to be *m*<sub>i</sub>, i.e., &mu;<sub>i1</sub> = &mu;<sub>i2</sub> = *m*<sub>i</sub>. The Gaussian with the smaller estimated standard deviation &sigma;<sub>i</sub> = min (&sigma;<sub>i1</sub> , &sigma;<sub>i2</sub>) is assumed to represent the unregulated component of proteins/PTM sites, and is used to normalize the sample by subtracting the mean *m*<sub>i</sub> from each protein/PTM site and dividing by the standard deviation &sigma;<sub>i</sub>. See (Mertins et al., 2016) and (Gillette et al, 2020).
1212

13+
**CAVEAT:** The two-component mixture model-based normalization (`2comp`) method has been tuned for log-transformed ratio (to a common reference) data, and hence cannot be used with intensity-based (e.g., label-free) data.
14+
1315
Normalization is followed by filtering:
1416

1517
* For Spectrum Mill based tables, exclude rows (proteins/PTM sites) with numratio less than `minNumratio` in more than `minNumratioFraction` samples

0 commit comments

Comments
 (0)