* Two-component mixture model-based normalization (`2comp`): In this method, we assume that for every sample there is a set of unregulated proteins or PTM sites. In the normalized sample, these proteins or PTM sites should have a log ratio centered at zero. In addition, there are proteins or PTM sites that are either up- or downregulated. This normalization scheme attempts to identify the unregulated proteins or PTM sites, and centers the distribution of these log-ratios around zero in order to nullify the effect of differential protein loading and/or systematic MS variation. A 2-component Gaussian mixture model-based is used to achieve this effect. The two Gaussians N (μ<sub>1</sub>, σ<sub>1</sub>) and N (μ<sub>2</sub>, σ<sub>2</sub>) for a sample *i* are fitted and used in the normalization process as follows: the mode *m*<sub>i</sub> of the log-ratio distribution is determined for each sample using kernel density estimation with a Gaussian kernel and Shafer-Jones bandwidth. A two-component Gaussian mixture model is then fit with the mean of both Gaussians constrained to be *m*<sub>i</sub>, i.e., μ<sub>i1</sub> = μ<sub>i2</sub> = *m*<sub>i</sub>. The Gaussian with the smaller estimated standard deviation σ<sub>i</sub> = min (σ<sub>i1</sub> , σ<sub>i2</sub>) is assumed to represent the unregulated component of proteins/PTM sites, and is used to normalize the sample by subtracting the mean *m*<sub>i</sub> from each protein/PTM site and dividing by the standard deviation σ<sub>i</sub>. See (Mertins et al., 2016) and (Gillette et al, 2020).
0 commit comments