move airPLS from google code to github

zmzhang · Sep 13, 2016 · e21fb0f · e21fb0f
commit e21fb0f
Show file tree

Hide file tree

Showing 18 changed files with 584 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,66 @@
+# 1. Intorduction #
+
+*adaptive iteratively reweighted Penalized Least Squares (airPLS) algorithm doesn’t require any user intervention and prior information, such as detected peaks. It iteratively changes weights of sum squares errors (SSE) between the fitted baseline and original signals, and the weights of SSE are obtained adaptively using between previously fitted baseline and original signals. This baseline estimator is fast and flexible in fitting baseline.*
+
+
+# 2. Installation #
+
+## 2.1 MATLAB version ##
+
+- Install MATLAB 6.5 or higher in you computer.
+- download, unzip and enjoy it from this url.
+
+## 2.2 R version ##
+
+By taking the advantage of sparse matrix in R package "Matrix", we implemented the sparse version of whittaker smoother and airPLS alogrithm. Now the speed of airPLS 2.0 is faster than airPLS 1.0 by 100 times or more.
+
+- Firstly, you must download and install R 2.12.2 from the urls as follows:
+
+	for linux: http://cran.r-project.org/src/base/R-2/R-2.12.2.tar.gz
+
+	for windows: http://cran.r-project.org/bin/windows/base/old/2.12.2/R-2.12.2-win.exe
+
+- Then, download the airPLS package from this project download pages.
+
+	for linux: 
+
+	for windows: 
+
+## 2.3 Python version ##
+
+Python version of airPLS using the scipy framework by Renato Lombardo of University of Palermo.
+
+
+
+- Install Python
+	Python 2.7 is recommended
+	https://www.python.org/ftp/python/2.7.10/python-2.7.10.msi
+
+
+- Install Numpy, Scipy, Matplotlib with following commands 
+
+	```shell
+	pip install numpy
+	pip install scipy
+	pip install matplotlib
+	```
+- clone this project and running airPLS.py
+
+## 2.4 C++ version ##
+
+We have already noticed the parameter optimizing problem in R and Matlab version of airPLS. So we have rewritten this airPLS algorithm in C++ and MFC (Visual Studio 2010) to provide a better user interface for baseline-correction. One can tune the lambda parameter by dragging the slider easily.
+
+It can be downloaded from url
+
+
+# 3. Contact #
+
+For any questions, please contact:
+
+	Zhi-Min Zhang: [email protected]
+
+# 4. How to cite#
+
+Z.-M. Zhang, S. Chen, and Y.-Z. Liang, Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst 135 (5), 1138-1146 (2010).
+
+[Download pdf and endnote citation here](http://pubs.rsc.org/is/content/articlelanding/2010/an/b922045c)
diff --git a/airPLS.m b/airPLS.m
@@ -0,0 +1,65 @@
+function [Xc,Z]= airPLS(X,lambda,order,wep,p,itermax)
+%  Baseline correction using adaptive iteratively reweighted Penalized Least Squares;		
+%  Input 
+%         X:row matrix of spectra or chromatogram (size m*n, m is sample and n is variable)
+%         lambda: lambda is an adjustable parameter, it can be adjusted by user. The larger lambda is, the smoother z will be 
+%         order: an integer indicating the order of the difference of penalties
+%         wep: weight exception proportion at both the start and end
+%         p: asymmetry parameter for the start and end
+%         itermax: maximum iteration times
+%  Output
+%         Xc: the corrected spectra or chromatogram vector (size m*n)
+%         Z: the fitted vector (size m*n)
+%  Examples:
+%         Xc=airPLS(X);
+%         [Xc,Z]=airPLS(X,10e5,2,0.1,0.5,20);
+%  Reference:
+%         (1) Eilers, P. H. C., A perfect smoother. Analytical Chemistry 75 (14), 3631 (2003).
+%         (2) Eilers, P. H. C., Baseline Correction with Asymmetric Least
+%         Squares Smoothing, http://www.science.uva.nl/~hboelens/publications/draftpub/Eilers_2005.pdf
+%         (3) Gan, Feng, Ruan, Guihua, and Mo, Jinyuan, Baseline correction by improved iterative polynomial fitting with automatic threshold. Chemometrics and Intelligent Laboratory Systems 82 (1-2), 59 (2006).
+% 
+%  zhimin zhang @ central south university on Mar 30,2011
+
+if nargin < 6
+    itermax=20;
+  if nargin < 5
+     p=0.05;
+    if nargin < 4
+       wep=0.1;
+      if nargin < 3
+          order=2;
+          if nargin < 2
+               lambda=10e7;
+              if nargin < 1
+                   error('airPLS:NotEnoughInputs','Not enough input arguments. See airPLS.');
+              end    
+          end  
+      end  
+    end
+  end
+end
+
+[m,n]=size(X);
+wi = [1:ceil(n*wep) floor(n-n*wep):n];
+D = diff(speye(n), order);
+DD = lambda*D'*D;
+for i=1:m
+    w=ones(n,1);
+    x=X(i,:);
+    for j=1:itermax
+        W=spdiags(w, 0, n, n);
+        C = chol(W + DD);
+        z = (C\(C'\(w .* x')))';
+        d = x-z;
+        dssn= abs(sum(d(d<0)));
+        if(dssn<0.001*sum(abs(x))) 
+            break;
+        end
+        w(d>=0) = 0;
+        w(wi)   = p;
+        w(d<0)  = exp(j*abs(d(d<0))/dssn);
+    end
+    Z(i,:)=z;
+end
+Xc=X-Z;
diff --git a/airPLS.py b/airPLS.py
@@ -0,0 +1,117 @@
+#!/usr/bin/python
+'''
+airPLS.py Copyright 2014 Renato Lombardo - [email protected]
+Baseline correction using adaptive iteratively reweighted penalized least squares
+
+This program is a translation in python of the R source code of airPLS version 2.0
+by Yizeng Liang and Zhang Zhimin - https://code.google.com/p/airpls
+Reference:
+Z.-M. Zhang, S. Chen, and Y.-Z. Liang, Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst 135 (5), 1138-1146 (2010).
+
+Description from the original documentation:
+
+Baseline drift always blurs or even swamps signals and deteriorates analytical results, particularly in multivariate analysis.  It is necessary to correct baseline drift to perform further data analysis. Simple or modified polynomial fitting has been found to be effective in some extent. However, this method requires user intervention and prone to variability especially in low signal-to-noise ratio environments. The proposed adaptive iteratively reweighted Penalized Least Squares (airPLS) algorithm doesn't require any user intervention and prior information, such as detected peaks. It iteratively changes weights of sum squares errors (SSE) between the fitted baseline and original signals, and the weights of SSE are obtained adaptively using between previously fitted baseline and original signals. This baseline estimator is general, fast and flexible in fitting baseline.
+
+
+LICENCE
+This program is free software: you can redistribute it and/or modify
+it under the terms of the GNU Lesser General Public License as published by
+the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public License
+along with this program.  If not, see <http://www.gnu.org/licenses/>
+'''
+
+import numpy as np
+from scipy.sparse import csc_matrix, eye, diags
+from scipy.sparse.linalg import spsolve
+
+def WhittakerSmooth(x,w,lambda_,differences=1):
+    '''
+    Penalized least squares algorithm for background fitting
+    
+    input
+        x: input data (i.e. chromatogram of spectrum)
+        w: binary masks (value of the mask is zero if a point belongs to peaks and one otherwise)
+        lambda_: parameter that can be adjusted by user. The larger lambda is,  the smoother the resulting background
+        differences: integer indicating the order of the difference of penalties
+    
+    output
+        the fitted background vector
+    '''
+    X=np.matrix(x)
+    m=X.size
+    i=np.arange(0,m)
+    E=eye(m,format='csc')
+    D=E[1:]-E[:-1] # numpy.diff() does not work with sparse matrix. This is a workaround.
+    W=diags(w,0,shape=(m,m))
+    A=csc_matrix(W+(lambda_*D.T*D))
+    B=csc_matrix(W*X.T)
+    background=spsolve(A,B)
+    return np.array(background)
+
+def airPLS(x, lambda_=100, porder=1, itermax=15):
+    '''
+    Adaptive iteratively reweighted penalized least squares for baseline fitting
+    
+    input
+        x: input data (i.e. chromatogram of spectrum)
+        lambda_: parameter that can be adjusted by user. The larger lambda is,  the smoother the resulting background, z
+        porder: adaptive iteratively reweighted penalized least squares for baseline fitting
+    
+    output
+        the fitted background vector
+    '''
+    m=x.shape[0]
+    w=np.ones(m)
+    for i in range(1,itermax+1):
+        z=WhittakerSmooth(x,w,lambda_, porder)
+        d=x-z
+        dssn=np.abs(d[d<0].sum())
+        if(dssn<0.001*(abs(x)).sum() or i==itermax):
+            if(i==itermax): print 'WARING max iteration reached!'
+            break
+        w[d>=0]=0 # d>0 means that this point is part of a peak, so its weight is set to 0 in order to ignore it
+        w[d<0]=np.exp(i*np.abs(d[d<0])/dssn)
+        w[0]=np.exp(i*(d[d<0]).max()/dssn) 
+        w[-1]=w[0]
+    return z
+
+if __name__=='__main__':
+    '''
+    Example usage and testing
+    '''
+    print 'Testing...'
+    from scipy.stats import norm
+    import matplotlib.pyplot as pl
+    x=np.arange(0,1000,1)
+    g1=norm(loc = 100, scale = 1.0) # generate three gaussian as a signal
+    g2=norm(loc = 300, scale = 3.0)
+    g3=norm(loc = 750, scale = 5.0)
+    signal=g1.pdf(x)+g2.pdf(x)+g3.pdf(x)
+    baseline1=5e-4*x+0.2 # linear baseline
+    baseline2=0.2*np.sin(np.pi*x/x.max()) # sinusoidal baseline
+    noise=np.random.random(x.shape[0])/500
+    print 'Generating simulated experiment'
+    y1=signal+baseline1+noise
+    y2=signal+baseline2+noise
+    print 'Removing baselines' 
+    c1=y1-airPLS(y1) # corrected values
+    c2=y2-airPLS(y2) # with baseline removed
+    print 'Plotting results'
+    fig,ax=pl.subplots(nrows=2,ncols=1)
+    ax[0].plot(x,y1,'-k')
+    ax[0].plot(x,c1,'-r')
+    ax[0].set_title('Linear baseline')
+    ax[1].plot(x,y2,'-k')
+    ax[1].plot(x,c2,'-r')
+    ax[1].set_title('Sinusoidal baseline')
+    pl.show()
+    print 'Done!'
+
diff --git a/airPLS_R/DESCRIPTION b/airPLS_R/DESCRIPTION
@@ -0,0 +1,12 @@
+Package: airPLS
+Type: Package
+Title: adaptive iteratively reweighted Penalized Least Squares for baseline correction
+Version: 3.0.0
+Date: 2014-10-21
+Depends: R (>= 3.0.0), Matrix
+Suggests:
+Author: Yizeng Liang, Zhimin Zhang, Shan Chen
+Maintainer: Zhimin Zhang <[email protected]>
+Description: adaptive iteratively reweighted Penalized Least Squares for baseline correction
+License: LGPL version 2 or newer
+Packaged: Wen Oct 21 20:40:21 2014;
diff --git a/airPLS_R/NAMESPACE b/airPLS_R/NAMESPACE
@@ -0,0 +1,6 @@
+importFrom("Matrix", spMatrix, diff, t, solve)
+
+export(
+       "airPLS",
+	   "WhittakerSmooth"
+       )
diff --git a/airPLS_R/R/airPLS.R b/airPLS_R/R/airPLS.R
@@ -0,0 +1,33 @@
+WhittakerSmooth <- function(x,w,lambda,differences=1) {
+  x=matrix(x,nrow = 1, ncol=length(x))
+  L=length(x)
+  E=spMatrix(L,L,i=seq(1,L),j=seq(1,L),rep(1,L))
+  D=as(diff(E,1,differences),"dgCMatrix")
+  W=as(spMatrix(L,L,i=seq(1,L),j=seq(1,L),w),"dgCMatrix")
+  background=solve((W+lambda*t(D)%*%D),t((w*x)));
+  return(as.vector(background))
+ }
+
+airPLS <- function(x,lambda=10,differences=1, itermax=20){
+
+  x = as.vector(x)
+  m = length(x)
+  w = rep(1,m)
+  control = 1
+  i = 1
+  while(control==1){
+     z = WhittakerSmooth(x,w,lambda,differences)
+     d = x-z
+     sum_smaller = abs(sum(d[d<0])) 
+     if(sum_smaller<0.001*sum(abs(x))||i==itermax)
+     {
+      control = 0
+     }
+     w[d>=0] = 0
+     w[d<0] = exp(i*abs(d[d<0])/sum_smaller)
+     w[1] = exp(i*max(d[d<0])/sum_smaller)
+     w[m] = exp(i*max(d[d<0])/sum_smaller)
+     i=i+1
+  }
+  return(z) 
+}
diff --git a/airPLS_R/data/chromatogram.rda b/airPLS_R/data/chromatogram.rda
diff --git a/airPLS_R/data/nmr.rda b/airPLS_R/data/nmr.rda
diff --git a/airPLS_R/data/raman.rda b/airPLS_R/data/raman.rda
diff --git a/airPLS_R/man/WhittakerSmooth.Rd b/airPLS_R/man/WhittakerSmooth.Rd
@@ -0,0 +1,27 @@
+\name{WhittakerSmooth}
+\alias{WhittakerSmooth}
+%- Also NEED an '\alias' for EACH other topic documented here.
+\title{Whittaker Smoother}
+\description{
+  penalized least squares algorithm for background fitting
+}
+\usage{
+WhittakerSmooth(x,w,lambda) 
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{x}{ raman spectrum }
+  \item{w}{ binary masks (value of the mask is zero if a point belongs to peaks and one otherwise) }
+  \item{lambda}{lambda is an adjustable parameter, it can be adjusted by user. The larger lambda is,  the smoother z will be }
+  \item{differences}{ an integer indicating the order of the difference of penalties}
+}
+
+
+\value{
+  the fitted vector
+}
+
+
+\author{Yizeng Liang ,Zhang Zhimin}
+
+\keyword{WhittakerSmooth}
diff --git a/airPLS_R/man/airPLS-function.Rd b/airPLS_R/man/airPLS-function.Rd
@@ -0,0 +1,28 @@
+\name{airPLS-function}
+\alias{airPLS-function}
+%- Also NEED an '\alias' for EACH other topic documented here.
+\title{adaptive iteratively reweighted penalized least squares}
+\description{
+ adaptive iteratively reweighted penalized least squares for baseline fitting
+}
+\usage{
+airPLS(x,lambda=10,differences=1, itermax=20)
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+  \item{x}{ spectrum }
+  \item{lambda}{lambda is an adjustable parameter, it can be adjusted by user. The larger lambda is,  the smoother z will be }
+  \item{differences}{ an integer indicating the order of the difference of penalties}
+}
+
+
+\value{
+  the fitted vector
+}
+
+
+\author{Yizeng Liang ,Zhang Zhimin}
+
+\seealso{\code{\link{WhittakerSmooth}}}
+
+\keyword{airPLS-function}
diff --git a/airPLS_R/man/airPLS.Rd b/airPLS_R/man/airPLS.Rd
@@ -0,0 +1,19 @@
+\name{airPLS}
+\alias{airPLS}
+\title{
+Baseline correction using adaptive iteratively reweighted penalized least squares
+}
+\description{
+Baseline drift always blurs or even swamps signals and deteriorates analytical results, particularly in multivariate analysis.  It is necessary to correct baseline drift to perform further data analysis. Simple or modified polynomial fitting has been found to be effective in some extent. However, this method requires user intervention and prone to variability especially in low signal-to-noise ratio environments. The proposed adaptive iteratively reweighted Penalized Least Squares (airPLS) algorithm doesn't require any user intervention and prior information, such as detected peaks. It iteratively changes weights of sum squares errors (SSE) between the fitted baseline and original signals, and the weights of SSE are obtained adaptively using between previously fitted baseline and original signals. This baseline estimator is general, fast and flexible in fitting baseline.
+}
+\details{
+\tabular{ll}{
+Package: \tab airPLS\cr
+Type: \tab Package\cr
+Version: \tab 1.0.0\cr
+Date: \tab 2009-10-09\cr
+License: \tab GPL (>= 2)\cr
+}
+}
+\author{ yizeng liang<[email protected]>, zhimin zhang <[email protected]>, chen shan <[email protected]>}                                                                                                                     
+\keyword{ package }