Skip to content

Commit

Permalink
move airPLS from google code to github
Browse files Browse the repository at this point in the history
  • Loading branch information
zmzhang committed Sep 13, 2016
0 parents commit e21fb0f
Show file tree
Hide file tree
Showing 18 changed files with 584 additions and 0 deletions.
66 changes: 66 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# 1. Intorduction #

*adaptive iteratively reweighted Penalized Least Squares (airPLS) algorithm doesn’t require any user intervention and prior information, such as detected peaks. It iteratively changes weights of sum squares errors (SSE) between the fitted baseline and original signals, and the weights of SSE are obtained adaptively using between previously fitted baseline and original signals. This baseline estimator is fast and flexible in fitting baseline.*


# 2. Installation #

## 2.1 MATLAB version ##

- Install MATLAB 6.5 or higher in you computer.
- download, unzip and enjoy it from this url.

## 2.2 R version ##

By taking the advantage of sparse matrix in R package "Matrix", we implemented the sparse version of whittaker smoother and airPLS alogrithm. Now the speed of airPLS 2.0 is faster than airPLS 1.0 by 100 times or more.

- Firstly, you must download and install R 2.12.2 from the urls as follows:

for linux: http://cran.r-project.org/src/base/R-2/R-2.12.2.tar.gz

for windows: http://cran.r-project.org/bin/windows/base/old/2.12.2/R-2.12.2-win.exe

- Then, download the airPLS package from this project download pages.

for linux:

for windows:

## 2.3 Python version ##

Python version of airPLS using the scipy framework by Renato Lombardo of University of Palermo.



- Install Python
Python 2.7 is recommended
https://www.python.org/ftp/python/2.7.10/python-2.7.10.msi


- Install Numpy, Scipy, Matplotlib with following commands

```shell
pip install numpy
pip install scipy
pip install matplotlib
```
- clone this project and running airPLS.py

## 2.4 C++ version ##

We have already noticed the parameter optimizing problem in R and Matlab version of airPLS. So we have rewritten this airPLS algorithm in C++ and MFC (Visual Studio 2010) to provide a better user interface for baseline-correction. One can tune the lambda parameter by dragging the slider easily.

It can be downloaded from url


# 3. Contact #

For any questions, please contact:

Zhi-Min Zhang: [email protected]

# 4. How to cite#

Z.-M. Zhang, S. Chen, and Y.-Z. Liang, Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst 135 (5), 1138-1146 (2010).

[Download pdf and endnote citation here](http://pubs.rsc.org/is/content/articlelanding/2010/an/b922045c)
65 changes: 65 additions & 0 deletions airPLS.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
function [Xc,Z]= airPLS(X,lambda,order,wep,p,itermax)
% Baseline correction using adaptive iteratively reweighted Penalized Least Squares;
% Input
% X:row matrix of spectra or chromatogram (size m*n, m is sample and n is variable)
% lambda: lambda is an adjustable parameter, it can be adjusted by user. The larger lambda is, the smoother z will be
% order: an integer indicating the order of the difference of penalties
% wep: weight exception proportion at both the start and end
% p: asymmetry parameter for the start and end
% itermax: maximum iteration times
% Output
% Xc: the corrected spectra or chromatogram vector (size m*n)
% Z: the fitted vector (size m*n)
% Examples:
% Xc=airPLS(X);
% [Xc,Z]=airPLS(X,10e5,2,0.1,0.5,20);
% Reference:
% (1) Eilers, P. H. C., A perfect smoother. Analytical Chemistry 75 (14), 3631 (2003).
% (2) Eilers, P. H. C., Baseline Correction with Asymmetric Least
% Squares Smoothing, http://www.science.uva.nl/~hboelens/publications/draftpub/Eilers_2005.pdf
% (3) Gan, Feng, Ruan, Guihua, and Mo, Jinyuan, Baseline correction by improved iterative polynomial fitting with automatic threshold. Chemometrics and Intelligent Laboratory Systems 82 (1-2), 59 (2006).
%
% zhimin zhang @ central south university on Mar 30,2011

if nargin < 6
itermax=20;
if nargin < 5
p=0.05;
if nargin < 4
wep=0.1;
if nargin < 3
order=2;
if nargin < 2
lambda=10e7;
if nargin < 1
error('airPLS:NotEnoughInputs','Not enough input arguments. See airPLS.');
end
end
end
end
end
end

[m,n]=size(X);
wi = [1:ceil(n*wep) floor(n-n*wep):n];
D = diff(speye(n), order);
DD = lambda*D'*D;
for i=1:m
w=ones(n,1);
x=X(i,:);
for j=1:itermax
W=spdiags(w, 0, n, n);
C = chol(W + DD);
z = (C\(C'\(w .* x')))';
d = x-z;
dssn= abs(sum(d(d<0)));
if(dssn<0.001*sum(abs(x)))
break;
end
w(d>=0) = 0;
w(wi) = p;
w(d<0) = exp(j*abs(d(d<0))/dssn);
end
Z(i,:)=z;
end
Xc=X-Z;
117 changes: 117 additions & 0 deletions airPLS.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
#!/usr/bin/python
'''
airPLS.py Copyright 2014 Renato Lombardo - [email protected]
Baseline correction using adaptive iteratively reweighted penalized least squares
This program is a translation in python of the R source code of airPLS version 2.0
by Yizeng Liang and Zhang Zhimin - https://code.google.com/p/airpls
Reference:
Z.-M. Zhang, S. Chen, and Y.-Z. Liang, Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst 135 (5), 1138-1146 (2010).
Description from the original documentation:
Baseline drift always blurs or even swamps signals and deteriorates analytical results, particularly in multivariate analysis. It is necessary to correct baseline drift to perform further data analysis. Simple or modified polynomial fitting has been found to be effective in some extent. However, this method requires user intervention and prone to variability especially in low signal-to-noise ratio environments. The proposed adaptive iteratively reweighted Penalized Least Squares (airPLS) algorithm doesn't require any user intervention and prior information, such as detected peaks. It iteratively changes weights of sum squares errors (SSE) between the fitted baseline and original signals, and the weights of SSE are obtained adaptively using between previously fitted baseline and original signals. This baseline estimator is general, fast and flexible in fitting baseline.
LICENCE
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>
'''

import numpy as np
from scipy.sparse import csc_matrix, eye, diags
from scipy.sparse.linalg import spsolve

def WhittakerSmooth(x,w,lambda_,differences=1):
'''
Penalized least squares algorithm for background fitting
input
x: input data (i.e. chromatogram of spectrum)
w: binary masks (value of the mask is zero if a point belongs to peaks and one otherwise)
lambda_: parameter that can be adjusted by user. The larger lambda is, the smoother the resulting background
differences: integer indicating the order of the difference of penalties
output
the fitted background vector
'''
X=np.matrix(x)
m=X.size
i=np.arange(0,m)
E=eye(m,format='csc')
D=E[1:]-E[:-1] # numpy.diff() does not work with sparse matrix. This is a workaround.
W=diags(w,0,shape=(m,m))
A=csc_matrix(W+(lambda_*D.T*D))
B=csc_matrix(W*X.T)
background=spsolve(A,B)
return np.array(background)

def airPLS(x, lambda_=100, porder=1, itermax=15):
'''
Adaptive iteratively reweighted penalized least squares for baseline fitting
input
x: input data (i.e. chromatogram of spectrum)
lambda_: parameter that can be adjusted by user. The larger lambda is, the smoother the resulting background, z
porder: adaptive iteratively reweighted penalized least squares for baseline fitting
output
the fitted background vector
'''
m=x.shape[0]
w=np.ones(m)
for i in range(1,itermax+1):
z=WhittakerSmooth(x,w,lambda_, porder)
d=x-z
dssn=np.abs(d[d<0].sum())
if(dssn<0.001*(abs(x)).sum() or i==itermax):
if(i==itermax): print 'WARING max iteration reached!'
break
w[d>=0]=0 # d>0 means that this point is part of a peak, so its weight is set to 0 in order to ignore it
w[d<0]=np.exp(i*np.abs(d[d<0])/dssn)
w[0]=np.exp(i*(d[d<0]).max()/dssn)
w[-1]=w[0]
return z

if __name__=='__main__':
'''
Example usage and testing
'''
print 'Testing...'
from scipy.stats import norm
import matplotlib.pyplot as pl
x=np.arange(0,1000,1)
g1=norm(loc = 100, scale = 1.0) # generate three gaussian as a signal
g2=norm(loc = 300, scale = 3.0)
g3=norm(loc = 750, scale = 5.0)
signal=g1.pdf(x)+g2.pdf(x)+g3.pdf(x)
baseline1=5e-4*x+0.2 # linear baseline
baseline2=0.2*np.sin(np.pi*x/x.max()) # sinusoidal baseline
noise=np.random.random(x.shape[0])/500
print 'Generating simulated experiment'
y1=signal+baseline1+noise
y2=signal+baseline2+noise
print 'Removing baselines'
c1=y1-airPLS(y1) # corrected values
c2=y2-airPLS(y2) # with baseline removed
print 'Plotting results'
fig,ax=pl.subplots(nrows=2,ncols=1)
ax[0].plot(x,y1,'-k')
ax[0].plot(x,c1,'-r')
ax[0].set_title('Linear baseline')
ax[1].plot(x,y2,'-k')
ax[1].plot(x,c2,'-r')
ax[1].set_title('Sinusoidal baseline')
pl.show()
print 'Done!'

12 changes: 12 additions & 0 deletions airPLS_R/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Package: airPLS
Type: Package
Title: adaptive iteratively reweighted Penalized Least Squares for baseline correction
Version: 3.0.0
Date: 2014-10-21
Depends: R (>= 3.0.0), Matrix
Suggests:
Author: Yizeng Liang, Zhimin Zhang, Shan Chen
Maintainer: Zhimin Zhang <[email protected]>
Description: adaptive iteratively reweighted Penalized Least Squares for baseline correction
License: LGPL version 2 or newer
Packaged: Wen Oct 21 20:40:21 2014;
6 changes: 6 additions & 0 deletions airPLS_R/NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
importFrom("Matrix", spMatrix, diff, t, solve)

export(
"airPLS",
"WhittakerSmooth"
)
33 changes: 33 additions & 0 deletions airPLS_R/R/airPLS.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
WhittakerSmooth <- function(x,w,lambda,differences=1) {
x=matrix(x,nrow = 1, ncol=length(x))
L=length(x)
E=spMatrix(L,L,i=seq(1,L),j=seq(1,L),rep(1,L))
D=as(diff(E,1,differences),"dgCMatrix")
W=as(spMatrix(L,L,i=seq(1,L),j=seq(1,L),w),"dgCMatrix")
background=solve((W+lambda*t(D)%*%D),t((w*x)));
return(as.vector(background))
}

airPLS <- function(x,lambda=10,differences=1, itermax=20){

x = as.vector(x)
m = length(x)
w = rep(1,m)
control = 1
i = 1
while(control==1){
z = WhittakerSmooth(x,w,lambda,differences)
d = x-z
sum_smaller = abs(sum(d[d<0]))
if(sum_smaller<0.001*sum(abs(x))||i==itermax)
{
control = 0
}
w[d>=0] = 0
w[d<0] = exp(i*abs(d[d<0])/sum_smaller)
w[1] = exp(i*max(d[d<0])/sum_smaller)
w[m] = exp(i*max(d[d<0])/sum_smaller)
i=i+1
}
return(z)
}
Binary file added airPLS_R/data/chromatogram.rda
Binary file not shown.
Binary file added airPLS_R/data/nmr.rda
Binary file not shown.
Binary file added airPLS_R/data/raman.rda
Binary file not shown.
27 changes: 27 additions & 0 deletions airPLS_R/man/WhittakerSmooth.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
\name{WhittakerSmooth}
\alias{WhittakerSmooth}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{Whittaker Smoother}
\description{
penalized least squares algorithm for background fitting
}
\usage{
WhittakerSmooth(x,w,lambda)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{x}{ raman spectrum }
\item{w}{ binary masks (value of the mask is zero if a point belongs to peaks and one otherwise) }
\item{lambda}{lambda is an adjustable parameter, it can be adjusted by user. The larger lambda is, the smoother z will be }
\item{differences}{ an integer indicating the order of the difference of penalties}
}


\value{
the fitted vector
}


\author{Yizeng Liang ,Zhang Zhimin}

\keyword{WhittakerSmooth}
28 changes: 28 additions & 0 deletions airPLS_R/man/airPLS-function.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
\name{airPLS-function}
\alias{airPLS-function}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{adaptive iteratively reweighted penalized least squares}
\description{
adaptive iteratively reweighted penalized least squares for baseline fitting
}
\usage{
airPLS(x,lambda=10,differences=1, itermax=20)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{x}{ spectrum }
\item{lambda}{lambda is an adjustable parameter, it can be adjusted by user. The larger lambda is, the smoother z will be }
\item{differences}{ an integer indicating the order of the difference of penalties}
}


\value{
the fitted vector
}


\author{Yizeng Liang ,Zhang Zhimin}

\seealso{\code{\link{WhittakerSmooth}}}

\keyword{airPLS-function}
19 changes: 19 additions & 0 deletions airPLS_R/man/airPLS.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
\name{airPLS}
\alias{airPLS}
\title{
Baseline correction using adaptive iteratively reweighted penalized least squares
}
\description{
Baseline drift always blurs or even swamps signals and deteriorates analytical results, particularly in multivariate analysis. It is necessary to correct baseline drift to perform further data analysis. Simple or modified polynomial fitting has been found to be effective in some extent. However, this method requires user intervention and prone to variability especially in low signal-to-noise ratio environments. The proposed adaptive iteratively reweighted Penalized Least Squares (airPLS) algorithm doesn't require any user intervention and prior information, such as detected peaks. It iteratively changes weights of sum squares errors (SSE) between the fitted baseline and original signals, and the weights of SSE are obtained adaptively using between previously fitted baseline and original signals. This baseline estimator is general, fast and flexible in fitting baseline.
}
\details{
\tabular{ll}{
Package: \tab airPLS\cr
Type: \tab Package\cr
Version: \tab 1.0.0\cr
Date: \tab 2009-10-09\cr
License: \tab GPL (>= 2)\cr
}
}
\author{ yizeng liang<[email protected]>, zhimin zhang <[email protected]>, chen shan <[email protected]>}
\keyword{ package }
Loading

0 comments on commit e21fb0f

Please sign in to comment.