diff --git a/01-spatial-data-handling.Rmd b/01-spatial-data-handling.Rmd
index c262440..18f2256 100644
--- a/01-spatial-data-handling.Rmd
+++ b/01-spatial-data-handling.Rmd
@@ -7,7 +7,7 @@ This R notebook covers the functionality of the [Spatial Data Handling](http://g
 The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments
 on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).
 
-In this lab, we will use the City of Chicago open data portal to download data on abandoned vehicles. Our end goal is to create a choropleth map with abandoned vehicles per capita for Chicago community areas. Before we can create the maps, we will need to download the information, select observations, aggregate data, join different files and carry out variable transformations in order to obtain a so-called “spatially intensive” variable for mapping (i.e., not just a count of abandoned vehicles, but a per capita ratio).
+In this lab, we will use the City of Chicago open data portal to download data on abandoned vehicles. Our end goal is to create a choropleth map with abandoned vehicles per capita for Chicago community areas. Before we can create the maps, we will need to download the information, select observations, aggregate data, join different files and carry out variable transformations in order to obtain a so-called “spatially intensive” variable for mapping (i.e., not just a count of abandoned vehicles, but a per capita ratio). These manipulations (also called data munging or wrangling) are typically required to get your data set ready for analysis. It is commonly argued that this typically takes around 80% of the effort in a data science project (@data_mining). 
 
 ### Objectives {-}
 
diff --git a/02-eda-1.Rmd b/02-eda-1.Rmd
index d19118e..3d3f817 100644
--- a/02-eda-1.Rmd
+++ b/02-eda-1.Rmd
@@ -118,8 +118,8 @@ We follow the discussion in the GeoDa workbook and start with the common univari
 descriptive graphs, the histogram and box plot. Before covering the specifics, we
 provide a brief overview of the principles behind the **ggplot** operations.
 
-Note that linking and brushing between a plot and a map is not (yet) readily
-implemented in R, so that our discussion will focus primarily on static graphs.
+Note that linking^[Linking refers to how a selection in any of the views results in the same observation to immediately be selected in all other views.] and brushing^[Brushing is a dynamic extension of the linking process. For some early exposition and discussion of these ideas pertaining to so-called dynamic graphics, see, e.g. the classic references of @s87, @bc87, @bcw87, @m89; as well as in the outline of legacy `GeoDa` fictionality in @ask06] between a plot and a map is not (yet) readily
+implemented in R, so that our discussion will focus primarily on static graphs. 
 
 ### A quick introduction to **ggplot** {-}
 We will be using the commands in the **ggplot2** package for the descriptive statistics plots. There are many options to create nice looking graphs in R, including the functionality in base R, but we chose **ggplot2** for its clean logic and its 
@@ -654,7 +654,7 @@ results on the graph. We don't pursue this any further.
 
 #### Loess smoother {-}
 The default nonlinear smoother in **ggplot** uses the **loess** algorithm as a locally
-weighted regression model. This is similar in spirit to the **LOWESS** method used in GeoDa, but not the same.^[See the GeoDa workbook for further discussion] The implementation is along the same lines as the linear smoother, using
+weighted regression model. This is similar in spirit to the **LOWESS** method used in GeoDa, but not the same.^[See the GeoDa workbook, and @c79, @l99, for further discussion] The implementation is along the same lines as the linear smoother, using
 `geom_smooth`, with the only difference that the `method` is now `loess`, as shown below.
 
 ```{r}
@@ -810,7 +810,7 @@ ggplot(nyc.data,aes(x=kids2000,y=pubast00)) +
 
 
 ### Chow test {-}
-In GeoDa, a Chow test on the equality of the regression coefficients between the selected and unselected observations is calculated on the fly and shown at the
+In GeoDa, a Chow test (@c60)on the equality of the regression coefficients between the selected and unselected observations is calculated on the fly and shown at the
 bottom of the scatter plot. This is not supported by **ggplot**, but we can
 run separate regressions for each subset using `lm`. We can also run the Chow test itself, using the `chow.test` command from the **gap** package.
 
diff --git a/03-eda-2.Rmd b/03-eda-2.Rmd
index f93e377..f7b70b8 100644
--- a/03-eda-2.Rmd
+++ b/03-eda-2.Rmd
@@ -360,7 +360,7 @@ plot_ly(nyc.data, x = ~kids2000, y = ~pubast00, z = ~rent2002) %>%
 ## True Multivariate EDA: Parallel Coordinate Plot and Conditional Plots {-}
 True multivariate EDA deals with situations where more than three variables
 are considered. We follow the GeoDa Workbook and illustrate the Parallel Coordinate
-Plot, or PCP, and conditional plots. For the former, we again need to resort to
+Plot, or PCP^[The parallel coordinate plot or PCP is designed to visually identify clusters and patterns in a multi-dimensional variable space. Originally suggested by @i85 (see also @i90) it has become a main feature in many visual data mining frameworks, e.g. @w90 and @wd03.], and conditional plots^[Conditional plots, also known as facet graphs or Trellis graphs (@bcs96)]. For the former, we again need to resort to
 **GGally**, but for the latter, we can exploit the `facet_wrap` and `facet_grid` functions of **ggplot**. In addition, we can turn these plots into interactive graphs by means of the **plotly** functionality.
 
 
diff --git a/04-mapping.Rmd b/04-mapping.Rmd
index 32f6015..2314334 100644
--- a/04-mapping.Rmd
+++ b/04-mapping.Rmd
@@ -555,7 +555,7 @@ tm_shape(nyc.bound) +
 
 ### Natural breaks map {-}
 
-A natural breaks map is obtained by specifying the **style = "jenks"** in `tm_fill`. All
+A natural breaks map^[A natural breaks map uses a nonlinear algorithm to group observations such that the within-group homogeneity is maximized, following the pathbreaking work of @f58 and @j77.] is obtained by specifying the **style = "jenks"** in `tm_fill`. All
 the other options are as before. Again, we illustrate this for four categories,
 with **n=4**.
 
@@ -616,7 +616,7 @@ tm_shape(nyc.bound) +
 ## Extreme Value Maps {-}
 
 In addition to the common map classifications, GeoDa also supports three types of extreme
-value maps: a percentile map, box map, and standard deviation map. For details on the
+value maps^[Extreme value maps are variations of common choropleth maps where the classification is designed to highlight extreme values at the lower and upper end of the scale, with the goal of identifying outliers. These maps were developed in the spirit of spatializing EDA, i.e., adding spatial features to commonly used approaches in non-spatial EDA (@a94)]: a percentile map, box map, and standard deviation map. For details on the
 rationale and methodology behind these maps, we refer to the GeoDa Workbook.
 
 Of the three extreme value maps, only 
@@ -978,7 +978,7 @@ tm_shape(nyc.bound) +
 
 
 ### Co-location map {-}
-A special case of a map for categorical variables is a so-called co-location map,
+A special case of a map for categorical variables is a so-called co-location map^[The idea behind a co-location map is the extension of the unique value map concept to a multivariate context. In essence, it is the implementation of ideas related to the principles of map overlay or map algebra applied to categorical maps.Map algebra tends to be geared to applications for raster data, i.e., regular grids. However, since the polygons for the different variables are identical, the same principles can be applied in the context of the categorical maps. The classic reference on the principles of map algebra is @t90],
 implemented in GeoDa. This map shows the values for those locations where two
 categorical variables take on the same value (it is up to the user to make sure
 the values make sense). Further details are given in the GeoDa Workbook.
@@ -1103,7 +1103,7 @@ tm_shape(nyc.bound) +
 
 ## Conditional Map {-}
 
-A conditional map, or facet map, or small multiples, is created by the `tm_facets` command.
+A conditional map^[Discussed at length in @cp10], or facet map, or small multiples, is created by the `tm_facets` command.
 This largely follows the logic of the `facet_grid` command in **ggplot** that we covered in the
 EDA notes. An extensive set of options is available to customize the facet maps. An in-depth
 coverage of all the subtleties is beyond our scope 
@@ -1154,7 +1154,7 @@ tm_shape(nyc.bound) +
 
 ## Cartogram {-}
 
-A final map functionality that we replicate from the GeoDa Workbook is the cartogram. GeoDa
+A final map functionality that we replicate from the GeoDa Workbook is the cartogram^[A cartogram is a map type where the original layout of the areal unit is replaced by a geometric form (usually a circle, rectangle, or hexagon) that is proportional to the value of the variable for the location. This is in contrast to a standard choropleth map, where the size of the polygon corresponds to the area of the location in question. The cartogram has a long history and many variants have been suggested, some quite creative. In essence, the construction of a cartogram is an example of a nonlinear optimization problem, where the geometric forms have to be located such that they reflect the topology (spatial arrangement) of the locations as closely as possible (see @t04, for an extensive discussion of various aspects of the cartogram)]. GeoDa
 implements a so-called circular cartogram, where circles represent spatial units and their
 size is proportional to a specified variable.
 
diff --git a/05-rate-mapping.Rmd b/05-rate-mapping.Rmd
index 9d63ec9..dd20cd8 100644
--- a/05-rate-mapping.Rmd
+++ b/05-rate-mapping.Rmd
@@ -356,7 +356,7 @@ other observations. This idea goes back to the fundamental contributions of Jame
 and Stein (the so-called James-Stein paradox), who showed that in some instances
 biased estimators may have better precision in a mean squared error sense.
 
-GeoDa includes three methods to smooth the rates: an Empirical Bayes approach, a
+GeoDa includes three methods to smooth the rates: an Empirical Bayes approach^[There are several excellent books and articles on Bayesian statistics, with @gcsdvr14 as a classic reference.], a
 spatial averaging approach, and a combination between the two. We will consider
 the spatial approaches after we discuss distance-based spatial weights. Here, we
 focus on the Empirical Bayes (EB) method. First, we provide some formal
@@ -396,7 +396,7 @@ prior and the likelihood in such a way that a proper posterior distribution
 results. In the context of rate estimation, the standard approach is to specify a
 Poisson distribution for the observed count of events (conditional upon the risk
 parameter), and a Gamma distribution for the prior of the risk parameter $\pi$.
-This is referred to as the Poisson-Gamma model.
+This is referred to as the Poisson-Gamma model^[For an extensive discussion, see, for example, the classic papers by @ck87 and @m91.].
 
 In this model, the prior distribution for the (unknown) risk parameter $\pi$ is
 $Gamma(\alpha,\beta)$, where $\alpha$ and $\beta$ are the shape and scale
@@ -439,7 +439,7 @@ In essense, the EB technique consists of computing a weighted average between th
 raw rate for each county and the state average, with weights proportional to the
 underlying population at risk. Simply put, small counties (i.e., with a small
 population at risk) will tend to have their rates adjusted considerably, whereas
-for larger counties the rates will barely change.
+for larger counties the rates will barely change^[For an extensive technical discussion, see also @alk06].
 
 More formally, the EB estimate for the risk in location i is:
 $$\pi_i^{EB}=w_ir_i + (1-w_i)\theta$$
diff --git a/06-contiguity-spatial-weights.Rmd b/06-contiguity-spatial-weights.Rmd
index 491465b..5163947 100644
--- a/06-contiguity-spatial-weights.Rmd
+++ b/06-contiguity-spatial-weights.Rmd
@@ -7,7 +7,7 @@ This notebook covers the functionality of the [Contiguity-Based Spatial Weights]
 The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments
 on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).
 
-For this notebook, we use U.S. Homicide data. Our goal in this lab is show how to implement contiguity based spatial weights 
+For this notebook, we use U.S. Homicide data. Our goal in this lab is show how to implement contiguity based spatial weights. 
 
 
 ```{r}
@@ -105,7 +105,7 @@ In practice, the construction of the spatial weights from the geometry of the da
 cannot be done by visual inspection or manual calculation, except in the most
 trivial of situations. To assess whether two polygons are contiguous requires the
 use of explicit spatial data structures to deal with the location and arrangement of
-the polygons. This is implemented through the spatial weights functionality in
+the polygons^[Further technical details on spatial weights are contained Chapters 3 and 4 of @ass02.]. This is implemented through the spatial weights functionality in
 GeoDa. We will do this with **sf** and **spdep** libraries.
 
 We will create our neighbors using **sf** first, as the **spdep** library doesn't 
@@ -235,7 +235,7 @@ sf.nb.queen <- as.nb.sgbp(sf.sgbp.queen)
 
 ## Higher Order Contiguity {-}
 
-Now we move on to higher order contiguity weights. To make these we will need the
+Now we move on to higher order contiguity weights^[Importantly, there is quite a difference between the higher order contiguity and lower order neighbors, namely that the pure higher order contiguity does not include any lower order neighbors. This is the notion appropriate for use in a statistical analysis of spatial autocorrelation for different spatial lag orders. In order to achieve this, all redundant and circular paths need to be removed (see @as96, for a technical discussion)]. To make these we will need the
 **spdep** package. We will use the `nblag` and `nblag_cumul` functions to compute
 the higher order weights.
 
@@ -431,7 +431,7 @@ summary(rook.card)
 
 ## Saving Neighbors {-}
 To save our neighbors list, we use the `write.nb.gal` function from
-the **spdep** package. The file format is a GAL Lattice file. We
+the **spdep** package. The file format is a GAL Lattice file^[The GAL weights file is a simple text file that contains, for each observation, the number of neighbors and their identifiers. The format was suggested in the 1980s by the Geometric Algorithms Lab at Nottingham University and achieved widespread use after its inclusion in `SpaceStat` (@a92), and subsequent adoption by the R `spdep` package and others.]. We
 input our neighbors list, and the the filename second. We have two 
 options from this point. We can save the file with the old style
 or the new GeoDa header style.
diff --git a/07-distance-based-spatial-weights.Rmd b/07-distance-based-spatial-weights.Rmd
index 268eb11..eb81029 100644
--- a/07-distance-based-spatial-weights.Rmd
+++ b/07-distance-based-spatial-weights.Rmd
@@ -7,7 +7,7 @@ This notebook cover the functionality of the [Distance-Based Spatial Weights](ht
 The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments
 on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).
 
-For this notebook, we use Cleveland homesale point data. Our goal in this lab is show how to implement distance-band spatial weights 
+For this notebook, we use Cleveland homesale point data. Our goal in this lab is show how to implement distance-band spatial weights^[Further technical details on distance-based spatial weights are contained Chapters 3 and 4 of @ar14, although the software illustrations are for an earlier `GeoDa` interface design.]. 
 
 
 
@@ -473,7 +473,7 @@ plot(k6, coords, lwd=.2, col="blue", cex = .5)
 ## Generalizing the Concept of Contiguity {-}
 
 In GeoDa, the concept of contiguity can be generalized to point layers by converting
-the latter to a tessellation, specifically Thiessen polygons. Queen or rook contiguity
+the latter to a tessellation, specifically Thiessen polygons^[For a more extensive technical discussion and historical background, see @y16.]. Queen or rook contiguity
 weights can then be created for the polygons, in the usual way.
 
 Similarly, the concepts of distance-band weights and k-nearest neighbor weights can be
diff --git a/08-spatial-weights-as-distance-functions.Rmd b/08-spatial-weights-as-distance-functions.Rmd
index 6f49b88..6034bb8 100644
--- a/08-spatial-weights-as-distance-functions.Rmd
+++ b/08-spatial-weights-as-distance-functions.Rmd
@@ -96,7 +96,7 @@ $$w_{ij}=f(d_{ij},\theta)$$
 with f as a functional form and $\theta$ a vector of parameters.
 
 In order to conform to Tobler’s first law of geography, a distance decay effect must be
-respected. In other words, the value of the function of distance needs to decrease with a 
+respected^[Tober’s so-called first law of geography postulates that everything is related to everything else, but closer things more so @t70]. In other words, the value of the function of distance needs to decrease with a 
 growing distance. More formally, the partial derivative of the distance function with respect 
 to distance should be negative, $\partial{}w_{ij}/\partial{}d_{ij}<0$
 .
@@ -294,7 +294,7 @@ invd.weights.knn$weights[1]
 
 Kernel weights are used in non-parametric approaches to model spatial covariance, such
 as in the HAC method for heteroskedastic and spatial autocorrelation consistent
-variance estimates.
+variance estimates^[This method is currently not implemented in GeoDa, but is available in GeoDaSpace and PySal (see @hp94, @kp07, among others, for technical aspects, and @ass02 for implementation details.].
 
 The kernel weights are defined as a function K(z) of the ratio between the distance dij
 from i to j, and the bandwidth $h_i$, with $z=d_{ij}/h_i$. This ensures that z is 
@@ -334,7 +334,7 @@ farthest apart.
 
 In creating kernal weights, we will cover two important options: the fixed bandwidth
 and the variable bandwidth. For the fixed bandwidth, we will be using distance-band
-neighbors. For the variable bandwidth we will need kth-nearest neighbors.
+neighbors. For the variable bandwidth we will need kth-nearest neighbors^[In GeoDa the default value for k equals the cube root of the number of observations (following the recommendation in @kp07. In general, a wider bandwidth gives smoother and more robust results, so the bandwidth should always be set at least as large as the recommended default.].
 
 To start, we will compute a new distance-band neighbors list with the critcial threshold,
 calculated earlier in the notebook.
diff --git a/09-applications-of-spatial-weights.Rmd b/09-applications-of-spatial-weights.Rmd
index 6527641..89b05f7 100644
--- a/09-applications-of-spatial-weights.Rmd
+++ b/09-applications-of-spatial-weights.Rmd
@@ -503,7 +503,7 @@ kable(head(df))
 
 $$[W_y]_i = \Sigma_jK_{ij}y_j$$
 
-
+Kernel-based spatially lagged variables correspond to a form of local smoothing. They can be used in specialized regression specifications, such as geographically weighted regression (GWR)^[GWR is not implemented in GeoDa. For further details on the use of kernel-based spatially lagged variables in GWR, see, e.g., @fbc02].
 
 
 
@@ -513,7 +513,7 @@ $$[W_y]_i = \Sigma_jK_{ij}y_j$$
 ### Principle {-}
 
 A spatial rate smoother is a special case of a nonparameteric rate estimator, based on
-the principle of locally weighted estimation. Rather than applying a local average to
+the principle of locally weighted estimation (see, e.g., @wg04). Rather than applying a local average to
 the rate itself, as in an application of a spatial window average, the weighted
 average is applied separately to the numerator and denominator.
 
@@ -530,7 +530,7 @@ diagonal)
 
 Different smoothers are obtained for different spatial definitions of neighbors and/or
 different weights applied to those neighbors (e.g., contiguity weights, inverse
-distance weights, or kernel weights).
+distance weights, or kernel weights)^[An early example was the spatial rate smoother outlined in @k96, based on the notion of a spatial moving average or window average (see also @97).].
 
 
 The window average is not applied to the rate itself, but it is computed separately for the
@@ -942,8 +942,7 @@ set to zero.
 
 
 The spatial EB smoothed rate is computed as a weighted average of the crude rate and
-the prior, in the same manner as for the standard EB rate
-
+the prior, in the same manner as for the standard EB rate (see the discussion in the Chapter on mapping rates, as well as @alk06, for technical details).
 For reference the EB rate in this case is as denoted below:
 
 $$w_i = \frac{\sigma_i^2}{\sigma_i^2 + \mu_i / P_i}$$
diff --git a/10-global-spatial-autocorrelation1.Rmd b/10-global-spatial-autocorrelation1.Rmd
index ea753af..ac9cdae 100644
--- a/10-global-spatial-autocorrelation1.Rmd
+++ b/10-global-spatial-autocorrelation1.Rmd
@@ -227,8 +227,8 @@ queen.weights <- nb2listw(queen.nb)
 #### Moran's I
 
 Moran’s I statistic is arguably the most commonly used indicator of global spatial
-autocorrelation. It was initially suggested by Moran (1948), and popularized through
-the classic work on spatial autocorrelation by Cliff and Ord (1973). In essence, it is
+autocorrelation. It was initially suggested by @m48, and popularized through
+the classic work on spatial autocorrelation by @co73. In essence, it is
 a cross-product statistic between a variable and its spatial lag, with the variable
 expressed in deviations from its mean. For an observation at location i, this is
 expressed as $z_i = x_i - \bar{x}$, where $\bar{x}$is the mean of variable x.
@@ -244,7 +244,7 @@ as the sum of all of the weights and n as the number of observations.
 Inference for Moran’s I is based on a null hypothesis of spatial randomness. The
 distribution of the statistic under the null can be derived using either an
 assumption of normality (independent normal random variates), or so-called
-randomization (i.e., each value is equally likely to occur at any location).
+randomization (i.e., each value is equally likely to occur at any location)^[While the analytical derivations provide easy to interpret expressions for the mean and the variance of the statistic under the null hypothesis, inference based on them employs an approximation to a standard normal distribution, which may be inappropriate when the underlying assumptions are not satisfied. See @co73 or @81 for extensive technical discussion.].
 
 An alternative to an analytical derivation is a computational approach based on
 permutation. This calculates a reference distribution for the statistic under the
@@ -269,7 +269,7 @@ significant than a result with a p-value of 0.001 with 999 permutations.
 
 #### Moran scatter plot
 
-The Moran scatter plot, first outlined in Anselin (1996), consists of a plot with the spatially
+The Moran scatter plot, first outlined in @a96, consists of a plot with the spatially
 lagged variable on the y-axis and the original variable on the x-axis. The slope of the linear fit to
 the scatter plot equals Moran’s I.
 
@@ -550,7 +550,7 @@ clev.points<- clev.points %>% mutate(bottom_left = if_else((x < mid_x & y < mid_
 
 
 Before we run the chow test, we will visualize the difference in slopes of the selected data, non-selected
-data and the aggregate data. With **ggplot2**, we can accomplish this by setting categorical colors based
+data and the aggregate data^[This is a visual implementation of a regionalized Moran’s I, where the indicator of spatial autocorrelation is calculated for a subset of the observations (@mm96)]. With **ggplot2**, we can accomplish this by setting categorical colors based
 whether or not an observation is "Selected" or "Rest". To do this, we specify `aes(color = bottom_left)` in
 both `geom_point` and `geom_smooth`. This will give us colored points and regression lines for "Selected"
 and "Rest". Then to get blue and red colors, we use `scale_color_manual`. For this plot, we do not set 
@@ -620,7 +620,7 @@ A non-parametric spatial correlogram is an alternative measure of global spatial
 autocorrelation that does not rely on the specification of a spatial weights
 matrix. Instead, a local regression is fit to the covariances or correlations
 computed for all pairs of observations as a function of the distance between them
-(for example, as outlined in Bjornstad and Falck 2001).
+(for example, as outlined in @bf01)^[See also @hp94 for technical discussion of the general principle.].
 
 
 With standardized variables z, this boils down to a local regression:
diff --git a/11-global-spatial-autocorrelation2.Rmd b/11-global-spatial-autocorrelation2.Rmd
index 2a0de94..0bab14f 100644
--- a/11-global-spatial-autocorrelation2.Rmd
+++ b/11-global-spatial-autocorrelation2.Rmd
@@ -161,14 +161,14 @@ queen.weights <- nb2listw(queen.nb)
 
 The concept of bivariate spatial correlation is complex and often misinterpreted.
 It is typically considered to the correlation between one variable and the spatial
-lag of another variable, as originally implemented in the precursor of GeoDa.
+lag of another variable, as originally implemented in the precursor of GeoDa (e.g. as described in @ass02).
 However this does not take into account the inherent correlation between the 
 two variables. More precisely, the bivariate spatial correlation
 is between $x_i$ and $\Sigma_jw_{ij}y_j$, but does not take into account the correlation
 between $x_i$ and $y_i$, i.e. between two variables at the same location.
 
 As a result, this statistic is often interpreted incorrectly, as it may overestimate
-the spatial aspect of the correlation that instead may be due mostly to in-place correlation.
+the spatial aspect of the correlation that instead may be due mostly to in-place correlation. @l01 provides an alternative that considers a separation between the correlative aspect and the spatial aspect, but we will not pursue that here. 
 
 Below, we provide a more in-depth assessment of the different aspects of bivariate spatial and
 non-spatial association, but first we turn to the original concept of a bivariate Moran scatter
@@ -518,8 +518,11 @@ used thus far.
 
 ## Moran Scatter Plot for EB Rates
 
+An Empirical Bayes (EB) standardization was suggested by @ar99 as a means to correct Moran’s I spatial autocorrelation test statistic for varying population densities across observational units, when the variable of interest is a rate or proportion. This standardization borrows ideas from the Bayesian shrinkage estimator outlined in discussion of Empirical Bayes smoothing.
 
+This approach is different from EB smoothing in that the spatial autocorrelation is not computed for a smoothed version of the original rate, but for a transformed standardized random variable. In other words, the crude rate is turned into a new variable that has a mean of zero and unit variance, thus avoiding problems with variance instability. The mean and variance used in the transformation are computed for each individual observation, thereby properly accounting for the instability in variance.
 
+The technical aspects are given in detail in @ar99 and in the review @alk06, but we briefly touch on some salient issues in the [GeoDa notes](https://geodacenter.github.io/workbook/5b_global_adv/lab5b.html). 
 
 
 ### Concept
diff --git a/12-local-spatial-autocorrelation1.Rmd b/12-local-spatial-autocorrelation1.Rmd
index c4a6096..c4cb796 100644
--- a/12-local-spatial-autocorrelation1.Rmd
+++ b/12-local-spatial-autocorrelation1.Rmd
@@ -133,7 +133,7 @@ tm_shape(guerry) +
 
 ### Principle
 
-The local Moran statistic was suggested in Anselin(1995) as a way to identify
+The local Moran statistic was suggested in @a95 as a way to identify
 local clusters and local spaital outliers. Most global spatial autocorrelation
 can be expressed as a double sum over i and j indices, such as $\Sigma_i\Sigma_jg_{ij}$.
 The local form of such a statistic would then be, for each observation(location)i, the
@@ -236,7 +236,7 @@ significance_map(guerry,"Donatns", type = "moran", permutations = 99999) +
 An important methodological issue associated with the local spatial autocorrelation statistics
 is the selection of the p-value cut-off to properly reflect the desired Type I error. Not only
 are the pseudo p-values not analytical, since they are the result of a computational permutation
-process, but they also suffer from the problem of multiple comparisons. The bottom line is that a 
+process, but they also suffer from the problem of multiple comparisons (for a detailed discussion, see @ds06; @19). The bottom line is that a 
 traditional choice of 0.05 is likely to lead to many false positives, i.e., rejections of the null 
 when in fact it holds.
 
@@ -262,7 +262,7 @@ significance_map(guerry,"Donatns", type = "moran",permutations = 99999, alpha =
 The Bonferroni bound constructs a bound on the overall p-value by taking $\alpha$ and 
 dividing it by the number of comparisons. In our context, the latter corresponds to the
 number of observation, n. As a result, the Bonferroni bound would be $\alpha/n = .00012$,
-the cutoff p-value to be used to determine significance. We assign **bonferroni** to be
+the cutoff p-value to be used to determine significance^[Note that in their recent overview of computer age statistical inference, @eh16 suggest the use of the term interesting observations, rather than signficant, which we will adopt as well.]. We assign **bonferroni** to be
 .01 / 85. Then we use `moran_map` with `permutations = 99999` and `alpha = bonferroni`. This 
 will give us a local moran cluster map with a bonferroni significance cut-off.
 ```{r}
@@ -294,7 +294,7 @@ permutations, to ensure that the minimum p-value can be less than $\alpha/n$. Th
 permutations supported is 99999 for **spatmap**, which is also the same for GeoDa. **spatmap** gives
 an error message if the number of permutations cannot yield a significanct location with the desired
 alpha level. This means that the bonferroni approach will be limited for datasets with many locations.
-With $\alpha = .01$, datasets with n > 1000, cannot yield significant locations. 
+With $\alpha = .01$, datasets with n > 1000, cannot yield significant locations^[A slightly less conservative option is to use the False Discovery Rate (FDR), first proposed by @bh95]. 
 
 
 
diff --git a/refs.bib b/refs.bib
index cd08afe..eb05d3b 100644
--- a/refs.bib
+++ b/refs.bib
@@ -1,5 +1,12 @@
+@book{data_mining, 
+  title = {Exploratory Data Mining and Data Cleaning},
+  author = {Dasu, Tamraparni, and Johnson, Theodore},
+  year = {2003},
+  publisher = {John Wiley & Sons}, 
+  address = {Hoboken, NJ}
+}
 
-@article{tennekes_tmap_2018,
+@Manual{tennekes_tmap_2018,
 	title = {tmap: {Thematic} {Maps} in {R}},
 	volume = {84},
 	issn = {1548-7660},
@@ -26,4 +33,511 @@ @article{ligges_scatterplot3d_2003
 	journal = {Journal of Statistical Software},
 	author = {Ligges, Uwe and Mächler, Martin},
 	year = {2003}
-}
\ No newline at end of file
+}
+
+@article{ask06,
+author = {Anselin, Luc and Ibnu Syabri and Youngihn Kho},
+journal = {Geographical Analysis},
+pages = {5-22},
+title = {GeoDa, an Introduction to Spatial Data Analysis},
+volume = {38},
+year = {2006},
+}
+
+@article{bc87,
+author = {Becker, Richard A. and W. S. Cleveland},
+journal = {Technometrics},
+pages = {127-42},
+title = {Brushing Scatterplots},
+volume = {29},
+year = {1987},
+}
+
+@article{bcw87,
+author = {Becker, Richard A. and W. S. Cleveland and A. R. Wilks},
+journal = {Statistical Science},
+pages = {355-95},
+title = {Dynamic Graphics for Data Analysis},
+volume = {2},
+year = {1987},
+}
+
+@article{c60,
+author = {Chow, G.},
+journal = {Econometrica},
+pages = {591-605},
+title = {Tests of Equality Between Sets of Coefficients in Two Linear Regressions},
+volume = {28},
+year = {1960},
+}
+
+@article{c79,
+author = {Cleveland, William S.},
+journal = {Journal of the American Statistical Association},
+pages = {829-36},
+title = {Robust Locally Weighted Regression and Smoothing Scatterplots},
+volume = {74},
+year = {1979},
+}
+
+@book{l99,
+address = {Heidelberg},
+author = {Loader, Catherine},
+publisher = {Springer-Verlag},
+title = {Local Regression and Likelihood},
+year = {1999},
+}
+
+@incollection{04,
+address = {Berlin},
+author = {James E. Gentle, Wolfgang Härdle, and Yuichi Mori},
+booktitle = {Handbook of Computational Statistics: Concepts and Methods},
+pages = {539-63},
+publisher = {Springer-Verlag},
+title = {Smoothing: Local Regression Techniques},
+year = {2004},
+}
+
+@article{m89,
+author = {Monmonier, Mark},
+journal = {Geographical Analysis},
+pages = {81-84},
+title = {Geographic Brushing: Enhancing Exploratory Analysis of the Scatterplot Matrix},
+volume = {21},
+year = {1989},
+}
+
+@article{s87,
+author = {Stuetzle, W.},
+journal = {Journal of the American Statistical Association},
+pages = {466-75},
+title = {Plot Windows},
+volume = {82},
+year = {1987},
+}
+
+@article{bcs96,
+author = {Becker, Richard A. and W. S. Cleveland and M-J. Shyu},
+journal = {Journal of Computational and Graphical Statistics},
+pages = {123-55},
+title = {The Visual Design and Control of Trellis Displays},
+volume = {5},
+year = {1996},
+}
+
+@article{i85,
+author = {Inselberg, A.},
+journal = {Visual Computer},
+pages = {69-91},
+title = {The Plane with Parallel Coordinates},
+volume = {1},
+year = {1985},
+}
+
+@inproceedings{id90,
+author = {Inselberg, Alfred and B. Dimsdale},
+booktitle = {Proceedings of the IEEE Visualization 90},
+pages = {361-78},
+title = {Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry},
+year = {1990},
+}
+
+@article{w90,
+author = {Wegman, Edward J.},
+journal = {Journal of the American Statistical Association},
+pages = {664-75},
+title = {Hyperdimensional Data Analysis Using Parallel Coordinates},
+volume = {85},
+year = {1990},
+}
+
+@article{wd03,
+author = {Wegman, Edward J. and Alan Dorfman},
+journal = {Computational Statistics and Data Analysis},
+number = {4},
+pages = {633-49},
+title = {Visualizing Cereal World},
+volume = {43},
+year = {2003},
+}
+
+@incollection{a94,
+address = {Luxembourg},
+author = {Anselin, Luc},
+booktitle = {New Tools for Spatial Analysis},
+editor = {Marco Painho},
+pages = {45-54},
+publisher = {Eurostat},
+title = {Exploratory Spatial Data Analysis and Geographic Information Systems},
+year = {1994},
+}
+
+@book{cp10,
+address = {Boca Raton, FL},
+author = {Carr, Daniel B. and Linda Williams Pickle},
+publisher = {Chapman & Hall/CRC},
+title = {Visualizing Data Patterns with Micromaps},
+year = {2010},
+}
+
+@article{f58,
+author = {Fisher, W. D.},
+journal = {Journal of the American Statistical Association},
+pages = {789-98},
+title = {On Grouping for Maximum Homogeneity},
+volume = {53},
+year = {1958},
+}
+
+@article{j77,
+author = {Jenks, G. F.},
+journal = {Department of Geography, University of Kansas},
+note = {Lawrence, KS},
+title = {Optimal Data Classification for Choropleth Maps},
+volume = {2},
+year = {1977},
+}
+
+@article{t04,
+author = {Tobler, Waldo},
+journal = {Annals of the Association of American Geographers},
+pages = {58-73},
+title = {Thirty Five Years of Computer Cartograms},
+volume = {94},
+year = {2004},
+}
+
+@book{t90,
+address = {Englewood Cliffs, NJ},
+author = {Tomlin, C. Dana},
+publisher = {Prentice-Hall},
+title = {Geographic Information Systems and Cartographic Modeling},
+year = {1990},
+}
+
+@unpublished{alk06,
+author = {Anselin, Luc and Nancy Lozano-Gracia and Julia Koschinky},
+note = {Technical Report. Urbana, IL: Spatial Analysis Laboratory, Department of Geography, University of Illinois},
+title = {Rate Transformations and Smoothing},
+year = {2006},
+}
+
+@article{ck87,
+author = {Clayton, David and John Kaldor},
+journal = {Biometrics},
+pages = {671-81},
+title = {Empirical {B}ayes Estimates of Age-Standardized Relative Risks for Use in Disease Mapping},
+volume = {43},
+year = {1987},
+}
+
+@book{gcsdvr14,
+address = {Boca Raton, FL},
+author = {Gelman, Andrew and John B. Carlin and Hal S. Stern and David B. Dunson and Aki Vehtari and Donald B. Rubin},
+publisher = {Chapman & Hall},
+title = {{B}ayesian Data Analysis, Third Edition},
+year = {2014},
+}
+
+@book{lbr03,
+address = {Chichester},
+author = {Lawson, Andrew B. and William J. Browne and Carmen L. Vidal Rodeiro},
+publisher = {John Wiley},
+title = {Disease Mapping with WinBUGS and MLwiN},
+year = {2003},
+}
+
+@article{m91,
+author = {Marshall, R. J.},
+journal = {Applied Statistics},
+pages = {283-94},
+title = {Mapping Disease and Mortality Rates Using Empirical {B}ayes Estimators},
+volume = {40},
+year = {1991},
+}
+
+@article{xc98,
+author = {Xia, Hong and Bradley P. Carlin},
+journal = {Statistics in Medicine},
+pages = {2025-43},
+title = {Spatio-Temporal Models with Errors in Covariates: Mapping Ohio Lung Cancer Mortality},
+volume = {17},
+year = {1998},
+}
+
+@book{a92,
+address = {University of California, Santa Barbara, CA},
+author = {Anselin, Luc},
+publisher = {National Center for Geographic Information; Analysis (NCGIA)},
+title = {SpaceStat, a Software Program for Analysis of Spatial Data},
+year = {1992},
+}
+
+@book{ar14,
+address = {Chicago, IL},
+author = {Anselin, Luc and Sergio J. Rey},
+publisher = {GeoDa Press},
+title = {Modern Spatial Econometrics in Practice, a Guide to Geoda, Geodaspace and Pysal},
+year = {2014},
+}
+
+@article{as96,
+author = {Anselin, Luc and Oleg Smirnov},
+journal = {Journal of Regional Science},
+pages = {67-89},
+title = {Efficient Algorithms for Constructing Proper Higher Order Spatial Lag Operators},
+volume = {36},
+year = {1996},
+}
+
+@book{ar14,
+address = {Chicago, IL},
+author = {Anselin, Luc and Sergio J. Rey},
+publisher = {GeoDa Press},
+title = {Modern Spatial Econometrics in Practice, a Guide to Geoda, Geodaspace and Pysal},
+year = {2014},
+}
+
+@article{y16,
+author = {Yamada, Ikuho},
+journal = {The International Encyclopedia of Geography},
+pages = {1-6},
+title = {Thiessen Polygons},
+year = {2016},
+}
+
+@book{ar14,
+address = {Chicago, IL},
+author = {Anselin, Luc and Sergio J. Rey},
+publisher = {GeoDa Press},
+title = {Modern Spatial Econometrics in Practice, a Guide to Geoda, Geodaspace and Pysal},
+year = {2014},
+}
+
+@article{hp94,
+author = {Hall, P. and P. Patil},
+journal = {Probability Theory and Related Fields},
+pages = {399-424},
+title = {Properties of Nonparametric Estimators of Autocovariance for Stationary Random Fields},
+volume = {99},
+year = {1994},
+}
+
+@article{kp07,
+author = {Kelejian, Harry H. and Ingmar R. Prucha},
+journal = {Journal of Econometrics},
+pages = {131-54},
+title = {HAC Estimation in a Spatial Framework},
+volume = {140},
+year = {2007},
+}
+
+@article{t70,
+author = {Tobler, Waldo},
+journal = {Economic Geography},
+pages = {234-40},
+title = {A Computer Movie Simulating Urban Growth in the Detroit Region},
+volume = {46},
+year = {1970},
+}
+
+@unpublished{alk06,
+author = {Anselin, Luc and Nancy Lozano-Gracia and Julia Koschinky},
+note = {Technical Report. Urbana, IL: Spatial Analysis Laboratory, Department of Geography, University of Illinois},
+title = {Rate Transformations and Smoothing},
+year = {2006},
+}
+
+@book{fbc02,
+address = {Chichester},
+author = {Fotheringham, A. Stewart and Chris Brunsdon and Martin Charlton},
+publisher = {John Wiley},
+title = {Geographically Weighted Regression},
+year = {2002},
+}
+
+@article{k96,
+author = {Kafadar, Karen},
+journal = {Statistics in Medicine},
+pages = {2539-60},
+title = {Smoothing Geographical Data, Particularly Rates of Disease},
+volume = {15},
+year = {1996},
+}
+
+@article{97,
+journal = {Annals of Epidemiology},
+pages = {35-45},
+title = {Geographic Trends in Prostate Cancer Mortality: An Application of Spatial Smoothers and the Need for Adjustment},
+volume = {7},
+year = {1997},
+}
+
+@book{wg04,
+address = {Hoboken, NJ},
+author = {Waller, Lance A. and Carol A. Gotway},
+publisher = {John Wiley},
+title = {Applied Spatial Statistics for Public Health Data},
+year = {2004},
+}
+
+@incollection{a96,
+address = {London},
+author = {Anselin, Luc},
+booktitle = {Spatial Analytical Perspectives on Gis in Environmental and Socio-Economic Sciences},
+editor = {Manfred Fischer and Henk Scholten},
+pages = {111-25},
+publisher = {Taylor; Francis},
+title = {The Moran Scatterplot as an ESDA Tool to Assess Local Instability in Spatial Association},
+year = {1996},
+}
+
+@book{al20,
+author = {Anselin, Luc and Xun Li.},
+publisher = {Geographical Analysis},
+title = {Tobler's Law in a Multivariate World},
+year = {2020},
+url = {https://doi.org/10.111/gean.12237},
+}
+
+@article{bf01,
+author = {Bjornstad, Ottar N. and Wilhelm Falck},
+journal = {Environmental and Ecological Statistics},
+pages = {53-70},
+title = {Nonparametric Spatial Covariance Functions: Estimation and Testing},
+volume = {8},
+year = {2001},
+}
+
+@book{co73,
+address = {London},
+author = {Cliff, Andrew and J. Keith Ord},
+publisher = {Pion},
+title = {Spatial Autocorrelation},
+year = {1973},
+}
+
+@book{81,
+address = {London},
+publisher = {Pion},
+title = {Spatial Processes: Models and Applications},
+year = {1981},
+}
+
+@article{hp94,
+author = {Hall, P. and P. Patil},
+journal = {Probability Theory and Related Fields},
+pages = {399-424},
+title = {Properties of Nonparametric Estimators of Autocovariance for Stationary Random Fields},
+volume = {99},
+year = {1994},
+}
+
+@article{m48,
+author = {Moran, Patrick A. P.},
+journal = {Biometrika},
+pages = {255-60},
+title = {The Interpretation of Statistical Maps},
+volume = {35},
+year = {1948},
+}
+
+@article{mm96,
+author = {Munasinghe, Rajika L. and Robert D. Morris},
+journal = {Statistics in Medicine},
+pages = {893-905},
+title = {Localization of Disease Clusters Using Regional Measures of Spatial Autocorrelation},
+volume = {15},
+year = {1996},
+}
+
+@unpublished{alk06,
+author = {Anselin, Luc and Nancy Lozano-Gracia and Julia Koschinky},
+note = {Technical Report. Urbana, IL: Spatial Analysis Laboratory, Department of Geography, University of Illinois},
+title = {Rate Transformations and Smoothing},
+year = {2006},
+}
+
+@incollection{ass02,
+address = {University of California, Santa Barbara},
+author = {Anselin, Luc and Ibnu Syabri and Oleg Smirnov},
+booktitle = {New Tools for Spatial Data Analysis: Proceedings of the Specialist Meeting},
+editor = {Luc Anselin and Sergio Rey},
+publisher = {Center for Spatially Integrated Social Science (CSISS},
+title = {Visualizing Multivariate Spatial Correlation with Dynamically Linked Windows},
+year = {2002},
+}
+
+@article{ar99,
+author = {Assun\c{c}{\=a}o, Renato and Edna A. Reis},
+journal = {Statistics in Medicine},
+pages = {2147-61},
+title = {A New Proposal to Adjust Moran's I for Population Density},
+volume = {18},
+year = {1999},
+}
+
+@article{l01,
+author = {Lee, Sang-Il},
+journal = {Journal of Geographical Systems},
+pages = {369-85},
+title = {Developing a Bivariate Spatial Association Measure: An Integration of Pearson's r and Moran's I},
+volume = {3},
+year = {2001},
+}
+
+@article{m91,
+author = {Marshall, R. J.},
+journal = {Applied Statistics},
+pages = {283-94},
+title = {Mapping Disease and Mortality Rates Using Empirical {B}ayes Estimators},
+volume = {40},
+year = {1991},
+}
+
+@article{a95,
+author = {Anselin, Luc},
+journal = {Geographical Analysis},
+pages = {93-115},
+title = {Local Indicators of Spatial Association --- LISA},
+volume = {27},
+year = {1995},
+}
+
+@article{19,
+journal = {Geographical Analysis},
+number = {2},
+pages = {133-50},
+title = {A Local Indicator of Multivariate Spatial Association, Extending Geary's c},
+volume = {51},
+year = {2019},
+}
+
+@article{bh95,
+author = {Benjamini, Y. and Y. Hochberg},
+journal = {Journal of the Royal Statistical Society B},
+number = {1},
+pages = {289-300},
+title = {Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing},
+volume = {57},
+year = {1995},
+}
+
+@article{ds06,
+author = {de Castro, Maria Caldas and Burton H. Singer},
+journal = {Geographical Analysis},
+pages = {180-208},
+title = {Controlling the False Discovery Rate: An Application to Account for Multiple and Dependent Tests in Local Statistics of Spatial Association},
+volume = {38},
+year = {2006},
+}
+
+@book{eh16,
+address = {Algorithms, Evidence, and Data Science. Cambridge, UK},
+author = {Efron, Bradley and Trevor Hastie},
+publisher = {Cambridge University Press},
+title = {Computer Age Statistical Inference},
+year = {2016},
+}
+
+