spatialanalysis · deblnia · Dec 5, 2020 · Dec 12, 2020 · Dec 13, 2020
diff --git a/01-spatial-data-handling.Rmd b/01-spatial-data-handling.Rmd
@@ -7,7 +7,7 @@ This R notebook covers the functionality of the [Spatial Data Handling](http://g
 The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments
 on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).
 
-In this lab, we will use the City of Chicago open data portal to download data on abandoned vehicles. Our end goal is to create a choropleth map with abandoned vehicles per capita for Chicago community areas. Before we can create the maps, we will need to download the information, select observations, aggregate data, join different files and carry out variable transformations in order to obtain a so-called “spatially intensive” variable for mapping (i.e., not just a count of abandoned vehicles, but a per capita ratio).
+In this lab, we will use the City of Chicago open data portal to download data on abandoned vehicles. Our end goal is to create a choropleth map with abandoned vehicles per capita for Chicago community areas. Before we can create the maps, we will need to download the information, select observations, aggregate data, join different files and carry out variable transformations in order to obtain a so-called “spatially intensive” variable for mapping (i.e., not just a count of abandoned vehicles, but a per capita ratio). These manipulations (also called data munging or wrangling) are typically required to get your data set ready for analysis. It is commonly argued that this typically takes around 80% of the effort in a data science project (@data_mining). 
 
 ### Objectives {-}
 

diff --git a/02-eda-1.Rmd b/02-eda-1.Rmd
@@ -118,8 +118,8 @@ We follow the discussion in the GeoDa workbook and start with the common univari
 descriptive graphs, the histogram and box plot. Before covering the specifics, we
 provide a brief overview of the principles behind the **ggplot** operations.
 
-Note that linking and brushing between a plot and a map is not (yet) readily
-implemented in R, so that our discussion will focus primarily on static graphs.
+Note that linking^[Linking refers to how a selection in any of the views results in the same observation to immediately be selected in all other views.] and brushing^[Brushing is a dynamic extension of the linking process. For some early exposition and discussion of these ideas pertaining to so-called dynamic graphics, see, e.g. the classic references of @s87, @bc87, @bcw87, @m89; as well as in the outline of legacy `GeoDa` fictionality in @ask06] between a plot and a map is not (yet) readily
+implemented in R, so that our discussion will focus primarily on static graphs. 
 
 ### A quick introduction to **ggplot** {-}
 We will be using the commands in the **ggplot2** package for the descriptive statistics plots. There are many options to create nice looking graphs in R, including the functionality in base R, but we chose **ggplot2** for its clean logic and its 
@@ -654,7 +654,7 @@ results on the graph. We don't pursue this any further.
 
 #### Loess smoother {-}
 The default nonlinear smoother in **ggplot** uses the **loess** algorithm as a locally
-weighted regression model. This is similar in spirit to the **LOWESS** method used in GeoDa, but not the same.^[See the GeoDa workbook for further discussion] The implementation is along the same lines as the linear smoother, using
+weighted regression model. This is similar in spirit to the **LOWESS** method used in GeoDa, but not the same.^[See the GeoDa workbook, and @c79, @l99, for further discussion] The implementation is along the same lines as the linear smoother, using
 `geom_smooth`, with the only difference that the `method` is now `loess`, as shown below.
 
 ```{r}
@@ -810,7 +810,7 @@ ggplot(nyc.data,aes(x=kids2000,y=pubast00)) +
 
 
 ### Chow test {-}
-In GeoDa, a Chow test on the equality of the regression coefficients between the selected and unselected observations is calculated on the fly and shown at the
+In GeoDa, a Chow test (@c60)on the equality of the regression coefficients between the selected and unselected observations is calculated on the fly and shown at the
 bottom of the scatter plot. This is not supported by **ggplot**, but we can
 run separate regressions for each subset using `lm`. We can also run the Chow test itself, using the `chow.test` command from the **gap** package.
 

diff --git a/03-eda-2.Rmd b/03-eda-2.Rmd
@@ -360,7 +360,7 @@ plot_ly(nyc.data, x = ~kids2000, y = ~pubast00, z = ~rent2002) %>%
 ## True Multivariate EDA: Parallel Coordinate Plot and Conditional Plots {-}
 True multivariate EDA deals with situations where more than three variables
 are considered. We follow the GeoDa Workbook and illustrate the Parallel Coordinate
-Plot, or PCP, and conditional plots. For the former, we again need to resort to
+Plot, or PCP^[The parallel coordinate plot or PCP is designed to visually identify clusters and patterns in a multi-dimensional variable space. Originally suggested by @i85 (see also @i90) it has become a main feature in many visual data mining frameworks, e.g. @w90 and @wd03.], and conditional plots^[Conditional plots, also known as facet graphs or Trellis graphs (@bcs96)]. For the former, we again need to resort to
 **GGally**, but for the latter, we can exploit the `facet_wrap` and `facet_grid` functions of **ggplot**. In addition, we can turn these plots into interactive graphs by means of the **plotly** functionality.
 
 

diff --git a/04-mapping.Rmd b/04-mapping.Rmd
@@ -555,7 +555,7 @@ tm_shape(nyc.bound) +
 
 ### Natural breaks map {-}
 
-A natural breaks map is obtained by specifying the **style = "jenks"** in `tm_fill`. All
+A natural breaks map^[A natural breaks map uses a nonlinear algorithm to group observations such that the within-group homogeneity is maximized, following the pathbreaking work of @f58 and @j77.] is obtained by specifying the **style = "jenks"** in `tm_fill`. All
 the other options are as before. Again, we illustrate this for four categories,
 with **n=4**.
 
@@ -616,7 +616,7 @@ tm_shape(nyc.bound) +
 ## Extreme Value Maps {-}
 
 In addition to the common map classifications, GeoDa also supports three types of extreme
-value maps: a percentile map, box map, and standard deviation map. For details on the
+value maps^[Extreme value maps are variations of common choropleth maps where the classification is designed to highlight extreme values at the lower and upper end of the scale, with the goal of identifying outliers. These maps were developed in the spirit of spatializing EDA, i.e., adding spatial features to commonly used approaches in non-spatial EDA (@a94)]: a percentile map, box map, and standard deviation map. For details on the
 rationale and methodology behind these maps, we refer to the GeoDa Workbook.
 
 Of the three extreme value maps, only 
@@ -978,7 +978,7 @@ tm_shape(nyc.bound) +
 
 
 ### Co-location map {-}
-A special case of a map for categorical variables is a so-called co-location map,
+A special case of a map for categorical variables is a so-called co-location map^[The idea behind a co-location map is the extension of the unique value map concept to a multivariate context. In essence, it is the implementation of ideas related to the principles of map overlay or map algebra applied to categorical maps.Map algebra tends to be geared to applications for raster data, i.e., regular grids. However, since the polygons for the different variables are identical, the same principles can be applied in the context of the categorical maps. The classic reference on the principles of map algebra is @t90],
 implemented in GeoDa. This map shows the values for those locations where two
 categorical variables take on the same value (it is up to the user to make sure
 the values make sense). Further details are given in the GeoDa Workbook.
@@ -1103,7 +1103,7 @@ tm_shape(nyc.bound) +
 
 ## Conditional Map {-}
 
-A conditional map, or facet map, or small multiples, is created by the `tm_facets` command.
+A conditional map^[Discussed at length in @cp10], or facet map, or small multiples, is created by the `tm_facets` command.
 This largely follows the logic of the `facet_grid` command in **ggplot** that we covered in the
 EDA notes. An extensive set of options is available to customize the facet maps. An in-depth
 coverage of all the subtleties is beyond our scope 
@@ -1154,7 +1154,7 @@ tm_shape(nyc.bound) +
 
 ## Cartogram {-}
 
-A final map functionality that we replicate from the GeoDa Workbook is the cartogram. GeoDa
+A final map functionality that we replicate from the GeoDa Workbook is the cartogram^[A cartogram is a map type where the original layout of the areal unit is replaced by a geometric form (usually a circle, rectangle, or hexagon) that is proportional to the value of the variable for the location. This is in contrast to a standard choropleth map, where the size of the polygon corresponds to the area of the location in question. The cartogram has a long history and many variants have been suggested, some quite creative. In essence, the construction of a cartogram is an example of a nonlinear optimization problem, where the geometric forms have to be located such that they reflect the topology (spatial arrangement) of the locations as closely as possible (see @t04, for an extensive discussion of various aspects of the cartogram)]. GeoDa
 implements a so-called circular cartogram, where circles represent spatial units and their
 size is proportional to a specified variable.
 

diff --git a/05-rate-mapping.Rmd b/05-rate-mapping.Rmd
@@ -356,7 +356,7 @@ other observations. This idea goes back to the fundamental contributions of Jame
 and Stein (the so-called James-Stein paradox), who showed that in some instances
 biased estimators may have better precision in a mean squared error sense.
 
-GeoDa includes three methods to smooth the rates: an Empirical Bayes approach, a
+GeoDa includes three methods to smooth the rates: an Empirical Bayes approach^[There are several excellent books and articles on Bayesian statistics, with @gcsdvr14 as a classic reference.], a
 spatial averaging approach, and a combination between the two. We will consider
 the spatial approaches after we discuss distance-based spatial weights. Here, we
 focus on the Empirical Bayes (EB) method. First, we provide some formal
@@ -396,7 +396,7 @@ prior and the likelihood in such a way that a proper posterior distribution
 results. In the context of rate estimation, the standard approach is to specify a
 Poisson distribution for the observed count of events (conditional upon the risk
 parameter), and a Gamma distribution for the prior of the risk parameter $\pi$.
-This is referred to as the Poisson-Gamma model.
+This is referred to as the Poisson-Gamma model^[For an extensive discussion, see, for example, the classic papers by @ck87 and @m91.].
 
 In this model, the prior distribution for the (unknown) risk parameter $\pi$ is
 $Gamma(\alpha,\beta)$, where $\alpha$ and $\beta$ are the shape and scale
@@ -439,7 +439,7 @@ In essense, the EB technique consists of computing a weighted average between th
 raw rate for each county and the state average, with weights proportional to the
 underlying population at risk. Simply put, small counties (i.e., with a small
 population at risk) will tend to have their rates adjusted considerably, whereas
-for larger counties the rates will barely change.
+for larger counties the rates will barely change^[For an extensive technical discussion, see also @alk06].
 
 More formally, the EB estimate for the risk in location i is:
 $$\pi_i^{EB}=w_ir_i + (1-w_i)\theta$$

diff --git a/06-contiguity-spatial-weights.Rmd b/06-contiguity-spatial-weights.Rmd
@@ -7,7 +7,7 @@ This notebook covers the functionality of the [Contiguity-Based Spatial Weights]
 The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments
 on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).
 
-For this notebook, we use U.S. Homicide data. Our goal in this lab is show how to implement contiguity based spatial weights 
+For this notebook, we use U.S. Homicide data. Our goal in this lab is show how to implement contiguity based spatial weights. 
 
 
 ```{r}
@@ -105,7 +105,7 @@ In practice, the construction of the spatial weights from the geometry of the da
 cannot be done by visual inspection or manual calculation, except in the most
 trivial of situations. To assess whether two polygons are contiguous requires the
 use of explicit spatial data structures to deal with the location and arrangement of
-the polygons. This is implemented through the spatial weights functionality in
+the polygons^[Further technical details on spatial weights are contained Chapters 3 and 4 of @ass02.]. This is implemented through the spatial weights functionality in
 GeoDa. We will do this with **sf** and **spdep** libraries.
 
 We will create our neighbors using **sf** first, as the **spdep** library doesn't 
@@ -235,7 +235,7 @@ sf.nb.queen <- as.nb.sgbp(sf.sgbp.queen)
 
 ## Higher Order Contiguity {-}
 
-Now we move on to higher order contiguity weights. To make these we will need the
+Now we move on to higher order contiguity weights^[Importantly, there is quite a difference between the higher order contiguity and lower order neighbors, namely that the pure higher order contiguity does not include any lower order neighbors. This is the notion appropriate for use in a statistical analysis of spatial autocorrelation for different spatial lag orders. In order to achieve this, all redundant and circular paths need to be removed (see @as96, for a technical discussion)]. To make these we will need the
 **spdep** package. We will use the `nblag` and `nblag_cumul` functions to compute
 the higher order weights.
 
@@ -431,7 +431,7 @@ summary(rook.card)
 
 ## Saving Neighbors {-}
 To save our neighbors list, we use the `write.nb.gal` function from
-the **spdep** package. The file format is a GAL Lattice file. We
+the **spdep** package. The file format is a GAL Lattice file^[The GAL weights file is a simple text file that contains, for each observation, the number of neighbors and their identifiers. The format was suggested in the 1980s by the Geometric Algorithms Lab at Nottingham University and achieved widespread use after its inclusion in `SpaceStat` (@a92), and subsequent adoption by the R `spdep` package and others.]. We
 input our neighbors list, and the the filename second. We have two 
 options from this point. We can save the file with the old style
 or the new GeoDa header style.

diff --git a/07-distance-based-spatial-weights.Rmd b/07-distance-based-spatial-weights.Rmd
@@ -7,7 +7,7 @@ This notebook cover the functionality of the [Distance-Based Spatial Weights](ht
 The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments
 on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).
 
-For this notebook, we use Cleveland homesale point data. Our goal in this lab is show how to implement distance-band spatial weights 
+For this notebook, we use Cleveland homesale point data. Our goal in this lab is show how to implement distance-band spatial weights^[Further technical details on distance-based spatial weights are contained Chapters 3 and 4 of @ar14, although the software illustrations are for an earlier `GeoDa` interface design.]. 
 
 
 
@@ -473,7 +473,7 @@ plot(k6, coords, lwd=.2, col="blue", cex = .5)
 ## Generalizing the Concept of Contiguity {-}
 
 In GeoDa, the concept of contiguity can be generalized to point layers by converting
-the latter to a tessellation, specifically Thiessen polygons. Queen or rook contiguity
+the latter to a tessellation, specifically Thiessen polygons^[For a more extensive technical discussion and historical background, see @y16.]. Queen or rook contiguity
 weights can then be created for the polygons, in the usual way.
 
 Similarly, the concepts of distance-band weights and k-nearest neighbor weights can be

diff --git a/08-spatial-weights-as-distance-functions.Rmd b/08-spatial-weights-as-distance-functions.Rmd
@@ -96,7 +96,7 @@ $$w_{ij}=f(d_{ij},\theta)$$
 with f as a functional form and $\theta$ a vector of parameters.
 
 In order to conform to Tobler’s first law of geography, a distance decay effect must be
-respected. In other words, the value of the function of distance needs to decrease with a 
+respected^[Tober’s so-called first law of geography postulates that everything is related to everything else, but closer things more so @t70]. In other words, the value of the function of distance needs to decrease with a 
 growing distance. More formally, the partial derivative of the distance function with respect 
 to distance should be negative, $\partial{}w_{ij}/\partial{}d_{ij}<0$
 .
@@ -294,7 +294,7 @@ invd.weights.knn$weights[1]
 
 Kernel weights are used in non-parametric approaches to model spatial covariance, such
 as in the HAC method for heteroskedastic and spatial autocorrelation consistent
-variance estimates.
+variance estimates^[This method is currently not implemented in GeoDa, but is available in GeoDaSpace and PySal (see @hp94, @kp07, among others, for technical aspects, and @ass02 for implementation details.].
 
 The kernel weights are defined as a function K(z) of the ratio between the distance dij
 from i to j, and the bandwidth $h_i$, with $z=d_{ij}/h_i$. This ensures that z is 
@@ -334,7 +334,7 @@ farthest apart.
 
 In creating kernal weights, we will cover two important options: the fixed bandwidth
 and the variable bandwidth. For the fixed bandwidth, we will be using distance-band
-neighbors. For the variable bandwidth we will need kth-nearest neighbors.
+neighbors. For the variable bandwidth we will need kth-nearest neighbors^[In GeoDa the default value for k equals the cube root of the number of observations (following the recommendation in @kp07. In general, a wider bandwidth gives smoother and more robust results, so the bandwidth should always be set at least as large as the recommended default.].
 
 To start, we will compute a new distance-band neighbors list with the critcial threshold,
 calculated earlier in the notebook.

diff --git a/09-applications-of-spatial-weights.Rmd b/09-applications-of-spatial-weights.Rmd
@@ -503,7 +503,7 @@ kable(head(df))
 
 $$[W_y]_i = \Sigma_jK_{ij}y_j$$
 
-
+Kernel-based spatially lagged variables correspond to a form of local smoothing. They can be used in specialized regression specifications, such as geographically weighted regression (GWR)^[GWR is not implemented in GeoDa. For further details on the use of kernel-based spatially lagged variables in GWR, see, e.g., @fbc02].
 
 
 
@@ -513,7 +513,7 @@ $$[W_y]_i = \Sigma_jK_{ij}y_j$$
 ### Principle {-}
 
 A spatial rate smoother is a special case of a nonparameteric rate estimator, based on
-the principle of locally weighted estimation. Rather than applying a local average to
+the principle of locally weighted estimation (see, e.g., @wg04). Rather than applying a local average to
 the rate itself, as in an application of a spatial window average, the weighted
 average is applied separately to the numerator and denominator.
 
@@ -530,7 +530,7 @@ diagonal)
 
 Different smoothers are obtained for different spatial definitions of neighbors and/or
 different weights applied to those neighbors (e.g., contiguity weights, inverse
-distance weights, or kernel weights).
+distance weights, or kernel weights)^[An early example was the spatial rate smoother outlined in @k96, based on the notion of a spatial moving average or window average (see also @97).].
 
 
 The window average is not applied to the rate itself, but it is computed separately for the
@@ -942,8 +942,7 @@ set to zero.
 
 
 The spatial EB smoothed rate is computed as a weighted average of the crude rate and
-the prior, in the same manner as for the standard EB rate
-
+the prior, in the same manner as for the standard EB rate (see the discussion in the Chapter on mapping rates, as well as @alk06, for technical details).
 For reference the EB rate in this case is as denoted below:
 
 $$w_i = \frac{\sigma_i^2}{\sigma_i^2 + \mu_i / P_i}$$