From eaead194f07d779c18d8ba1989d790146bff9461 Mon Sep 17 00:00:00 2001 From: Xun Li Date: Thu, 5 Dec 2019 01:01:49 -0700 Subject: [PATCH] Add files via upload --- docs/rgeoda_tutorial_0_0_4.html | 1136 +++++++++++++++++++++++++++++++ 1 file changed, 1136 insertions(+) create mode 100644 docs/rgeoda_tutorial_0_0_4.html diff --git a/docs/rgeoda_tutorial_0_0_4.html b/docs/rgeoda_tutorial_0_0_4.html new file mode 100644 index 0000000..9ac5f4a --- /dev/null +++ b/docs/rgeoda_tutorial_0_0_4.html @@ -0,0 +1,1136 @@ + + + + + + + + + + + + + + +Tutorial of rgeoda v0.0.4 + + + + + + + + + + + + + + + + + + + + + +

Tutorial of rgeoda v0.0.4

+ + + +

rgeoda is a R library for spatial data analysis. It is a R wrapper of the libgeoda C++ library, which is built based on the GeoDa software. The version used in this tutorial is version 0.0.4.

+
+

1. Install rgeoda

+

Like GeoDa desktop software, rgeoda are avaiable to different platforms including: Mac, Linux and Windows.

+ +
+

Windows

+

In R console, use install.packages() function to install rgeoda from its source pacakge at: https://github.com/lixun910/rgeoda/releases/download/0.0.4/rgeoda_0.0.4.zip

+ +

Install rgeoda on windows from source package is not recommended. You would try if you know how to deal with R devtools on windows.

+
+
+

Load rgeoda library in R

+

If everything installed without error, you should be able to load rgeoda:

+ +
+
+
+

2. Load Spatial Data

+

The data formats that rgeoda can read directly includes: ESRI Shapefile, MapInfo File, CSV, GML, GPX, KML, GeoJSON, TopoJSON, OpenFileGDB, GFT Google Fusion Tables, CouchDB

+

Note: in this tutorial, we only tested loading ESRI shapefiles using rgeoda v0.0.4. Please create a ticket in rgeoda’s repository if you experience any issues when loading spatial data.

+

For example, to load the ESRI Shapefile Guerry.shp comes with the package:

+ +

The geoda_open function returns a geoda object, which can be used to access the meta-data, fields, and columns of the input dataset.

+ +
+

2.2 Access Table Data

+

The geoda instance has a data.frame attribute table, which stores the data loaded from the dataset. To get the values of “Crm_prp” column:

+ +
+
+
+

3. Spatial Weights

+

Spatial weights are central components in spatial data analysis. The spatial weights represents the possible spatial interaction between observations in space. Like GeoDa desktop software, rgeoda provides a rich variety of methods to create several different types of spatial weights:

+ +
+

3.1 Queen Contiguity Weights

+

To create a Queen contiguity weights, we can call pygeoda’s function

+ +

by passing the GeoDa object guerry we just created:

+ +

The function queen_weights() returns an instance of Weight object. One can access the meta data of the spatial weights by accessing the attributes of GeoDaWeight object:

+
+

Attributes of Weight object

+
    +
  • weight_type
  • +
  • is_symmetric
  • +
  • sparsity
  • +
  • density
  • +
  • min_nbrs
  • +
  • max_nbrs
  • +
  • mean_nbrs
  • +
  • median_nbrs
  • +
  • bool HasIsolates()
  • +
  • [] GetNeighbors(idx)
  • +
  • double SpatialLag(idx, [data])
  • +
  • SaveToFile()
  • +
+

We can also access the details of the weights: e.g. list the neighbors of a specified observation, which is very helpful in exploratory spatial data analysis (which is focused in next tutorial):

+ +

We can also compute the spatial lag of a specified observation by passing the values of the selected variable:

+ +
+
+
+

3.2 Rook Contiguity Weights

+

To create a Rook contiguity weights, we can call pygeoda’s function

+ +

by passing the GeoDa object guerry we just created:

+ +

The weights we created are in memory, which makes it straight forward for spatial data analysis and also are good for programming your application. To save the weights to a file, we need to call GeoDaWeight’s function

+ +

The layer_name is the layer name of loaded dataset. For a ESRI shapefile, the layer name is the file name without the suffix (e.g. Guerry).

+

The id_name is a key (column name), which means the associated column contains unique values, that makes sure that the weights are connected to the correct observations in the data table.

+

The id_vec is the actual column data of id_name, it could be a tuple of integer or string values.

+

For example, in Guerry dataset, the column “CODE_DE” can be used as a key to save a weights file:

+ +

Then, we should find the file “Guerry_r.gal” in the output directory.

+
+ + +
+

3.5 Kernel Weights

+

Kernel weights apply kernel function to determine the distance decay in the derived continuous weights kernel. The kernel weights are defined as a function K(z) of the ratio between the distance dij from i to j, and the bandwidth hi, with z=dij/hi.

+

The kernl functions include

+
    +
  • triangular
  • +
  • uniform
  • +
  • quadratic
  • +
  • epanechnikov
  • +
  • quartic
  • +
  • gaussian
  • +
+

Two functions are provided in rgeoda to create kernel weights.

+

** Kernel Weights with adaptive bandwidth**

+

To create a kernel weights with fixed bandwith:

+ +

Besides the options is_inverse, power, is_arc and is_mile that are the same with the distance based weights, this kernel weights function has another option:

+
use_kernel_diagonals    
+(optional) FALSE (default) or TRUE, apply kernel on the diagonal of weights matrix
+

** Kernel Weights with adaptive bandwidth**

+

To create a kernel weights with adaptive bandwidth or using max Knn distance as bandwidth:

+ +

This kernel weights function two more options:

+
adaptive_bandwidth  
+(optional) TRUE (default) or FALSE: TRUE use adaptive bandwidth calculated using distance of k-nearest neithbors, FALSE use max distance of all observation to their k-nearest neighbors
+
+use_kernel_diagonals    
+(optional) FALSE (default) or TRUE, apply kernel on the diagonal of weights matrix
+
+
+
+

4 Spatial Data Analysis

+
+

4.1 Local Spatial Autocorrelation

+

rgeoda 0.0.4 provids following methods for univariate local spatial autocorrelation statistics:

+
    +
  • Local Moran: local_moran()
  • +
  • Local Geary: local_geary()
  • +
  • Local Getis-Ord statistics: local_g() and local_gstar()
  • +
  • Local Join Count: local_joincount()
  • +
+

Methods for bivariate and multivariate local spatial autocorrelation statistics, as well as global spatial autocorrelation satatistics, will be included in next release of rgeoda.

+

In this tutorial, we will only introduce how to call these methods using pygeoda. For more information about the local spatial autocorrelation statisticis, please read: http://geodacenter.github.io/workbook/6a_local_auto/lab6a.html.

+
+

4.1.1 Local Moran

+

The Local Moran statistic is a method to identify local clusters and local spatial outliers. For example, we can call function local_moran() with the created Queen weights and the data “crm_prp” as input parameters:

+ +

The local_moran() function will return a lisa object, which we can call its public functions to access the results of lisa computation.

+

For example, we can call the function GetLISAValues() to get the values of local Moran:

+ +

To get the pseudo-p values of significance of local Moran computation:

+ +

To get the cluster indicators of local Moran computation:

+ +

The predefined values of the indicators of LISA cluster are:

+
0 Not significant
+1 High-High
+2 Low-Low
+3 High-Low
+4 Low-High
+5 Neighborless
+6 Undefined
+

which can be accessed via function GetLabels():

+ +

By default, the local_moran() function will run with some default parameters, e.g.:

+
permutation number: 999
+seed for random number generator: 123456789
+

, which are identical to GeoDa desktop software so that we can replicate the results as using GeoDa software. It is also easy to change the paremter and re-run the LISA computation by calling Run() function.

+

For example, re-run the above local Moran example using 9999 permutations

+ +

Then, we can use the same lisa object to get the new results after 9999 permutations:

+ +

rgeoda uses GeoDa’s C++ code, in which multi-threading is used to accelerate the computation of LISA. We can specify how many threads to run the computation:

+ +

Get the False Discovery Rate value based on current pseudo-p values:

+ +

Then, one can set the FDR value as the cutoff p-value to filter the cluster results:

+ +
+ + +
+

4.1.4 Local Getis-Ord Statistics

+

There are two types of local Getis-Ord statistics: one is computing a ratio of the weighted average of the values in the neighboring locations, not including the value at the location; while another type of statistic includes the value at the location in both numerator and denominator.

+

A value larger than the mean suggests a high-high cluster or hot spot, a value smaller than the mean indicates a low-low cluster or cold spot.

+

For example, we can call the function local_g() with the created Queen weights and the data “crm_prp” as input parameters:

+ +

To get the cluster indicators of the local G computation:

+ +

To get the pseudo-p values of the local G computation:

+ +

For the second type of local Getis-Ord statistics, we can call the function local_gstar() with the created Queen weights and the data “crm_prp” as input parameters:

+ +
+
+

4.1.5 Local Join Count

+

Local Join Count is a method to identify local clusters for binary data by using a local version of the so-called BB join count statistic. The statistic is only meaningful for those observations with value 1.

+

For example, we can call the function local_joincount() with a Queen weights and the data “TopCrm”, which is a set of binary (0,1) values, as input parameters:

+ +

To get the cluster indicators of the local Join Count computation:

+ +

To get the pseudo-p values of the local Join Count computation:

+ +

To get the number of neighbors of the local Join Count computation:

+ +
+ + +
+
+

4.2 Spatializing Multivariate Analysis

+
+

4.2.1 Pincinple Components

+

This PCA function aims to reproduce the PCA feature and results in GeoDa. However, one can use prcomp function in R to apply same PCA computations.

+

For example, the following example will apply PCA on 6 variables. The standardize() function is called to standardize the data, which pca() function applies on.

+

Other standardization functions include: demean() and mad(), which are same in GeoDa program.

+
#data <- list(guerry$table$Crm_prs, guerry$table$Crm_prp, guerry$table$Litercy, guerry$table$Donatns, guerry$table$Infants, guerry$table$Suicids)
+std_data <- standardize(data)
+pc <- pca(std_data)
+summary(pc)
+#> $`PCA method: `
+#> [1] "svd"
+#> 
+#> $`Standard Deviation:`
+#> [1] 1.4630340 1.0958195 1.0497845 0.8166800 0.7407258 0.5839707
+#> 
+#> $`Proportion of variance:`
+#> [1] 0.35674483 0.20013675 0.18367462 0.11116106 0.09144579 0.05683697
+#> 
+#> $`Cumulative proportion:`
+#> [1] 0.3567448 0.5568815 0.7405562 0.8517172 0.9431630 1.0000000
+#> 
+#> $`Kaiser criterion:`
+#> [1] 3
+#> 
+#> $`95% threshold criterion:`
+#> [1] 5
+#> 
+#> $`Eigen Values:`
+#> [1] 2.1404686 1.2008203 1.1020476 0.6669663 0.5486748 0.3410218
+#> 
+#> $`Variable Loadings:`
+#> $`Variable Loadings:`[[1]]
+#> [1] -0.06586844 -0.51232553  0.51175290 -0.10619514 -0.45133743 -0.50627047
+#> 
+#> $`Variable Loadings:`[[2]]
+#> [1] -0.5123255  0.5117529 -0.1061951 -0.4513374 -0.5062705 -0.5905984
+#> 
+#> $`Variable Loadings:`[[3]]
+#> [1]  0.5117529 -0.1061951 -0.4513374 -0.5062705 -0.5905984  0.0883673
+#> 
+#> $`Variable Loadings:`[[4]]
+#> [1] -0.1061951 -0.4513374 -0.5062705 -0.5905984  0.0883673  0.1293611
+#> 
+#> $`Variable Loadings:`[[5]]
+#> [1] -0.4513374 -0.5062705 -0.5905984  0.0883673  0.1293611 -0.6989980
+#> 
+#> $`Variable Loadings:`[[6]]
+#> [1] -0.5062705 -0.5905984  0.0883673  0.1293611 -0.6989980 -0.1033123
+#> 
+#> 
+#> $`Squared Correlations:`
+#> $`Squared Correlations:`[[1]]
+#> [1] 9.286812e-03 4.188536e-01 4.994276e-01 1.302188e-02 5.682668e-05
+#> [6] 5.935308e-02
+#> 
+#> $`Squared Correlations:`[[2]]
+#> [1] 0.561825454 0.009376963 0.250265568 0.006484959 0.010457415 0.161589921
+#> 
+#> $`Squared Correlations:`[[3]]
+#> [1] 5.605701e-01 2.009490e-02 4.815156e-02 4.233069e-05 3.700718e-01
+#> [6] 1.069382e-03
+#> 
+#> $`Squared Correlations:`[[4]]
+#> [1] 0.02413891 0.58671862 0.18833773 0.14920925 0.04125461 0.01034097
+#> 
+#> $`Squared Correlations:`[[5]]
+#> [1] 0.436025709 0.012816893 0.115551390 0.355727494 0.078226551 0.001651996
+#> 
+#> $`Squared Correlations:`[[6]]
+#> [1] 0.5486234426 0.1529589146 0.0003130297 0.1424801946 0.0486078747
+#> [6] 0.1070163175
+

With the returned object pc, one can call get_kcomponents() function to get first K components:

+

For example, to get first 3 components

+
get_kcomponents(pc, 3)
+#> [[1]]
+#>  [1] -2.15079784  1.24630284 -2.07714891  0.60372376  0.96266198
+#>  [6] -2.02217531  0.83754289 -2.31564236  1.71969211 -0.99525052
+#> [11] -1.30435145  1.50636125  1.09195995 -1.42777729 -1.00126719
+#> [16]  0.34988350 -1.24823558 -1.98039460  0.87822956 -2.18980956
+#> [21] -3.22537541 -1.10068703  1.73498404  0.58726311  1.32815874
+#> [26]  1.69590640 -1.29646063 -0.05494371 -0.14791185 -0.58604223
+#> [31]  0.80251622 -0.19949876 -1.47620130 -0.68854648  0.26585680
+#> [36]  0.04394481  0.94991171  0.07063645  0.61062211 -2.19973874
+#> [41] -4.76963949 -0.32964194  1.48016524 -0.74878561 -0.35999349
+#> [46]  0.26808232 -0.55363011  0.05715972  2.13901687  1.03593326
+#> [51] -0.63324934  1.69777215  1.27213883 -1.56710339  1.53337324
+#> [56] -0.67340422  1.28850329  1.13373721 -0.07612546  1.69624293
+#> [61] -2.13737488  0.08742628 -1.27229607 -0.01182331  1.59204471
+#> [66]  2.02312613  1.93588793  0.78383493 -0.53895044  0.14461297
+#> [71]  3.49998665  1.97162461  1.40403152  1.96229410 -0.12139815
+#> [76]  0.89586586 -1.20020568 -0.88919276 -0.01788428  0.84583360
+#> [81] -3.16539955 -0.47796935 -0.64045554  0.80463821  1.03328919
+#> 
+#> [[2]]
+#>  [1] -0.4527585506 -1.0172367096 -0.4506332278  0.8762004375  0.1175051630
+#>  [6]  1.2471966743 -1.0354368687  2.4068284035  0.2288773656  1.3052561283
+#> [11]  2.2329857349  0.8711666465 -2.7800273895  1.1981728077 -1.3957386017
+#> [16] -1.1938062906 -0.9319203496 -0.7427287698 -0.3932662606 -1.1850622892
+#> [21] -1.2337827682  0.0391844995  1.4692529440  0.9954308867 -0.5276314020
+#> [26]  0.0052912235 -3.7251703739  0.7714666128  0.9288847446  1.0575201511
+#> [31] -0.2037402093  1.3441133499 -0.6145500541 -1.9812527895 -0.3403103948
+#> [36]  0.4891647100  0.2438151687 -0.6851536632 -0.2789261043  0.1960459650
+#> [41]  2.3337526321 -0.2917928696  0.2283707857  1.4030313492  0.0939905122
+#> [46]  1.1593261957 -0.5887037516 -0.3914591968  0.6122305393 -0.0005217344
+#> [51] -0.2126427293 -0.1144591570  0.0500445664 -1.7837839127  0.2014352381
+#> [56] -1.1722066402 -0.6031852961 -0.7491376400 -1.0345443487 -0.2920551002
+#> [61]  0.7276004553  1.2423870564  2.2319626808  0.3553900719 -0.5595567822
+#> [66]  1.1510205269  0.6762453318 -0.6442477703 -0.3994759917 -0.6871493459
+#> [71]  0.7143861651 -0.2325166762 -0.2274321616  0.6617624760 -1.5715823174
+#> [76] -1.0274018049  1.0900812149  0.7256541848  0.8817523718  1.0478113890
+#> [81] -1.3642948866 -0.4680343866 -1.0020393133  0.6873189211  0.2874433696
+#> 
+#> [[3]]
+#>  [1] -1.66799724 -0.31462803 -0.07048267  0.47869289  0.00525923
+#>  [6]  1.27775264 -2.01790786  1.08377373  0.12464118 -0.26167497
+#> [11]  1.12701774  0.30989689  2.02507019 -0.70055324 -0.61895663
+#> [16]  1.26067650  0.13994022  0.67642587 -2.05223727  0.52698410
+#> [21] -3.13812375 -0.34984887  0.15795122  0.16967219  1.14865243
+#> [26] -0.04162211  1.35208511  0.65789407 -0.19445431 -0.75118661
+#> [31] -0.75471002 -0.23062927  1.05399203 -0.40653789  0.34343180
+#> [36] -0.31112960 -1.27419460  0.93029964  0.02720148 -1.22925198
+#> [41] -0.98689252 -0.03388155  0.13261238  1.21204257 -0.58923507
+#> [46]  1.41246438 -1.00878584 -1.31356204  0.07231057 -1.50667369
+#> [51] -1.14036369 -1.27124703 -1.30463350  1.13268960  1.08635914
+#> [56]  0.05002058 -0.83821458 -0.86786515 -0.61552852 -0.18724760
+#> [61] -0.11673179 -0.44532412 -0.12393313  1.63498080  1.39351642
+#> [66]  1.03854430 -0.37857199 -0.27956575 -1.32880223 -1.82409549
+#> [71]  0.39723620  0.51845258 -0.25960159  0.80245513  1.39410090
+#> [76] -1.57615566  1.20807958  0.50640631  0.05387082  0.43062329
+#> [81]  2.29930806  1.91082525  1.52598381 -0.72471607  0.01756289
+
+
+

4.2.1 Multi Dimensional Scaling

+

The mds() function is to apply multi-dimensional scaling on input data, with output of a K-dimensional array data. K should be an input parameter for mds() function.

+

For example, to apply mds() on the 6 selected variables, and scaling down to a 2-d space:

+
#data <- list(guerry$table$Crm_prs, guerry$table$Crm_prp, guerry$table$Litercy, guerry$table$Donatns, guerry$table$Infants, guerry$table$Suicids)
+
+std_data <- standardize(data)
+pc <- pca(std_data)
+mds_v <- mds(std_data, 2)
+mds_v
+#> [[1]]
+#>  [1] -2.15079914  1.24630268 -2.07714878  0.60372394  0.96266208
+#>  [6] -2.02217510  0.83754212 -2.31564183  1.71969219 -0.99525060
+#> [11] -1.30435076  1.50636158  1.09196128 -1.42777743 -1.00126761
+#> [16]  0.34988392 -1.24823585 -1.98039435  0.87822863 -2.18980981
+#> [21] -3.22537671 -1.10068748  1.73498453  0.58726327  1.32815937
+#> [26]  1.69590647 -1.29646036 -0.05494364 -0.14791185 -0.58604238
+#> [31]  0.80251602 -0.19949879 -1.47620137 -0.68854692  0.26585690
+#> [36]  0.04394477  0.94991127  0.07063695  0.61062210 -2.19973976
+#> [41] -4.76964008 -0.32964200  1.48016556 -0.74878501 -0.35999381
+#> [46]  0.26808279 -0.55363073  0.05715920  2.13901710  1.03593266
+#> [51] -0.63325014  1.69777170  1.27213835 -1.56710325  1.53337397
+#> [56] -0.67340435  1.28850297  1.13373674 -0.07612582  1.69624307
+#> [61] -2.13737510  0.08742638 -1.27229550 -0.01182236  1.59204555
+#> [66]  2.02312714  1.93588814  0.78383508 -0.53895129  0.14461212
+#> [71]  3.49998720  1.97162518  1.40403139  1.96229461 -0.12139763
+#> [76]  0.89586507 -1.20020531 -0.88919260 -0.01788424  0.84583380
+#> [81] -3.16539934 -0.47796907 -0.64045496  0.80463804  1.03328917
+#> 
+#> [[2]]
+#>  [1]  0.4527536271  1.0172360777  0.4506333060 -0.8761993256 -0.1175050604
+#>  [6] -1.2471938904  1.0354317522 -2.4068255542 -0.2288768376 -1.3052569054
+#> [11] -2.2329825675 -0.8711658576  2.7800335222 -1.1981746825  1.3957368458
+#> [16]  1.1938096673  0.9319203178  0.7427303650  0.3932609148  1.1850635503
+#> [21]  1.2337742453 -0.0391857512 -1.4692520664 -0.9954304939  0.5276346328
+#> [26] -0.0052911254  3.7251741858 -0.7714651225 -0.9288852134 -1.0575222027
+#> [31]  0.2037382670 -1.3441142042  0.6145525650  1.9812515041  0.3403112204
+#> [36] -0.4891656117 -0.2438184179  0.6851562571  0.2789261438 -0.1960495395
+#> [41] -2.3337555238  0.2917926603 -0.2283703585 -1.4030283522 -0.0939921633
+#> [46] -1.1593228571  0.5887008391  0.3914559889 -0.6122301079  0.0005178202
+#> [51]  0.2126394543  0.1144559991 -0.0500478842  1.7837865496 -0.2014320665
+#> [56]  1.1722066773  0.6031832609  0.7491354510  1.0345427824  0.2920548746
+#> [61] -0.7276009862 -1.2423880289 -2.2319623928 -0.3553856899  0.5595608248
+#> [66] -1.1510175196 -0.6762460298  0.6442473714  0.3994723166  0.6871446314
+#> [71] -0.7143845504  0.2325183231  0.2274315570 -0.6617601262  1.5715860386
+#> [76]  1.0273976841 -1.0900782001 -0.7256530463 -0.8817525785 -1.0478103198
+#> [81]  1.3643006281  0.4680391875  1.0020432867 -0.6873206957 -0.2874432665
+
+
+
+

4.3 Spatial Clustering

+

Spatial clustering aims to group of a large number of geographic areas or points into a smaller number of regions based on similiarities in one or more variables. Spatially constrained clustering is needed when clusters are required to be spatially contiguous.

+

In GeoDa, there are three different approaches explicitly incorporate the contiguity constraint in the optimization process: SKATER, Redcap and Max-p. More more details, please check: http://geodacenter.github.io/workbook/8_spatial_clusters/lab8.html All of these methods are included in rgeoda 0.0.4.

+

For example, to apply spatial clustering on the Guerry dataset, we use the queen weights to define the spatial contiguity and select 6 variables for similarity measure: “Crm_prs”, “Crm_prp”, “Litercy”, “Donatns”, “Infants”, “Suicids”.

+

The following code is used to get a 2D data vector for the selected variables:

+ +
+

4.3.1 SKATER

+

The Spatial C(K)luster Analysis by Tree Edge Removal(SKATER) algorithm introduced by Assuncao et al. (2006) is based on the optimal pruning of a minimum spanning tree that reflects the contiguity structure among the observations. It provides an optimized algorithm to prune to tree into several clusters that their values of selected variables are as similar as possible.

+

The rgeoda’s SKATER function is:

+ +

For example, to create 4 spatially contiguous clusters using Guerry dataset, the queen weights and the values of the 6 selected variables:

+ +

This skater() function returns a 2D list, which represents 4 clusters. Each cluster is composed by several contiguity areas, e.g. 15, 74, 16, 55, 60, 39, 68, 33, 17, 82, 81, 0, 2, 40, 20, 80

+

rgeoda also provides utility functions to compute some descriptive statistics of the clustering results, e.g. to compute the ratio of between to total sum of squares:

+ +
+
+

4.3.2 REDCAP

+

REDCAP (Regionalization with dynamically constrained agglomerative clustering and partitioning) is developed by D. Guo (2008). Like SKATER, REDCAP starts from building a spanning tree with 3 different ways (single-linkage, average-linkage, and the complete-linkage). The single-linkage way leads to build a minimum spanning tree. Then,REDCAP provides 2 different ways (first‐order and full-order constraining) to prune the tree to find clusters. The first-order approach with a minimum spanning tree is exactly the same with SKATER. In GeoDa and rgeoda, the following methods are provided:

+
    +
  • First-order and Single-linkage
  • +
  • Full-order and Complete-linkage
  • +
  • Full-order and Average-linkage
  • +
  • Full-order and Single-linkage
  • +
+

For example, to find 4 clusters using the same dataset and weights as above using REDCAP with Full-order and Complete-linkage method:

+ +
+
+

4.3.3 Max-p

+

The so-called max-p regions model (outlined in Duque, Anselin, and Rey 2012) uses a different approach and considers the regionalization problem as an application of integer programming. In addition, the number of regions is determined endogenously.

+

The algorithm itself consists of a search process that starts with an initial feasible solution and iteratively improves upon it while maintaining contiguity among the elements of each cluster. Like Geoda, rgeoda provides three different heuristic algorithms to find an optimal solution for max-p:

+
    +
  • greedy
  • +
  • Tabu Search
  • +
  • Simulated Annealing
  • +
+

Unlike SKATER and REDCAP that one can specify the number of clusters as an input paramter, max-p doesn’t allow to specify the number of clusters explicitly, but a constrained variable and the minimum bounding value that each cluster should reach that are used to find an optimized number of clusters.

+

For example, to use greedy algorithm in maxp function with the same dataset and weights as above to find optimal clusters using max-p:

+

First, we need to specify, for example, every cluster must have population >= 3236.67 thousands people:

+ +

Then, we can call the max-p function with “greedy” algorith, the bound values and minimum bound value:

+ +

We can also specify using tabu search algorithm in maxp function with the parameter of tabu length:

+ +

To apply simulated annealing algorithm in maxp function with the parameter of cooling rate:

+ +

We can also increase the number of iterations for local search process by specifying the parameter initial (default value is 99):

+ +
+
+
+ + + + + + + +