-
Notifications
You must be signed in to change notification settings - Fork 13
/
04-mapping.Rmd
1212 lines (929 loc) · 48.3 KB
/
04-mapping.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Basic Mapping
## Introduction {-}
This notebook covers the functionality of the [Basic Mapping](https://geodacenter.github.io/workbook/3a_mapping/lab3a.html) section of the GeoDa workbook. We refer to that document for details on the methodology, references, etc. The goal of these notes is to approximate as closely as possible the operations carried out using GeoDa by means of a range of R packages.
The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments
on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).
For this notebook, we will continue to use the socioeconomic data for 55 New York City sub-boroughs from the GeoDa website.
### Objectives {-}
After completing the notebook, you should know how to carry out the following tasks
(some of these were also covered in the spatial data handling notes):
- Reading and loading a shapefile
- Creating choropleth maps for different classifications
- Customizing choropleth maps
- Selecting appropriate color schemes
- Calculating and plotting polygon centroids
- Composing conditional maps
- Creating a cartogram
#### R Packages used {-}
- **sf**: to manipulate simple features
- **tmap**: to create and customize choropleth maps
- **RColorBrewer**: to select color schemes
- **cartogram**: to construct a cartogram
#### R Commands used {-}
Below follows a list of the commands used in this notebook. For further details
and a comprehensive list of options, please consult the
[R documentation](https://www.rdocumentation.org).
- **Base R**: `setwd`, `install.packages`, `library`, `summary`, `quantile`, `function`, `unname`,
`vector`, `floor`, `ceiling`, `cut`, `as.factor`, `as.character`, `as.numeric`, `which`, `length`, `rev`,
`unique`
- **sf**: `st_read`, `st_crs`, `plot`, `st_centroid`, `st_write`, `st_set_geometry(NULL)`
- **tmap**: `tm_shape`, `tm_polygons`, `tm_fill`, `tm_borders`, `tm_layout`,
`tmap_mode`, `tm_basemap`, `tmap_save`, `tm_dots`, `tm_facets`
- **RColorBrewer**: `brewer.pal`, `display.brewer.pal`
- **cartogram**: `cartogram_dorling`, `cartogram_cont`, `cartogram_ncont`
## Preliminaries {-}
Before starting, make sure to have the latest version of R and of packages that are compiled for the matching version of R (this document was created using R 3.5.1 of 2018-07-02). Also, make sure to set a working directory.^[Use `setwd(directorypath)` to specify the working directory.] We will use a relative path to the working directory to read the data set.
### Load packages {-}
First, we load all the required packages using the `library` command. If you don't have some of these in your system, make sure to install them first as well as
their dependencies.^[Use
`install.packages(packagename)`.] You will get an error message if something is missing. If needed, just install the missing piece and everything will work after that.
```{r}
library(sf)
library(tmap)
library(RColorBrewer)
library(cartogram)
library(geodaData)
```
### Obtaining the data {-}
The data set used to implement the operations in this workbook is the same as the one we
used for exploratory data analysis. The variables are contained in [NYC Data](https://geodacenter.github.io/data-and-lab/nyc/) on the GeoDa support
web site. After the file is downloaded, it must be unzipped (e.g., double click
on the file). The **nyc** folder should be moved to the current working directory
for the path names we use below to work correctly.
We use the `st_read` command from the **sf** package to read in the shape file information.
This provides a brief summary of the geometry of the layer, such as the path (your path will differ),
the driver (ESRI Shapefile), the geometry type (MULTIPOLYGON), bounding box and projection information.
The latter is contained in the **proj4string** information.
```{r}
nyc.bound <- nyc_sf
```
```{r echo=FALSE}
class(nyc.bound) <- c("sf", "data.frame")
```
The projection in question is the ESRI projection 102718, NAD 1983 stateplane New York Long Island
fips 3104 feet. It does not have an EPSG code (see the spatial data handling notes for further details
on projections), but it has a valid **proj4string**. So, as long as we don't change anything, we should
be OK. We can double check the projection information with `st_crs`:
```{r}
st_crs(nyc.bound)
```
As we saw in the spatial data handling notes, we can create a quick map using the **sf** `plot` command. This
will result in a choropleth map for the first few variables in the data frame.
```{r}
plot(nyc.bound)
```
## Basic Choropleth Mapping {-}
We now turn to the choropleth mapping functionality included in the **tmap** package.
We have already covered some
functionality of **tmap** in the spatial data handling notes. Here, we will explore
some further customizations.
The **tmap** package is extremely powerful,
and there are many features that we do not cover. Detailed information on **tmap** functionality can
be found at [tmap documentation](https://cran.r-project.org/web/packages/tmap/tmap.pdf). In addition, several
extensive examples are listed in
the review in the *Journal of Statistical Software* by @tennekes_tmap_2018. Another useful resource
is the [chapter on mapping](https://geocompr.robinlovelace.net/adv-map.html) in Lovelace, Nowosad,
and Muenchow's *Geocomputation with R* book.
### Default settings {-}
We have already seen (in the spatial data handling notes) that **tmap** uses the same layered
logic as **ggplot**. The initial command is `tm_shape`, which specifies the geography to which the
mapping is applied. This is followed by a number of `tm_*` options that select the type of map
and several optional customizations.
We follow the example in the GeoDa Workbook and start with a basic choropleth map of the **rent2008**
variable. In **tmap** there are two ways to create a choropleth map. The simplest one, which we
have already seen, is to use the `tm_polygons` command with the variable name as the argument
(under the hood, this is the value for the **col** parameter). In our example, the map
is created with two commands, as shown below. The color coding corresponds to `style = "pretty"`,
which is the default when the classification breaks are not set explicitly, and the default
values are used (see the discussion of map classifications below).
```{r}
tm_shape(nyc.bound) +
tm_polygons("rent2008")
```
The `tm_polygons` command is a wrapper around two other functions, `tm_fill` and `tm_borders`.
`tm_fill` controls the contents of the polygons (color, classification, etc.), while
`tm_borders` does the same for the polygon outlines.
For example, using the same shape (but no variable), whe obtain the outlines of the
sub-boroughs from the `tm_borders` command.
```{r}
tm_shape(nyc.bound) +
tm_borders()
```
Similarly, we obtain a choropleth map without the polygon outlines when we just
use the `tm_fill` command.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008")
```
When we combine the two commands, we obtain the same map as with `tm_polygons` (this
illustrates how in R one can often obtain the same result in a number of different ways).
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008") +
tm_borders()
```
An extensive set of options is available to customize the appearance of the map.
A full list is given in the [documentation page](https://www.rdocumentation.org/packages/tmap/versions/2.1-1/topics/tm_fill) for `tm_fill`.
In what follows, we briefly consider the most common ones.
### Color palette {-}
The range of colors used to depict the spatial distribution of a variable is
determined by the **palette**. The palette is an argument to the `tm_fill` function.
Several built-in palettes are contained in **tmap**. For example, using
`palette = "Reds"` would yield the following map for our example.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",palette="Reds") +
tm_borders()
```
Under the hood, **"Reds"** refers to one of the color schemes supported by the
**RColorBrewer** package (see below).
#### Custom color palettes
In addition to the built-in palettes, customized color ranges can be
created by specifying a vector with the desired colors as anchors. This will create
a spectrum of colors in the map that range between the colors specified in the vector.
For instance, if we used **c("red", "blue")**, the color spectrum would move from red to
purple, then to blue, with in between shades. In our example:
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",palette=c("red","blue")) +
tm_borders()
```
Not exactly a pretty picture. In order to mimic the *diverging* scale used in many
of GeoDa's extreme value maps, we insert "white" in between red and blue (but see the next
section for a better approach).
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",palette=c("red","white","blue")) +
tm_borders()
```
Better, but still not quite there.
#### ColorBrewer {-}
A preferred approach to select a color palette is to chose one of the schemes contained
in the **RColorBrewer** package. These are based on the research of cartographer Cynthia Brewer
([see the colorbrewer2 web site for details](http://colorbrewer2.org)). ColorBrewer makes
a distinction between *sequential* scales (for a scale that goes from low to high),
*diverging* scales (to highlight how values differ from a central tendency), and
*qualitative* scales (for categorical variables). For each scale, a series of single hue and
multi-hue scales are suggested. In the **RColorBrewer** package, these are referred to by a
name (e.g., the "Reds" palette we used above is an example). The full list is contained in the
[RColorBrewer documentation](https://www.rdocumentation.org/packages/RColorBrewer/versions/1.1-2/topics/RColorBrewer).
There are two very useful commands in this package. One sets a color palette by specifying its
name and the number of desired categories. The result is a character vector with the hex codes of the
corresponding colors.
For example, we select a sequential color scheme going from blue to green, as **BuGn**, by
means of the command `brewer.pal`, with the number of categories (6) and the scheme as
arguments. The resulting vector contains the HEX codes for the colors.
```{r}
pal <- brewer.pal(6,"BuGn")
pal
```
Using this palette in our map yields the following result.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",palette="BuGn") +
tm_borders()
```
The command `display.brewer.pal` allows us to explore different color schemes before
applying them to a map. For example:
```{r}
display.brewer.pal(6,"BuGn")
```
### Legend {-}
There are many options to change the formatting of the legend entries through the **legend.format**
argument. We refer to the `tm_fill` documentation for specific details.
Often, the automatic title for the legend is not that attractive, since it is simply
the variable name. This can be customized by setting the **title** argument to `tm_fill`.
For example, keeping all the other settings to the default, we change the legend to
**Rent in 2008**.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders()
```
Another important
aspect of the legend is its positioning. This is handled through the `tm_layout` function.
This function has a vast number of options, [as detailed in the documentation](https://www.rdocumentation.org/packages/tmap/versions/2.1-1/topics/tm_layout).
There are also specialized subsets of layout functions, focused on specific aspects
of the map, such as `tm_legend`, `tm_style` and `tm_format`. We illustrate the positioning of
the legend.
The default is to position the legend inside the map. Often, this default solution is appropriate,
but sometimes further control is needed. The **legend.position** argument to the `tm_layout`
function takes a vector of two string variables that determine both the horizontal
position ("left", "right", or "center") and the vertical position ("top", "bottom",
or "center").
For example, if we would want to move the legend to the lower-right position (clearly
inferior to the default solution), we would use the following set of commands.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders() +
tm_layout(legend.position = c("right", "bottom"))
```
There is also the option to position the legend outside the frame of the map. This
is accomplished by setting **legend.outside** to TRUE (the default is FALSE), and
optionally also specify its position by means of **legend.outside.position**.
The latter can take the values "top", "bottom", "right", and "left".
For example, to position the legend outside and on the right, would be accomplished
by the following commands.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders() +
tm_layout(legend.outside = TRUE, legend.outside.position = "right")
```
We can also customize the size of the legend, its alignment, font, etc. We refer
to the documentation for specifics.
While there is no obvious way to show the number of observations in each
category (as is the case in GeoDa), **tmap** has an option to add a histogram
to the legend. This is accomplished by setting **legend.hist=TRUE** in the
`tm_fill` command. Further customization is possible, but is not covered here.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008",legend.hist=TRUE) +
tm_borders() +
tm_layout(legend.outside = TRUE, legend.outside.position = "right")
```
### Title {-}
Another functionality of the `tm_layout` function is to set a title for
the map, and specify its position, size, etc. For example, we can
set the **title**, the **title.size** and the **title.position** as in
the example below (for details, see
[the `tm_layout` documentation](https://www.rdocumentation.org/packages/tmap/versions/2.1-1/topics/tm_layout)).
We made the font size a bit smaller (0.8) in order not to overwhelm the map, and positioned it in the
bottom right-hand corner.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders() +
tm_layout(title = "Rent 2008 NYC Sub-Boroughs", title.size = 0.8, title.position = c("right","bottom"))
```
To have a title appear on top (or on the bottom) of the map, rather than inside
(the default), we need to set the **main.title**
argument of the `tm_layout` function, with the associated **main.title.position**, as
illustrated below (with **title.size** set to 1.5 to have a larger font).
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders() +
tm_layout(main.title = "Rent 2008 NYC Sub-Boroughs", title.size = 1.5,main.title.position="center")
```
### Border options {-}
So far, we have not specified any arguments to the `tm_borders` function. Common options
are the color for the border lines (**col**), their thickness (**lwd**) and the type
of line (**lty**). The line type is a base R functionality and can be set by specifying
a string (such as "solid", "dashed", "dotted", etc.), or the internal code (e.g., 1 for
solid, 2 for dashed, 3 for dotted). To illustrate this, we set the borders to blue, with
a line width of 2.0 and use a dotted line.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008") +
tm_borders(col="blue",lwd=2,lty=3)
```
### Interactive base map {-}
The default **mode** in **tmap** is **plot**, i.e., the standard plotting environment in
R to draw a map on the screen or to a device (e.g., a pdf file). There is also an
interactive mode, which builds upon leaflet to add a basemap and interact with the
map through zooming and identification of individual observations. The interactive
mode is referred to as **view**. We switch between modes by means of the `tmap_mode`
command.
```{r}
tmap_mode("view")
```
There are a number of different ways in which a basemap can be added. The current preferred
approach is through the `tm_basemap` command. This takes two important arguments. One is
the name of the server. A number of basemaps are supported (and no doubt, that number will
increase over time), but we will illustrate the **OpenStreetMap** option. A second argument
is the degree of transparency, or **alpha** level. This can also be set for the map
itself through `tm_fill`. In practice, one typically needs to experiment a bit to find the
right balance between the information in the choropleth map and the background.
In the example below, we set the **alpha** level for the main map to 0.7, and
for the base layer to 0.5. When we move the pointer over the polygons, their
ID value is shown. A click on a location also gives the value for the variable
that is being mapped. Zooming and panning are supported as well. In addition, it
is possible to stack several layers, but we won't pursue that here.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008",alpha=0.7) +
tm_borders() +
tm_basemap(server="OpenStreetMap",alpha=0.5)
```
Before we proceed, we turn the mode back to **plot**.
```{r}
tmap_mode("plot")
```
### Saving a map as a file {-}
So far, we have only plotted the map to the screen or an interactive widget. In order
to save the map as an output, we first assign the result of a series of **tmap** commands
to a variable (an object). We can then save this object by means of the `tmap_save`
function.^[The older function `save_tmap` has been deprecated.]
This function takes the map object as the argument **tm** and the output file name as the
argument **filename**. The default output is a **png** file. Other formats are obtained
by including the proper file extension.
There are many other options in this command, such as specifying the **height**, **width**,
and **dpi**. Details can be found in the
[list of options](https://www.rdocumentation.org/packages/tmap/versions/2.1-1/topics/tmap_save).
For example, we assign our rudimentary default map to the variable **mymap** and then save that
to the file **mymap.png**.
```{r eval=FALSE}
mymap <- tm_shape(nyc.bound) +
tm_fill("kids2000") +
tm_borders()
tmap_save(tm = mymap, filename = "mymap.png")
```
```
## Map saved to mymap.png
```
```
## Resolution: 2100 by 1500 pixels
```
```
## Size: 7 by 5 inches (300 dpi)
```
### Shape centers {-}
GeoDa has functionality invoked from a map window to map or save shape centers, i.e., mean centers
and centroids. The `st_centroid` function is part of **sf** (there is no obvious counterpart
to the mean center functionality). It creates a point simple features layer and contains all the
variables of the original layer.
```{r}
nycentroid <- st_centroid(nyc.bound)
summary(nycentroid)
```
We can now exploit the layered logic of `tm_map` to first draw the outlines of the sub-boroughs
using `tm_borders`, followed by `tm_dots` for the point layer. The `tm_dots` command is a
specialized version of `tm_symbols`, which has a
[very large set of options](https://www.rdocumentation.org/packages/tmap/versions/2.1-1/topics/tm_symbols).
We use two simple options: **size=0.2** to make the points a little larger than the default,
and **col** to set the color to **red**.
```{r}
tm_shape(nyc.bound) +
tm_borders() +
tm_shape(nycentroid) +
tm_dots(size=0.2,col="red")
```
A close examination of the map reveals that the centroid locations can be problematic for
non-convex polygons.
We can save the centroid point layer as a shape file by means of the `st_write` function we saw earlier
(in the spatial data handling notes).
```{r eval=FALSE}
st_write(nycentroid, "nyc_centroids", driver = "ESRI Shapefile")
```
```
## Writing layer `nyc_centroids' to data source `nyc_centroids' using driver `ESRI Shapefile'
## Writing 55 features with 34 fields and geometry type Point.
```
## Common Map Classifications {-}
The statistical aspects of a choropleth map are expressed through the
map classification that is used to convert observations for a continuous variable into
a small number of bins, similar to what is the case for a histogram. Different
classifications reveal different aspects of the spatial distribution of the variable.
In **tmap**, the classification is selected by means of the **style** option in
`tm_fill`. So far, we have used the default, which is set to **pretty**. The latter is
a [base R function](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/pretty) to
compute a sequence of roughly equally spaced values. Note that the number of intervals that
results is not necessariy equal to **n** (the preferred number of classes), which can seem
confusing at first.^[For example, in our first map, the default value for n is 5, but the
*pretty* map that is constructed has 6 categories. Other classifications provide more
control over the number of intervals.]
There are many
classification types available in **tmap**, which each correspond to either a base R function, or to a
custom expression contained in the
[classIntervals functionality](https://www.rdocumentation.org/packages/classInt/versions/0.1-7/topics/classIntervals).
In this section, we review the quantile map, natural breaks map, and equal intervals map, and
also illustrate how to set custom breaks with their own labels. For a detailed discussion of
the methodological issues associated with these classifications, we refer to the GeoDa Workbook.
We continue to use the same example for **rent2008**.
### Quantile map {-}
A quantile map is obtained by setting **style="quantile"** in the `tm_fill` command.
The number of categories is taken from the **n** argument, which has a default value of 5.
So, for example, using this default will yield a quintile map with five categories
(we specify the type of map in the title through `tm_layout`).
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008",style="quantile") +
tm_borders() +
tm_layout(title = "Quintile Map", title.position = c("right","bottom"))
```
We follow the example in the GeoDa Workbook and illustrate a quartile map.
This is created by specifying **n=4**, for four categories with **style="quantile"**. The
rest of the commands are the same as before.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008",n=4,style="quantile") +
tm_borders() +
tm_layout(title = "Quartile Map", title.position = c("right","bottom"))
```
### Natural breaks map {-}
A natural breaks map is obtained by specifying the **style = "jenks"** in `tm_fill`. All
the other options are as before. Again, we illustrate this for four categories,
with **n=4**.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008",n=4,style="jenks") +
tm_borders() +
tm_layout(title = "Natural Breaks Map", title.position = c("right","bottom"))
```
### Equal intervals map {-}
An equal intervals map for four categories is obtained by setting **n=4**,
and **style="equal"**. The resulting map replicates the result in GeoDa.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008",n=4,style="equal") +
tm_borders() +
tm_layout(title = "Equal Intervals Map", title.position = c("right","bottom"))
```
### Custom breaks {-}
For all the built-in styles, the category breaks are computed internally. In order to
override these defaults, the breakpoints can be set explicitly by means of the **breaks**
argument to the `tm_fill` function. To illustrate this, we mimic the example in the GeoDa
Workbook, which uses the variable **kids2000**. The classification results in six categories.
Unlike what is the case in the GeoDa Category Editor, in **tmap** the breaks include a minimum
and maximum (in GeoDa, only the break points themselves need to be specified). As a result, in
order to end up with n categories, n+1 elements must be specified in the **breaks** option
(the values must be in increasing order).
It is always good practice to get some descriptive statistics on the variable before setting
the break points. For **kids2000**, a `summary` command yields the following results.
```{r}
summary(nyc.bound$kids2000)
```
Following the example in the GeoDa workbook, we set break points at 20, 30, 40, 45, and 50.
In addition, we also need to include a minimum and maximum, which we set at 0 and 65. Our
**breaks** vector is thus `c(0,20,30,40,45,50,60)`.
We also use the same color palette as in the GeoDa Workbook. This is a sequential scale, referred
to as **"YlOrBr"** in ColorBrewer.
The resulting map is identical to the one obtained with GeoDa.
```{r}
tm_shape(nyc.bound) +
tm_fill("kids2000",title="Percentage Kids",breaks=c(0,20,30,40,45,50,60),palette="YlOrBr") +
tm_borders() +
tm_layout(title = "Custom Breaks Map", title.position = c("right","bottom"))
```
## Extreme Value Maps {-}
In addition to the common map classifications, GeoDa also supports three types of extreme
value maps: a percentile map, box map, and standard deviation map. For details on the
rationale and methodology behind these maps, we refer to the GeoDa Workbook.
Of the three extreme value maps, only
the standard deviation map is included as a **style** in **tmap**. To create the
other two, we will need to use the custom breaks functionality. Ideally, we should
create an additional **style**, but that's beyond the scope of these notes.
The three extreme value maps in GeoDa share the same color palette. This is a diverging
scale with 6 categories, corresponding to the ColorBrewer **"RdBu"** scheme. We can
illustrate the associated colors with a `display.brewer.pal` command.
```{r}
display.brewer.pal(6,"RdBu")
```
This seems OK at first sight, but the ranking is the opposite of the one used in GeoDa. Therefore,
to reverse the order, we will need to preface the name with a **"-"** in the **palette** specification.
### Percentile map {-}
The percentile map is a special type of quantile map with six specific categories: 0-1\%,1-10\%,
10-50\%,50-90\%,90-99\%, and 99-100\%. We can obtain the corresponding breakpoints by means of the
base R `quantile` command, passing an explicit vector of cumulative probabilities
as `c(0,.01,.1,.5,.9,.99,1)`. Note that the begin and endpoint need to be included, as we
already pointed out in the discussion of the generic custom break points.
A percentile map is then created by passing the resulting break points to `tm_fill`.
In addition, we can also pass labels for the new categories, using the **labels** option,
for example, as `labels = c("< 1%", "1% - %10", "10% - 50%", "50% - 90%","90% - 99%", "> 99%")`.
#### Extracting a variable from an sf data frame {-}
There is one catch, however. When variables are extracted from an **sf** dataframe, the geometry
is extracted as well. For mapping and spatial manipulation, this is the expected behavior, but
many base R functions cannot deal with the geometry. Specifically, the `quantile` function gives
an error.
For example, this is illustrated when we extract the variable **rent2008** and then apply the quantile
function.
```{r error=TRUE}
percent <- c(0,.01,.1,.5,.9,.99,1)
var <- nyc.bound["rent2008"]
quantile(var,percent)
```
We remove the geometry by means of the `st_set_geometry(NULL)` operation in **sf**, but it still doesn't work.
```{r error=TRUE}
var <- nyc.bound["rent2008"] %>% st_set_geometry(NULL)
quantile(var,percent)
```
As it happens, the selection of the variable returns a data frame, not a vector, and the `quantile` function
doesn't know which column to use. We make that explicit by taking the first column (and all rows),
as `[,1]`. Now, it works.
```{r error=TRUE}
var <- nyc.bound["rent2008"] %>% st_set_geometry(NULL)
quantile(var[,1],percent)
```
Since we will have to carry out this kind of transformation any time we want to carry out
some computation on a variable, we create a simple helper function to extract the
variable **vname** from the simple features data frame **df**. We have added one extra feature,
i.e., we remove the name of the vector by means of `unname`.
```{r}
get.var <- function(vname,df) {
# function to extract a variable as a vector out of an sf data frame
# arguments:
# vname: variable name (as character, in quotes)
# df: name of sf data frame
# returns:
# v: vector with values (without a column name)
v <- df[vname] %>% st_set_geometry(NULL)
v <- unname(v[,1])
return(v)
}
```
#### A percentile map function {-}
To create a percentile map for the variable **rent2008** from scratch, we proceed in four steps.
First, we set the cumulative percentages to compute the quantiles, in the vector **percent**.
Second, we extract the variable as a vector using our function `get.var`. Next, we
compute the breaks with the `quantile` function. Finally, we create the map using
`tm_fill` with the **breaks** and **labels** option, and the palette set to **"-RdBu"** (to
reverse the default order of the colors).
```{r}
percent <- c(0,.01,.1,.5,.9,.99,1)
var <- get.var("rent2008",nyc.bound)
bperc <- quantile(var,percent)
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008",breaks=bperc,palette="-RdBu",
labels=c("< 1%", "1% - %10", "10% - 50%", "50% - 90%","90% - 99%", "> 99%")) +
tm_borders() +
tm_layout(title = "Percentile Map", title.position = c("right","bottom"))
```
We can summarize this in a simple function. Note that this is just a bare bones implementation. Additional
arguments could be passed to customize various features of the map (such as the title, legend
positioning, etc.), but we leave that for an exercise. Also, the function assumes (and does not check)
that `get.var` is available. For general use, this should not be assumed.
The minimal arguments we can pass are the variable name (**vnam**), the data frame (**df**), a
title for the legend (**legtitle**, set to **NA** as default, the same default as in `tm_fill`), and
an optional title for the map (**mtitle**, set to **Percentile Map** as the default).
```{r}
percentmap <- function(vnam,df,legtitle=NA,mtitle="Percentile Map"){
# percentile map
# arguments:
# vnam: variable name (as character, in quotes)
# df: simple features polygon layer
# legtitle: legend title
# mtitle: map title
# returns:
# a tmap-element (plots a map)
percent <- c(0,.01,.1,.5,.9,.99,1)
var <- get.var(vnam,df)
bperc <- quantile(var,percent)
tm_shape(df) +
tm_fill(vnam,title=legtitle,breaks=bperc,palette="-RdBu",
labels=c("< 1%", "1% - %10", "10% - 50%", "50% - 90%","90% - 99%", "> 99%")) +
tm_borders() +
tm_layout(title = mtitle, title.position = c("right","bottom"))
}
```
We illustrate this for the **rent2008** variable, without a legend title.
```{r}
percentmap("rent2008",nyc.bound)
```
### Box map {-}
To create a box map, we proceed largely along the same lines as for the percentile map,
using the custom breaks specification. However, there is a complication. The break
points for the box map vary depending on whether lower or upper outliers are present.
In essence, a box map is an augmented quartile map, with an additional lower and
upper category. When there are lower outliers, then the starting point for the
breaks is the minimum value, and the second break is the lower fence. In contrast,
when there are no lower outliers, then the starting point for the breaks will
be the lower fence, and the second break is the minimum value (there will be
no observations that fall in the interval between the lower fence and the
minimum value).
The same complication occurs at the upper end of the distribution. When there are
upper outliers, then the last value for the breaks is the maximum value, and the
next to last value is the upper fence. In contrast, when there are no upper
outliers, then the last value is the upper fence, and the next to last is the maximum.
As such, however, this is not satisfactory, because the default operation of the
cuts is to have the **interval.closure="left"**. This implies that the observation
with the maximum value (the break point for the last category) will always be
assigned to the outlier category when there are in fact no outliers. Similarly, the
observation with the minimum value (the break point for the second category) will be
assigned to the first group.
We therefore create a small function to deal with these complications and compute
the break points for a box map. Then we create a function for the box map along
the same lines as for the percentile map.
#### Computing the box map break points {-}
We follow the logic outlined above to create the 7 breakpoints (for the 6 categories) for
the box map. We handle the problem with the interval closure by rounding the maximum
value such that its observation will always fall in the proper category. We need to
distinguish between a minimum and a maximum. For the former, we use the
`floor` function, for the latter, the `ceiling` function.
Our function **boxbreaks** returns the break points. It reuses the same calculations that we
already exploited to compute the fences for the box plot. It takes a vector of observations and optionally
a value for the multiplier as arguments (the default multiplier is 1.5). The function first
gets the quartile values from the `quantile` function (the default is to compute quartiles, so
we do not have to pass cumulative probabilities). Next, we calculate the upper and lower fence
for the given multiplier value.
The break points vector (**bb**) is initialized as a zero vector with 7 values (one more than the
number of categories). The middle three values (3 to 5) will correspond to the first quartile,
median, and third quartile. The first two values are allocated depending on whether there are
lower outliers, and the same for the last two values.
```{r}
boxbreaks <- function(v,mult=1.5) {
# break points for box map
# arguments:
# v: vector with observations
# mult: multiplier for IQR (default 1.5)
# returns:
# bb: vector with 7 break points
# compute quartile and fences
qv <- unname(quantile(v))
iqr <- qv[4] - qv[2]
upfence <- qv[4] + mult * iqr
lofence <- qv[2] - mult * iqr
# initialize break points vector
bb <- vector(mode="numeric",length=7)
# logic for lower and upper fences
if (lofence < qv[1]) { # no lower outliers
bb[1] <- lofence
bb[2] <- floor(qv[1])
} else {
bb[2] <- lofence
bb[1] <- qv[1]
}
if (upfence > qv[5]) { # no upper outliers
bb[7] <- upfence
bb[6] <- ceiling(qv[5])
} else {
bb[6] <- upfence
bb[7] <- qv[5]
}
bb[3:5] <- qv[2:4]
return(bb)
}
```
We illustrate this for **rent2008**, taking advantage of our `get.var` function.
```{r}
var <- get.var("rent2008",nyc.bound)
boxbreaks(var)
```
In our example, there are both lower and upper outliers. As a result, the first and last
values in the vector are, respectively, the minimum and maximum, and positions 2 and 6
are taken by the lower and upper fence.
#### A box map function {-}
Our function `boxmap` is designed along the same lines as the percentile map. It assumes
that both `get.var` and `boxbreaks` are available.
Again, the minimal arguments we can pass are the variable name (**vnam**), the data frame (**df**), a
title for the legend (**legtitle**, set to **NA** as default, the same default as in `tm_fill`), and
an optional title for the map (**mtitle**, set to **Percentile Map** as the default). In addition, we
also need to pass a value for the multiplier (**mult**, needed by `boxbreaks`).
```{r}
boxmap <- function(vnam,df,legtitle=NA,mtitle="Box Map",mult=1.5){
# box map
# arguments:
# vnam: variable name (as character, in quotes)
# df: simple features polygon layer
# legtitle: legend title
# mtitle: map title
# mult: multiplier for IQR
# returns:
# a tmap-element (plots a map)
var <- get.var(vnam,df)
bb <- boxbreaks(var)
tm_shape(df) +
tm_fill(vnam,title=legtitle,breaks=bb,palette="-RdBu",
labels = c("lower outlier", "< 25%", "25% - 50%", "50% - 75%","> 75%", "upper outlier")) +
tm_borders() +
tm_layout(title = mtitle, title.position = c("right","bottom"))
}
```
Illustrated for the **rent2008** variable:
```{r}
boxmap("rent2008",nyc.bound)
```
### Standard deviation map
A standard deviation map is included in **tmap** as **style="sd"**. We follow the GeoDa
Workbook and return to the **rent2008** variable for our examples. Apart from setting
the style, we also specify the **palette="-RdBu"**. This yields the same map as GeoDa.
```{r}
tm_shape(nyc.bound) +
tm_fill("rent2008",title="Rent in 2008",style="sd",palette="-RdBu") +
tm_borders() +
tm_layout(title = "Standard Deviation Map", title.position = c("right","bottom"))
```
## Mapping Categorical Variables {-}
In the GeoDa Workbook, the mapping of categorical variables is illustrated by extracting
the classifications from a map as integer values. There doesn't seem to be an obvious
way to do this for a tmap element. Instead, we will resort to the base R function `cut` to
create integer values for a classification. We will then turn these into **factors** to
be able to use the **style="cat"** option in `tm_fill`. Note that it is important to have
the variable as a **factor**. When integer values are used as such, the resulting map follows
a standard sequential classification.
#### Creating categories {-}
We use the same example variables as in the GeoDa Workbook and create box map categories
for **kids2000** and **pubast00**. We first extract the variable, using our `get.var` function.
```{r}
var <- get.var("kids2000",nyc.bound)
```
Next, we construct a vector with the boxmap breakpoints using our `boxbreaks` function.
```{r}
vbreaks <- boxbreaks(var)
vbreaks
```
We verify the treatment of lower and upper fences by comparing the result to a `summary`.
```{r}
summary(var)
```
We observe that the first value in the breaks vector is the minimum, indicating lower outliers. However,
the last value is larger than the maximum, indicating the lack of upper outliers. Consequently, our
integer codes will not include the value 6.
We use the boxmap breaks to create a classification by means of the `cut` function. We pass the
vector of observations and the vector of breaks and then specify some important options. The default
in `cut` is to create labels for the categories as strings that show the lower and upper bound. Instead,
we want to have a numeric code, so we set the option **labels=FALSE** (the opposite of the default).
Also, we want to make sure that the mininum value is included if it is the lowest value (the default
is that it would not be) by setting **include.lowest=TRUE**. The result is a vector of integer codes.
Since we only have 55 observations, we can list the result (note the absence of a code of 6).
```{r}
vcat <- cut(var,breaks=vbreaks,labels=FALSE,include.lowest=TRUE)
vcat
```
Finally, we need to add the new variable as a **factor** to the simple features layer, using the
standard **$** notation. We call the new variable **kidscat**, as in the GeoDa Workbook. We could (and should)
turn this whole operation into a function, but we leave that as an exercise for now.
```{r}
nyc.bound$kidscat <- as.factor(vcat)
```
### Unique value map {-}
We are now in a position to create a unique value map for the new variable **kidscat**. All the
options are as before, except that **style="cat"** and we set **palette="Paired"** to replicate
the map in GeoDa.
```{r}
tm_shape(nyc.bound) +
tm_fill("kidscat",style="cat",palette="Paired") +
tm_borders() +
tm_layout(title = "Unique Value Map", title.position = c("right","bottom"))
```
### Co-location map {-}
A special case of a map for categorical variables is a so-called co-location map,
implemented in GeoDa. This map shows the values for those locations where two
categorical variables take on the same value (it is up to the user to make sure
the values make sense). Further details are given in the GeoDa Workbook.
To replicate a co-location map in R takes a little bit of work. First we need to
identify the locations that match. Then, in order to make a map that makes sense,
we need to make sure we pick the right colors from the color palette.
We follow the example in the GeoDa Workbook and create a co-location map for
the box map categories for **kids2000** and **pubast00**. We already created the
former, as the new variable **kidscat**. We still need to create a categorical
variable for the latter. We proceed in the same way as before to add the
variable **asstcat** to the **nyc.bound** layer.
```{r}
var2 <- get.var("pubast00",nyc.bound)
vb2 <- boxbreaks(var2)
vcat2 <- cut(var2,breaks=vb2,labels=FALSE,include.lowest=TRUE)
nyc.bound$asstcat <- as.factor(vcat2)