-
Notifications
You must be signed in to change notification settings - Fork 1
/
GMTMaps.jmd
244 lines (154 loc) · 8.58 KB
/
GMTMaps.jmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
# Visualizing Spatial Data with GMT
GMT is the "Generic Mapping Tools" a powerful toolset for geospatial visualizations. With great power comes great learning curves.
The goal here is to make it possible for someone who has a data analysis background but not much specific knowledge of geospatial software to
make maps that show their data in 2D maps, and simultaneously to learn something about the framework and what it is capable of, and how to
use it.
# Problem 1: a Choropleth map of US Census Data
One of the most common tasks in basic geospatial data analysis is to create a map in which regions of the world are colored based on
the value of some statistic pertaining to the region. We will start with a very simple map of population for each county in the US.
The data for this can be found from the Census bureau at https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/totals/co-est2020-alldata.csv
```julia
using CSV,DataFrames,GMT,Printf,DataFramesMeta
countypopurl = "https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/totals/co-est2020-alldata.csv"
download(countypopurl,"data/countypops.csv")
cpopall = CSV.read("data/countypops.csv",DataFrame)
cpop = cpopall[:,[:STATE,:COUNTY,:STNAME,:CTYNAME,:POPESTIMATE2020]]
```
Now we have a large number of different estimates for each county in the US. For simplicity let's use the POPESTIMATE2020 variable. The COUNTY
variable is the FIPS county code for the county.
In order to build a Choropleth map, we will need to have a file that defines the spatial boundaries of each county! GMT is a generic mapping tool
It will read various geospatial data formats which can be used to define the regions to be plotted. The Census bureau has a file which defines
the boundaries of the counties at 3 different resolutions. These are in "shapefile" format. The files and other related shapefiles are available
[at the Census website](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html). We will use the medium
resolution version of the county shapefiles (5 million meter resolution).
```julia{eval=false}
countyshapeurl = "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_5m.zip"
download(countyshapeurl,"data/cb_2018_us_county_5m.zip")
cd("data")
run(`unzip cb_2018_us_county_5m.zip`)
cd("..")
```
Inside the zip file are a number of files, the one ending in ".shp" is in the shape file format. We can read this using GMT's function
gmtread
```julia
counties = gmtread("data/cb_2018_us_county_5m.shp")
```
Now, what kind of thing is "counties?"
```julia
typeof(counties)
```
It's a Vector{GMTdataset}. Basically GMT treats the shape file as a collection of shapes, each shape gets its own GMTdataset, which is a structure
```{eval=false}
search: GMTdataset
No documentation found.
Summary
≡≡≡≡≡≡≡≡≡
mutable struct GMTdataset{T<:Real, N}
Fields
≡≡≡≡≡≡≡≡
data :: Array{T<:Real, N}
ds_bbox :: Vector{Float64}
bbox :: Vector{Float64}
attrib :: Dict{String, String}
colnames :: Vector{String}
text :: Vector{String}cpop.CFIPS = format("%03d",cpop.COUNTY)
geom :: Int64
Supertype Hierarchy
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
GMTdataset{T<:Real, N} <: AbstractArray{T<:Real, N} <: Any
```
In addition to "data" which is lattitude and longitude values in this case, there are a variety of metadata fields. one of which is "header".
Let's look at the header of the first county:
```julia
counties[1].header
counties[1].attrib
```
This shows that the header is a comma separated values string. We want to "join" our data to these counties such that the county FIPS identifier
is the same. The FIPS identifier is a unique numerical code assigned to each county (the names are not necessarily unique).
In this case, we have the value 39 being the FIPS code for the state of Ohio, and 071 being the county FIPS code for Highland County.
So we'd like to join field 2 of the header to the COUNTY column of our population data.
However our population data represents the COUNTY field as a number, not a 0 padded string. Let's fix that then do the join
```julia
cpop.COUNTY = Printf.format.(Ref(Printf.Format("%03d")),cpop.COUNTY)
cpop.STATE = Printf.format.(Ref(Printf.Format("%02d")),cpop.STATE)
cpop = cpop[cpop.COUNTY .!= "000",:] ## eliminate county "000" which is "the whole state" for each state
dfc = DataFrame(STATE = map(x->x.attrib["STATEFP"],counties),COUNTY=map(x -> x.attrib["COUNTYFP"],counties),ORDER=1:length(counties))
joineddata = @chain leftjoin(dfc,cpop,on= [:STATE,:COUNTY],makeunique=true) begin
@orderby(:ORDER)
end
cptvallog = makecpt(range=(log(1000),log(11e6)),C=:plasma)
#imshow(counties,level=joineddata.POPESTIMATE2020,cmap=cptval,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),proj=:guess,colorbar=true)
```
There are missing values of POPESTIMATE2020 so let's replace them with NaN so we at least still get the polygon drawn. GMT will draw NaN
as a special color.
```julia
joineddata.POPESTIMATE2020 = replace(joineddata.POPESTIMATE2020,missing => NaN)
GMT.plot(counties,level=log.(1.0 .+ joineddata.POPESTIMATE2020),cmap=cptvallog,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
proj=:guess,colorbar=true,figname="choroplethlog.png",title="Continental US County log(Population)")
```
![Choropleth of log(population) at county level](choroplethlog.png)
Let's see what it looks like if we don't take the logarithm...
```julia
cptval = makecpt(range=(0,11_000_000),C=:plasma)
GMT.plot(counties,level=joineddata.POPESTIMATE2020,cmap=cptval,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
proj=:guess,colorbar=true,figname="choroplethnolog.png",title="Continental US County Population")
```
![Continental Population](choroplethnolog.png)
It may be interesting to work further with the polygons that represent the counties in some additional ways. For example we could calculate the population
Density by dividing the population by the area of the polygon. We can calculate the area using the function gmtspatial.
```julia
measures = gmtspatial(counties, area="k")[1] # to get area in km^2
print(typeof(measures))
length(measures)
print(measures)
```
But some counties will consiste of several polygons, like some along the coast may include some islands, etc, to get the total area, we
need to group by county identifier, and sum all the areas.
```julia
countyareas = @chain dfc begin
@transform(:area = measures[:,3])
groupby([:STATE,:COUNTY])
@combine(:totarea = sum(:area))
@select(:STATE,:COUNTY,:totarea)
end
cpop = @chain leftjoin(cpop,countyareas,on = [:STATE, :COUNTY]) begin
@transform(:density = :POPESTIMATE2020 ./ :totarea)
@orderby(:STATE,:COUNTY)
end
print(cpop[1:10,:])
joineddata = @chain leftjoin(dfc,cpop,on = [:STATE,:COUNTY]) begin
@orderby(:ORDER)
end
```
```julia
denscpt = makecpt(range=(0,11e6/(100*100)),C=:plasma)
GMT.plot(counties,level=Vector{Float64}(replace(joineddata.density,missing=>NaN)),cmap=denscpt,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
proj=:guess,colorbar=true,figname="choroplethdens.png",title="Continental US County Pop Density (1/km^2)")
```
![Continental US Density](choroplethdens.png)
## Smallest Area Containing ~50% of the population
Our approach to this question will be to sort the counties by density in decreasing order, cumsum the populations... and then create an indicator
variable for whether the cumsum is less than 50% of the total, then plot this indicator variable.
```julia
totpop = sum(cpop[cpop.COUNTY .!= "000",:POPESTIMATE2020]) ## ignore county 0, that's the "whole state"
print(totpop)
areads = @chain cpop begin
@subset(:COUNTY .!= "000")
@orderby(-:density)
@transform( :csum = cumsum(:POPESTIMATE2020))
@transform( :inset = :csum .< totpop/2.0,:in80set = :csum .< 0.8 * totpop)
end
joineddata = @chain leftjoin(dfc,areads,on=[:STATE,:COUNTY]) begin
@orderby(:ORDER)
end
insetcmap = makecpt(range(0,1),C=:plasma)
GMT.plot(counties,level=Float64.(replace(joineddata.inset,missing => NaN)),cmap=insetcmap,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
proj=:guess,colorbar=true,figname="choroplethhpdset.png",title="Continental US smallest 50% set ")
```
![high probability density set 50%](choroplethhpdset.png)
## 80%tile smallest area
```julia
GMT.plot(counties,level=Float64.(replace(joineddata.in80set,missing => NaN)),cmap=insetcmap,close=true,fill="+z",pen=0.25,region=(-125,-65,24,50),
proj=:guess,colorbar=true,figname="choroplethhpd80set.png",title="Continental US smallest 80% set ")
```
![high probability density set 50%](choroplethhpd80set.png)