-
Notifications
You must be signed in to change notification settings - Fork 0
/
analysis.Rmd
229 lines (169 loc) · 7.27 KB
/
analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
---
title: "analysis"
author: "Scott Brenstuhl"
date: "June 4, 2016"
output: html_document
---
```{r requirments, echo = FALSE, message=FALSE, warning=FALSE}
library(dplyr)
library(rgdal)
library(ggplot2)
library(ggmap)
library(maptools)
library(RSQLite)
```
### Getting the data
Aquiring all of the data for this project has been a somewhat hillarious process.
I was able to get a list of all the city parks with some information about each
of them from SF's fantastic data portal:
https://data.sfgov.org/Culture-and-Recreation/Recreation-Park-Department-Park-Info-Dataset/z76i-7s65
This need some minor cleanup mostly around malformed json in the location field
for some parks, but I'm told they will be releasing a clean updated version
soon!
This gave me the size of the park, location, and what type of park it is but not
much about the sort of things that make people love parks. Luckly Jason showed
me a site where SF Parks scores the features in parks and present it to the
public in tableau. http://sfparkscores.weebly.com/feature-scores.html After some
currious clicksploration I figured out how to download the raw data from tableau.
This was awesome because I'm able to assume that any feature they rate exists,
and they rate every park at least once a year so I should have all the features.
As you may expect, getting all of the park names matched up with one another was
a circus it but gave me some nice regex practice.
The rest of the data collection was actaully pretty painless, with shapefiles
for plotting neighborhoods/zipcodes coming from the SF data portal:
https://data.sfgov.org/Geographic-Locations-and-Boundaries/San-Francisco-ZIP-Codes-Zipped-Shapefile-Format-/9q84-kc2y
### Add something here about the elivation API if I end up doing that.
Presidio is part of the Golden Gate National Recreation Area, technically a
national park. Should find a way to get its info.
For some reason none of the mission bay parks are included:
Bay front park?
Mission Bay Commons Park?
China Basin Park?
Mission Creek Park?
Are the area codes wrong? I should reverse geocode the lat long and see
what address I get.
```{r, echo=FALSE}
sqlite <- dbDriver("SQLite")
con <- dbConnect(sqlite, "parks_discovery/db.sqlite3")
parks <- dbGetQuery(con,
"SELECT *
FROM parks_discovery_park")
park_attribs <- dbGetQuery(con,
"SELECT *
FROM parks_discovery_parkattribute")
zip2hood<- read.csv("data/zipcode_to_neighborhood.csv", stringsAsFactors = FALSE)
park_attribs <- park_attribs[!(park_attribs$attribute %in% parks$neighborhood),]
park_attribs <- park_attribs[!(park_attribs$attribute %in% parks$size),]
park_attribs <- park_attribs[!(park_attribs$attribute %in% parks$park_type),]
attrib_grid <- table(park_attribs$park_id, park_attribs$attribute)
attrib_grid <- data.frame(id = rownames(attrib_grid),
as.data.frame.matrix(attrib_grid))
park_data <- merge(parks, attrib_grid, by= "id", all = TRUE)
park_data[is.na(park_data)] <- 0
park_data <- merge(zip2hood, park_data, on = "neighborhood")
```
```{r, echo = FALSE, eval=FALSE}
# check zip by reverse geocode
rev_addresses <- mapply(function(lon, lat) {
revgeocode(c(lon,lat))
},
park_data$longitude, park_data$latitude)
rev_zips <- gsub(".*CA ","", rev_addresses) %>%
gsub(", .*","",.)
diff_zip <- park_data[!(rev_zips == park_data$zipcode),]
diff_zip['rev_zip'] <- rev_zips[!(rev_zips == park_data$zipcode)]
# Need to manually look all of these up and label correctly.
View(select(diff_zip, name, zipcode, rev_zip))
looked_up_zips <- c(94114, 94105, 94105, 94105, 94102, 94115, 94122, 94134,
94116, 94132, 94132, 94109, 94122, 94116, 94118, 94127,
94122, 94122, 94122, 94114)
# These two sets didnt get addresses with zip codes in the reverse geocode from
# gmaps will manually check.
park_data[124,]
rev_addresses[124]
park_data[188,]
rev_addresses[188]
```
```{r}
atribs <- names(attrib_grid) %>%
.[.!="id"]
total_atributes <- colSums(park_data[names(park_data) %in% atribs])
sort(total_atributes/nrow(park_data))
```
Hardscapes are thins like sidewalks, gazibos, non-athletic court concreate slabs (Tank hill)
```{r, echo = FALSE}
by_neighborhood <- group_by(park_data, neighborhood, zipcode) %>%
summarize(count = n(),
tot_acreage = sum(acreage),
dog_areas = sum(Dog.Play.Areas)/count,
child_areas = sum(Children.s.Play.Areas)/count,
courts = sum(Outdoor.Courts)/count,
athletic_fields = sum(Athletic.Fields)/count,
#garden = sum(Community.Garden)/count,
#mini_parks = sum(Mini.Park)/count,
seating = sum(Table.Seating.Areas)/count) %>%
arrange(desc(count))
```
```{r, echo=FALSE}
# map<-get_map(location='san francisco', zoom=12, maptype='roadmap')
# ggmap(map)
# ggmap(map)+
# geom_point(aes(x=longitude, y=latitude),
# data=park_data)
# Guide I used
#http://www.markhneedham.com/blog/2014/11/17/r-ggmap-overlay-shapefile-with-filled-polygon-of-regions/
sfn <- readOGR("data/sfzipcodes","SFZipCodes") %>%
spTransform(CRS("+proj=longlat +datum=WGS84"))
ggplot(data = sfn, aes(x = long, y = lat, group = group)) +
geom_path()
# names(sfn)
sfn.f <- sfn %>% fortify(region = 'zip')
# I need to learn more about this line
SFNeighbourhoods <- merge(sfn.f, sfn@data, by = 'id')
# possibly should do a full merge so none get dropped
sf <- merge(SFNeighbourhoods, by_neighborhood,
by.x = 'id', by.y= 'zipcode') %>%
arrange(order)
#ggplot(sf[sf$neighborhood== "St Francis Wood",], aes(long, lat, group = group))+
# geom_polygon(aes(fill = neighborhood))
# ggplot(sf, aes(long, lat, group = group))+
# geom_polygon(aes(fill = dog_areas))
```
```{r, echo=FALSE}
ggmap(map) +
geom_polygon(aes(fill = athletic_fields, x = long, y = lat, group = group),
data = sf,
alpha = 0.8,
size = 0.2)
```
```{r, echo=FALSE}
ggmap(map) +
geom_polygon(aes(fill = dog_areas, x = long, y = lat, group = group),
data = sf,
alpha = 0.8,
size = 0.2)
```
```{r, echo = FALSE}
ggmap(map) +
geom_polygon(aes(fill = child_areas, x = long, y = lat, group = group),
data = sf,
alpha = 0.8,
size = 0.2)
```
```{r, echo=FALSE}
by_neighborhood_plus <- unique(sf[,-1:-11])
by_neighborhood_plus[((by_neighborhood_plus$tot_acreage * .00156)/by_neighborhood_plus$sqmi)>1,]
#hahahahah all of GGP is going in cole valley. gotta fix
by_neighborhood_plus$neighborhood
```
### Todo:
Acres of park per mile in each neighborhood (need to map back to zip, file
needed is in data dir) - done
Decide how to deal with Golden gate park more eligantly (actually I think the
area codes are off. Reverse geocoding them might take care of this.) - done
Are there no city parks in
- The Presidio (It is a national park),
- The little square that is missing (Need to double check with area codes)
- Mission Bay/China Basin area (Weirdly missing, might be newer than data set)
parks$name[!(parks$name %in% park_data$name)]
Can I pull in elevation data to guess which parks have the best views?