Under active development; not ready for use
This repository contains scripts for reading, transforming and combining datasets that are relevant for analysis of behavioral health and social determinants of health (SDoH) at the local neighborhood level (using ‘census tract’ as a proxy for ‘neighborhood’) and, where necessary, county level.
This list includes datasets which are available at one or more of the following levels of aggregation:
- Address (allows for geocoding and attribution of location to census tract)
- Census Tract
- County
These lower levels of aggregation can be rolled up to state level using FIPS codes.
The list of datasets are tracked in the .csv
file located in the data
folder, with more specific documentation found below as issues are
identified. Please push a commit marking the complete
field in the
.csv file as TRUE
. There are currently 13 datasets completed for
inclusion.
There are different output file formats for each level of aggregation in the data.
The following fields must be included in all files:
dataset
: A shortened name of the dataset, to allow for subsetting when datasets are combined.state
: Two-digit state 2010 FIPS codecounty
: Three-digit county 2010 FIPS codetract
: Six-digit tract 2010 FIPS codeyear
: The year of the published dataset.race
: Should be marked aspooled
where data is not broken out by race. Should be marked asNA
when the variable is not related to a population metric, such as in a count of facilities.gender
: Should be marked aspooled
where data is not broken out by gender. Should be marked asNA
when the variable is not related to a population metric, such as in a count of facilities.age_range
: Should be marked aspooled
where data is not broken out by age range. Should be marked asNA
when the variable is not related to a population metric, such as in a count of facilities.var_name
: The name of the variable/metric being reported.value
: The numeric value of the measure identified invar_name
stat_type
: The type of summary statistic being reported invalue
. For example:n
,mean
,se
,median
, etc.
All fields from the census tract level data should be included in all
files, other than the tract
variable.
Address-level datasets should include the following fields:
dataset
: A shortened name of the dataset, to allow for subsetting when datasets are combined.state
:county
:tract
: The census tract within which the address is located, obtained by using theTBDfun::census_tract
function.address
:lat
,lon
: Geocoded latitude and longitude coordinates ofaddress
year
: The year of the published dataset....
: Other variables specific to the dataset, which may be of value to retain, though these will not be aggregated in the tract or county-level data.
A list of available variables in the combined datasets are available in the data dictionary.