A data processing pipeline and interactive web visualization:
Previous version:
Connected project:
- Make sure you have a Github account and can work with git locally on your computer.
- Get access to the dcaction repo (this one, if this is not a fork).
- Clone the repository to your local machine (e.g.,
git clone [email protected]:DCActionforChildren/dcaction.git
). - Run a simple server to test local instance (e.g. go to directory in terminal and run simple Python server by entering
python -m SimpleHTTPServer
). - Load the web address in your browser to view data tool.
- Go to data folder and change date in fetch_acs.rb and run in Ruby (may need to install gem/library dependencies) to create acs_tract_data.json. Note that at some point it may be a good idea to check out the ACS release info, in particular data product changes (e.g., for 2013).
- Then run crosswalk.rb to which uses the cross-walk Excel in that folder transform acs_tract_data.json into acs_nbhd_data.csv.
- Open up the Google Spreadsheet for DataBook updating.
- Check that all indicators are accounted for and up-to-date in the “Comparison” tab, and that the variable names correspond to the descriptions and explanations in the methodology.
- If ACS updates are needed, copy and paste the named variable columns (you can ignore the Census numerically-named ones in
acs_nbhd_data.csv
unless you need to debug) from acs_nbhd_data.csv file into an ACS tab and add NBHD cluster column for VLOOKUP. - Make sure the “neighborhoods CSV” spreadsheet tab is calculating from the appropriate ACS tab via a VLOOKUP. The VLOOKUP looks like this
=VALUE(VLOOKUP(A2,ACS2013!$A$1:$CA$45, 3, 0))
and looks at the clusterID inA2
then matches it to the first column inACS2013!$A$1:$CA$45
then takes the value in the cell in column3
. - Once the “neighborhoods CSV” spreadsheet tab is updated accordingly, it can be exported to CSV and saved in the Data folder (as neighborhoods.csv)to power the visualization. It is recommended to do this locally and test thoroughly before pushing to the main repo.
- The visualization will then be powered by the new data file.
- If additional data updates are needed (e.g. crime, health, child care), suggest adding them as separate tabs like ACS in the “DCAC DataBook v2 Updating” Google Spreadsheet and having the values auto-calculate in the “neighborhoods CSV” tab via VLOOKUP so that it can be easily updated in the future. Alternatively, the data processing can be done via one script but this may be tricker for DCAC to debug and maintain. (See issue #126.)
- data provenance and processing (refs issue #129)
- this repo's wiki (refs issue #130)