Make Decisions about the Civis Dataset #2

DForbush · 2020-02-21T19:36:49Z

Importing the Civis data is cumbersome for two reasons:

In the Global File, the following code (lines 49-53) takes a long time to run:
civistable <- "cic.pdb2019trv3_us" civisdata <- read_civis(civistable, database="City of Chicago") #this will take a minute or two civisdata <- as.data.table(civisdata) civisdata <- civisdata[match(shp_tracts$GEOID, civisdata$gidtr)]

This is because the entire dataset is imported and then matched with the Chicago-specific census tracts. Is there a way, in the read_civis import line, to only import the Chicago-specific census tracts?

There are 522 columns in the Civis data set. We probably don't need most of these, and it makes the data unwieldy and slow. How should we filter which columns we want to use?

The text was updated successfully, but these errors were encountered:

DForbush · 2020-02-21T19:42:52Z

Another thought about this: the column names in the Civis dataset, based off of the Census Planning Database, are impossible to understand on their own. We need to rely on the Census documentation (available here ) to interpret the variables. There's no way that anybody else reading these reports is going to be able to understand the column names on their own. This isn't a problem for the data table, because it is easy to rename the columns. But it is a problem for the map- if you click on specific Census tracts in the map, the column name and value appears. We probably will need to rename the columns to make them understandable. Should we do that? And if so, where is the best place in the code to do it?

geneorama · 2020-02-21T20:41:54Z

I agree, I would consider keeping a lookup table

sherryshenker · 2020-03-03T21:45:58Z

I've shared you both on a table called cic.data_dict. This has a mapping of raw columns to more human-readable column names. Let me know if its helpful!

DForbush assigned geneorama and DForbush Feb 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Decisions about the Civis Dataset #2

Make Decisions about the Civis Dataset #2

DForbush commented Feb 21, 2020

DForbush commented Feb 21, 2020

geneorama commented Feb 21, 2020 via email •

edited

Loading

sherryshenker commented Mar 3, 2020

Make Decisions about the Civis Dataset #2

Make Decisions about the Civis Dataset #2

Comments

DForbush commented Feb 21, 2020

DForbush commented Feb 21, 2020

geneorama commented Feb 21, 2020 via email • edited Loading

sherryshenker commented Mar 3, 2020

geneorama commented Feb 21, 2020 via email •

edited

Loading