Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Decisions about the Civis Dataset #2

Open
DForbush opened this issue Feb 21, 2020 · 3 comments
Open

Make Decisions about the Civis Dataset #2

DForbush opened this issue Feb 21, 2020 · 3 comments
Assignees

Comments

@DForbush
Copy link
Collaborator

Importing the Civis data is cumbersome for two reasons:

  1. In the Global File, the following code (lines 49-53) takes a long time to run:
    civistable <- "cic.pdb2019trv3_us" civisdata <- read_civis(civistable, database="City of Chicago") #this will take a minute or two civisdata <- as.data.table(civisdata) civisdata <- civisdata[match(shp_tracts$GEOID, civisdata$gidtr)]

This is because the entire dataset is imported and then matched with the Chicago-specific census tracts. Is there a way, in the read_civis import line, to only import the Chicago-specific census tracts?

  1. There are 522 columns in the Civis data set. We probably don't need most of these, and it makes the data unwieldy and slow. How should we filter which columns we want to use?
@DForbush
Copy link
Collaborator Author

Another thought about this: the column names in the Civis dataset, based off of the Census Planning Database, are impossible to understand on their own. We need to rely on the Census documentation (available here ) to interpret the variables. There's no way that anybody else reading these reports is going to be able to understand the column names on their own. This isn't a problem for the data table, because it is easy to rename the columns. But it is a problem for the map- if you click on specific Census tracts in the map, the column name and value appears. We probably will need to rename the columns to make them understandable. Should we do that? And if so, where is the best place in the code to do it?

@geneorama
Copy link
Member

geneorama commented Feb 21, 2020 via email

@sherryshenker
Copy link
Collaborator

I've shared you both on a table called cic.data_dict. This has a mapping of raw columns to more human-readable column names. Let me know if its helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants