Skip to content

Modeling and visualization of NYC 311 hotline complaint type

Notifications You must be signed in to change notification settings

minaxixi/NYC-311-Complaint-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

NYC-311-Complaint-Modeling

Goal

This is a project to model the New York City 311 Hotline complaint type based on time and users' location.

Data

The New York City 311 Hotline data were downloaded from NYC Open Data and were served on AWS RedShift at the time of this project

There are 23.5 MM rows and 41 columns in this dataset in total. I sub-selected 500k rows from 2014-2019 to perform this analysis.

Summary and Key Conclusions

  • Developed a multi-class classification LightGBM model to predict the top 10 complaint types by count from 2014-2019.
  • Utilized the top-3 accuracy as metrics due to the nature of this problem (considered to be correct as long as the top-three predictions match).
  • Built a data processing pipeline for data cleaning, missing value imputation (KNN), and categorical variable encoding.
  • Created both time-lag and geo- features to correlate time and location to the target.
  • Achieved a top-3 accuracy of 0.97, which suggests that the caller's complaint type is correlated with their locations and time of the call.
  • Visualized the complaint types spatially via Tableau, and the dashboard is available upon request.

Package Dependencies

  • Python 3.6.5
  • Matplotlib 2.2.2
  • Seaborn 0.9.0
  • Pandas 0.23.0
  • Scikit-Learn 0.19.1
  • LightGBM 2.2.1

About

Modeling and visualization of NYC 311 hotline complaint type

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published