- Objective: Explore the best numeric feature subset and K-value for car price prediction.
Dataset
is from 1985 Ward's Automotive Yearbook
Available from
: https://archive.ics.uci.edu/dataset/10/automobile
- Data Cleansing and Data Transformation
- Univariate KNN model with single K-value
- Univariate KNN model with multiple K-values (Hyperparameter Tuning)
- Multivariate KNN model with single K-Value
- Multivariate KNN model with multiple K-Values (Hyperparameter Tuning & K-Fold Cross-Validation)
- Best Feature Subset: ['city-mpg', 'wheel-base', 'curb-weight', 'highway-mpg', 'peak-rpm']
Best k Value: 1
Best Average Accurcy: 86.43%
- With the multivariate KNN hyperparameter tuning, we used
f_regression
scoring function for best feature subset selection.
However,f_regression
only examines the linear relationship between features and target, and returnp-value
. - We can further apply
Mutual Information
function fromScikit-learn
for feature selection, to see if non-linear results can make the prediction better. - We only used numerical features for prediction. It is interesting to explore how combinations of categorical and numeric features can achieve better accuracy by implementing
One-Hot Encoding
.