Conversation
|
Hi Alexandre (GitHub: @Alexatug) – PR #1 on branch assignment-1 Hits: Used info(), shape and unique() to inspect data, which correctly answered Q1. Standardized predictors and set up KNN with GridSearchCV and 10‑fold CV. Issues & Advice: Data leakage: After standardizing predictors, class was added back into the same DataFrame before splitting. This means the model saw the target during training and testing. Avoid including class in your feature matrix when standardizing/splitting Incorrect data split: The code used wine_df_train/wine_df_test with the class column still inside. You should split into X_train, y_train, X_test, y_test. Grid range: Parameter grid was range(1,50) (excluding 50). It should be range(1,51) to include all values from 1 to 50. Ensure you remove the target from predictors before scaling/splitting and show the grid‑search results explicitly. |
What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)
I am trying to add code to get information from dataset, and explain some concepts in classification such as standardization and setting random seed.
What did you learn from the changes you have made?
I learnt splitting datasets into training and testing datasets, standardization, setting a random seed, using KNeighborsClassifier to fit the model, using scikit-learn, numpy, and pandas in classification. I also learnt training, testing, and evaluating a classification model.
Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?
None
Were there any challenges? If so, what issue(s) did you face? How did you overcome it?
I faced challenges of removing and re-adding the response variables when performing standardization and data-splitting.
It overcame it through the participation in the work period.
How were these changes tested?
The changes were tested and worked well.
A reference to a related issue in your repository (if applicable)
Checklist