UofT-DSI | LCR- Assignment 1 #319

davidancor · 2025-12-31T22:06:33Z

UofT-DSI | LCR- Assignment 1

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

We're completing assignment 1: data inspection, standardization and data-splitting, model initialization and cross-validation, and model evaluation.

What did you learn from the changes you have made?

I learned the importance of data standardization and ensuring proper data splitting. Data splitting ensures model training is done only on the training data set and validated against the validation data set. And only once complete, the final test is validated against the test data. Also, learned about GridSearchCV can automate cross-validation testing to provide us with a way to identify the best number k-neighbours for the data set (in this case, best k = 7). I also learned that knn.score() returns accuracy (not recall), and accuracy_score(y_true, y_pred) is basically the explicit way to calculate the same thing using predictions.

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

When looking at the assignment I saw the predictors were standardized but the classes were not - here, I was concerned about indexation mismatch once combining the "classes" and the predictor variable; however, I've managed to solve this by only standardizing the predictor columns and keeping the class column untouched.

Also, I initially used the knn.score() to measure model accuracy, however, since we were told to use "accuracy_score", I had to redo this function. I also considered using precision, but the assignment specifically asks for accuracy_score.

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

At first it wasn’t super obvious why we don’t just evaluate everything on the test set once it exists. I realized the point of the test set is that it’s the “final exam” — you shouldn’t touch it during tuning, or you’ll start picking k based on it (aka leakage). So: tune using CV on training, test only once at the end.

I assumed knn.score() was mixing accuracy and recall or doing something more “advanced.” After looking into it, I learned knn.score() is just accuracy for classification. accuracy_score is the same idea but you calculate it manually from y_true + y_pred.

How were these changes tested?

Printed/checked the dataset after loading (head/info/shape) to make sure it looks right.

Confirmed standardized predictors, and the class’s index synced up post data standardization.

A reference to a related issue in your repository (if applicable)

Checklist

[ x ] I can confirm that my changes are working as intended

juliagallucci · 2026-01-05T20:54:42Z

accidently sent PR to main

davidancor added 2 commits December 31, 2025 16:36

submitting assignment 1

682bb4a

submitting assignment 1

26606af

juliagallucci closed this Jan 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UofT-DSI | LCR- Assignment 1 #319

UofT-DSI | LCR- Assignment 1 #319

Uh oh!

davidancor commented Dec 31, 2025

Uh oh!

juliagallucci commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

UofT-DSI | LCR- Assignment 1 #319

UofT-DSI | LCR- Assignment 1 #319

Uh oh!

Conversation

davidancor commented Dec 31, 2025

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

What did you learn from the changes you have made?

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

How were these changes tested?

A reference to a related issue in your repository (if applicable)

Checklist

Uh oh!

juliagallucci commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants