Diamonds Regression and Classification

This repository contains two projects on the Diamonds dataset

Both notebooks explore and test preproccesing, feature engineering, model training, and evaluation.

Notebooks Overview

Linear Regression
- Predics diamond prices
- Techniques: Ordinal Encoding, Log-Scaling, Feature Engineering, TrainTestSplit, and evaluation using Scatter Plots
- Key Takeaway: log-scaling drastically improved the score: ~0.90 to ~0.97
- Some thoughts: I had an amazing time coding this model, espacially at the time of finding out the huge difference log-scaling made while testing out scaling methods
Logistic Regression
- Categorizes the prices into 4 bins
- Techniques: Ordinal Encoding, Feature Engineering, Standard Scaling, GridSearchCV + Pipeline, and Confusion Matric for evaluation
- Key Takeaway: Even small differences in predicted price can rezult in misclasification, overall score: ~0.92
- Some thoughts: It is the first classifier that I ran on a regression dataset (converted into a classification dataset). Personally, the most important part of this notebook was the Confusion Matrix, showing that a score of ~0.92 can be somewhat decieving, as in reality the predictions are often very close but considered wrong because they where classified int to a bin adjecent to the correct one because of a small value difference

Clone repository:

git clone https://github.com/lakalex/Linear-Logistic-Regression.git

Open the notebooks in VS Code or Jupyter Notebook

Both notebooks contain documented information about each model and their behavior on the dataset
Both show the importance of understanding core model principles and data preprocessing, while showing the thought procces behind certain decisions

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
1_ml-diamonds-regress.ipynb		1_ml-diamonds-regress.ipynb
2_ml-diamonds-clasiffication.ipynb		2_ml-diamonds-clasiffication.ipynb
README.md		README.md