The Candy Recommender System is designed to predict an individual's candy preferences based on their taste profile. The project utilizes machine learning techniques, including clustering and recommendation algorithms, to analyze a dataset of 85 different candies and their attributes.
The dataset includes the following features:
- chocolate: Contains chocolate or not.
- fruity: Fruit-flavored or not.
- caramel: Presence of caramel.
- peanutyalmondy: Contains peanuts, peanut butter, or almonds.
- nougat: Presence of nougat.
- crispedricewafer: Contains crisped rice wafers or cookie component.
- hard: Hard candy or not.
- bar: Candy bar or not.
- pluribus: One of many candies in a bag or box.
- sugarpercent: Sugar content percentile.
- pricepercent: Price percentile.
- winpercent: Popularity measure based on vote percentage.
- Correlation Matrix: A heatmap was generated to visualize the correlations between different features in the dataset.
- Pairplot: Used for exploring the relationships between features.
- Checked for null values and found none.
- Removed duplicates, resulting in a dataset with 85 entries and 13 features.
- Feature Selection: Selected 9 features related to candy attributes for clustering.
- Standardization: Applied
StandardScaler
to standardize the features. - Principal Component Analysis (PCA): Reduced the dataset to 2 principal components, explaining approximately 61.19% of the variance.
- K-Means Clustering: Tested various numbers of clusters using the Silhouette Method. The optimal number of clusters was determined to be 4.
- Scatter Plot: Visualized the clusters using the two principal components.
- Cosine Similarity: Calculated the cosine similarity between candies based on their attributes to recommend similar candies.
- Recommendation Function: A function was implemented to recommend candies similar to a given candy.
Recommending 5 candies similar to "Nestle Crunch":
- Kit Kat
- Nestle Crunch
- 100 Grand
- Twix
- Tootsie Roll Snack Bars
- Root Mean Square Error (RMSE): The model's performance was evaluated using RMSE, which was calculated to be approximately 0.228.
- Clustering: Clear distinctions between candy types were identified, making the clustering model effective.
- Recommendations: The recommender system successfully identified similar candies based on taste profiles.
- Positive: Helps users discover new candies they might enjoy, provides personalized recommendations, and can promote health awareness by suggesting lower-sugar options.
- Negative: Could increase sugar consumption, especially among children.
- Pranav ytDadu
- Yojith Mandapati
- Thien Nguyen
- Kegan Edwards
- Ninh Nguyen