This repo contains my food calorie regression project with Codeup.
This data is sourced from the USDA Food Data Central.
The goal of this project is to understand which vitamins, minerals, and other nutrients are the best predictors of calories, does the quantity matter, and how accurately can calories be predicted. Additionally, how accuractely can a food group be predicted based off of the vitamins, minerals, and other nutrients that make up the foods.
It is important for people to understand what they are eating and how it may affect their bodies and physical goals. Furthermore, is it important to make sure that foods are marketed correctly, making the ability to predict a foods group imperative.
1) Are the mean calories for a each food group equal?
2) Is there a relationship between protein intake and calories
3) Are carbhohyrdrates and calories correlated?
4) What kind of relationship exists between fats and calories?
Target | Meaning |
---|---|
calories | amount of calories in the food item |
Variable | Meaning |
---|---|
food_group | into what group does the food fall |
fat | amount in grams |
protein | amount in grams |
carbohydrate | amount in grams |
sugars | amount in grams |
fiber | amount in grams |
saturated fats | amount in grams |
water | amount in grams |
alcohol | amount in grams |
cholesterol | amount in milligram |
calcium | amount in milligram |
iron | amount in milligram |
potassium | amount in milligram |
magnesium | amount in milligram |
vitamin c | amount in milligram |
vitamin e alphatocopherol | amount in milligram |
omega 3s | amount in milligram |
omega 6s | amount in milligram |
phosphorus | amount in milligram |
sodium | amount in milligram |
zinc | amount in milligram |
copper | amount in milligram |
thiamin b1 | amount in milligram |
riboflavin b2 | amount in milligram |
niacin b3 | amount in milligram |
vitamin b6 | amount in milligram |
choline | amount in milligram |
fatty acids total monounsaturated | amount in milligram |
fatty acids total polyunsaturated | amount in milligram |
caffeine | amount in milligram |
theobromine | amount in milligram |
vitamin a | amount in micrograms |
vitamin b12 | amount in micrograms |
vitamin d | amount in micrograms |
selenium | amount in micrograms |
folate b9 | amount in micrograms |
folic acid | amount in micrograms |
food folate | amount in micrograms |
folate dfe | amount in micrograms |
retinol | amount in micrograms |
carotene beta | amount in micrograms |
carotene alpha | amount in micrograms |
lycopene | amount in micrograms |
lutein + zeaxanthin | amount in micrograms |
vitamin k | amount in micrograms |
- You will need to go to this website, Food Data and open the file up in Google Sheets. The following punctuation needs to be removed before data can be read via Pandas, ():-,. Furthermore, the following columns should be dropped, Serving Weight 1-9 description g (the 1-9 is because there are 9 columns with this name).
- From the Google Sheet Readme, "all serving sizes are in 100 grams. Use the serving size conversion weights at the end of the file to convert values. For example to convert to an ounce (28.4g) multiply each value by 0.284".
- Download from Google Sheets as a CSV.
- Clone this repo and ensure wrangle.py and prepare.py are on your local machine.
- Verify *.csv is in the .gitignore to ensure the csv file is not pushed to GitHub.
- The technologies used in this project are Python 3.9.5, Pandas 1.3.5, MatPlotLib 3.5.0, Numpy 1.21.2 Seaborn 0.11.2, Scipy 1.7.3, and SkLearn 1.0.1. The notebook named report.ipynb should run.
- Wrangle the data from the xlsx file.
- Visualizations and statistical tests.
- Regression and clustering machine learning using ENTER CHOSEN MODELS HERE.
- Fit on the training data and check for overfitting with the validation data.
- Pick the best model to test and move into production.
- Discuss some recommendations and next steps I would like to do with this project.