Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STORY] Explore ingredients of vegetarian Indian food #1

Open
2 of 18 tasks
dharashah410 opened this issue Mar 10, 2019 · 16 comments
Open
2 of 18 tasks

[STORY] Explore ingredients of vegetarian Indian food #1

dharashah410 opened this issue Mar 10, 2019 · 16 comments
Assignees

Comments

@dharashah410
Copy link
Member

dharashah410 commented Mar 10, 2019

Ideation

  • Decide story pitch with domain experts
    • Does the story idea add value?
    • Is data available?

Prepare

  • Data collection and cleaning
  • Story insight
  • Get story reviewed from domain experts (Is there even a story here?)

Plan

  • Structure of the story
  • Design the visualization
  • Get story reviewed from domain experts (Is the story plan making sense?)

Execute

  • Code the visualization
  • Write the essay
  • Get story reviewed from domain experts

Publish

  • Prepare social media images
  • Publish
    • Post on site
    • Post on Instagram
    • Post on Facebook
    • Post on Twitter
@dharashah410 dharashah410 changed the title [STORY] Explore vegetarian Indian food [STORY] Explore ingredients of vegetarian Indian food Mar 10, 2019
@dharashah410
Copy link
Member Author

@dharashah410
Copy link
Member Author

dharashah410 commented Mar 13, 2019

1. Clean up

Hi @rjpraj, first up - let us clean up the names and nomenclature so the data structure is readable.

ingredient_table.csv

  • Rename CSV from ingredient_table.csv to ingredients.csv
  • Change column name from Ingrident_id to id
  • Change column name from ingridents to name

recipe_name.csv

  • ID cannot start from 0
  • Rename CSV from recipe_name.csv to recipes.csv
  • Rename column recipe_id to id
  • Rename column Recipe Name to name
  • Rename column name desc to description
  • Rename column name cook_time to cook_time_in_minutes
  • Rename column name prep_time to preparation_time_in_minutes
  • Rename column name total_time to total_time_in_minutes

quantity_table.csv

  • Rename CSV from quantity_table.csv to ingredient_quantities.csv
  • Rename column Recipe_Id to recipe_id
  • Rename column Ingredient_Id to ingredient_id

nutrition.csv

  • Rename CSV from nutrition.csv to nutrition_values.csv
  • Rename column Recipe_Id to recipe_id
  • Rename column Nutrient to key
  • Rename column Values to value

method_table.csv

  • Rename CSV table from method_table.csv to steps.csv
  • Rename column Recipe_Id to recipe_id
  • Rename column Step to step
  • Rename column Desc to description

2. Upload the Code

  • Upload your python script code

@dharashah410
Copy link
Member Author

Finish above task... meanwhile, I will come up with new todos. Below is my draft that I am still working on....

#5 Things we need to improve after above todos are done

If we take the example of https://www.tarladalal.com/Achaari-Aloo-Roll-(-Wraps-and-Rolls)-32665r

  • To the recipes table, we need to add how much does this recipe make? -> Makes 4 rolls
  • Add a tags (name) and taggable table (recipe_id, tag_id) and store -> Rolls, Saute, RefrigeratorNon Stick Kadai, Veg
  • From the breadcrum we need to store --> Starters / Snacks, Rolls in the tags

@rjpraj
Copy link
Contributor

rjpraj commented Mar 13, 2019

I am making the changes suggested but since recipe_id and ingredient_id are acting as the foreign key both should have a unique name rather than using "id" as the primary key in both recipe and ingredients table. Or should I change it to id only?

@dharashah410
Copy link
Member Author

Yes, change it to "id" only. Also, I opened 2 issues above.

@dharashah410
Copy link
Member Author

dharashah410 commented Mar 17, 2019

1. Improve the data from Tarla Dalal.com

recipe_name.csv

  • Add a new column called image_url and store URL of the image recipe
  • Add a new column called number_of_views and store how many times was it viewed --> 150491 times
  • Clean data in cook_time_in_minutes to ensure there is no units e.g. minutes or hours. And all values that have hours get standardized to minutes.
  • Clean data in preparation_time_in_minutes to ensure there is no units e.g. minutes or hours. And all values that have hours get standardized to minutes.
  • Clean data in total_time_in_minutes to ensure there is no units e.g. minutes or hours. And all values that have hours get standardized to minutes.

quantity_table.csv

  • Add column called "unit" and split value in quantity column. Units like cup, tbsp, tsp, etc. should go into new column and quantity column should only have number

nutrition.csv

  • Add column called "unit" and split value in Values column. Units like cal, g, etc. should go into new column and Values column should only have number

ingredient_table.csv

2. Upload the Code and Data

  • Upload your python script code

3. Improve data from Wikipedia

ingredient_table.csv

  • Add a new column called Wikipedia link. Search Wikipedia / WikiData to find respective ingredient
  • Add a new column called Wikipedia image. Store the primary image of the page in this column.

4. Upload the Code

  • Upload your python script code

@rjpraj
Copy link
Contributor

rjpraj commented Mar 18, 2019

I will be making changes so that the ingredients are unique in the CSV files, but I had a doubt in cases where the form of the ingredient is different it should be treated as a different ingredient ? for example Cinnamon sticks and Cinnamon powder should be different ingredients right?

@dharashah410
Copy link
Member Author

@rjpraj We want Cinnamon sticks and Cinnamon powder to be treated as the same thing since we are analyzing the data instead of helping readers cook recipes

@dharashah410
Copy link
Member Author

dharashah410 commented Mar 29, 2019

@rjpraj

  • Green Asparagus and Asparagus are in 2 rows. It should be 1 row for Asparagus.
  • Similarly, there are 5 different rows for Raw Mangoes -- Raw Mangoes, Raw Mango Cubes, Chopped Raw Mango, Grated Raw Mangoes. All these are same ingredient called "Raw Mango"
  • Certain Wikipedia links for ingredients are broken. There is no page available for Turmeric Powder, in that case add link to Turmeric (core ingredient and not derived one). Eg: https://en.wikipedia.org/wiki/Turmeric Powder. Clean up for other such ingredients as well
  • There should be no Hindi names present in "name" column Eg: Kabuli Chana (White Chick Peas).
  • Remove Hindi name from brackets in "name" column

Did you upload all your latest python scripts?

@rjpraj
Copy link
Contributor

rjpraj commented Mar 31, 2019

I have tried to remove all the redundancy in the ingredients name but since this is a manual and time-consuming process I may have missed out on some. Please let me know here in such a case.

@dharashah410
Copy link
Member Author

dharashah410 commented Apr 1, 2019

@rjpraj

  • Replace Chopped Onions with "Onions", Chopped Green Chillies with "Green Chillies", Grated Ginger with "Ginger", Mashed Potatoes with "Potatoes", Boiled Green Peas with "Green Peas", Sliced Ladies Finger with " Ladies Finger", Chopped Tomatoes with "Tomatoes", Paneer Cubes with "Paneer"
  • I cannot see row for "Raw Mango". Did it make it as "Mango"? Please note: "Mango" and "Raw Mango" are 2 different core ingredients.

Once this is done, replace the broken Wikipedia links and add primary image from Wikipedia for each ingredient

@rjpraj
Copy link
Contributor

rjpraj commented Apr 4, 2019

I have made all the changes suggested. There are no more broken links and I have added the image URL for the ingredients as well.

@dharashah410
Copy link
Member Author

dharashah410 commented Apr 4, 2019

  • Rename Ingredients.csv file to "ingredients.csv" (lower case)
  • Rename column Wikipedia Links to "wikipedia_links"
  • Rename column Hindi name to "hindi_name"
  • Capitalise first letter of all Hindi names
  • Rename column Image Link to "wikipedia_image"
  • Upload the latest python script

@dharashah410
Copy link
Member Author

dharashah410 commented May 7, 2019

@rjpraj

Data collection

  • Scrape data for remainder recipes (B-Z)

Data cleanup

  • Remove duplicate ingredients from the table
  • Create a mapping array which removes words like chopped, grated, sliced etc
  • Ensure that Hindi names are not present in English name columns
  • Dynamically generate Wikipedia links from ingredients name
  • Figure out the pattern to match remaining Wiki URLs (later)
  • Add image URLs of primary image from Wikipedia

Scripts

  • Write the code to pull out data whenever new recipe is added.
  • Write the code to pull only newly added recipes and not scrape all the data again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants