Elamraoui-Sohayb / Ethical_Scrapper Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

A tutorial on data collecting and web scrapping for financial news site, as part 1 of an NLP pipline series

1 star 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Ethical_Data_Collection		Ethical_Data_Collection
README.md		README.md

Repository files navigation

Ethical_Scrapper:

A jupyter notebook tutorial on data collecting and web scrapping for financial news site, as part1 of a NLP pipline series

Notebook Contents:

Ethical Scrapping:
Efficent Scrapping:
Pre-Code Analysis:
1. Examining the Source
2. Examining the HTML
Code:
1. Envirenment and Setup
2. Imports
3. Making a request to a single page
4. Code Structure
5. Getting the details of a single Article
6. Getting the details of a single Page: (list of Articles)
7. Saving to CSV
8. Looping over the Pages of the Category: (the General function)
Checking the resulting dataset
Future Improvements
Up next: Starting our NLP pipline for this dataset
Ressources

BY: Elamraoui Sohayb,

About

A tutorial on data collecting and web scrapping for financial news site, as part 1 of an NLP pipline series

nlp tutorial jupyter-notebook requests dataset scraping-websites beautifulsoup4

Report repository

Releases

No releases published

Packages

Languages

Jupyter Notebook 100.0%