A jupyter notebook tutorial on data collecting and web scrapping for financial news site, as part1 of a NLP pipline series
-
Ethical Scrapping:
-
Efficent Scrapping:
-
Pre-Code Analysis:
- Examining the Source
- Examining the HTML
-
Code:
- Envirenment and Setup
- Imports
- Making a request to a single page
- Code Structure
- Getting the details of a single Article
- Getting the details of a single Page: (list of Articles)
- Saving to CSV
- Looping over the Pages of the Category: (the General function)
-
Checking the resulting dataset
-
Future Improvements
-
Up next: Starting our NLP pipline for this dataset
-
Ressources