-
- The objective of the project was to successfully scrape data from multiple space related websites and display the data on a webpage. After compiling the data that was scraped, I first stored it in a Mongo database. Once stored in the database, the data from our mongo database was stored on a webpage through a flask server. The webpage has a somewhat basic design and layout since this project is focused on scraping, using flask servers, and mongo databases.
-
- Jupyter Notebook
- BeautifulSoup
- Pandas
- Splinter
- MongoDB
- Flask
- HTML
-
-
-
- I used splinter to manipulate the google chrome browser, pulling html from each site. Once I had the html for each site, I used beautifulsoup to parse the html. Initially, I set up the browser to visit each site, grab the html, parse the html, extract the data needed, then move onto the next site. However, I changed the order of these steps to limit the number of times the browser was opening and closing to increase efficiency.
-
- Used flask to create a Server which would host the webpage
- Used Pymongo to set-up the Mongo database
- In the app file there are 2 main routes, one route being the homepage and one route being the scrape function.
-
Step 3: Combining Scraping process with Flask server Webpage and Mongo database through use of route paths.
- Once the server and database were configured, I imported the scraping function into the app.py file. When the scrape button on the homepage is pressed, the user is directed to the "scrape" route which contains the execution of the scrape function. Once the data was scraped, the mongo database would be updated and the user would be redirected to the home route, which displays the scraped data.
-
-
- scraping data is useful and interesting. It seems like it would be more useful for one-off data retrieval instead of ongoing data retrieval considering the likelihood of a site's HTML and layout changing. Flask seems like a great tool for developing a webserver, however I'd like to see and use a version of Flask that could be used in deployment, not just development.
- Images of final page:
travisb98/Mars_Scraping
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|