-
Notifications
You must be signed in to change notification settings - Fork 1
pandaproject/mozfest2012
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
WHAT SKILL LEVELS DO WE HAVE SPLIT INTO PAIRS INSTALL STUFF Why write a screen scraper? To get data that is available, but not in structured format. What can I scrape? With patience, almost anything. But the more tabular the data the more straightforward it will be. When doesn't this work? When you can't be certain you've found all the data (search only, no predictable urls) What is PANDA? http://pandaproject.net/ Why put data in PANDA? To share with your colleagues. To search it. Tools and technologies: Python, Node, Ruby, Scraperwiki, Mechanize What are we going to produce today? A script you can run to extract structured data from an unstructured website. What we aren't going to cover: Sessions/cookies, regular expressions, POST urls/search params, broken HTML, Question: Does the percentage of runners who finish the race vary with wind speed? Step 1: Explain boilerplate How to fetch a webpage Scraping the year Step 2: Scraping the registered and finished runners Step 3: Scraping the wind speed Step 4: Scraping all the urls Writing to a csv Step 5: Finished script that scrapes everything
About
Mozilla Festival 2012 PANDA Project Session
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published