Scraping movie price from m.maoyan.com. See Main.ipynb.
Since the website use Ajax to dynamically load the webpage, used selenium and phantomjs to run the javascript inside the html.
Maoyan also hide the price information using self-defined font (woff file embedded in the html) to mapping characters, such as to the number 5. I used the convert method from ImageMagick to generate a 30dp x 20dp .jpg image file( in the mapping_num folder) and recognized the number within using a 3 layer neural networks. The training data source and the training of the neural networks can refer to the Training Neural Network.ipynb inside the training folder.
data.sqlite: sample data scrape from http://m.maoyan.com/shows/881?v=yes (881 stands for the cinema id in maoyan)
- Selenium
- Phantomjs
- ImageMagick
- BeautifulSoup
- numpy
- scipy