About • Content • Getting Started • How to use • License
This project is part of an academic monography developed to collect a corpus to analyse statistical distribuitions of diacritics errors in european languages with high accent frequency and its comparison with brazilian portuguese.
-
Scrapers written in Python3
- Italian
- Turkish
-
Scrapers written in NodeJS
- Hungarian
- French
To clone and run this application, you'll need:
- Git
- Node 8+
If you have different Node versions make sure to install using nvm.
- Conda for Python3
From your command line:
# Clone this repository
$ git clone https://github.com/rvitorgomes/textCrawler tripadvisor-crawler
# Go into the repository
$ cd tripadvisor-crawler
# Check for dependencies
$ conda --version; python --version; node --version; npm --version
# Install dependencies
$ npm install
# Change directory
$ cd tcc
# Create and activate a new conda environment
$ conda create -n crawler; activate crawler
# Install scrapy
$ conda install scrapy
# Run some crawler and watch out the magic
$ scrapy runspider tcc/italian.py
Check if you have the latest WebDriver for Firefox (geckodriver.exe) inside the project root, otherwise you can download from https://github.com/mozilla/geckodriver/releases
This project is licensed under Unlicense license.
Not for commercial usage.
For academic usage/citation ask me for instructions.
GitHub @rvitorgomes
Linkedin Rubens Gomes
Email [email protected]