TripAdvisor Scraper

Scrap milions of hotel reviews.

About • Content • Getting Started • How to use • License

About

This project is part of an academic monography developed to collect a corpus to analyse statistical distribuitions of diacritics errors in european languages with high accent frequency and its comparison with brazilian portuguese.

Content

Scrapers written in Python3
- Italian
- Turkish
Scrapers written in NodeJS
- Hungarian
- French

Getting Started

To clone and run this application, you'll need:

Git
Node 8+ If you have different Node versions make sure to install using nvm.
Conda for Python3

How To Use

From your command line:

# Clone this repository
$ git clone https://github.com/rvitorgomes/textCrawler tripadvisor-crawler

# Go into the repository
$ cd tripadvisor-crawler

# Check for dependencies
$ conda --version; python --version; node --version; npm --version

# Install dependencies
$ npm install

# Change directory
$ cd tcc

# Create and activate a new conda environment
$ conda create -n crawler; activate crawler

# Install scrapy
$ conda install scrapy

# Run some crawler and watch out the magic
$ scrapy runspider tcc/italian.py

Common Errors

Check if you have the latest WebDriver for Firefox (geckodriver.exe) inside the project root, otherwise you can download from https://github.com/mozilla/geckodriver/releases

License

This project is licensed under Unlicense license.
Not for commercial usage.
For academic usage/citation ask me for instructions.

GitHub @rvitorgomes
Linkedin Rubens Gomes
Email [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
tcc		tcc
.gitignore		.gitignore
PPgSI-001_2014.pdf		PPgSI-001_2014.pdf
README.md		README.md
coli_a_00216.pdf		coli_a_00216.pdf
geckodriver.exe		geckodriver.exe
package-lock.json		package-lock.json
package.json		package.json
scrapy.cfg		scrapy.cfg
term-paper.pdf		term-paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TripAdvisor Scraper

Scrap milions of hotel reviews.

About

Content

Getting Started

How To Use

Common Errors

License

About

Releases

Packages

Languages

rvitorgomes/tripadvisor-crawler

Folders and files

Latest commit

History

Repository files navigation

TripAdvisor Scraper

Scrap milions of hotel reviews.

About

Content

Getting Started

How To Use

Common Errors

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages