Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tripadvisor reviewers scraping total num of reviews #9

Open
karmyras opened this issue Apr 10, 2022 · 1 comment
Open

Tripadvisor reviewers scraping total num of reviews #9

karmyras opened this issue Apr 10, 2022 · 1 comment

Comments

@karmyras
Copy link

Hi, can i ask you how can i find the total number of reviews on each tripadvisor profile?
i have around 20k of profiles but i do not know how to find the total number of reviews from them.

eg.
https://www.tripadvisor.com/Profile/davideL8413AD
https://www.tripadvisor.com/Profile/PhilB2846
https://www.tripadvisor.com/Profile/Spockiwocki
https://www.tripadvisor.com/Profile/Peterkel
https://www.tripadvisor.com/Profile/SMCP1992
https://www.tripadvisor.com/Profile/yes2luvtravel

result:
davideL8413AD : 12
PhilB2846: 33
Spockiwocki: 8

etc

thank you in advance
Kostas

@furas
Copy link
Owner

furas commented Apr 11, 2022

Hi,

It uses JavaScript to add elements.

Using Selenium you can visit page with ?tab=reviews to see all reviews.

But it may need also to click button Show more because it shows only first 20 reviews.

And at start it needs to click button I Accept to accept cookies.

from selenium import webdriver
from selenium.webdriver.common.by import By
#from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.firefox import GeckoDriverManager
import time

url = 'https://www.tripadvisor.com/Profile/yes2luvtravel?tab=reviews'

#driver = webdriver.Chrome(executable_path=ChromeDriverManager().install())
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())

driver.get(url)

time.sleep(3)

# accept cookies
buttons = driver.find_elements(By.XPATH, '//button[@id="onetrust-accept-btn-handler"]')
if buttons:
    print('click Accept')
    buttons[0].click()

# click `Show More`  (few times)
while True:
    time.sleep(3)
    buttons = driver.find_elements(By.XPATH, '//div[@id="content"]//button')
    if not buttons:
        break
    print('click Show More')
    buttons[0].click()
    
# count all reviews
all_items = driver.find_elements(By.XPATH, '//div[@id="content"]//div[contains(@class, "section")]')
print('len(all_items):', len(all_items))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants