Question in ch2 #76

shufanzhang · 2019-07-16T09:09:15Z

from urllib.request import urlopen
from bs4 import BeautifulSoup
html=urlopen("http://www.pythonscraping.com/pages/warandpeace.html")
bs=BeautifulSoup(html,"html.parser")
nameList = bs.find_all(text='the prince')
print(len(nameList))

I run the code above and the result is 7. However, when I use 'ctrl+F' to search 'the prince' in the the browser, the result is 11. I'm confused why the results are inconsistent.

Proteusiq · 2019-07-17T06:43:22Z

That is because of casing. You have only captured 'the prince' but left out 'The prince' :) I got 11 by doing similar but with requests. You can just replace find_prince in your original code and it will work too

import re

import requests
from bs4 import BeautifulSoup

URL = "http://www.pythonscraping.com/pages/warandpeace.html"

# ignoring casing
find_prince = re.compile(r'the prince', re.IGNORECASE)

s = requests.Session()
r = s.get(URL)

soup = BeautifulSoup(r.content,'html5lib')

prince_found = soup.find_all(text = find_prince)

print(len(prince_found)) #11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question in ch2 #76

Question in ch2 #76

shufanzhang commented Jul 16, 2019

Proteusiq commented Jul 17, 2019

Question in ch2 #76

Question in ch2 #76

Comments

shufanzhang commented Jul 16, 2019

Proteusiq commented Jul 17, 2019