-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output.txt doesn't contain anything. #1
Comments
which python version are you using? |
@laxmanverma , I am using python 3.6. |
@laxmanverma Okay I will try with python 2.7 and let you know. |
I tried running it on 2.7 Error Stack Trace: Print output: |
Ok I'll check and let you know.
It may be possible that the html Dom of this site is changed.
…On Mon 16 Jul, 2018, 8:02 AM ChiragSoni, ***@***.***> wrote:
I tried running it on 2.7
It gives this error, and I tried printing pdfName and pdflink, it gives
the following output:
Error Stack Trace:
Traceback (most recent call last):
File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line
83, in
lookUp ();
File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line
80, in lookUp
crawlPage ( htmlSourceCode )
File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line
66, in crawlPage
urllib.urlretrieve ( pdfLink, pdfName )
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py",
line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py",
line 245, in retrieve
fp = self.open(url, data)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py",
line 213, in open
return getattr(self, name)(url)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py",
line 469, in open_file
return self.open_local_file(url)
File
"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py",
line 483, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: ''
Print output:
physics-All The Best For You Board Exams-TYPE html>
<title>CBSE 12th Science Previous Year Question Papers All
Subjects</title>
[image: Jobs, Recruitment, Result, Answer Key, Admit Card, News] .pdf
<http://www.4ono.com/>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALFQS7_Mhn3EmPm_HDG5G3fj3TAMtrTzks5uG_stgaJpZM4VPn0X>
.
|
Yes. |
If you want to download only PDF files then search href ending with .pdf in
Dom of any site.
I'll make it generic by next weekend.
…On Mon 16 Jul, 2018, 8:12 AM ChiragSoni, ***@***.***> wrote:
Yes.
so what can I do to make this generic, every site will be having a
separate DOM structure, I want to retrieve as many pdfs from web as much as
I can.
What can I do for that?
Can you help?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALFQS8P9zHp9zBCorPPELOsRk1YC7qJ-ks5uG_2IgaJpZM4VPn0X>
.
|
Okay thanks!! |
What exactly do you store in the output.txt apart from the pdf links, because it successfully runs but nothing is wrote in output.txt (after uncommenting the lines too).
The text was updated successfully, but these errors were encountered: