Output.txt doesn't contain anything. #1

ChiragSoni95 · 2018-07-13T23:23:58Z

What exactly do you store in the output.txt apart from the pdf links, because it successfully runs but nothing is wrote in output.txt (after uncommenting the lines too).

laxmanverma · 2018-07-14T11:00:04Z

which python version are you using?
Use 2.7

ChiragSoni95 · 2018-07-16T02:07:34Z

@laxmanverma , I am using python 3.6.
I tried to uncomment line number, 46-48 and 65-68, it just shows the hyperlink I entered and then it just runs and doesn't stop running.
I get the following output:

ChiragSoni95 · 2018-07-16T02:08:44Z

@laxmanverma Okay I will try with python 2.7 and let you know.
Thanks

ChiragSoni95 · 2018-07-16T02:32:13Z

I tried running it on 2.7
It gives this error, and I tried printing pdfName and pdflink, it gives the following output:

Error Stack Trace:
Traceback (most recent call last):
File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line 83, in
lookUp ();
File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line 80, in lookUp
crawlPage ( htmlSourceCode )
File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line 66, in crawlPage
urllib.urlretrieve ( pdfLink, pdfName )
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 245, in retrieve
fp = self.open(url, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 469, in open_file
return self.open_local_file(url)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 483, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: ''

Print output:
physics-All The Best For You Board Exams-TYPE html>

<title>CBSE 12th Science Previous Year Question Papers All Subjects</title>

.pdf

laxmanverma · 2018-07-16T02:40:56Z

Ok I'll check and let you know. It may be possible that the html Dom of this site is changed.

…

On Mon 16 Jul, 2018, 8:02 AM ChiragSoni, ***@***.***> wrote: I tried running it on 2.7 It gives this error, and I tried printing pdfName and pdflink, it gives the following output: Error Stack Trace: Traceback (most recent call last): File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line 83, in lookUp (); File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line 80, in lookUp crawlPage ( htmlSourceCode ) File "/Users/chirag/PycharmProjects/LayoutLearning/scrape_pdfs.py", line 66, in crawlPage urllib.urlretrieve ( pdfLink, pdfName ) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 98, in urlretrieve return opener.retrieve(url, filename, reporthook, data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 245, in retrieve fp = self.open(url, data) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open return getattr(self, name)(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 469, in open_file return self.open_local_file(url) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 483, in open_local_file raise IOError(e.errno, e.strerror, e.filename) IOError: [Errno 2] No such file or directory: '' Print output: physics-All The Best For You Board Exams-TYPE html> <title>CBSE 12th Science Previous Year Question Papers All Subjects</title> [image: Jobs, Recruitment, Result, Answer Key, Admit Card, News] .pdf <http://www.4ono.com/> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALFQS7_Mhn3EmPm_HDG5G3fj3TAMtrTzks5uG_stgaJpZM4VPn0X> .

ChiragSoni95 · 2018-07-16T02:42:15Z

Yes.
so what can I do to make this generic, every site will be having a separate DOM structure, I want to retrieve as many pdfs from web as much as I can.
What can I do for that?
Can you help?

laxmanverma · 2018-07-16T02:50:29Z

If you want to download only PDF files then search href ending with .pdf in Dom of any site. I'll make it generic by next weekend.

…

On Mon 16 Jul, 2018, 8:12 AM ChiragSoni, ***@***.***> wrote: Yes. so what can I do to make this generic, every site will be having a separate DOM structure, I want to retrieve as many pdfs from web as much as I can. What can I do for that? Can you help? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALFQS8P9zHp9zBCorPPELOsRk1YC7qJ-ks5uG_2IgaJpZM4VPn0X> .

ChiragSoni95 · 2018-07-16T02:51:15Z

Okay thanks!!
I will try!

ChiragSoni95 changed the title ~~Beautiful Soup takes a lot of time~~ DownloadSamplePaper.py takes a lot of time Jul 13, 2018

ChiragSoni95 changed the title ~~DownloadSamplePaper.py takes a lot of time~~ Output.txt doesn't contain anything. Jul 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output.txt doesn't contain anything. #1

Output.txt doesn't contain anything. #1

ChiragSoni95 commented Jul 13, 2018 •

edited

Loading

laxmanverma commented Jul 14, 2018

ChiragSoni95 commented Jul 16, 2018

ChiragSoni95 commented Jul 16, 2018

ChiragSoni95 commented Jul 16, 2018

laxmanverma commented Jul 16, 2018 via email

ChiragSoni95 commented Jul 16, 2018

laxmanverma commented Jul 16, 2018 via email

ChiragSoni95 commented Jul 16, 2018

Output.txt doesn't contain anything. #1

Output.txt doesn't contain anything. #1

Comments

ChiragSoni95 commented Jul 13, 2018 • edited Loading

laxmanverma commented Jul 14, 2018

ChiragSoni95 commented Jul 16, 2018

ChiragSoni95 commented Jul 16, 2018

ChiragSoni95 commented Jul 16, 2018

laxmanverma commented Jul 16, 2018 via email

ChiragSoni95 commented Jul 16, 2018

laxmanverma commented Jul 16, 2018 via email

ChiragSoni95 commented Jul 16, 2018

ChiragSoni95 commented Jul 13, 2018 •

edited

Loading