Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageFromTweet Runnable Error #9

Open
Yicong-Huang opened this issue Oct 12, 2019 · 3 comments
Open

ImageFromTweet Runnable Error #9

Yicong-Huang opened this issue Oct 12, 2019 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@Yicong-Huang
Copy link
Contributor

Describe the bug
When launching ImageFromTweet Runnable, it gives the following error:

SQL: select id, text from records r WHERE NOT EXISTS (select distinct id from images i where i.id = r.id) limit 100
[DATABASE] HOST = cloudberry05.ics.uci.edu, CONNECTION COUNT = 13, MAXIMUM = 100
extracting [], results = []
error: Traceback (most recent call last):
  File "/extra/yicongh10/wildfires/backend/task/image_from_tweet.py", line 25, in run
    f"select id, text from records r WHERE NOT EXISTS (select distinct id from images i where i.id = r.id) limit {batch_num}")})
  File "/extra/yicongh10/wildfires/backend/task/image_from_tweet.py", line 23, in <dictcomp>
    self.dumper.insert({id: self.extractor.extract(text) for id, text in
  File "/extra/yicongh10/wildfires/backend/data_preparation/extractor/tweet_media_extractor.py", line 28, in extract
    link_type: MediaURL = URLClassifier.classify(short_url)
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 91, in new_function
    return timeout_wrapper(*args, **kwargs)
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 150, in __call__
    return self.value
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 173, in value
    raise load
requests.exceptions.ConnectionError: None: Max retries exceeded with url: /how-to-make-a-hydrogen-conversion-kit-at-home/ (Caused by None)```


**To Reproduce**
1. start `TaskManager`
2. start a thread for `ImageFromTweet`
3. set loop time 600, other parameter are default

**Expected behavior**
Should work without error and extract image links from tweets.

**Desktop (please complete the following information):**
 - OS: CentOS 7 

@Yicong-Huang Yicong-Huang added the bug Something isn't working label Oct 12, 2019
@JHaoX JHaoX self-assigned this Oct 22, 2019
@JHaoX
Copy link

JHaoX commented Nov 2, 2019

The "requests" raises exceptions when the URL is invalid.
After fixing the invalid URL problem, there are more errors:
These ones came together and showed the problem with a deprecated library: PhantomJS. I have downloaded the PhantomJS, but I am not sure where to put it.
image
image

@Yicong-Huang
Copy link
Contributor Author

It is fine to use phantom js, even though it is deprecated.

It is for dynamically render the html so that we can parse the content.

@JHaoX
Copy link

JHaoX commented Nov 2, 2019

I found that I didn't have PhantomJS in my local bin. Mentioning it in the setup instruction could be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants