Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selenium support for PhantomJS has been deprecated, #7

Open
wankio opened this issue Apr 29, 2019 · 6 comments
Open

Selenium support for PhantomJS has been deprecated, #7

wankio opened this issue Apr 29, 2019 · 6 comments

Comments

@wankio
Copy link

wankio commented Apr 29, 2019

UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead

it happen when i'm using this
python3 -m mediascraper.general [WEB PAGE 1] [WEB PAGE 2] ...

@elvisyjlin
Copy link
Owner

Hi, even there is a deprecation warning of phantomjs, you can still use it. However, I guess your problem is due to some bugs in mediascraper.general. I just fixed them. Please git pull and try again. Thank you! Let me know if your have other questions.

@wankio
Copy link
Author

wankio commented Apr 30, 2019

after that i have this error

Starting PhantomJS web driver...
.\webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe
C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
52 media are found.
Downloading...
0%| | 0/52 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "I:\1_Command tools\media-scraper-master\media-scraper-master\mediascraper\general.py", line 14, in
scraper.download(tasks=tasks, path='download/general')
File "I:\1_Command tools\media-scraper-master\media-scraper-master\mediascrapers.py", line 107, in download
download(url, path=target_path, rename=rename, replace=force)
UnboundLocalError: local variable 'target_path' referenced before assignment

@elvisyjlin
Copy link
Owner

That's another bug due to no subfolder... I've just fixed it.

@wankio
Copy link
Author

wankio commented May 1, 2019

another error, can it have anyway to create subfolder in general with webpage title ? or domain/webpage title ? thank

Traceback (most recent call last):
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "I:\1_Command tools\media-scraper-master\media-scraper-master\mediascraper\general.py", line 14, in <module>
    scraper.download(tasks=tasks, path='download/general')
  File "I:\1_Command tools\media-scraper-master\media-scraper-master\mediascrapers.py", line 108, in download
    download(url, path=target_path, rename=rename, replace=force)
  File "I:\1_Command tools\media-scraper-master\media-scraper-master\util\url.py", line 47, in download
    r = requests.get(url, stream=True)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 640, in send
    adapter = self.get_adapter(url=request.url)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 731, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGhlaWdodD0iNDgwIiB3aWR0aD0iNjQwIiB2aWV3Qm94PSIwIDAgNjQwIDQ4MCI+ICA8ZGVmcz4gICAgPGNsaXBQYXRoIGlkPSJhIj4gICAgICA8cGF0aCBmaWxsLW9wYWNpdHk9Ii42NyIgZD0iTS04NS4zMzQgMGg2ODIuNjd2NTEyaC02ODIuNjd6Ii8+ICAgIDwvY2xpcFBhdGg+ICA8L2RlZnM+ICA8ZyBmaWxsLXJ1bGU9ImV2ZW5vZGQiIGNsaXAtcGF0aD0idXJsKCNhKSIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoODAuMDAxKSBzY2FsZSguOTM3NSkiPiAgICA8cGF0aCBmaWxsPSIjZWMwMDE1IiBkPSJNLTEyOCAwaDc2OHY1MTJoLTc2OHoiLz4gICAgPHBhdGggZD0iTTM0OS41OSAzODEuMDVsLTg5LjU3Ni02Ni44OTMtODkuMTM3IDY3LjU1IDMzLjE1Mi0xMDkuNzctODguOTczLTY3Ljc4NCAxMTAuMDgtLjk0NSAzNC4xNDItMTA5LjQ0IDM0Ljg3MyAxMDkuMTkgMTEwLjA4LjE0NC04OC41MTcgNjguNDIzIDMzLjg4NCAxMDkuNTN6IiBmaWxsPSIjZmYwIi8+ICA8L2c+PC9zdmc+'
Exception ignored in: <function tqdm.__del__ at 0x0000028023C979D8>
Traceback (most recent call last):
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\_tqdm.py", line 931, in __del__
    self.close()
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\_tqdm.py", line 1133, in close
    self._decr_instances(self)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\_tqdm.py", line 496, in _decr_instances
    cls.monitor.exit()
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\_monitor.py", line 52, in exit
    self.join()
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\threading.py", line 1029, in join
    raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

@elvisyjlin
Copy link
Owner

  1. I've modified the mediascraper.general to save media in folder named as the page title.
  2. I cannot reproduce your error. Would you mind providing me more information to get those error?

@sintaxx
Copy link

sintaxx commented Apr 30, 2021

is this project still active? i'm also having similar issues, i can paste output if i get a response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants