Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

병렬 처리 혹은 exception error 관련 에러인 듯한데, 한번 살펴봐주세요. 크롤링이 안되는 상황입니다. #30

Open
bogus215 opened this issue May 27, 2022 · 1 comment

Comments

@bogus215
Copy link

안녕하세요, 이메일로 에러 관련 연락드렸고, 관련 에러를 깃허브 이슈 등록해달라는 말씀에 다라 깃허브 에러 이슈 등록 합니다.

아래와 같은 에러 문구를 보게 되었고, 제가 현재 사용하고 있는 os, python version은 pop_os_20.04, 3.8.10 입니다. 에러 문구 맨 아래쪽에 추가적인 제 의심사항 적어두었으니 참고 부탁드립니다.

(base) root@920410e6c84d:/mnthdd/Dropbox/D/project/COVID_NEWS# python crawler.py
{'start_year': 2020, 'start_month': 1, 'start_day': 1, 'end_year': 2020, 'end_month': 5, 'end_day': 31}
정치 PID: 3057177
IT과학 PID: 3057178
economy PID: 3057179
생활문화 PID: 3057180
오피니언 PID: 3057181
사회 PID: 3057182
세계 PID: 3057183
오피니언 Urls are generated
오피니언 is collecting ...
IT과학 Urls are generated
IT과학 is collecting ...
생활문화 Urls are generated
생활문화 is collecting ...
정치 Urls are generated
정치 is collecting ...
세계 Urls are generated
세계 is collecting ...
economy Urls are generated
economy is collecting ...
사회 Urls are generated
사회 is collecting ...
Process Process-7:
Process Process-6:
Traceback (most recent call last):
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
ConnectionResetError: [Errno 104] Connection reset by peer
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)

During handling of the above exception, another exception occurred:

File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(

During handling of the above exception, another exception occurred:

File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(

During handling of the above exception, another exception occurred:

File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-4:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-5:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-2:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-3:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed
Process Process-1:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/opt/conda/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/opt/conda/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/opt/conda/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/opt/conda/lib/python3.8/http/client.py", line 1344, in getresponse
response.begin()
File "/opt/conda/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/opt/conda/lib/python3.8/http/client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/opt/conda/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/opt/conda/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/opt/conda/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 131, in get_url_data
return requests.get(url, headers={'User-Agent':'Mozilla/5.0'})
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 170, in crawling
request_content = self.get_url_data(content_url)
File "/opt/conda/lib/python3.8/site-packages/korea_news_crawler/articlecrawler.py", line 132, in get_url_data
except requests.exceptions:
TypeError: catching classes that do not inherit from BaseException is not allowed

멀티 프로세싱 및 requiests.exceptions 관련 문제인 듯 하여, articlecrawler.py의 228번, 229번째 코드 줄을 self.crawling(category_name), print(f"{category_name} crawling start!")로 바꾸고, 132번째 코드 줄을 except: 바꾸고 재 작동 시켜보았을 때에는, 아래와 같은 에러가 발생했습니다.

"ResponseTimeout()"

@bogus215
Copy link
Author

디버깅 모드로 조금 더 살펴본 결과, html 구조가 일부 변경되어서 크롤링이 안 된 것 같습니다. 뉴스 기사 제목, 언론사, 작성 시간대 같은 경우는 아래와 같이 고쳐서 잘 수집하게 되었는데, 이 외의 부분은 기존 라이브러리 제작자 분께서 한번 더 봐주시면 좋을 것 같습니다.

    for url in target_urls:
        request = self.get_url_data(url)
        document = BeautifulSoup(request.content, 'html.parser')

        # html - newsflash_body - type06_headline, type06
        # 각 페이지에 있는 기사들 가져오기
        temp_post = document.select('.newsflash_body .type06_headline li dl')
        temp_post.extend(document.select('.newsflash_body .type06 li dl'))
        
        # 각 페이지에 있는 기사들의 url 저장
        post_urls = []
        for line in temp_post:
            # 해당되는 page에서 모든 기사들의 URL을 post_urls 리스트에 넣음
            post_urls.append(line.a.get('href'))
        del temp_post

        for content_url in post_urls:  # 기사 url
            # 크롤링 대기 시간
            sleep(0.01)
            
            # 기사 HTML 가져옴
            request_content = self.get_url_data(content_url)

            try:
                document_content = BeautifulSoup(request_content.content, 'html.parser')
            except:
                continue

            try:
                # 기사 제목 가져옴
                tag_headline = document_content.find_all('h2', attrs={'class': 'media_end_head_headline'})
                # 뉴스 기사 제목 초기화
                text_headline = ''
                text_headline = text_headline + ArticleParser.clear_headline(str(tag_headline[0].find_all(text=True)))
                # 공백일 경우 기사 제외 처리
                if not text_headline:
                    continue

                # 기사 언론사 가져옴
                tag_company = document_content.find_all('p', {'class': 'c_text'})

                # 언론사 초기화
                text_company = ''
                text_company = text_company + str(tag_company[0].get_text())

                # 공백일 경우 기사 제외 처리
                if not text_company:
                    continue

                # 기사 시간대 가져옴
                time = ""
                time = time + document_content.find_all("div",attrs={'class':'media_end_head_info_datestamp_bunch'})[0].text.strip()

                if not time:
                    continue

                # CSV 작성
                writer.write_row([time, category_name, text_company, text_headline])

                del time
                del text_company, text_headline
                del tag_company 
                del tag_headline
                del request_content, document_content

            # UnicodeEncodeError
            except Exception as ex:
                del request_content, document_content
                pass
    writer.close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant