Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wandb: Network error (ConnectionError), entering retry loop. #83

Open
akashAD98 opened this issue Jul 22, 2022 · 4 comments
Open

wandb: Network error (ConnectionError), entering retry loop. #83

akashAD98 opened this issue Jul 22, 2022 · 4 comments

Comments

@akashAD98
Copy link

akashAD98 commented Jul 22, 2022

im using NVIDIA A100 40GB GPU to train my object detection model , im using ubantu 20.04 server machine & https://github.com/WongKinYiu/yolov7 this repo / but not able to do trainig of model because of wamdb issue

im using batch scripting to launch my job

error log

YOLOR 🚀 v0.1-43-g8b72ac7 torch 1.9.0+cu111 CUDA:0 (A100-SXM4-40GB, 40537.1875MB)

Retry attempt failed:
Traceback (most recent call last):
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/util/connection.py", line 72, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 358, in connect
self.sock = conn = self._new_conn()
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/adapters.py", line 489, in send
resp = conn.urlopen(
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 108, in call
result = self._call_fn(*args, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 158, in execute
return self.client.execute(*args, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/Conda/envs/yolov7/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/transport/requests.py", line 38, in execute
request = requests.post(self.url, **post_args)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/nlsasfs/home/reflexion/chandnip/.local/lib/python3.8/site-packages/requests/adapters.py", line 565, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f93b64ab520>: Failed to establish a new connection: [Errno -2] Name or service not known'))
wandb: Network error (ConnectionError), entering retry loop.
wandb: W&B API key is configured. Use wandb login --relogin to force relogin
wandb: Network error (ConnectionError), entering retry loop.

@akashAD98
Copy link
Author

#34 here is same issue but not able to solve

@akashAD98
Copy link
Author

solved the issue by adding it inside my .sh file

Kindly use the below proxy ip addrss for wandb connection.

export http_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/

export ftp_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/

export https_proxy=http://dgx-proxy-mn.mgmt.siddhi.param:9090/

@exalate-issue-sync
Copy link

WandB Internal User commented:
akashAD98 commented:
#34 here is same issue but not able to solve

@Vita112
Copy link

Vita112 commented Jun 21, 2023

hello, i encountered the similar problem(pls see picture i attached below), and i tried your suggested method. However, it seems not work for me. how can i fix this problem?
my working setting : local OS is win10 education version, remote server OS Ubuntu 20.04.2 LTS.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants