Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add max_retries for requests #101

Open
veenstrajelmer opened this issue Apr 26, 2024 · 5 comments
Open

add max_retries for requests #101

veenstrajelmer opened this issue Apr 26, 2024 · 5 comments

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented Apr 26, 2024

  • ddlpy version: 0.5.0
  • Python version: 3.11
  • Operating System: Windows

Description

Sometimes in the middle of data retrieval, the connection is aborted from the server side. This is an error that cannot be reproduced (and forgot to copy the traceback), but very inconvenient since it interrupts the download process.

Suggestion

Add max_retries parameter for requests to improve robustness of ddlpy.

import logging
import requests

from requests.adapters import HTTPAdapter, Retry

logging.basicConfig(level=logging.DEBUG)

s = requests.Session()
retries = Retry(total=3, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])
s.mount('http://', HTTPAdapter(max_retries=retries))

s.get("http://httpstat.us/503")
@veenstrajelmer veenstrajelmer changed the title add retries add max_retries for requests Apr 26, 2024
@Weidav
Copy link

Weidav commented May 21, 2024

I think I'm getting the same error here, still on 0.4.0 though. Here's my traceback:

Traceback (most recent call last):
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connection.py", line 203, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 791, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 492, in _make_request
    raise new_e
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 468, in _make_request
    self._validate_conn(conn)
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1097, in _validate_conn
    conn.connect()
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connection.py", line 611, in connect
    self.sock = sock = self._new_conn()
                       ^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connection.py", line 212, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7ffa8455ab10>, 'Connection to waterwebservices.rijkswaterstaat.nl timed out. (connect timeout=None)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 845, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='waterwebservices.rijkswaterstaat.nl', port=443): Max retries exceeded with url: /ONLINEWAARNEMINGENSERVICES_DBO/OphalenWaarnemingen (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7ffa8455ab10>, 'Connection to waterwebservices.rijkswaterstaat.nl timed out. (connect timeout=None)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/rws/rwsload.py", line 454, in <module>
    dsn=sentry_dsn,
^^^^^^
  File "~/rws/rwsload.py", line 436, in main
    insertion_status = ReportsInsertionService.process_report(session=session, reports_data=result)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "~/rws/rwsload.py", line 115, in fetch_data
    except JSONDecodeError:
                       ^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/ddlpy/ddlpy.py", line 357, in measurements
    measurement = _measurements_slice(
                  ^^^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/ddlpy/ddlpy.py", line 301, in _measurements_slice
    resp = requests.post(endpoint["url"], json=request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/adapters.py", line 507, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='waterwebservices.rijkswaterstaat.nl', port=443): Max retries exceeded with url: /ONLINEWAARNEMINGENSERVICES_DBO/OphalenWaarnemingen (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7ffa8455ab10>, 'Connection to waterwebservices.rijkswaterstaat.nl timed out. (connect timeout=None)'))

@veenstrajelmer
Copy link
Collaborator Author

@Weidav that could be the case, fixing this issue would prevent your process from being interrupted if there is a single timeout. There could of course also be a outage of the rijkswaterstaat server, in which case the process will fail either way. However, it is difficult to fix this problem, since we have no way to simulate a single timeout on the server side, so it is difficult to debug. This is also a nice to have feature, not as essential as the recently implemented developments. If you run into this issue again, please include a minimal example code to reproduce it, if it can be reproduced at least.

@Weidav
Copy link

Weidav commented May 24, 2024

This keeps happening on a regular basis.

I use the selected_stations.csv to store the result from ddlpy.locations() because that endpoint causes issues on a regular basis. This approach with the is more stable an allows my to directly fetch the measurements. I tried to update the csv-file, I thought maybe the stations and their available parameters changed, but that didn't help.

Here's a little snipped from my code:

EDIT: updated the csv again and I'm that leads to fewer exeptions with mesurements, I'll keep you updated.

    selected_stations = pandas.read_csv("selected_stations.csv", index_col=0)

    # measurements-timezone is always in utc+1
    one_h_ago = datetime.utcnow() - timedelta(hours=2.1)
    tomorrow = datetime.utcnow() + timedelta(days=1, hours=1)

    # iterate over my known spots
    for rws_id, spot_id in spots_dict.items():
        try:
            station = selected_stations.loc[rws_id]
        except KeyError:
            logger.info(f"spot-id: {spot_id} source_station-id: {rws_id} has no measurements")
            continue

        # when a station has only one entry, it is usually incomplete and stored as a series
        if type(station) is pandas.core.series.Series:
            logger.debug(f"{spot_id} measurements are incomplete and will be ignored")

        i = 0
        # iterate over the the different measurement-types (wind, waves...) from this station
        for index, station_data in station.iterrows():
            try:
                measurements = ddlpy.measurements(
                    station_data, start_date=one_h_ago, end_date=tomorrow
                )
            except JSONDecodeError:
                continue
    [...]

@Weidav
Copy link

Weidav commented May 27, 2024

Update: I keep running into the same issues, even with up to date locations / csv-file.

@veenstrajelmer
Copy link
Collaborator Author

Could you provide example code to reproduce the issue without any of your own files or local code? So a minimal code only requiring ddlpy and its dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants