Skip to content

Request to support proxied S3 connections #386

@ZachSARAO

Description

@ZachSARAO

Hi Ludwig,

I am experimenting with accessing the RDB files via a proxy. This works:

import katdal
link = "https://archive-gw-1.kat.ac.za/1750206187/1750206187_sdp_l0.full.rdb?token=ey...etc"
data = katdal.open(link)
print(data)

But this does not:

import katdal
proxied_link = "https://archive.sarao.ac.za/s3/1750206187/1750206187_sdp_l0.full.rdb?token=ey...etc (same token)"
data = katdal.open(proxied_link)
print(data)

BUT. In datasources.py I found if I pass the proxied URL with the token - then the proxied URL works:

 (463)  rdb_store = S3ChunkStore(store_url, **kwargs)
 (464)  # rdb_data = rdb_store.request('GET', rdb_url, process=_read_object)
 (464)  rdb_data = rdb_store.request('GET', proxied_link, process=_read_object)
 (465)  telstate.load_from_file(io.BytesIO(rdb_data))

Somewhere the request wrapper is dropping the token. Looking into this a little more. In chunkstore_s3.py:

        # Use _standard_errors to filter errors emanating from within with-block
        with self._standard_errors(chunk_name), self._session_pool() as session:
            adapter = session.get_adapter(url)
            while True:
                # Initialise and reuse the same Retry object for the entire session
                adapter.max_retries = retries
                try:
                    with _request(session, method, url, timeout, **kwargs) as response:  <====== using a URL with the token works here
                        _raise_for_status(response, chunk_name, ignored_errors)
                        retries = response.raw.retries.new()
                        return process(response)

The /s3 proxy path is stripped by the proxy server, so the the claim is correct without the s3 path (I saw there was an explicit check in katdal for this).

@contextlib.contextmanager
def _request(session, method, url, timeout=(None, None), **kwargs):
    """A beefed-up request that facilitates retries on reading the response.

    This catches socket timeouts and reset connections while the response data
    is being read and reraises them as appropriate urllib3 exceptions that can
    be passed to a `Retry` object to trigger a read retry.
    """
    try:
        with session.request(method, url, timeout=timeout, **kwargs) as response:   <======== session.request is not getting the token so far as I can tell
            yield response

if I set url to the original URL (with the token param) then it works

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions