Skip to content

Conversation

matrss
Copy link

@matrss matrss commented Feb 27, 2025

Previously an unknown URL (e.g. from a different special remote) would crash the uncurl remote. Additionally, if a key had more than one URL registered only one was tried in some cases.

Now, if the data is not found at one location or the URLs scheme is unsupported, the next URL is tried instead.

Fixes #770.

Previously an unknown URL (e.g. from a different special remote) would
crash the uncurl remote. Additionally, if a key had more than one URL
registered only one was tried in some cases.

Now, if the data is not found at one location or the URLs scheme is
unsupported, the next URL is tried instead.

Fixes datalad#770.
@matrss matrss marked this pull request as ready for review February 27, 2025 17:26
@matrss matrss requested a review from mih as a code owner February 27, 2025 17:26
@mih
Copy link
Member

mih commented Feb 28, 2025

Thanks for the PR!

If I understand correctly, the two changes are:

(1) catch a handler crash with ValueError
(2) not return False to cycle to a next URL, in case of a crash like (1), but also any "unknown" report.

Can you provide a bit more insight into the situation that triggered the undesirable behavior for you?

What handler tripped? On which kind of URL?

Maybe some configuration is too broad in the defaults.

With this change, the return value semantics of the function are a bit crippled. I want to understand, whether you point to a conceptual issue with the imolemented logic. Thanks!

@matrss
Copy link
Author

matrss commented Feb 28, 2025

In the ERA5 dataset, some files are available both from the CDS using datalad-cds as well as from a data project on JUDAC called MeteoCloud. I want to register ssh:// as well as potentially file:// URLs to make DataLad and git-annex retrieve the files from the MeteoCloud project if possible.

One of those files now looks like this:

$ pixi run git annex whereis 01/2008010101_ml.grb
whereis 01/2008010101_ml.grb (2 copies) 
  	105ff811-c804-48c4-95b2-14ad9cf78a7e -- [uncurl]
   	923e2755-e747-42f4-890a-9c921068fb82 -- [cds]

  uncurl: file:///home/icg149/Playground/2008010101_ml.grb
  uncurl: ssh://judac.fz-juelich.de/p/data1/slmet/met_data/ecmwf/era5/grib/2008/01/2008010101_ml.grb

  cds: {"dataset":"reanalysis-era5-complete","sub-selection":{"class":"ea","date":"2008-01-01","expver":"1","format":"grib","grid":".3/.3","levelist":"1/to/137","levtype":"ml","param":"129.128/130.128/131.128/132.128/133.128/135.128/138.128/152.128/155.128/203.128/246.128/247.128/248.128","stream":"oper","time":"01","type":"an"}}
  cds: cds:v1-eyJkYXRhc2V0IjoicmVhbmFseXNpcy1lcmE1LWNvbXBsZXRlIiwic3ViLXNlbGVjdGlvbiI6eyJjbGFzcyI6ImVhIiwiZGF0ZSI6IjIwMDgtMDEtMDEiLCJleHB2ZXIiOiIxIiwiZm9ybWF0IjoiZ3JpYiIsImdyaWQiOiIuMy8uMyIsImxldmVsaXN0IjoiMS90by8xMzciLCJsZXZ0eXBlIjoibWwiLCJwYXJhbSI6IjEyOS4xMjgvMTMwLjEyOC8xMzEuMTI4LzEzMi4xMjgvMTMzLjEyOC8xMzUuMTI4LzEzOC4xMjgvMTUyLjEyOC8xNTUuMTI4LzIwMy4xMjgvMjQ2LjEyOC8yNDcuMTI4LzI0OC4xMjgiLCJzdHJlYW0iOiJvcGVyIiwidGltZSI6IjAxIiwidHlwZSI6ImFuIn19
ok

Now the two issues are these:

  1. Returning False on UrlOperationsResourceUnknown means that if the file is not found at the file:// URL, then the uncurl remote doesn't fallback to the ssh:// URL and instead just fails to retrieve the file.
  2. As soon as the cds: URL is encountered the uncurl remote crashes because it gets a ValueError from here:
    raise ValueError(f'unsupported URL {url!r}')

This PR fixes those two issues by skipping the URL and trying the next one instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uncurl special remote fails when a key has a URL from a different special remote that it does not understand
2 participants