Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference X failed to fetch target #545

Open
dkczk opened this issue Feb 24, 2025 · 1 comment
Open

Reference X failed to fetch target #545

dkczk opened this issue Feb 24, 2025 · 1 comment

Comments

@dkczk
Copy link

dkczk commented Feb 24, 2025

Error Description

Hi folks,

I followed kerchunk's quick start for a single file but for some reason, reading the JSON file containing the chunk info at the end results in a "ReferenceNotReachable" error:

ReferenceNotReachable: Reference "lat/0" failed to fetch target ['c_gls_LAI300_201402280000_GLOBE_PROBAV_V1.0.1.nc', 972117, 376320]

At first glance, this seems like a path issue, but I already tried to open/read the dataset by only using the used paths and it worked (see steps below).

Code

Setup

I have to connect to two network drives. The first one is basically the file storage containing the NetCDF I want to chunk and the second drive should store the JSON file containing the chunk info. I connect to both of them via the smb protocol.

import fsspec
import xarray as xr
from kerchunk.hdf import SingleHdf5ToZarr
import ujson
import json
from pathlib import Path


host_data = "data.storage.server.com/path/to/data_dir"
host_cube = "json.storage.server.com/path/to/json_dir"
usr = "MyUsername"
pwd = "MyPassword"

fs = fsspec.filesystem(
    "smb",
    host=host_data,
    username=usr,
    password=pwd
    )

fs_2 = fsspec.filesystem(
    "smb",
    host=host_cube,
    username=usr,
    password=pwd
)

To confirm that the connection works, I open/read the desired file with:

file = "c_gls_LAI300_201402280000_GLOBE_PROBAV_V1.0.1.nc"
with fs.open(file, mode="rb") as f:
    data = xr.open_dataset(f)

data

I'm continuing to follow the proposed steps in the guide with slight variations e.g. defining the name of the JSON file first as I only want to test the procedure:

def gen_json(file_url):
    with fs.open(file_url, **so) as infile:
        h5chunks = SingleHdf5ToZarr(infile, file_url, inline_threshold=500)

        with fs_2.open(outf, 'wb') as f:
            f.write(ujson.dumps(h5chunks.translate()).encode());

so = dict(mode='rb')
outf = "CGLS_LAI_20140228.json"

gen_json(file)

After checking the location I can confirm that the JSON file is there. So I proceed with loading the JSON into a variable:

with fs_2.open(outf) as f:
    reference = json.load(f)

The last step is to open the file using the opened JSON file. According to the guide with some adaption to my smb case I did:

data_chunk = xr.open_dataset(
    "reference://",
    engine="zarr",
    backend_kwargs={
        "consolidated": False,
        "storage_options": {
            "fo": reference,
            "remote_protocoll": "smb",
            "remote_options": {
                "host": host_data,
                "username": usr,
                "password": pwd
            }
        }
    }
)

But after executing this I get the error from above.

Debugging Attempts

I checked two different things. I tried to open the file like in the beginning and I had a look into the JSON, searching for incorrect paths but I found nothing.

File In Question

The file I'm trying to open is a Copernicus Global Land Service Leaf Area Index file. It's described via two dimensions (lon, lat), contains two coordinate axis' (lon, lat), six data variables and two Pandas indexes.

System Info

OS

NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"

Python

Version=3.9.19

Relevant Miniconda Packages

Name   Version   Build Channel
fsspec   2024.6.1   pyhff2d567_0   conda-forge
h5netcdf   1.3.0   pyhd8ed1ab_0   conda-forge
h5py   3.8.0   mpi_mpich_py39hadaddcd_0   conda-forge
hdf4   4.2.15   h9772cbc_5   conda-forge
hdf5   1.12.2   mpi_mpich_h5d83325_0   conda-forge
kerchunk   0.2.7   pyhd8ed1ab_0   conda-forge
netcdf4   1.6.3   nompi_py39h8b3a7bc_100   conda-forge
smbprotocol   1.15.0   pyhd8ed1ab_0   conda-forge
ujson   5.10.0   py39h84cc369_0   conda-forge
xarray   2024.7.0   pyhd8ed1ab_0   conda-forge
zarr   2.18.2   pyhd8ed1ab_0   conda-forge

@martindurant
Copy link
Member

It's probably worth making the reference filesystem explicitly for debugging:

fs = fsspec.filesystem("reference", fo=reference, remote_protocol=, remote_options=)

and checking out what files it thinks it can see and read. You can also check out the .fss attribute to see what filesystems it's trying to operate on.

It's also a good way to turn on logging, something like

fsspec.utils.setup_logging(logger_name="fsspec.reference")

Was there further information in the exception/traceback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants