Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when reading string variable over DAP4 #3042

Open
lsterzinger opened this issue Oct 10, 2024 · 11 comments
Open

Segmentation fault when reading string variable over DAP4 #3042

lsterzinger opened this issue Oct 10, 2024 · 11 comments

Comments

@lsterzinger
Copy link

Hello Unidata/NetCDF folks! A few of us at NASA GES DISC are running into a failure in netcdf-c when accessing string variable over DAP4 from NASA's cloud-hosted OPeNDAP server. This seems to be an issue only when reading via DAP4, using netcdf to read the data on disk directly is not an issue.

Authentication

If not already set up, an account is needed at https://uat.urs.earthdata.nasa.gov/ and the following line added to ~/.netrc in the user's home folder:

machine uat.urs.earthdata.nasa.gov login <username> password <password>

Issue

We came across this issue when accessing the following DAP4 URL via netcdf-python dap4://opendap.uat.earthdata.nasa.gov/collections/C1256351857-GES_DISC/granules/SNDRSNML2RMS.3%3ASNDR.SNPP.ATMS.20111210T0154.m06.g020.L2_RAMSES2_RET.std.v03_21.G.231124163951.nc

import netCDF4 as nc4
f = nc4.Dataset("dap4://opendap.uat.earthdata.nasa.gov/collections/C1256351857-GES_DISC/granules/SNDRSNML2RMS.3%3ASNDR.SNPP.ATMS.20111210T0154.m06.g020.L2_RAMSES2_RET.std.v03_21.G.231124163951.nc")

One of the variables causing issues is obs_id, which is an array of strings.

In [3]: f['obs_id']
Out[3]:
<class 'netCDF4._netCDF4.Variable'>
vlen obs_id(atrack, xtrack)
    long_name: earth view observation id
    coverage_content_type: referenceInformation
    description: unique earth view observation identifier: yyyymmddThhmm.aa[a]Exx .  Includes gran_id plus two- or three-digit along-track index (01-45 or 001-135) and 2-digit cross-track index (01-96).
vlen data type: <class 'str'>
unlimited dimensions:
current shape = (135, 96)

Trying to access the data from obs_id results in a segmentation fault:

In [4]: f['obs_id'][:]
Assertion failed: ((p)->offset+(size) <= (p)->limit), function NCD4_incr, file d4util.c, line 421.
[1]    17238 abort      ipython

Resources:

OPeNDAP landing page for this file: https://opendap.uat.earthdata.nasa.gov/collections/C1256351857-GES_DISC/granules/SNDRSNML2RMS.3%3ASNDR.SNPP.ATMS.20111210T0154.m06.g020.L2_RAMSES2_RET.std.v03_21.G.231124163951.nc.html

Raw file (in case of authentication issues or for local testing): SNDR.SNPP.ATMS.20111210T0154.m06.g020.L2_RAMSES2_RET.std.v03_21.G.231124163951.nc.zip

I hope this is enough information but please let me know if anything else is needed.

Tagging @j-m-adams @eni-awowale

@lsterzinger
Copy link
Author

I'm not really sure if this is some issue of DAP4 encoding on the server-side, or decoding proper DAP4 on the client side (or some combination of both). Any input would be greatly appreciated. Thanks!

@lsterzinger
Copy link
Author

Forgot to include my versions:

  • netcdf library version 4.9.2 of Jul 5 2023 23:51:08 $
  • Python: 3.11.9
  • netcdf-python: 1.6.5

@eni-awowale
Copy link

I am also getting the same error with these versions:

  • netcdf-c: '4.9.3-development of Oct 24 2023 19:20:46 $'
  • netcdf4-python==1.6.5
  • Python: 3.12.5

And

  • netcdf-c: '4.8.1 of Oct 10 2024 19:24:58 $'
  • netcdf4-python==1.6.5
  • Python: 3.12.7

@DennisHeimbigner
Copy link
Collaborator

Unfortunately, my attempt register fails repeatedly on a recapture invalid key failure.
So I will not be able to help. Sorry.

@j-m-adams
Copy link

j-m-adams commented Oct 22, 2024

Hi, Dennis -- I'm not sure what a recapture invalid key failure means, but it might be because the URL we provided was in our testing environment (".uat." in the URL), which may not be publicly accessible. Could you please try again with the production version of the URL copied below? Note that the opendap.earthdata.nasa.gov endpoint always requires authentication with an Earthdata Login. We are struggling to support some newer data types that are not DAP2 compatible and we'd really like to promote the dap4:// protocol to our opendap users. Thank you for your help. --Jennifer Adams

dap4://opendap.earthdata.nasa.gov/collections/C2559919298-GES_DISC/granules/SNDRSNML2RMS.3%3ASNDR.SNPP.ATMS.20111210T0154.m06.g020.L2_RAMSES2_RET.std.v03_21.G.231124163951.nc

Edit: there is another data type (ubyte) that also causes a seg fault -- please also test the variable named 'obs_id'. Thanks

@DennisHeimbigner
Copy link
Collaborator

I finally got the urs login problem solved.
I tried to replicate you problem but failed.
I tried several different commands using ncdump from the current main branch.
All of these appear to work:

  • .ncdump -v obs_id 'dap4://opendap.uat.earthdata.nasa.gov/collections/C1256351857-GES_DISC/granules/SNDRSNML2RMS.3:SNDR.SNPP.ATMS.20111210T0154.m06.g020.L2_RAMSES2_RET.std.v03_21.G.231124163951.nc'
  • ncdump 'dap4://opendap.uat.earthdata.nasa.gov/collections/C1256351857-GES_DISC/granules/SNDRSNML2RMS.3:SNDR.SNPP.ATMS.20111210T0154.m06.g020.L2_RAMSES2_RET.std.v03_21.G.231124163951.nc?dap4.ce=obs_id'
  • ncdump 'dap4://opendap.uat.earthdata.nasa.gov/collections/C1256351857-GES_DISC/granules/SNDRSNML2RMS.3:SNDR.SNPP.ATMS.20111210T0154.m06.g020.L2_RAMSES2_RET.std.v03_21.G.231124163951.nc?dap4.ce=obs_id[0][0]'

Do any of these commands correspond with what you are doing?

@j-m-adams
Copy link

j-m-adams commented Oct 31, 2024

Very glad to see that ncdump is working! We are using python packages (xarray/datatree and netCDF4) to open those dap4:// urls, but are struggling to find anything that is linked with the 4.9.3 release candidate. Is there a dev version of any of these python packages that we can test the 4.9.3 release candidate?

@DennisHeimbigner
Copy link
Collaborator

I am not alll that familiar with the python/netcdf community. My guess is that there is a way to install 4.9.3
and have python/netcdf4 use that installation. But do not know the details.

@WardF
Copy link
Member

WardF commented Nov 1, 2024

I will second Dennis' comment; @dopplershift can you provide any guidance as to how to easily hook the latest netCDF-C release candidate into a python ecosystem?

@dopplershift
Copy link
Member

As we've discussed previously, the first step is to get a build of the RC up on the conda-forge feedstock. When I attempted this previously, I ran into numerous build issues that we didn't get solved before the official release. Happy to be a part of getting that going over there if someone is prepared to look at any issues we encounter in doing so.

Depending on how tightly pinned the conda-forge version of netcdf4-python is to a version of libnetcdf, we may be able to just install the RC and have netcdf4-python use it, but I'm not sure about that yet.

@eni-awowale
Copy link

Thanks @DennisHeimbigner and everyone for the engagement! We built the netcdf-c library in main and we are no longer getting segfaults with the 4.9.4-development version 🎉. We were able to link it to the python library by pip installing it.

With python

>>> import netCDF4 as nc4
>>> nc4.getlibversion()
'4.9.4-development of Oct  7 2024 10:41:37 $'

With ncdump

$ ncdump -h
ncdump [-c|-h] [-v ...] [[-b|-f] [c|f]] [-l len] [-n name] [-p n[,n]] [-k] [-x] [-s] [-t|-i] [-g ...] [-w] [-F] [-Ln] file
...
...
netcdf library version 4.9.4-development of Nov 13 2024 20:48:37 $

I am guessing that netcdf4-python library is pulling from our built version of the netcdf-c library but I wanted confirm that this is correct? Additionally, do you all know when the python library is going be released with the 4.9.4-development version of the netcdf-c library? Happy to contribute time to help with this.

Thanks again everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants