Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpacked values from scaled/offset dataset are all NaN. #1312

Closed
1 task done
rschmunk opened this issue Mar 5, 2024 · 6 comments · Fixed by #1315
Closed
1 task done

Unpacked values from scaled/offset dataset are all NaN. #1312

rschmunk opened this issue Mar 5, 2024 · 6 comments · Fixed by #1315
Assignees
Labels
bug Something isn't working

Comments

@rschmunk
Copy link
Contributor

rschmunk commented Mar 5, 2024

Versions impacted by the bug

Any build of version 5.5.4 from the past month.

What went wrong?

A user wrote that a plot of his data was coming up blank when using a just-downloaded Panoply. A quick check revealed that Panoply was seeing all NaN values even though an ncdump showed good values. Some further testing demonstrated that this was also occurring with a copy of Panoply from a month ago, but that a copy from the very end of December 2023 was plotting as expected. In all three cases, the Panoply release was using an up-to-date build of the netcdf-Java 5.x-maint snapshot.

The dataset packs the data as shorts and specifies a scale value and an offset. Panoply acquires the dataset in enhanced mode and accepts whatever values netCDF-Java says are there.

If I modify Panoply to acquire the dataset in NOT enhanced mode, then netCDF-Java reports the unpacked short values and a plot is successfully made. If I build a copy of Panoply using a copy of netcdfAll that I built on December 28 and tell it to acquire the dataset in enhanced mode, the unpacked data values are correctly reported and a plot is successfully made.

Looking at the commit history on the 5.x-maint branch, it looks like there were some modifications to unpacking data made about January 12, which seems a likely point where the problem is occurring. Nothing I noticed in the changes then jumps out at me as the obvious cause, but I haven't dug very deep yet.

Not sure if it's at all meaningful, but the scale and offset values are reeeaally small.

A copy of the example dataset that revealed the issue can be downloaded from Dropbox. It's a CF-1.6 dataset with lon-lat data on a relatively small array covering the re4gion around Mexico City.

Relevant stack trace

No response

Relevant log messages

No response

If you have an example file that you can share, please attach it to this issue.

If so, may we include it in our test datasets to help ensure the bug does not return once fixed?
Note: the test datasets are publicly accessible without restriction.

Yes

Code of Conduct

  • I agree to follow the UCAR/Unidata Code of Conduct
@rschmunk rschmunk added the bug Something isn't working label Mar 5, 2024
@haileyajohnson haileyajohnson self-assigned this Mar 5, 2024
@haileyajohnson
Copy link

@rschmunk just to clarify, the place where Panoply is getting incorrect data is from a call to NetcdfDatasets.openDataset(filename) with enhanced=all?

@rschmunk
Copy link
Contributor Author

rschmunk commented Mar 8, 2024

Actually

NetcdfFile njfile = NetcdfFiles.open (filePathStr);
NetccdfDataset njdataset = NetcdfDatasets.enhance (njfile, ENHANCEMENT, null);

where ENHANCEMENT is the default enhancement plus allowing for an "incomplete" coordinate system.

If you're wondering why I acquire the dataset in two steps, it's actually because there's also a step where I acquire an unenhanced version in order to see the original CDL description.

@rschmunk
Copy link
Contributor Author

rschmunk commented Mar 9, 2024

@haileyajohnson, I tracked this down, and it has to do with the changes made to the ConvertMissing filter on January 12. I'm not sure which commit in particular was the coup de grace, as there were about 6 of them that day.

TL;DR - the unpacked data values from the sample dataset are all much smaller than Misc.defaultMaxRelativeDiffFloat.

Long version...

As noted above, the scale and offset values in the sample dataset are really small. Tiny even. The scale factor is of order 1E-12 and the offset of order 1E-7, and after unpacking, the output data results are of order 1E-8.

The dataset also specifies that the _FillValue is the same as the missing_value. Unpacked, the fill/missing value is ~ 0 (actually, also of order 1E-12).

The problem then is that ConvertMissing.isMissing receives an unpacked data value, checks if it is less than Misc.defaultMaxRelativeDiffFloat away from the unpacked fill value. Since that default difference is of order 1E-5, then of course the method is returning true. Every value in the array is being deemed invalid and the convert method sends back NaN for the entire array.

So it seems like a smaller difference value needs to be used in the test when dealing with such tiny scale factors and offsets.

@haileyajohnson
Copy link

Thanks for tracking that down! We did change the fuzzyequals comparison to use a relative diff instead of absolute diff, and that's what broke this case - I'm planning on reverting that change. But another issue I think is that we need to do the missing and fill value comparisons with unpacked data, so that the fuzyequals implementation doesn't matter. I should have a fix up shortly. Thanks for all your help!

@rschmunk
Copy link
Contributor Author

@haileyajohnson, I thought about trying to hack up a test of modifying ConvertMissing to allow for specifying a more flexible relative difference in its fuzzy-equals test. But I took a second look at how many parameters there already are in the constructor and said, "Nah."

@rschmunk
Copy link
Contributor Author

I downloaded a fresh snapshot and built it into Panoply. It plots the sample data as expected, so it looks like #1315 was what was needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants