You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Diving deeper, the only difference I could find between the two JSONs is that the fill_value is 0 for kerchunk whereas virtualizarr sets it to None:
Kerchunk time zarray: "time/.zarray": "{\"chunks\":[1],\"compressor\":null,\"dtype\":\"<i4\",\"fill_value\":null,\"filters\":[{\"elementsize\":4,\"id\":\"shuffle\"},{\"id\":\"zlib\",\"level\":2}],\"order\":\"C\",\"shape\":[24],\"zarr_format\":2}",
Virtualizarr --> kerchunk time zarray: "time/.zarray": "{\"shape\":[24],\"chunks\":[1],\"dtype\":\"<i4\",\"fill_value\":0,\"order\":\"C\",\"compressor\":null,\"filters\":[{\"elementsize\":4,\"id\":\"shuffle\"},{\"id\":\"zlib\",\"level\":2}],\"zarr_format\":2}",
I tried manually changing the virtualizarr produced JSONs fill_value to null and that fixed this issue. So TLDR seems like there is a bug in the default fill_value (potentially datetime specific?) cc @TomNicholas@mpiannucci
Code used to produce the two JSONs and highlighting the issue:
I was trying to virtualize references and then load some NASA MERRA2 data offered at GESDISC. I noticed that when the dataset is loaded using the references made by virtualizarr, the initial timestep has a
NaT
value whereas the exact same dataset loaded using references made by kerchunk does not. Here is the data file used: https://data.gesdisc.earthdata.nasa.gov/data/MERRA2/M2T1NXSLV.5.12.4/1980/01/MERRA2_100.tavg1_2d_slv_Nx.19800103.nc4Diving deeper, the only difference I could find between the two JSONs is that the fill_value is 0 for kerchunk whereas virtualizarr sets it to
None
:Kerchunk
time
zarray:"time/.zarray": "{\"chunks\":[1],\"compressor\":null,\"dtype\":\"<i4\",\"fill_value\":null,\"filters\":[{\"elementsize\":4,\"id\":\"shuffle\"},{\"id\":\"zlib\",\"level\":2}],\"order\":\"C\",\"shape\":[24],\"zarr_format\":2}",
Virtualizarr --> kerchunk
time
zarray:"time/.zarray": "{\"shape\":[24],\"chunks\":[1],\"dtype\":\"<i4\",\"fill_value\":0,\"order\":\"C\",\"compressor\":null,\"filters\":[{\"elementsize\":4,\"id\":\"shuffle\"},{\"id\":\"zlib\",\"level\":2}],\"zarr_format\":2}",
I tried manually changing the virtualizarr produced JSONs fill_value to
null
and that fixed this issue. So TLDR seems like there is a bug in the default fill_value (potentially datetime specific?) cc @TomNicholas @mpiannucciCode used to produce the two JSONs and highlighting the issue:
The text was updated successfully, but these errors were encountered: