Skip to content

next step issue with vzarr #1

@mdsumner

Description

@mdsumner
library(zaro)
> store <- zaro("virtualizarr://https://raw.githubusercontent.com/mdsumner/virtualized/refs/heads/main/remote/ocean_temp_2023.parq")
[zaro] opening VirtualiZarr Parquet reference store: https://raw.githubusercontent.com/mdsumner/virtualized/refs/heads/main/remote/ocean_temp_2023.parq
> meta <- zaro_meta(store)
[zaro] found .zmetadata (Zarr V2 consolidated)
[zaro]   11 arrays: Time, Time_bnds, average_DT, average_T1, average_T2, nv, st_edges_ocean, st_ocean, temp, xt_ocean, yt_ocean
> data <- zaro_read(store, "temp", start = c(0, 0, 0, 0), count = c(1, 1, 1500, 3600), meta = temp)
[zaro] reading 60 chunk(s) for path 'temp' (V2)
Error in dim(values) <- actual_chunk_shape : 
  dims [product 90000] do not match the length of object [37093]
In addition: Warning messages:
1: unknown codec 'zlib', passing through unchanged 
2: unknown codec 'shuffle', passing through unchanged 

zlib and shuffle codecs not recognized — the V2 filters use names "zlib" and "shuffle" but the codec pipeline probably only knows "gzip" / "zstd" etc. zlib maps to gzip in Arrow (arrow::Codec$create("gzip")), and shuffle is a byte-reordering filter that needs its own implementation (unshuffle bytes by element size before decompression).
The byte_range_read VSI path — needs updating from procedural vsi_open/vsi_seek/vsi_read to VSIFile class, or better yet, add the curl fallback for HTTP so this path stays GDAL-free:

 if (grepl("^https?://", url) && requireNamespace("curl", quietly = TRUE)) {
    resp <- curl::curl_fetch_memory(url,
      handle = curl::new_handle(range = paste0(offset, "-", offset + length - 1L)))
    if (resp$status_code == 206L) return(resp$content)
  }

That plus the zlib/shuffle codec mapping

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions