Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/vsicurl/: fix to allow to read Parquet partitionned datasets from public Azure container using /vsicurl/ #11310

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rouault
Copy link
Member

@rouault rouault commented Nov 20, 2024

Fixes #11309

@rouault rouault added the backport release/3.10 Backport to release/3.10 branch label Nov 20, 2024
@rouault rouault added the funded through GSP Work funded through the GDAL Sponsorship Program label Nov 20, 2024
@mdsumner
Copy link
Contributor

I don't know if this is related ... but I tried this fix branch because I don't think partitioned parquet worked for me before, and so why does this /vsis3 form work

ogrinfo --config AWS_S3_ENDPOINT  projects.pawsey.org.au --config AWS_VIRTUAL_HOSTING NO --config AWS_NO_SIGN_REQUEST YES PARQUET:/vsis3/vzarr/oisst-avhrr-v02r01.parquet

INFO: Open of `PARQUET:/vsis3/vzarr/oisst-avhrr-v02r01.parquet'
      using driver `Parquet' successful.
1: oisst-avhrr-v02r01 (None)

but not the /vsicurl form ?

ogrinfo PARQUET:/vsicurl/https://projects.pawsey.org.au/vzarr/oisst-avhrr-v02r01.parquet

(is it settings on the bucket for raw-url use perhaps?)

@rouault
Copy link
Member Author

rouault commented Nov 20, 2024

but not the /vsicurl form ?

yes same reason. But in the case of https://projects.pawsey.org.au/vzarr/oisst-avhrr-v02r01.parquet, there's nothing in the HTTP response headers that indicates it is a AWS directory (just a hint that it is a non-existent resource under a AWS bucket)...

$ curl -v -X HEAD https://projects.pawsey.org.au/vzarr/oisst-avhrr-v02r01.parquet
[....]
> HEAD /vzarr/oisst-avhrr-v02r01.parquet HTTP/1.1
> Host: projects.pawsey.org.au
> User-Agent: curl/7.68.0
> Accept: */*
> 
[....]
< HTTP/1.1 404 Not Found
< content-length: 218
< x-amz-request-id: tx000008893db01a3d16a91-00673dfb58-7d05beb-default
< accept-ranges: bytes
< content-type: application/xml
< date: Wed, 20 Nov 2024 15:08:08 GMT
[....]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release/3.10 Backport to release/3.10 branch funded through GSP Work funded through the GDAL Sponsorship Program
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GeoParquet fails to reads hive partioned data from Azure
2 participants