What happens?
Summary
When using DuckLake with a Quack-backed metadata catalog, ducklake_add_data_files(...) appears to send the parquet_full_metadata(...) inspection query to the remote Quack catalog backend.
This differs from the Postgres metadata catalog path, where the Parquet inspection query is executed locally in the DuckDB process, and only metadata writes are sent to Postgres.
This means a remote Quack metadata service must also have access to the data files and object-store credentials, even though the DuckDB client already has those credentials.
Why this matters
For serverless Quack metadata backends, this creates an unexpected requirement:
- The DuckDB client has the
TYPE r2 / TYPE s3 secret needed to read/write DuckLake data files.
- But
ducklake_add_data_files(...) sends parquet_full_metadata('r2://...') through Quack.
- Therefore the remote Quack backend also needs R2/S3 access, or must reimplement Parquet metadata extraction.
This is surprising because the metadata catalog backend should not necessarily need object-store credentials. For example, Postgres does not need to read Parquet files.
Code path observed
In ducklake_add_data_files.cpp, DuckLakeFileProcessor::ReadParquetFullMetadata(...) builds a query containing:
FROM parquet_full_metadata(...)
and calls:
For the base metadata manager path, DuckLakeMetadataManager::Query(...) eventually executes the query locally through transaction.ExecuteRaw(...).
For Postgres, PostgresMetadataManager::Query(...) falls back to the base implementation, so parquet_full_metadata(...) runs locally in DuckDB. Postgres only overrides Execute(...) for metadata writes via postgres_execute(...).
For Quack, QuackMetadataManager::Query(...) overrides this behavior and wraps the query in:
CALL system.main.quack_query_by_name(...)
As a result, parquet_full_metadata(...) is executed by the remote Quack endpoint instead of the local DuckDB client.
Expected behavior
ducklake_add_data_files(...) should inspect Parquet files in the DuckDB client process, where the relevant filesystem extensions and secrets already exist.
The Quack metadata catalog should only receive the resulting metadata writes/reads that truly belong to the metadata catalog.
Actual behavior
With a Quack metadata catalog, parquet_full_metadata(...) is sent to the remote Quack backend. A remote backend that only implements the metadata catalog SQL cannot support ducklake_add_data_files(...) unless it also implements Parquet metadata extraction and has access to the same data files.
Reproduction shape
Using a Quack-backed DuckLake catalog:
LOAD httpfs;
LOAD quack;
LOAD ducklake;
CREATE SECRET (
TYPE quack,
TOKEN '...'
);
CREATE SECRET lake_r2 (
TYPE r2,
KEY_ID '...',
SECRET '...',
ACCOUNT_ID '...',
SCOPE 'r2://bucket/lake/'
);
ATTACH 'ducklake:quack:<host>:443' AS lake (
DATA_PATH 'r2://bucket/lake/'
);
CALL ducklake_add_data_files('lake', 'some_table', 'r2://bucket/path/file.parquet');
The remote Quack service receives a query involving parquet_full_metadata(...).
Suggested direction
One possible fix would be to avoid routing client-local file inspection table functions through QuackMetadataManager::Query(...).
For example, DuckLake could split ducklake_add_data_files(...) into:
-
Local DuckDB phase:
- run
parquet_full_metadata(...)
- validate schema/types/partition info
- build
DuckLakeDataFile metadata
-
Metadata catalog phase:
- write the resulting DuckLake metadata rows through the metadata manager
Alternatively, the Quack metadata manager could distinguish metadata-catalog SQL from client-local helper queries and execute the latter locally.
Impact
This would make Quack-backed DuckLake behavior align better with Postgres-backed DuckLake behavior and avoid requiring remote Quack metadata services to have data-file credentials or to reimplement DuckDB table functions such as parquet_full_metadata(...).
To Reproduce
See above
OS:
MacOS
DuckDB Version:
1.5.2
DuckLake Version:
1.0
DuckDB Client:
CLI
Hardware:
No response
Full Name:
TobiLG
Affiliation:
None
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
What happens?
Summary
When using DuckLake with a Quack-backed metadata catalog,
ducklake_add_data_files(...)appears to send theparquet_full_metadata(...)inspection query to the remote Quack catalog backend.This differs from the Postgres metadata catalog path, where the Parquet inspection query is executed locally in the DuckDB process, and only metadata writes are sent to Postgres.
This means a remote Quack metadata service must also have access to the data files and object-store credentials, even though the DuckDB client already has those credentials.
Why this matters
For serverless Quack metadata backends, this creates an unexpected requirement:
TYPE r2/TYPE s3secret needed to read/write DuckLake data files.ducklake_add_data_files(...)sendsparquet_full_metadata('r2://...')through Quack.This is surprising because the metadata catalog backend should not necessarily need object-store credentials. For example, Postgres does not need to read Parquet files.
Code path observed
In
ducklake_add_data_files.cpp,DuckLakeFileProcessor::ReadParquetFullMetadata(...)builds a query containing:FROM parquet_full_metadata(...)and calls:
For the base metadata manager path,
DuckLakeMetadataManager::Query(...)eventually executes the query locally throughtransaction.ExecuteRaw(...).For Postgres,
PostgresMetadataManager::Query(...)falls back to the base implementation, soparquet_full_metadata(...)runs locally in DuckDB. Postgres only overridesExecute(...)for metadata writes viapostgres_execute(...).For Quack,
QuackMetadataManager::Query(...)overrides this behavior and wraps the query in:As a result,
parquet_full_metadata(...)is executed by the remote Quack endpoint instead of the local DuckDB client.Expected behavior
ducklake_add_data_files(...)should inspect Parquet files in the DuckDB client process, where the relevant filesystem extensions and secrets already exist.The Quack metadata catalog should only receive the resulting metadata writes/reads that truly belong to the metadata catalog.
Actual behavior
With a Quack metadata catalog,
parquet_full_metadata(...)is sent to the remote Quack backend. A remote backend that only implements the metadata catalog SQL cannot supportducklake_add_data_files(...)unless it also implements Parquet metadata extraction and has access to the same data files.Reproduction shape
Using a Quack-backed DuckLake catalog:
The remote Quack service receives a query involving
parquet_full_metadata(...).Suggested direction
One possible fix would be to avoid routing client-local file inspection table functions through
QuackMetadataManager::Query(...).For example, DuckLake could split
ducklake_add_data_files(...)into:Local DuckDB phase:
parquet_full_metadata(...)DuckLakeDataFilemetadataMetadata catalog phase:
Alternatively, the Quack metadata manager could distinguish metadata-catalog SQL from client-local helper queries and execute the latter locally.
Impact
This would make Quack-backed DuckLake behavior align better with Postgres-backed DuckLake behavior and avoid requiring remote Quack metadata services to have data-file credentials or to reimplement DuckDB table functions such as
parquet_full_metadata(...).To Reproduce
See above
OS:
MacOS
DuckDB Version:
1.5.2
DuckLake Version:
1.0
DuckDB Client:
CLI
Hardware:
No response
Full Name:
TobiLG
Affiliation:
None
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?