-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Labels
Description
What happens?
Using the same single file, the iceberg_scan()
is slower than read_parquet()
.
The file is about 4GB with ~35 row groups.
A simple select * from iceberg_scan()
:
Run Time (s): real 8.250 user 4.033364 sys 4.176462
The same parquet file:
Run Time (s): real 0.918 user 6.176052 sys 12.617458
Only 40 rows are shown as usual, the CLI output is identical. Maybe it's as simple as not skipping row groups, not pushing down the limit+offest when displaying the sample.
To Reproduce
select * from iceberg_scan('an iceberg table with the same single parquet')
OS:
linux
DuckDB Version:
1.3.2
DuckDB Client:
CLI
Hardware:
No response
Full Name:
Adam Lippai
Affiliation:
N/A
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- Yes, I have