iceberg_scan() is slower than read_parquet() if multithreaded

### What happens?

Using the same single file, the `iceberg_scan()` is slower than `read_parquet()`.

The file is about 4GB with ~35 row groups.
A simple `select * from iceberg_scan()`:
```
Run Time (s): real 8.250 user 4.033364 sys 4.176462
```

The same parquet file:
```
Run Time (s): real 0.918 user 6.176052 sys 12.617458
```

Only 40 rows are shown as usual, the CLI output is identical. Maybe it's as simple as not skipping row groups, not pushing down the limit+offest when displaying the sample.

### To Reproduce

`select * from iceberg_scan('an iceberg table with the same single parquet')`

### OS:

linux

### DuckDB Version:

1.3.2

### DuckDB Client:

CLI

### Hardware:

_No response_

### Full Name:

Adam Lippai

### Affiliation:

N/A

### What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

### Did you include all relevant data sets for reproducing the issue?

Yes

### Did you include all code required to reproduce the issue?

- [x] Yes, I have

### Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

- [x] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

iceberg_scan() is slower than read_parquet() if multithreaded #419

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

iceberg_scan() is slower than read_parquet() if multithreaded #419

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions