Inconsistent reads from iceberg table

### What happens?

When reading iceberg table using direct metadata reference, we get inconsistent query results.

```
-- Shows value exists in distinct list
SELECT DISTINCT dv FROM iceberg_scan('s3://.../00292-....metadata.json') WHERE runid = '11b...'
-- Result: Includes 'e70...'

-- But direct filter returns empty
SELECT COUNT(*) FROM iceberg_scan('s3://.../00292-....metadata.json') WHERE dv = 'e70...'
-- Result: 0 rows

-- Even more specific combination fails
SELECT COUNT(*) FROM iceberg_scan('s3://.../00292-....metadata.json') WHERE runid = '11b...' AND dv = 'e70...'
-- Result: 0 rows
```

* Physical files exist in S3*
```
aws s3 ls "s3://.../"

# Result: 3 parquet files from multiple updates
# 2025-07-31 12:14:32  7954805  00000-0-a61....parquet
# 2025-07-31 12:30:30  7949663  00000-45-27a....parquet  
# 2025-07-31 21:29:26  7957154  00000-0-9da....parquet
```

Both dv and runid are partition keys, so I suspect that it is able to discover partitions regardless of the metadata state.  But somehow it is not able find the latest written parquet snapshot.  When I checked from Athena which also points to this table, it was able to generate consistent outputs.  This indicates that the metadata had enough information for Athena to read and use the latest data, but the DuckDB iceberg extension did not interpret it correctly.

Rewriting data on the same partition fixed the problem.  So we now have a 4th parquet file, and duckdb can read it consistently.

I've also tried `FORCE INSTALL iceberg FROM core_nightly;` and reproduced the issue.

### To Reproduce

This happened after a large data backfill.  So I am not sure what caused it.  I suspect that one of the processes that wrote to the table may have been killed before write completed.

### OS:

MacOS Sequoia 15.5

### DuckDB Version:

v1.3.1 (Ossivalis)

### DuckDB Client:

Both python and cli

### Hardware:

Both AMD based EC2 and Apple M3 Pro

### Full Name:

Ilkay Benian

### Affiliation:

Foursquare Labs

### What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have not tested with any build

### Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

### Did you include all code required to reproduce the issue?

- [ ] Yes, I have

### Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

- [ ] Yes, I have

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent reads from iceberg table #400

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent reads from iceberg table #400

Description

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Hardware:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions