What happens?
On a DuckLake catalog backed by Postgres, reading a VARCHAR column from a
table whose ducklake_inlined_data_<tid>_<sv> chunk holds more than ~2 GB
of cell bytes crashes with:
INTERNAL Error: Attempted to access index 0 within vector of size 0
duckdb::PostgresMetadataManager::TransformInlinedData+0x794
duckdb::DuckLakeInlinedDataReader::TryInitializeScan+0xa44
duckdb::MultiFileFunction<ParquetMultiFileInfo>::TryInitializeNextBatch+0x738
Other reads (count(*), non-VARCHAR columns, the same VARCHAR column on smaller
tables) all succeed. The session is left FATAL'd and must be reconnected.
Reproduces on the linux_arm64 build of the extension. The same script run
against the osx_arm64 build does NOT reproduce.
Related (same error string, different code path): #281.
Related (other Postgres-inlined-table bugs): #1123, #1089, #1155, #1219.
To Reproduce
Start an empty Postgres on localhost:55432:
docker run --rm -d -p 55432:5432 -e POSTGRES_DB=repro -e POSTGRES_USER=repro -e POSTGRES_PASSWORD=repro postgres:17-alpine
Dockerfile:
FROM --platform=linux/arm64 python:3.12-slim
RUN pip install --no-cache-dir 'duckdb==1.5.2' psycopg2-binary
COPY repro.py /repro.py
ENTRYPOINT ["python", "-u", "/repro.py"]
repro.py:
import duckdb, psycopg2, os, shutil
N_ROWS = int(os.environ.get("N_ROWS", "70"))
DSN = "host=host.docker.internal port=55432 dbname=repro user=repro password=repro"
DATA = "/tmp/dldata/"
pg = psycopg2.connect(host="host.docker.internal", port=55432,
dbname="repro", user="repro", password="repro")
pg.autocommit = True
pg.cursor().execute("DROP SCHEMA IF EXISTS public CASCADE; CREATE SCHEMA public")
pg.close()
shutil.rmtree(DATA, ignore_errors=True); os.makedirs(DATA, exist_ok=True)
c = duckdb.connect()
c.execute("FORCE INSTALL ducklake; LOAD ducklake; INSTALL postgres; LOAD postgres")
c.execute(f"ATTACH 'ducklake:postgres:{DSN}' AS dl (DATA_PATH '{DATA}')")
c.execute("CREATE TABLE dl.main.t (id INT, c VARCHAR)")
for i in range(N_ROWS):
c.execute("INSERT INTO dl.main.t SELECT ?, repeat('A', 33554432)", [i])
c.close()
c = duckdb.connect()
c.execute("LOAD ducklake; LOAD postgres")
c.execute(f"ATTACH 'ducklake:postgres:{DSN}' AS dl (DATA_PATH '{DATA}')")
print(c.execute("SELECT length(c) FROM dl.main.t LIMIT 1").fetchone())
Build, then run once below the threshold (PASSES) and once above (CRASHES):
docker build --platform linux/arm64 -t ducklake-bug .
# 60 rows * 32 MiB = ~1.92 GB total -> passes
docker run --rm --platform linux/arm64 -e N_ROWS=60 ducklake-bug
# 70 rows * 32 MiB = ~2.24 GB total -> crashes
docker run --rm --platform linux/arm64 -e N_ROWS=70 ducklake-bug
Output at N_ROWS=60:
Output at N_ROWS=70:
Traceback (most recent call last):
File "/repro.py", line 28, in <module>
print(c.execute("SELECT length(c) FROM dl.main.t LIMIT 1").fetchone())
duckdb.duckdb.InternalException: INTERNAL Error: Attempted to access index 0 within vector of size 0
Stack Trace:
/root/.duckdb/extensions/v1.5.2/linux_arm64/ducklake.duckdb_extension
PostgresMetadataManager::TransformInlinedData+0x794
DuckLakeInlinedDataReader::TryInitializeScan+0xa44
/usr/local/lib/python3.12/site-packages/_duckdb.cpython-312-aarch64-linux-gnu.so
MultiFileFunction<ParquetMultiFileInfo>::TryInitializeNextBatch+0x738
The bug is the row-cumulative cell-byte size of the inlined chunk crossing
~2 GB, not specific row content or the absolute number of rows. We first found
it on a real catalog where 216 rows of pipeline state crossed 2.04 GB
cumulative octet_length(state) and started crashing, while 215 rows at 2.01
GB still read cleanly.
The same script run against the osx_arm64 build of the extension (same
extension hash) returns 33554432 cleanly at N_ROWS=70.
OS:
Linux aarch64 (linux_arm64). Reproduced inside python:3.12-slim docker container with --platform linux/arm64. Does NOT reproduce on osx_arm64.
DuckDB Version:
1.5.2 (primary). Also reproduced on 1.5.3.
DuckLake Version:
v1.0 stable (extension hash 415a9eb on DuckDB 1.5.2, e6a3bd0 on DuckDB 1.5.3). Also reproduced on core_nightly (5e49991 on DuckDB 1.5.2, 93cc490 on DuckDB 1.5.3). 4/4 combinations crash.
DuckDB Client:
Python
Hardware:
Apple M-series host (arm64) running Docker Desktop's linux/arm64 VM. Postgres 17 catalog on the same host. Bug is not performance-related.
Full Name:
Giordano Mattoni
Affiliation:
Tensor Energy
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
What happens?
On a DuckLake catalog backed by Postgres, reading a VARCHAR column from a
table whose
ducklake_inlined_data_<tid>_<sv>chunk holds more than ~2 GBof cell bytes crashes with:
Other reads (count(*), non-VARCHAR columns, the same VARCHAR column on smaller
tables) all succeed. The session is left FATAL'd and must be reconnected.
Reproduces on the linux_arm64 build of the extension. The same script run
against the osx_arm64 build does NOT reproduce.
Related (same error string, different code path): #281.
Related (other Postgres-inlined-table bugs): #1123, #1089, #1155, #1219.
To Reproduce
Start an empty Postgres on
localhost:55432:Dockerfile:repro.py:Build, then run once below the threshold (PASSES) and once above (CRASHES):
Output at
N_ROWS=60:Output at
N_ROWS=70:The bug is the row-cumulative cell-byte size of the inlined chunk crossing
~2 GB, not specific row content or the absolute number of rows. We first found
it on a real catalog where 216 rows of pipeline state crossed 2.04 GB
cumulative
octet_length(state)and started crashing, while 215 rows at 2.01GB still read cleanly.
The same script run against the
osx_arm64build of the extension (sameextension hash) returns
33554432cleanly atN_ROWS=70.OS:
Linux aarch64 (linux_arm64). Reproduced inside python:3.12-slim docker container with --platform linux/arm64. Does NOT reproduce on osx_arm64.
DuckDB Version:
1.5.2 (primary). Also reproduced on 1.5.3.
DuckLake Version:
v1.0 stable (extension hash 415a9eb on DuckDB 1.5.2, e6a3bd0 on DuckDB 1.5.3). Also reproduced on core_nightly (5e49991 on DuckDB 1.5.2, 93cc490 on DuckDB 1.5.3). 4/4 combinations crash.
DuckDB Client:
Python
Hardware:
Apple M-series host (arm64) running Docker Desktop's linux/arm64 VM. Postgres 17 catalog on the same host. Bug is not performance-related.
Full Name:
Giordano Mattoni
Affiliation:
Tensor Energy
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?