You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a case where data is clustered so that each file has a different value for the (integer) column foo I want to filter on. If I query with a simple filter where foo = 1, the file skipping works OK and only one file will be scanned. However, if I query for multiple values with an OR filter where foo = 1 OR foo = 2, this causes a full scan of all files. This seems to affect OR only, as where foo > 9 and foo < 12 also skips files. Here's a reproducible example, requires polars for writing the table:
import duckdb
import polars as pl
for file in range(50):
df = pl.DataFrame(
[
pl.Series("foo", [file], dtype=pl.Int64),
]
)
df.write_delta("test_table4", mode="append")
duckdb.execute("force install delta from core_nightly; UPDATE EXTENSIONS; load delta;").fetchall()
duckdb.execute("ATTACH './test_table4' AS my_delta_table (TYPE delta)")
# this scans only a subset of files as expected
res = duckdb.execute("""
explain analyze SELECT COUNT(*) from my_delta_table
where foo = 10
""").fetchall()
print(res[0][1])
# this scans all files
res = duckdb.execute("""
explain analyze SELECT COUNT(*) from my_delta_table
where foo = 10 or foo = 11
""").fetchall()
print(res[0][1])
# this skips files
res = duckdb.execute("""
explain analyze SELECT COUNT(*) from my_delta_table
where foo > 9 and foo < 12
""").fetchall()
print(res[0][1])
The text was updated successfully, but these errors were encountered:
Thanks for reporting @eeroel This is expected for now. The delta-kernel-rs FFI does not currently implement pushing down OR filters. Once it does I'll make sure to fix it!
Hi,
I have a case where data is clustered so that each file has a different value for the (integer) column
foo
I want to filter on. If I query with a simple filterwhere foo = 1
, the file skipping works OK and only one file will be scanned. However, if I query for multiple values with an OR filterwhere foo = 1 OR foo = 2
, this causes a full scan of all files. This seems to affect OR only, aswhere foo > 9 and foo < 12
also skips files. Here's a reproducible example, requirespolars
for writing the table:The text was updated successfully, but these errors were encountered: