You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although DuckDB has implemented these as explicit join types, they're still just WHERE clauses behind the scenes -- in particular, the columns from the table on the right of the SEMI/ANTI join are not available in the context of the SELECT clause.
However, SQLMesh treats these joins like "normal" joins and renders a SELECT * statement with the columns from the right table, which is incorrect.
MWE
Here's an example where the column names are incorrectly taken from the right table, so model evaluation fails:
model (name test.model);
with
l as (fromvalues (1, 'a'), (2, 'b') as v(id, val_l)),
r as (fromvalues (1, 'X') as v(id, val_r))
from l semi join r using (id)
;
DuckDB would return the following table for this query:
id
val_l
1
a
However, running sqlmesh evaluate test.model throw the following error:
...
duckdb.duckdb.BinderException: Binder Error: Referenced table "r" not found!
Candidate tables: "l"
LINE 1: ...", "val_r")) SELECT COALESCE("l"."id", "r"."id") AS "id", "l"."val_l" AS "val_...
This is because the rendered version of the model adds columns from the r table into the SELECT clause; the output of sqlmesh render test.model is:
Related: column names that exist in both tables are considered ambiguous when not prefixed with the left table name (which would indeed be the case if these were "normal" joins)
For example, given the model:
model (name test.model);
with
l as (fromvalues (1, 'a'), (2, 'b') as v(id, val)),
r as (fromvalues (1, 'X') as v(id, val))
select id, val
from l semi join r using (id)
;
...then sqlmesh evaluate test.model returns:
2024-12-22 10:20:49,341 - MainThread - sqlmesh.core.renderer - WARNING - Column '"val"' could not be resolved for model '"memory"."test"."model"', the column may not exist or is ambiguous (renderer.py:554)
2024-12-22 10:20:50,043 - MainThread - sqlmesh.core.renderer - WARNING - Column '"val"' could not be resolved for model '"memory"."test"."model"', the column may not exist or is ambiguous (renderer.py:554)
id val
0 1 a
Summary
DuckDB has introduced explicit support for
SEMI
andANTI
joins by adding theSEMI
andANTI
join types:Although DuckDB has implemented these as explicit join types, they're still just
WHERE
clauses behind the scenes -- in particular, the columns from the table on the right of theSEMI
/ANTI
join are not available in the context of theSELECT
clause.However, SQLMesh treats these joins like "normal" joins and renders a
SELECT *
statement with the columns from the right table, which is incorrect.MWE
Here's an example where the column names are incorrectly taken from the right table, so model evaluation fails:
DuckDB would return the following table for this query:
However, running
sqlmesh evaluate test.model
throw the following error:This is because the rendered version of the model adds columns from the
r
table into theSELECT
clause; the output ofsqlmesh render test.model
is:The CTEs are fine, but the
SELECT
part should be:The
ANTI
case is analogous to theSEMI
case.Environment details
Windows 10
3.11.5
0.141.1
1.1.3
The text was updated successfully, but these errors were encountered: