Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Columns from SEMI and ANTI joins are handled like "normal" joins (DuckDB) #3557

Open
Bilbottom opened this issue Dec 22, 2024 · 3 comments
Open
Labels
Bug Something isn't working

Comments

@Bilbottom
Copy link

Summary

DuckDB has introduced explicit support for SEMI and ANTI joins by adding the SEMI and ANTI join types:

Although DuckDB has implemented these as explicit join types, they're still just WHERE clauses behind the scenes -- in particular, the columns from the table on the right of the SEMI/ANTI join are not available in the context of the SELECT clause.

However, SQLMesh treats these joins like "normal" joins and renders a SELECT * statement with the columns from the right table, which is incorrect.

MWE

Here's an example where the column names are incorrectly taken from the right table, so model evaluation fails:

model (name test.model);
with

l as (from values (1, 'a'), (2, 'b') as v(id, val_l)),
r as (from values (1, 'X') as v(id, val_r))

from l semi join r using (id)
;

DuckDB would return the following table for this query:

id val_l
1 a

However, running sqlmesh evaluate test.model throw the following error:

...

duckdb.duckdb.BinderException: Binder Error: Referenced table "r" not found!
Candidate tables: "l"
LINE 1: ...", "val_r")) SELECT COALESCE("l"."id", "r"."id") AS "id", "l"."val_l" AS "val_...

This is because the rendered version of the model adds columns from the r table into the SELECT clause; the output of sqlmesh render test.model is:

WITH "l" AS (
  SELECT
    "v"."id" AS "id",
    "v"."val_l" AS "val_l"
  FROM (VALUES
    (1, 'a'),
    (2, 'b')) AS "v"("id", "val_l")
), "r" AS (
  SELECT
    "v"."id" AS "id",
    "v"."val_r" AS "val_r"
  FROM (VALUES
    (1, 'X')) AS "v"("id", "val_r")
)
SELECT
  COALESCE("l"."id", "r"."id") AS "id",
  "l"."val_l" AS "val_l",
  "r"."val_r" AS "val_r"
FROM "l" AS "l"
SEMI JOIN "r" AS "r"
  ON "l"."id" = "r"."id"

The CTEs are fine, but the SELECT part should be:

SELECT
  "l"."id" AS "id",
  "l"."val_l" AS "val_l"
FROM "l" AS "l"
SEMI JOIN "r" AS "r"
  ON "l"."id" = "r"."id"

The ANTI case is analogous to the SEMI case.

Environment details

  • OS: Windows 10
  • Python: 3.11.5
  • SQLMesh: 0.141.1
  • DuckDB: 1.1.3
@Bilbottom
Copy link
Author

Bilbottom commented Dec 22, 2024

Related: column names that exist in both tables are considered ambiguous when not prefixed with the left table name (which would indeed be the case if these were "normal" joins)

For example, given the model:

model (name test.model);
with

l as (from values (1, 'a'), (2, 'b') as v(id, val)),
r as (from values (1, 'X') as v(id, val))

select id, val
from l semi join r using (id)
;

...then sqlmesh evaluate test.model returns:

2024-12-22 10:20:49,341 - MainThread - sqlmesh.core.renderer - WARNING - Column '"val"' could not be resolved for model '"memory"."test"."model"', the column may not exist or is ambiguous (renderer.py:554)
2024-12-22 10:20:50,043 - MainThread - sqlmesh.core.renderer - WARNING - Column '"val"' could not be resolved for model '"memory"."test"."model"', the column may not exist or is ambiguous (renderer.py:554)

   id val
0   1   a

@treysp treysp added the Bug Something isn't working label Dec 22, 2024
@treysp
Copy link
Contributor

treysp commented Dec 22, 2024

Hello - thanks for reporting this, looks like we need to add SEMI/ANTI join support to sqlglot. cc @georgesittas

@georgesittas
Copy link
Contributor

I see, yeah these joins are already supported in the parser, but the optimizer doesn't implement duckdb's semantics. I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants