Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(tpcds): add queries 28-63 #9736

Merged
merged 18 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .codespellrc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[codespell]
# local codespell matches `./docs`, pre-commit codespell matches `docs`
skip = *.lock,.direnv,.git,./docs/_freeze,./docs/_output/**,./docs/_inv/**,docs/_freeze/**,*.svg,*.css,*.html,*.js,ibis/backends/tests/tpc/queries/duckdb/ds/44.sql
skip = *.lock,.direnv,.git,./docs/_freeze,./docs/_output/**,./docs/_inv/**,docs/_freeze/**,*.svg,*.css,*.html,*.js,ibis/backends/tests/tpc/queries/duckdb/ds/*.sql
ignore-regex = \b(i[if]f|I[IF]F|AFE|alls)\b
builtin = clear,rare,names
ignore-words-list = tim,notin,ang
25 changes: 25 additions & 0 deletions ibis/backends/tests/tpc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# TPC queries with Ibis

These tests perform correctness tests against backends that are able to run
some of the TPC-H and TPC-DS queries.

The text queries are assumed to be correct, and also that if transpiled
correctly will produce the same results as the written Ibis expression.

**This is the assertion being made in these tests.**

The ground truth SQL text is taken from
[DuckDB](https://github.com/duckdb/duckdb/tree/main/extension/tpcds/dsdgen/queries)
and transpiled using SQLGlot to the dialect of whatever backend is under test.

Some queries are altered from the upstream DucKDB repo to have static column
names and to cast strings that are dates explicitly to dates so that pedantic
engines like Trino will accept these queries. These alterations do not change
the computed results of the queries.

ClickHouse is a bit odd in that queries that contain a cross join with an `OR`
condition common to all operands of the `OR` will effectively never finish.
This is probably a bug in ClickHouse.

For that case, the queries for clickhouse have been minimally rewritten to pass
by extracting the common join condition out into a single `AND` operand.
25 changes: 21 additions & 4 deletions ibis/backends/tests/tpc/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,18 @@ def tpc_test(suite_name: Literal["h", "ds"], *, result_is_empty=False):
def inner(test: Callable[..., ir.Table]):
name = f"tpc{suite_name}"

@getattr(pytest.mark, name)
# so that clickhouse doesn't run forever when we hit one of its weird cross
# join performance black holes
#
# trino can sometimes take a while as well, especially in CI
#
# func_only=True doesn't include the fixture setup time in the duration
# of the test run, which is important since backends can take a hugely
# variable amount of time to load all the TPC-$WHATEVER tables.
@pytest.mark.timeout(60, func_only=True)
@pytest.mark.usefixtures("backend")
@pytest.mark.xdist_group(name)
@getattr(pytest.mark, name)
@functools.wraps(test)
def wrapper(*args, backend, **kwargs):
backend_name = backend.name()
Expand Down Expand Up @@ -94,17 +103,25 @@ def wrapper(*args, backend, **kwargs):

assert result_expr._find_backend(use_default=False) is backend.connection
result = backend.connection.to_pandas(result_expr)
assert (result_is_empty and result.empty) or not result.empty

assert (result_is_empty and result.empty) or (
not result_is_empty and not result.empty
)

expected = expected_expr.to_pandas()

assert len(expected.columns) == len(result.columns)
assert all(r in e.lower() for r, e in zip(result.columns, expected.columns))
assert all(
r.lower() in e.lower() for r, e in zip(result.columns, expected.columns)
)

expected.columns = result.columns

expected = PandasData.convert_table(expected, result_expr.schema())
assert (result_is_empty and expected.empty) or not expected.empty

assert (result_is_empty and expected.empty) or (
not result_is_empty and not expected.empty
)

assert len(expected) == len(result)
assert result.columns.tolist() == expected.columns.tolist()
Expand Down
Loading
Loading